Fix Cassandra schema and graph filter semantics (#680)

Schema fix (dtype/lang clustering key):
- Add dtype and lang to PRIMARY KEY in quads_by_entity table
- Add otype, dtype, lang to PRIMARY KEY in quads_by_collection table
- Fixes deduplication bug where literals with same value but different
  datatype or language tag were collapsed (e.g., "thing" vs "thing"@en)
- Update delete_collection to pass new clustering columns
- Update tech spec to reflect new schema

Graph filter semantics (simplified, no wildcard constant):
- g=None means all graphs (no filter)
- g="" means default graph only
- g="uri" means specific named graph
- Remove GRAPH_WILDCARD usage from EntityCentricKnowledgeGraph
- Fix service.py streaming and non-streaming paths
- Fix CLI to preserve empty string for -g '' argument
This commit is contained in:
cybermaggedon 2026-03-10 12:52:51 +00:00 committed by GitHub
parent c951562189
commit 84941ce645
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 102 additions and 65 deletions

View file

@ -305,9 +305,8 @@ class TestEntityCentricKnowledgeGraph:
mock_session.execute.assert_called()
def test_graph_wildcard_returns_all_graphs(self, entity_kg):
"""Test that g='*' returns quads from all graphs"""
from trustgraph.direct.cassandra_kg import GRAPH_WILDCARD
def test_graph_none_returns_all_graphs(self, entity_kg):
"""Test that g=None returns quads from all graphs"""
kg, mock_session = entity_kg
mock_result = [
@ -320,7 +319,7 @@ class TestEntityCentricKnowledgeGraph:
]
mock_session.execute.return_value = mock_result
results = kg.get_s('test_collection', 'http://example.org/Alice', g=GRAPH_WILDCARD)
results = kg.get_s('test_collection', 'http://example.org/Alice', g=None)
# Should return quads from both graphs
assert len(results) == 2