mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
Fix Cassandra schema and graph filter semantics (#680)
Schema fix (dtype/lang clustering key): - Add dtype and lang to PRIMARY KEY in quads_by_entity table - Add otype, dtype, lang to PRIMARY KEY in quads_by_collection table - Fixes deduplication bug where literals with same value but different datatype or language tag were collapsed (e.g., "thing" vs "thing"@en) - Update delete_collection to pass new clustering columns - Update tech spec to reflect new schema Graph filter semantics (simplified, no wildcard constant): - g=None means all graphs (no filter) - g="" means default graph only - g="uri" means specific named graph - Remove GRAPH_WILDCARD usage from EntityCentricKnowledgeGraph - Fix service.py streaming and non-streaming paths - Fix CLI to preserve empty string for -g '' argument
This commit is contained in:
parent
c951562189
commit
84941ce645
5 changed files with 102 additions and 65 deletions
|
|
@ -42,7 +42,7 @@ CREATE TABLE quads_by_entity (
|
|||
d text, -- Dataset/graph of the quad
|
||||
dtype text, -- XSD datatype (when otype = 'L'), e.g. 'xsd:string'
|
||||
lang text, -- Language tag (when otype = 'L'), e.g. 'en', 'fr'
|
||||
PRIMARY KEY ((collection, entity), role, p, otype, s, o, d)
|
||||
PRIMARY KEY ((collection, entity), role, p, otype, s, o, d, dtype, lang)
|
||||
);
|
||||
```
|
||||
|
||||
|
|
@ -54,6 +54,7 @@ CREATE TABLE quads_by_entity (
|
|||
2. **p** — next most common filter, "give me all `knows` relationships"
|
||||
3. **otype** — enables filtering by URI-valued vs literal-valued relationships
|
||||
4. **s, o, d** — remaining columns for uniqueness
|
||||
5. **dtype, lang** — distinguish literals with same value but different type metadata (e.g., `"thing"` vs `"thing"@en` vs `"thing"^^xsd:string`)
|
||||
|
||||
### Table 2: quads_by_collection
|
||||
|
||||
|
|
@ -69,11 +70,11 @@ CREATE TABLE quads_by_collection (
|
|||
otype text, -- 'U' (URI), 'L' (literal), 'T' (triple/reification)
|
||||
dtype text, -- XSD datatype (when otype = 'L')
|
||||
lang text, -- Language tag (when otype = 'L')
|
||||
PRIMARY KEY (collection, d, s, p, o)
|
||||
PRIMARY KEY (collection, d, s, p, o, otype, dtype, lang)
|
||||
);
|
||||
```
|
||||
|
||||
Clustered by dataset first, enabling deletion at either collection or dataset granularity.
|
||||
Clustered by dataset first, enabling deletion at either collection or dataset granularity. The `otype`, `dtype`, and `lang` columns are included in the clustering key to distinguish literals with the same value but different type metadata — in RDF, `"thing"`, `"thing"@en`, and `"thing"^^xsd:string` are semantically distinct values.
|
||||
|
||||
## Write Path
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue