mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-05-11 16:22:37 +02:00
Subgraph provenance (#694)
Replace per-triple provenance reification with subgraph model Extraction provenance previously created a full reification (statement URI, activity, agent) for every single extracted triple, producing ~13 provenance triples per knowledge triple. Since each chunk is processed by a single LLM call, this was both redundant and semantically inaccurate. Now one subgraph object is created per chunk extraction, with tg:contains linking to each extracted triple. For 20 extractions from a chunk this reduces provenance from ~260 triples to ~33. - Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri - Replace triple_provenance_triples() with subgraph_provenance_triples() - Refactor kg-extract-definitions and kg-extract-relationships to generate provenance once per chunk instead of per triple - Add subgraph provenance to kg-extract-ontology and kg-extract-agent (previously had none) - Update CLI tools and tech specs to match Also rename tg-show-document-hierarchy to tg-show-extraction-provenance. Added extra typing for extraction provenance, fixed extraction prov CLI
This commit is contained in:
parent
35128ff019
commit
64e3f6bd0d
20 changed files with 463 additions and 193 deletions
|
|
@ -36,7 +36,7 @@ TG_SELECTED_EDGE = TG + "selectedEdge"
|
|||
TG_EDGE = TG + "edge"
|
||||
TG_REASONING = TG + "reasoning"
|
||||
TG_CONTENT = TG + "content"
|
||||
TG_REIFIES = TG + "reifies"
|
||||
TG_CONTAINS = TG + "contains"
|
||||
PROV = "http://www.w3.org/ns/prov#"
|
||||
PROV_STARTED_AT_TIME = PROV + "startedAtTime"
|
||||
PROV_WAS_DERIVED_FROM = PROV + "wasDerivedFrom"
|
||||
|
|
@ -185,18 +185,18 @@ async def _query_edge_provenance(ws_url, flow_id, edge_s, edge_p, edge_o, user,
|
|||
"""
|
||||
Query for provenance of an edge (s, p, o) in the knowledge graph.
|
||||
|
||||
Finds statements that reify the edge via tg:reifies, then follows
|
||||
Finds subgraphs that contain the edge via tg:contains, then follows
|
||||
prov:wasDerivedFrom to find source documents.
|
||||
|
||||
Returns list of source URIs (chunks, pages, documents).
|
||||
"""
|
||||
# Query for statements that reify this edge: ?stmt tg:reifies <<s p o>>
|
||||
# Query for subgraphs that contain this edge: ?subgraph tg:contains <<s p o>>
|
||||
request = {
|
||||
"id": "edge-prov-request",
|
||||
"service": "triples",
|
||||
"flow": flow_id,
|
||||
"request": {
|
||||
"p": {"t": "i", "i": TG_REIFIES},
|
||||
"p": {"t": "i", "i": TG_CONTAINS},
|
||||
"o": {
|
||||
"t": "t", # Quoted triple type
|
||||
"tr": {
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue