Subgraph provenance (#694)

Replace per-triple provenance reification with subgraph model Extraction provenance previously created a full reification (statement URI, activity, agent) for every single extracted triple, producing ~13 provenance triples per knowledge triple. Since each chunk is processed by a single LLM call, this was both redundant and semantically inaccurate. Now one subgraph object is created per chunk extraction, with tg:contains linking to each extracted triple. For 20 extractions from a chunk this reduces provenance from ~260 triples to ~33. - Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri - Replace triple_provenance_triples() with subgraph_provenance_triples() - Refactor kg-extract-definitions and kg-extract-relationships to generate provenance once per chunk instead of per triple - Add subgraph provenance to kg-extract-ontology and kg-extract-agent (previously had none) - Update CLI tools and tech specs to match Also rename tg-show-document-hierarchy to tg-show-extraction-provenance. Added extra typing for extraction provenance, fixed extraction prov CLI
2026-06-14 09:15:13 +02:00 · 2026-03-13 11:37:59 +00:00 · 2026-03-13 11:37:59 +00:00 · 64e3f6bd0d
commit 64e3f6bd0d
parent 35128ff019
20 changed files with 463 additions and 193 deletions
--- a/docs/tech-specs/query-time-explainability.md
+++ b/docs/tech-specs/query-time-explainability.md
@ -193,7 +193,7 @@ When storing explainability data, URIs from `uri_map` are used.

 Selected edges can be traced back to source documents:

-1. Query for reifying statement: `?stmt tg:reifies <<s p o>>`
+1. Query for containing subgraph: `?subgraph tg:contains <<s p o>>`
 2. Follow `prov:wasDerivedFrom` chain to root document
 3. Each step in chain: chunk → page → document

@ -209,7 +209,7 @@ elif term.type == TRIPLE:

 This enables queries like:
 ```
-?stmt tg:reifies <<http://example.org/s http://example.org/p "value">>
+?subgraph tg:contains <<http://example.org/s http://example.org/p "value">>
 ```

 ## CLI Usage