Commit graph

5 commits

Author SHA1 Message Date
cybermaggedon
64e3f6bd0d
Subgraph provenance (#694)
Replace per-triple provenance reification with subgraph model

Extraction provenance previously created a full reification (statement
URI, activity, agent) for every single extracted triple, producing ~13
provenance triples per knowledge triple.  Since each chunk is processed
by a single LLM call, this was both redundant and semantically
inaccurate.

Now one subgraph object is created per chunk extraction, with
tg:contains linking to each extracted triple.  For 20 extractions from
a chunk this reduces provenance from ~260 triples to ~33.

- Rename tg:reifies -> tg:contains, stmt_uri -> subgraph_uri
- Replace triple_provenance_triples() with subgraph_provenance_triples()
- Refactor kg-extract-definitions and kg-extract-relationships to
  generate provenance once per chunk instead of per triple
- Add subgraph provenance to kg-extract-ontology and kg-extract-agent
  (previously had none)
- Update CLI tools and tech specs to match

Also rename tg-show-document-hierarchy to tg-show-extraction-provenance.

Added extra typing for extraction provenance, fixed extraction prov CLI
2026-03-13 11:37:59 +00:00
cybermaggedon
2b9232917c
Fix/extraction prov (#662)
Quoted triple fixes, including...

1. Updated triple_provenance_triples() in triples.py:
   - Now accepts a Triple object directly
   - Creates the reification triple using TRIPLE term type: stmt_uri tg:reifies
         <<extracted_triple>>
   - Includes it in the returned provenance triples
    
2. Updated definitions extractor:
   - Added imports for provenance functions and component version
   - Added ParameterSpec for optional llm-model and ontology flow parameters
   - For each definition triple, generates provenance with reification
    
3. Updated relationships extractor:
   - Same changes as definitions extractor
2026-03-06 12:23:58 +00:00
cybermaggedon
887fafcf8c
Fix/core save api (#172)
* Acknowledge messaages from Pulsar, doh!
* Change API to deliver a boolean e if value is an entity
* Change loaders to use new API
* Changes, entity-aware API is complete
2024-11-26 16:46:38 +00:00
cybermaggedon
24d099793d
Feature/doc metadata labels (#130)
* Add schema load util

* Added a sample schema turtle file will be useful for future testing and
tutorials.

* Fixed graph label metadata confusion, was created incorrect subjectOf
edges.
2024-10-29 21:18:02 +00:00
cybermaggedon
7954e863cc
Feature: document metadata (#123)
* Rework metadata structure in processing messages to be a subgraph
* Add subgraph creation for tg-load-pdf and tg-load-text based on command-line passing of doc attributes
* Document metadata is added to knowledge graph with subjectOf linkage to extracted entities
2024-10-23 18:04:04 +01:00