mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
1. Shared Provenance Module - URI generators, namespace constants,
triple builders, vocabulary bootstrap
2. Librarian - Emits document metadata to graph on processing
initiation (vocabulary bootstrap + PROV-O triples)
3. PDF Extractor - Saves pages as child documents, emits parent-child
provenance edges, forwards page IDs
4. Chunker - Saves chunks as child documents, emits provenance edges,
forwards chunk ID + content
5. Knowledge Extractors (both definitions and relationships):
- Link entities to chunks via SUBJECT_OF (not top-level document)
- Removed duplicate metadata emission (now handled by librarian)
- Get chunk_doc_id and chunk_uri from incoming Chunk message
6. Embedding Provenance:
- EntityContext schema has chunk_id field
- EntityEmbeddings schema has chunk_id field
- Definitions extractor sets chunk_id when creating EntityContext
- Graph embeddings processor passes chunk_id through to
EntityEmbeddings
Provenance Flow:
Document → Page (PDF) → Chunk → Extracted Facts/Embeddings
↓ ↓ ↓ ↓
librarian librarian librarian (chunk_id reference)
+ graph + graph + graph
Each artifact is stored in librarian with parent-child linking, and PROV-O
edges are emitted to the knowledge graph for full traceability from any
extracted fact back to its source document.
Also, updating tests
|
||
|---|---|---|
| .. | ||
| __TEMPLATE.md | ||
| architecture-principles.md | ||
| cassandra-consolidation.md | ||
| cassandra-performance-refactor.md | ||
| collection-management.md | ||
| entity-centric-graph.md | ||
| extraction-time-provenance.md | ||
| flow-class-definition.md | ||
| flow-configurable-parameters.md | ||
| graph-contexts.md | ||
| graphql-query.md | ||
| graphrag-performance-optimization.md | ||
| import-export-graceful-shutdown.md | ||
| jsonl-prompt-output.md | ||
| large-document-loading.md | ||
| logging-strategy.md | ||
| mcp-tool-arguments.md | ||
| mcp-tool-bearer-token.md | ||
| minio-to-s3-migration.md | ||
| more-config-cli.md | ||
| multi-tenant-support.md | ||
| neo4j-user-collection-isolation.md | ||
| ontology-extract-phase-2.md | ||
| ontology.md | ||
| ontorag.md | ||
| openapi-spec.md | ||
| pubsub.md | ||
| python-api-refactor.md | ||
| query-time-provenance.md | ||
| rag-streaming-support.md | ||
| schema-refactoring-proposal.md | ||
| streaming-llm-responses.md | ||
| structured-data-2.md | ||
| structured-data-descriptor.md | ||
| structured-data-schemas.md | ||
| structured-data.md | ||
| structured-diag-service.md | ||
| tool-group.md | ||
| tool-services.md | ||
| vector-store-lifecycle.md | ||