trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-07-26 05:31:01 +02:00

cybermaggedon 286f762369 The id field in pipeline Metadata was being overwritten at each processing (#686 ) The id field in pipeline Metadata was being overwritten at each processing stage (document → page → chunk), causing knowledge storage to create separate cores per chunk instead of grouping by document. Add a root field that: - Is set by librarian to the original document ID - Is copied unchanged through PDF decoder, chunkers, and extractors - Is used by knowledge storage for document_id grouping (with fallback to id) Changes: - Add root field to Metadata schema with empty string default - Set root=document.id in librarian when initiating document processing - Copy root through PDF decoder, recursive chunker, and all extractors - Update knowledge storage to use root (or id as fallback) for grouping - Add root handling to translators and gateway serialization - Update test mock Metadata class to include root parameter		2026-03-11 12:16:39 +00:00
..
__init__.py	Extending test coverage (#434 )	2025-07-14 17:54:04 +01:00
conftest.py	The id field in pipeline Metadata was being overwritten at each processing (#686 )	2026-03-11 12:16:39 +00:00
test_agent_extraction.py	Remove redundant metadata (#685 )	2026-03-11 10:51:39 +00:00
test_agent_extraction_edge_cases.py	Remove redundant metadata (#685 )	2026-03-11 10:51:39 +00:00
test_entity_extraction.py	Extending test coverage (#434 )	2025-07-14 17:54:04 +01:00
test_graph_validation.py	Changed schema for Value -> Term, majorly breaking change (#622 )	2026-01-27 13:48:08 +00:00
test_object_extraction_logic.py	Remove redundant metadata (#685 )	2026-03-11 10:51:39 +00:00
test_object_validation.py	Structured data 2 (#645 )	2026-02-23 15:56:29 +00:00
test_relationship_extraction.py	Extending test coverage (#434 )	2025-07-14 17:54:04 +01:00
test_triple_construction.py	Remove redundant metadata (#685 )	2026-03-11 10:51:39 +00:00