trustgraph/tests/unit/test_knowledge_graph
cybermaggedon 286f762369
The id field in pipeline Metadata was being overwritten at each processing (#686)
The id field in pipeline Metadata was being overwritten at each processing
stage (document → page → chunk), causing knowledge storage to create
separate cores per chunk instead of grouping by document.

Add a root field that:
- Is set by librarian to the original document ID
- Is copied unchanged through PDF decoder, chunkers, and extractors
- Is used by knowledge storage for document_id grouping (with fallback to id)

Changes:
- Add root field to Metadata schema with empty string default
- Set root=document.id in librarian when initiating document processing
- Copy root through PDF decoder, recursive chunker, and all extractors
- Update knowledge storage to use root (or id as fallback) for grouping
- Add root handling to translators and gateway serialization
- Update test mock Metadata class to include root parameter
2026-03-11 12:16:39 +00:00
..
__init__.py Extending test coverage (#434) 2025-07-14 17:54:04 +01:00
conftest.py The id field in pipeline Metadata was being overwritten at each processing (#686) 2026-03-11 12:16:39 +00:00
test_agent_extraction.py Remove redundant metadata (#685) 2026-03-11 10:51:39 +00:00
test_agent_extraction_edge_cases.py Remove redundant metadata (#685) 2026-03-11 10:51:39 +00:00
test_entity_extraction.py Extending test coverage (#434) 2025-07-14 17:54:04 +01:00
test_graph_validation.py Changed schema for Value -> Term, majorly breaking change (#622) 2026-01-27 13:48:08 +00:00
test_object_extraction_logic.py Remove redundant metadata (#685) 2026-03-11 10:51:39 +00:00
test_object_validation.py Structured data 2 (#645) 2026-02-23 15:56:29 +00:00
test_relationship_extraction.py Extending test coverage (#434) 2025-07-14 17:54:04 +01:00
test_triple_construction.py Remove redundant metadata (#685) 2026-03-11 10:51:39 +00:00