trustgraph/trustgraph-base/trustgraph/messaging/translators
cybermaggedon 286f762369
The id field in pipeline Metadata was being overwritten at each processing (#686)
The id field in pipeline Metadata was being overwritten at each processing
stage (document → page → chunk), causing knowledge storage to create
separate cores per chunk instead of grouping by document.

Add a root field that:
- Is set by librarian to the original document ID
- Is copied unchanged through PDF decoder, chunkers, and extractors
- Is used by knowledge storage for document_id grouping (with fallback to id)

Changes:
- Add root field to Metadata schema with empty string default
- Set root=document.id in librarian when initiating document processing
- Copy root through PDF decoder, recursive chunker, and all extractors
- Update knowledge storage to use root (or id as fallback) for grouping
- Add root handling to translators and gateway serialization
- Update test mock Metadata class to include root parameter
2026-03-11 12:16:39 +00:00
..
__init__.py Row embeddings APIs exposed (#646) 2026-02-23 21:52:56 +00:00
agent.py Fix non streaming RAG problems (#607) 2026-01-12 18:45:52 +00:00
base.py Feature/translator classes (#414) 2025-06-20 16:59:55 +01:00
collection.py Fix/queue configurations (#585) 2025-12-06 14:54:47 +00:00
config.py Empty configuration is returned as empty list, previously was not in response (#436) 2025-07-15 14:30:37 +01:00
diagnosis.py Fix tests (#593) 2025-12-19 08:53:21 +00:00
document_loading.py The id field in pipeline Metadata was being overwritten at each processing (#686) 2026-03-11 12:16:39 +00:00
embeddings.py Batch embeddings (#668) 2026-03-08 18:36:54 +00:00
embeddings_query.py Embeddings API scores (#671) 2026-03-09 10:53:44 +00:00
flow.py Fix config inconsistency (#609) 2026-01-14 12:31:40 +00:00
knowledge.py The id field in pipeline Metadata was being overwritten at each processing (#686) 2026-03-11 12:16:39 +00:00
library.py Fix/librarian broken (#674) 2026-03-09 13:36:24 +00:00
metadata.py Incremental / large document loading (#659) 2026-03-04 16:57:58 +00:00
nlp_query.py Structured query support (#492) 2025-09-04 16:06:18 +01:00
primitives.py Fix/extraction prov (#662) 2026-03-06 12:23:58 +00:00
prompt.py Fix streaming API niggles (#599) 2026-01-06 16:41:35 +00:00
retrieval.py Terminology Rename, and named-graphs for explainability (#682) 2026-03-10 14:35:21 +00:00
rows_query.py Structured data 2 (#645) 2026-02-23 15:56:29 +00:00
structured_query.py Extend use of user + collection fields (#503) 2025-09-08 18:28:38 +01:00
text_completion.py Update to add streaming tests (#600) 2026-01-06 21:48:05 +00:00
tool.py MCP client support (#427) 2025-07-07 23:52:23 +01:00
triples.py Feature/streaming triples (#676) 2026-03-09 15:46:33 +00:00