mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
The id field in pipeline Metadata was being overwritten at each processing stage (document → page → chunk), causing knowledge storage to create separate cores per chunk instead of grouping by document. Add a root field that: - Is set by librarian to the original document ID - Is copied unchanged through PDF decoder, chunkers, and extractors - Is used by knowledge storage for document_id grouping (with fallback to id) Changes: - Add root field to Metadata schema with empty string default - Set root=document.id in librarian when initiating document processing - Copy root through PDF decoder, recursive chunker, and all extractors - Update knowledge storage to use root (or id as fallback) for grouping - Add root handling to translators and gateway serialization - Update test mock Metadata class to include root parameter |
||
|---|---|---|
| .. | ||
| test_agent | ||
| test_base | ||
| test_chunking | ||
| test_cli | ||
| test_clients | ||
| test_config | ||
| test_cores | ||
| test_decoding | ||
| test_direct | ||
| test_embeddings | ||
| test_extract | ||
| test_gateway | ||
| test_knowledge_graph | ||
| test_query | ||
| test_retrieval | ||
| test_rev_gateway | ||
| test_storage | ||
| test_text_completion | ||
| __init__.py | ||
| test_prompt_manager.py | ||
| test_prompt_manager_edge_cases.py | ||
| test_python_api_client.py | ||