mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-05-04 12:52:36 +02:00
The id field in pipeline Metadata was being overwritten at each processing (#686)
The id field in pipeline Metadata was being overwritten at each processing stage (document → page → chunk), causing knowledge storage to create separate cores per chunk instead of grouping by document. Add a root field that: - Is set by librarian to the original document ID - Is copied unchanged through PDF decoder, chunkers, and extractors - Is used by knowledge storage for document_id grouping (with fallback to id) Changes: - Add root field to Metadata schema with empty string default - Set root=document.id in librarian when initiating document processing - Copy root through PDF decoder, recursive chunker, and all extractors - Update knowledge storage to use root (or id as fallback) for grouping - Add root handling to translators and gateway serialization - Update test mock Metadata class to include root parameter
This commit is contained in:
parent
aa4f5c6c00
commit
286f762369
15 changed files with 48 additions and 4 deletions
|
|
@ -334,6 +334,7 @@ class Processor(AsyncProcessor):
|
|||
triples_msg = Triples(
|
||||
metadata=Metadata(
|
||||
id=doc_uri,
|
||||
root=document.id,
|
||||
user=processing.user,
|
||||
collection=processing.collection,
|
||||
),
|
||||
|
|
@ -380,6 +381,7 @@ class Processor(AsyncProcessor):
|
|||
doc = TextDocument(
|
||||
metadata = Metadata(
|
||||
id = document.id,
|
||||
root = document.id,
|
||||
user = processing.user,
|
||||
collection = processing.collection
|
||||
),
|
||||
|
|
@ -390,6 +392,7 @@ class Processor(AsyncProcessor):
|
|||
doc = TextDocument(
|
||||
metadata = Metadata(
|
||||
id = document.id,
|
||||
root = document.id,
|
||||
user = processing.user,
|
||||
collection = processing.collection
|
||||
),
|
||||
|
|
@ -405,6 +408,7 @@ class Processor(AsyncProcessor):
|
|||
doc = Document(
|
||||
metadata = Metadata(
|
||||
id = document.id,
|
||||
root = document.id,
|
||||
user = processing.user,
|
||||
collection = processing.collection
|
||||
),
|
||||
|
|
@ -415,6 +419,7 @@ class Processor(AsyncProcessor):
|
|||
doc = Document(
|
||||
metadata = Metadata(
|
||||
id = document.id,
|
||||
root = document.id,
|
||||
user = processing.user,
|
||||
collection = processing.collection
|
||||
),
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue