trustgraph/docs/tech-specs/document-embeddings-chunk-id.tr.md
Alex Jenkins 8954fa3ad7 Feat: TrustGraph i18n & Documentation Translation Updates (#781)
Native CLI i18n: The TrustGraph CLI has built-in translation support
that dynamically loads language strings. You can test and use
different languages by simply passing the --lang flag (e.g., --lang
es for Spanish, --lang ru for Russian) or by configuring your
environment's LANG variable.

Automated Docs Translations: This PR introduces autonomously
translated Markdown documentation into several target languages,
including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew,
Arabic, Simplified Chinese, and Russian.
2026-04-14 12:08:32 +01:00

4.5 KiB
Raw Blame History

layout title parent
default Belge Gömme Parçası Kimliği Turkish (Beta)

Belge Gömme Parçası Kimliği

Beta Translation: This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.

Genel Bakış

Belge gömme depolaması şu anda parça metnini doğrudan vektör deposu yüküne kaydederek, Garage'da bulunan verilerin çoğaltılmasına neden oluyor. Bu özellik, parça metni depolamasını chunk_id referanslarıyla değiştirmektedir.

Mevcut Durum

@dataclass
class ChunkEmbeddings:
    chunk: bytes = b""
    vectors: list[list[float]] = field(default_factory=list)

@dataclass
class DocumentEmbeddingsResponse:
    error: Error | None = None
    chunks: list[str] = field(default_factory=list)

Vektör depolama yükü:

payload={"doc": chunk}  # Duplicates Garage content

Tasarım

Şema Değişiklikleri

ChunkEmbeddings - chunk'ı chunk_id ile değiştirin:

@dataclass
class ChunkEmbeddings:
    chunk_id: str = ""
    vectors: list[list[float]] = field(default_factory=list)

DocumentEmbeddingsResponse - parçalar yerine chunk_id'leri döndür:

@dataclass
class DocumentEmbeddingsResponse:
    error: Error | None = None
    chunk_ids: list[str] = field(default_factory=list)

Vektör Depolama Veri Yapısı

Tüm depolar (Qdrant, Milvus, Pinecone):

payload={"chunk_id": chunk_id}

Belge RAG Değişiklikleri

Belge RAG işlemcisi, parça içeriğini Garage'dan alır:

# Get chunk_ids from embeddings store
chunk_ids = await self.rag.doc_embeddings_client.query(...)

# Fetch chunk content from Garage
docs = []
for chunk_id in chunk_ids:
    content = await self.rag.librarian_client.get_document_content(
        chunk_id, self.user
    )
    docs.append(content)

API/SDK Değişiklikleri

DocumentEmbeddingsClient, chunk_ids değerini döndürür:

return resp.chunk_ids  # Changed from resp.chunks

Kablolu format (DocumentEmbeddingsResponseTranslator):

result["chunk_ids"] = obj.chunk_ids  # Changed from chunks

CLI Değişiklikleri

CLI aracı, chunk_id'leri gösterir (çağrıcılar, gerekirse içeriği ayrı olarak alabilir).

Değiştirilecek Dosyalar

Şema

trustgraph-base/trustgraph/schema/knowledge/embeddings.py - ChunkEmbeddings trustgraph-base/trustgraph/schema/services/query.py - DocumentEmbeddingsResponse

Mesajlaşma/Çeviriciler

trustgraph-base/trustgraph/messaging/translators/embeddings_query.py - DocumentEmbeddingsResponseTranslator

İstemci

trustgraph-base/trustgraph/base/document_embeddings_client.py - chunk_id'leri döndür

Python SDK/API

trustgraph-base/trustgraph/api/flow.py - document_embeddings_query trustgraph-base/trustgraph/api/socket_client.py - document_embeddings_query trustgraph-base/trustgraph/api/async_flow.py - eğer uygunsa trustgraph-base/trustgraph/api/bulk_client.py - belge gömülerinin içe/dışa aktarımı trustgraph-base/trustgraph/api/async_bulk_client.py - belge gömülerinin içe/dışa aktarımı

Gömme Hizmeti

trustgraph-flow/trustgraph/embeddings/document_embeddings/embeddings.py - chunk_id'yi ilet

Depolama Yazıcıları

trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py

Sorgu Hizmetleri

trustgraph-flow/trustgraph/query/doc_embeddings/qdrant/service.py trustgraph-flow/trustgraph/query/doc_embeddings/milvus/service.py trustgraph-flow/trustgraph/query/doc_embeddings/pinecone/service.py

Ağ Geçidi

trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_query.py trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_export.py trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_import.py

Belge RAG

trustgraph-flow/trustgraph/retrieval/document_rag/rag.py - librarian istemcisini ekle trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py - Garage'dan al

CLI

trustgraph-cli/trustgraph/cli/invoke_document_embeddings.py trustgraph-cli/trustgraph/cli/save_doc_embeds.py trustgraph-cli/trustgraph/cli/load_doc_embeds.py

Faydalar

  1. Tek kaynaklı doğruluk - chunk metni yalnızca Garage'da bulunur.
  2. Vektör depolama alanında azalma.
  3. chunk_id aracılığıyla sorgu zamanı köken bilgisini sağlar.