Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
4.5 KiB
| layout | title | parent |
|---|---|---|
| default | Belge Gömme Parçası Kimliği | Turkish (Beta) |
Belge Gömme Parçası Kimliği
Beta Translation: This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
Genel Bakış
Belge gömme depolaması şu anda parça metnini doğrudan vektör deposu yüküne kaydederek, Garage'da bulunan verilerin çoğaltılmasına neden oluyor. Bu özellik, parça metni depolamasını chunk_id referanslarıyla değiştirmektedir.
Mevcut Durum
@dataclass
class ChunkEmbeddings:
chunk: bytes = b""
vectors: list[list[float]] = field(default_factory=list)
@dataclass
class DocumentEmbeddingsResponse:
error: Error | None = None
chunks: list[str] = field(default_factory=list)
Vektör depolama yükü:
payload={"doc": chunk} # Duplicates Garage content
Tasarım
Şema Değişiklikleri
ChunkEmbeddings - chunk'ı chunk_id ile değiştirin:
@dataclass
class ChunkEmbeddings:
chunk_id: str = ""
vectors: list[list[float]] = field(default_factory=list)
DocumentEmbeddingsResponse - parçalar yerine chunk_id'leri döndür:
@dataclass
class DocumentEmbeddingsResponse:
error: Error | None = None
chunk_ids: list[str] = field(default_factory=list)
Vektör Depolama Veri Yapısı
Tüm depolar (Qdrant, Milvus, Pinecone):
payload={"chunk_id": chunk_id}
Belge RAG Değişiklikleri
Belge RAG işlemcisi, parça içeriğini Garage'dan alır:
# Get chunk_ids from embeddings store
chunk_ids = await self.rag.doc_embeddings_client.query(...)
# Fetch chunk content from Garage
docs = []
for chunk_id in chunk_ids:
content = await self.rag.librarian_client.get_document_content(
chunk_id, self.user
)
docs.append(content)
API/SDK Değişiklikleri
DocumentEmbeddingsClient, chunk_ids değerini döndürür:
return resp.chunk_ids # Changed from resp.chunks
Kablolu format (DocumentEmbeddingsResponseTranslator):
result["chunk_ids"] = obj.chunk_ids # Changed from chunks
CLI Değişiklikleri
CLI aracı, chunk_id'leri gösterir (çağrıcılar, gerekirse içeriği ayrı olarak alabilir).
Değiştirilecek Dosyalar
Şema
trustgraph-base/trustgraph/schema/knowledge/embeddings.py - ChunkEmbeddings
trustgraph-base/trustgraph/schema/services/query.py - DocumentEmbeddingsResponse
Mesajlaşma/Çeviriciler
trustgraph-base/trustgraph/messaging/translators/embeddings_query.py - DocumentEmbeddingsResponseTranslator
İstemci
trustgraph-base/trustgraph/base/document_embeddings_client.py - chunk_id'leri döndür
Python SDK/API
trustgraph-base/trustgraph/api/flow.py - document_embeddings_query
trustgraph-base/trustgraph/api/socket_client.py - document_embeddings_query
trustgraph-base/trustgraph/api/async_flow.py - eğer uygunsa
trustgraph-base/trustgraph/api/bulk_client.py - belge gömülerinin içe/dışa aktarımı
trustgraph-base/trustgraph/api/async_bulk_client.py - belge gömülerinin içe/dışa aktarımı
Gömme Hizmeti
trustgraph-flow/trustgraph/embeddings/document_embeddings/embeddings.py - chunk_id'yi ilet
Depolama Yazıcıları
trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py
trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py
trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py
Sorgu Hizmetleri
trustgraph-flow/trustgraph/query/doc_embeddings/qdrant/service.py
trustgraph-flow/trustgraph/query/doc_embeddings/milvus/service.py
trustgraph-flow/trustgraph/query/doc_embeddings/pinecone/service.py
Ağ Geçidi
trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_query.py
trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_export.py
trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_import.py
Belge RAG
trustgraph-flow/trustgraph/retrieval/document_rag/rag.py - librarian istemcisini ekle
trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py - Garage'dan al
CLI
trustgraph-cli/trustgraph/cli/invoke_document_embeddings.py
trustgraph-cli/trustgraph/cli/save_doc_embeds.py
trustgraph-cli/trustgraph/cli/load_doc_embeds.py
Faydalar
- Tek kaynaklı doğruluk - chunk metni yalnızca Garage'da bulunur.
- Vektör depolama alanında azalma.
- chunk_id aracılığıyla sorgu zamanı köken bilgisini sağlar.