mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
145 lines
4.7 KiB
Markdown
145 lines
4.7 KiB
Markdown
|
|
---
|
||
|
|
layout: default
|
||
|
|
title: "Kitambulisho cha Sehemu ya Matini (Document Embeddings Chunk ID)"
|
||
|
|
parent: "Swahili (Beta)"
|
||
|
|
---
|
||
|
|
|
||
|
|
# Kitambulisho cha Sehemu ya Matini (Document Embeddings Chunk ID)
|
||
|
|
|
||
|
|
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||
|
|
|
||
|
|
## Muhtasari
|
||
|
|
|
||
|
|
Hifadhi ya matini ya maandishi kwa sasa huhifadhi matini ya sehemu moja kwa moja katika sehemu ya data ya hifadhi ya vector, na hivyo kurudia data ambayo ipo katika Garage. Hati hii inabadilisha uhifadhi wa matini ya sehemu kwa kutumia marejeleo ya `chunk_id`.
|
||
|
|
|
||
|
|
## Hali ya Sasa
|
||
|
|
|
||
|
|
```python
|
||
|
|
@dataclass
|
||
|
|
class ChunkEmbeddings:
|
||
|
|
chunk: bytes = b""
|
||
|
|
vectors: list[list[float]] = field(default_factory=list)
|
||
|
|
|
||
|
|
@dataclass
|
||
|
|
class DocumentEmbeddingsResponse:
|
||
|
|
error: Error | None = None
|
||
|
|
chunks: list[str] = field(default_factory=list)
|
||
|
|
```
|
||
|
|
|
||
|
|
Hifadhi ya data ya aina ya vector:
|
||
|
|
```python
|
||
|
|
payload={"doc": chunk} # Duplicates Garage content
|
||
|
|
```
|
||
|
|
|
||
|
|
## Ubunifu
|
||
|
|
|
||
|
|
### Mabadiliko ya Mpango
|
||
|
|
|
||
|
|
**ChunkEmbeddings** - badilisha "chunk" na "chunk_id":
|
||
|
|
```python
|
||
|
|
@dataclass
|
||
|
|
class ChunkEmbeddings:
|
||
|
|
chunk_id: str = ""
|
||
|
|
vectors: list[list[float]] = field(default_factory=list)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Jibu la DocumentEmbeddingsResponse** - irudishe `chunk_ids` badala ya `chunks`:
|
||
|
|
```python
|
||
|
|
@dataclass
|
||
|
|
class DocumentEmbeddingsResponse:
|
||
|
|
error: Error | None = None
|
||
|
|
chunk_ids: list[str] = field(default_factory=list)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Mfumo wa Hifadhi ya Vektor
|
||
|
|
|
||
|
|
Maduka yote (Qdrant, Milvus, Pinecone):
|
||
|
|
```python
|
||
|
|
payload={"chunk_id": chunk_id}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Mabadiliko ya Mchakato wa RAG wa Hati
|
||
|
|
|
||
|
|
Mchakato wa RAG wa hati hupata maudhui ya sehemu kutoka kwa Garage:
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Get chunk_ids from embeddings store
|
||
|
|
chunk_ids = await self.rag.doc_embeddings_client.query(...)
|
||
|
|
|
||
|
|
# Fetch chunk content from Garage
|
||
|
|
docs = []
|
||
|
|
for chunk_id in chunk_ids:
|
||
|
|
content = await self.rag.librarian_client.get_document_content(
|
||
|
|
chunk_id, self.user
|
||
|
|
)
|
||
|
|
docs.append(content)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Mabadiliko ya API/SDK
|
||
|
|
|
||
|
|
**DocumentEmbeddingsClient** hurudia chunk_ids:
|
||
|
|
```python
|
||
|
|
return resp.chunk_ids # Changed from resp.chunks
|
||
|
|
```
|
||
|
|
|
||
|
|
**Muundo wa data** (Mfasiri wa Majibu ya Matangazo ya Hati):
|
||
|
|
```python
|
||
|
|
result["chunk_ids"] = obj.chunk_ids # Changed from chunks
|
||
|
|
```
|
||
|
|
|
||
|
|
### Mabadiliko ya CLI
|
||
|
|
|
||
|
|
Zana ya CLI inaonyesha kitambulisho cha vipande (watumiaji wanaweza kupata maudhui kando ikiwa ni lazima).
|
||
|
|
|
||
|
|
## Faili Zinazohitajika Kubadilishwa
|
||
|
|
|
||
|
|
### Mpango (Schema)
|
||
|
|
`trustgraph-base/trustgraph/schema/knowledge/embeddings.py` - ChunkEmbeddings
|
||
|
|
`trustgraph-base/trustgraph/schema/services/query.py` - DocumentEmbeddingsResponse
|
||
|
|
|
||
|
|
### Ujumbe/Watafsiri
|
||
|
|
`trustgraph-base/trustgraph/messaging/translators/embeddings_query.py` - DocumentEmbeddingsResponseTranslator
|
||
|
|
|
||
|
|
### Mteja (Client)
|
||
|
|
`trustgraph-base/trustgraph/base/document_embeddings_client.py` - rudisha kitambulisho cha vipande
|
||
|
|
|
||
|
|
### SDK/API ya Python
|
||
|
|
`trustgraph-base/trustgraph/api/flow.py` - document_embeddings_query
|
||
|
|
`trustgraph-base/trustgraph/api/socket_client.py` - document_embeddings_query
|
||
|
|
`trustgraph-base/trustgraph/api/async_flow.py` - ikiwa inafaa
|
||
|
|
`trustgraph-base/trustgraph/api/bulk_client.py` - uagizaji/uangamizi wa vipande vya maandishi
|
||
|
|
`trustgraph-base/trustgraph/api/async_bulk_client.py` - uagizaji/uangamizi wa vipande vya maandishi
|
||
|
|
|
||
|
|
### Huduma ya Vipande vya Maandishi (Embeddings Service)
|
||
|
|
`trustgraph-flow/trustgraph/embeddings/document_embeddings/embeddings.py` - pitisha kitambulisho cha kipande
|
||
|
|
|
||
|
|
### Waandishi wa Uhifadhi (Storage Writers)
|
||
|
|
`trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py`
|
||
|
|
`trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py`
|
||
|
|
`trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py`
|
||
|
|
|
||
|
|
### Huduma za Utafutaji (Query Services)
|
||
|
|
`trustgraph-flow/trustgraph/query/doc_embeddings/qdrant/service.py`
|
||
|
|
`trustgraph-flow/trustgraph/query/doc_embeddings/milvus/service.py`
|
||
|
|
`trustgraph-flow/trustgraph/query/doc_embeddings/pinecone/service.py`
|
||
|
|
|
||
|
|
### Lango (Gateway)
|
||
|
|
`trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_query.py`
|
||
|
|
`trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_export.py`
|
||
|
|
`trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_import.py`
|
||
|
|
|
||
|
|
### Utafutaji wa Hati (Document RAG)
|
||
|
|
`trustgraph-flow/trustgraph/retrieval/document_rag/rag.py` - ongeza mteja wa "librarian"
|
||
|
|
`trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py` - pata kutoka "Garage"
|
||
|
|
|
||
|
|
### CLI
|
||
|
|
`trustgraph-cli/trustgraph/cli/invoke_document_embeddings.py`
|
||
|
|
`trustgraph-cli/trustgraph/cli/save_doc_embeds.py`
|
||
|
|
`trustgraph-cli/trustgraph/cli/load_doc_embeds.py`
|
||
|
|
|
||
|
|
## Faida
|
||
|
|
|
||
|
|
1. Chanzo kimoja cha ukweli - maandishi ya vipande tu katika "Garage"
|
||
|
|
2. Kupunguzwa kwa uhifadhi wa hifadhi ya vector
|
||
|
|
3. Inawezesha uhakikisho wa muda wa utafutaji kupitia kitambulisho cha kipande.
|