mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-28 01:46:22 +02:00
Structure the tech specs directory (#836)
Tech spec some subdirectories for different languages
This commit is contained in:
parent
48da6c5f8b
commit
e7efb673ef
423 changed files with 0 additions and 0 deletions
135
docs/tech-specs/sw/__TEMPLATE.sw.md
Normal file
135
docs/tech-specs/sw/__TEMPLATE.sw.md
Normal file
|
|
@ -0,0 +1,135 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Kiufundi vya Kujaza Habari Kupitia Amri ya Kamba"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vipimo vya Kiufundi vya Kujaza Habari Kupitia Amri ya Kamba
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Vipimo hivi vinaelezea interface za amri ya kamba kwa ajili ya kujaza habari katika TrustGraph, na kuwezesha watumiaji kusambaza data kutoka vyanzo mbalimbali kupitia zana za amri ya kamba. Uunganishaji huu unaunga mkono matumizi manne makuu:
|
||||
|
||||
1. **[Matumizi ya 1]**: [Maelezo]
|
||||
2. **[Matumizi ya 2]**: [Maelezo]
|
||||
3. **[Matumizi ya 3]**: [Maelezo]
|
||||
4. **[Matumizi ya 4]**: [Maelezo]
|
||||
|
||||
## Lengo
|
||||
|
||||
- **[Lengo la 1]**: [Maelezo]
|
||||
- **[Lengo la 2]**: [Maelezo]
|
||||
- **[Lengo la 3]**: [Maelezo]
|
||||
- **[Lengo la 4]**: [Maelezo]
|
||||
- **[Lengo la 5]**: [Maelezo]
|
||||
- **[Lengo la 6]**: [Maelezo]
|
||||
- **[Lengo la 7]**: [Maelezo]
|
||||
- **[Lengo la 8]**: [Maelezo]
|
||||
|
||||
## Asili
|
||||
|
||||
[Eleza hali ya sasa na vikwazo ambavyo vipimo hivi vinashughulikia]
|
||||
|
||||
Vikwazo vya sasa ni pamoja na:
|
||||
- [Vikwazo 1]
|
||||
- [Vikwazo 2]
|
||||
- [Vikwazo 3]
|
||||
- [Vikwazo 4]
|
||||
|
||||
Vipimo hivi vinashughulikia pengo hizi kwa [maelezo]. Kwa [uwezo], TrustGraph inaweza:
|
||||
- [Faida 1]
|
||||
- [Faida 2]
|
||||
- [Faida 3]
|
||||
- [Faida 4]
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Kujaza habari kupitia amri ya kamba inahitaji vipengele vifuatavyo vya kiufundi:
|
||||
|
||||
1. **[Kipengele cha 1]**
|
||||
- [Maelezo ya utendaji wa kipengele]
|
||||
- [Sifa muhimu]
|
||||
- [Maeneo ya kuunganisha]
|
||||
|
||||
Moduli: [njia-ya-moduli]
|
||||
|
||||
2. **[Kipengele cha 2]**
|
||||
- [Maelezo ya utendaji wa kipengele]
|
||||
- [Sifa muhimu]
|
||||
- [Maeneo ya kuunganisha]
|
||||
|
||||
Moduli: [njia-ya-moduli]
|
||||
|
||||
3. **[Kipengele cha 3]**
|
||||
- [Maelezo ya utendaji wa kipengele]
|
||||
- [Sifa muhimu]
|
||||
- [Maeneo ya kuunganisha]
|
||||
|
||||
Moduli: [njia-ya-moduli]
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
#### [Mifano ya Data 1]
|
||||
|
||||
[Maelezo ya mfano wa data na muundo]
|
||||
|
||||
Mfano:
|
||||
```
|
||||
[Example data structure]
|
||||
```
|
||||
|
||||
Mbinu hii inaruhusu:
|
||||
- [Faida 1]
|
||||
- [Faida 2]
|
||||
- [Faida 3]
|
||||
- [Faida 4]
|
||||
|
||||
### API
|
||||
|
||||
API mpya:
|
||||
- [Maelezo ya API 1]
|
||||
- [Maelezo ya API 2]
|
||||
- [Maelezo ya API 3]
|
||||
|
||||
API zilizobadilishwa:
|
||||
- [API iliyobadilishwa 1] - [Maelezo ya mabadiliko]
|
||||
- [API iliyobadilishwa 2] - [Maelezo ya mabadiliko]
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
[Mbinu na miongozo ya utendaji]
|
||||
|
||||
[Maelezo ya ziada ya utendaji]
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
[Masuala ya usalama maalum kwa utendaji huu]
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
[Masuala ya utendaji na vizuizi vinavyowezekana]
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
[Mbinu na mkakati wa majaribio]
|
||||
|
||||
## Mpango wa Uhamisho
|
||||
|
||||
[Mkakati wa uhamisho ikiwa unafaa]
|
||||
|
||||
## Ratiba
|
||||
|
||||
[Habari ya ratiba ikiwa imebainishwa]
|
||||
|
||||
## Maswali Yaliyobaki
|
||||
|
||||
- [Swali lililobaki 1]
|
||||
- [Swali lililobaki 2]
|
||||
|
||||
## Marejeleo
|
||||
|
||||
[Marejeleo ikiwa yanafaa]
|
||||
280
docs/tech-specs/sw/agent-explainability.sw.md
Normal file
280
docs/tech-specs/sw/agent-explainability.sw.md
Normal file
|
|
@ -0,0 +1,280 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Ufafanuzi wa Mwakala: Urekodaji wa Asili"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Ufafanuzi wa Mwakala: Urekodaji wa Asili
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Ongeza urekodaji wa asili kwenye mzunguko wa wakala wa React ili vipindi vya wakala viweze kufuatiliwa na kurekebishwa kwa kutumia miundomino sawa ya ufafanuzi kama GraphRAG.
|
||||
|
||||
**Maamuzi ya Ubunifu:**
|
||||
- Andika kwenye `urn:graph:retrieval` (picha ya ufafanuzi ya jumla)
|
||||
- Mnyororo wa utegemezi wa mstari kwa sasa (uchambuzi N → ilitokana na → uchambuzi N-1)
|
||||
- Zana ni masanduku meusi (rekodi tu ingizo/patto)
|
||||
- Usaidizi wa DAG umeahirishwa hadi toleo la baadaye
|
||||
|
||||
## Aina za Vitambulisho
|
||||
|
||||
GraphRAG na Agent hutumia PROV-O kama ontolojia ya msingi na aina za ziada maalum za TrustGraph:
|
||||
|
||||
### Aina za GraphRAG
|
||||
| Vitambulisho | Aina ya PROV-O | Aina za TG | Maelezo |
|
||||
|--------|-------------|----------|-------------|
|
||||
| Swali | `prov:Activity` | `tg:Question`, `tg:GraphRagQuestion` | Uliza wa mtumiaji |
|
||||
| Uchunguzi | `prov:Entity` | `tg:Exploration` | Edges iliyopatikana kutoka kwenye grafu ya maarifa |
|
||||
| Lengo | `prov:Entity` | `tg:Focus` | Edges iliyochaguliwa na hoja |
|
||||
| Muunganisho | `prov:Entity` | `tg:Synthesis` | Jibu la mwisho |
|
||||
|
||||
### Aina za Wakala
|
||||
| Vitambulisho | Aina ya PROV-O | Aina za TG | Maelezo |
|
||||
|--------|-------------|----------|-------------|
|
||||
| Swali | `prov:Activity` | `tg:Question`, `tg:AgentQuestion` | Uliza wa mtumiaji |
|
||||
| Uchambuzi | `prov:Entity` | `tg:Analysis` | Kila mzunguko wa kufikiria/kutenda/kuona |
|
||||
| Hitimisho | `prov:Entity` | `tg:Conclusion` | Jibu la mwisho |
|
||||
|
||||
### Aina za RAG za Hati
|
||||
| Vitambulisho | Aina ya PROV-O | Aina za TG | Maelezo |
|
||||
|--------|-------------|----------|-------------|
|
||||
| Swali | `prov:Activity` | `tg:Question`, `tg:DocRagQuestion` | Uliza wa mtumiaji |
|
||||
| Uchunguzi | `prov:Entity` | `tg:Exploration` | Sehemu zilizopatikana kutoka kwenye duka la hati |
|
||||
| Muunganisho | `prov:Entity` | `tg:Synthesis` | Jibu la mwisho |
|
||||
|
||||
**Kumbuka:** RAG ya Hati hutumia sehemu ya aina za GraphRAG (hakuna hatua ya Lengo kwa sababu hakuna awamu ya uteuzi/hoja ya edge).
|
||||
|
||||
### Aina za Ndogo za Swali
|
||||
|
||||
Aina zote za Swali hushiriki `tg:Question` kama aina ya msingi lakini zina aina maalum ili kutambua utaratibu wa urejesho:
|
||||
|
||||
| Aina | Mfumo wa URI | Utaratibu |
|
||||
|---------|-------------|-----------|
|
||||
| `tg:GraphRagQuestion` | `urn:trustgraph:question:{uuid}` | RAG ya grafu ya maarifa |
|
||||
| `tg:DocRagQuestion` | `urn:trustgraph:docrag:{uuid}` | RAG ya hati/sehemu |
|
||||
| `tg:AgentQuestion` | `urn:trustgraph:agent:{uuid}` | Wakala wa ReAct |
|
||||
|
||||
Hii inaruhusu kuuliza maswali yote kupitia `tg:Question` huku ikiwezesha kuchujwa kwa utaratibu maalum kupitia aina.
|
||||
|
||||
## Mfumo wa Asili
|
||||
|
||||
```
|
||||
Question (urn:trustgraph:agent:{uuid})
|
||||
│
|
||||
│ tg:query = "User's question"
|
||||
│ prov:startedAtTime = timestamp
|
||||
│ rdf:type = prov:Activity, tg:Question
|
||||
│
|
||||
↓ prov:wasDerivedFrom
|
||||
│
|
||||
Analysis1 (urn:trustgraph:agent:{uuid}/i1)
|
||||
│
|
||||
│ tg:thought = "I need to query the knowledge base..."
|
||||
│ tg:action = "knowledge-query"
|
||||
│ tg:arguments = {"question": "..."}
|
||||
│ tg:observation = "Result from tool..."
|
||||
│ rdf:type = prov:Entity, tg:Analysis
|
||||
│
|
||||
↓ prov:wasDerivedFrom
|
||||
│
|
||||
Analysis2 (urn:trustgraph:agent:{uuid}/i2)
|
||||
│ ...
|
||||
↓ prov:wasDerivedFrom
|
||||
│
|
||||
Conclusion (urn:trustgraph:agent:{uuid}/final)
|
||||
│
|
||||
│ tg:answer = "The final response..."
|
||||
│ rdf:type = prov:Entity, tg:Conclusion
|
||||
```
|
||||
|
||||
### Mfumo wa Asili ya Hati ya RAG
|
||||
|
||||
```
|
||||
Question (urn:trustgraph:docrag:{uuid})
|
||||
│
|
||||
│ tg:query = "User's question"
|
||||
│ prov:startedAtTime = timestamp
|
||||
│ rdf:type = prov:Activity, tg:Question
|
||||
│
|
||||
↓ prov:wasGeneratedBy
|
||||
│
|
||||
Exploration (urn:trustgraph:docrag:{uuid}/exploration)
|
||||
│
|
||||
│ tg:chunkCount = 5
|
||||
│ tg:selectedChunk = "chunk-id-1"
|
||||
│ tg:selectedChunk = "chunk-id-2"
|
||||
│ ...
|
||||
│ rdf:type = prov:Entity, tg:Exploration
|
||||
│
|
||||
↓ prov:wasDerivedFrom
|
||||
│
|
||||
Synthesis (urn:trustgraph:docrag:{uuid}/synthesis)
|
||||
│
|
||||
│ tg:content = "The synthesized answer..."
|
||||
│ rdf:type = prov:Entity, tg:Synthesis
|
||||
```
|
||||
|
||||
## Mabadiliko Yanayohitajika
|
||||
|
||||
### 1. Mabadiliko ya Muundo
|
||||
|
||||
**Faili:** `trustgraph-base/trustgraph/schema/services/agent.py`
|
||||
|
||||
Ongeza sehemu za `session_id` na `collection` kwenye `AgentRequest`:
|
||||
```python
|
||||
@dataclass
|
||||
class AgentRequest:
|
||||
question: str = ""
|
||||
state: str = ""
|
||||
group: list[str] | None = None
|
||||
history: list[AgentStep] = field(default_factory=list)
|
||||
user: str = ""
|
||||
collection: str = "default" # NEW: Collection for provenance traces
|
||||
streaming: bool = False
|
||||
session_id: str = "" # NEW: For provenance tracking across iterations
|
||||
```
|
||||
|
||||
**Faidio:** `trustgraph-base/trustgraph/messaging/translators/agent.py`
|
||||
|
||||
Sasisha mtafsiri ili kushughulikia `session_id` na `collection` katika `to_pulsar()` na `from_pulsar()`.
|
||||
|
||||
### 2. Ongeza Mzalishaji wa Ufafanuzi kwa Huduma ya Wakala
|
||||
|
||||
**Faidio:** `trustgraph-flow/trustgraph/agent/react/service.py`
|
||||
|
||||
Sajili "mzalishaji wa ufafanuzi" (mfumo sawa na GraphRAG):
|
||||
```python
|
||||
from ... base import ProducerSpec
|
||||
from ... schema import Triples
|
||||
|
||||
# In __init__:
|
||||
self.register_specification(
|
||||
ProducerSpec(
|
||||
name = "explainability",
|
||||
schema = Triples,
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Uzalishaji wa Mfumo wa Asili
|
||||
|
||||
**Faili:** `trustgraph-base/trustgraph/provenance/agent.py`
|
||||
|
||||
Unda kazi za msaada (kama zile za GraphRAG, kama `question_triples`, `exploration_triples`, n.k.):
|
||||
```python
|
||||
def agent_session_triples(session_uri, query, timestamp):
|
||||
"""Generate triples for agent Question."""
|
||||
return [
|
||||
Triple(s=session_uri, p=RDF_TYPE, o=PROV_ACTIVITY),
|
||||
Triple(s=session_uri, p=RDF_TYPE, o=TG_QUESTION),
|
||||
Triple(s=session_uri, p=TG_QUERY, o=query),
|
||||
Triple(s=session_uri, p=PROV_STARTED_AT_TIME, o=timestamp),
|
||||
]
|
||||
|
||||
def agent_iteration_triples(iteration_uri, parent_uri, thought, action, arguments, observation):
|
||||
"""Generate triples for one Analysis step."""
|
||||
return [
|
||||
Triple(s=iteration_uri, p=RDF_TYPE, o=PROV_ENTITY),
|
||||
Triple(s=iteration_uri, p=RDF_TYPE, o=TG_ANALYSIS),
|
||||
Triple(s=iteration_uri, p=TG_THOUGHT, o=thought),
|
||||
Triple(s=iteration_uri, p=TG_ACTION, o=action),
|
||||
Triple(s=iteration_uri, p=TG_ARGUMENTS, o=json.dumps(arguments)),
|
||||
Triple(s=iteration_uri, p=TG_OBSERVATION, o=observation),
|
||||
Triple(s=iteration_uri, p=PROV_WAS_DERIVED_FROM, o=parent_uri),
|
||||
]
|
||||
|
||||
def agent_final_triples(final_uri, parent_uri, answer):
|
||||
"""Generate triples for Conclusion."""
|
||||
return [
|
||||
Triple(s=final_uri, p=RDF_TYPE, o=PROV_ENTITY),
|
||||
Triple(s=final_uri, p=RDF_TYPE, o=TG_CONCLUSION),
|
||||
Triple(s=final_uri, p=TG_ANSWER, o=answer),
|
||||
Triple(s=final_uri, p=PROV_WAS_DERIVED_FROM, o=parent_uri),
|
||||
]
|
||||
```
|
||||
|
||||
### 4. Ufafanuzi wa Aina
|
||||
|
||||
**Faili:** `trustgraph-base/trustgraph/provenance/namespaces.py`
|
||||
|
||||
Ongeza aina za vitu vya uelewaji na sentensi za wakala:
|
||||
```python
|
||||
# Explainability entity types (used by both GraphRAG and Agent)
|
||||
TG_QUESTION = TG + "Question"
|
||||
TG_EXPLORATION = TG + "Exploration"
|
||||
TG_FOCUS = TG + "Focus"
|
||||
TG_SYNTHESIS = TG + "Synthesis"
|
||||
TG_ANALYSIS = TG + "Analysis"
|
||||
TG_CONCLUSION = TG + "Conclusion"
|
||||
|
||||
# Agent predicates
|
||||
TG_THOUGHT = TG + "thought"
|
||||
TG_ACTION = TG + "action"
|
||||
TG_ARGUMENTS = TG + "arguments"
|
||||
TG_OBSERVATION = TG + "observation"
|
||||
TG_ANSWER = TG + "answer"
|
||||
```
|
||||
|
||||
## Faili Yaliyobadilishwa
|
||||
|
||||
| Faili | Mabadiliko |
|
||||
|------|--------|
|
||||
| `trustgraph-base/trustgraph/schema/services/agent.py` | Ongeza `session_id` na `collection` kwenye `AgentRequest` |
|
||||
| `trustgraph-base/trustgraph/messaging/translators/agent.py` | Sasisha `translator` kwa ajili ya sehemu mpya |
|
||||
| `trustgraph-base/trustgraph/provenance/namespaces.py` | Ongeza aina za `entity`, `agent predicates`, na `Document RAG predicates` |
|
||||
| `trustgraph-base/trustgraph/provenance/triples.py` | Ongeza aina za `TG` kwenye `GraphRAG triple builders`, ongeza `Document RAG triple builders` |
|
||||
| `trustgraph-base/trustgraph/provenance/uris.py` | Ongeza `Document RAG URI generators` |
|
||||
| `trustgraph-base/trustgraph/provenance/__init__.py` | Export aina mpya, `predicates`, na `Document RAG functions` |
|
||||
| `trustgraph-base/trustgraph/schema/services/retrieval.py` | Ongeza `explain_id` na `explain_graph` kwenye `DocumentRagResponse` |
|
||||
| `trustgraph-base/trustgraph/messaging/translators/retrieval.py` | Sasisha `DocumentRagResponseTranslator` kwa ajili ya sehemu za `explainability` |
|
||||
| `trustgraph-flow/trustgraph/agent/react/service.py` | Ongeza mzalishaji wa `explainability` + mantiki ya kurekodi |
|
||||
| `trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py` | Ongeza `explainability callback` na toa `provenance triples` |
|
||||
| `trustgraph-flow/trustgraph/retrieval/document_rag/rag.py` | Ongeza mzalishaji wa `explainability` na uunganishe `callback` |
|
||||
| `trustgraph-cli/trustgraph/cli/show_explain_trace.py` | Shirikisha aina za `agent trace` |
|
||||
| `trustgraph-cli/trustgraph/cli/list_explain_traces.py` | Orodha `agent sessions` pamoja na `GraphRAG` |
|
||||
|
||||
## Faili Zilizoundwa
|
||||
|
||||
| Faili | Madhumuni |
|
||||
|------|---------|
|
||||
| `trustgraph-base/trustgraph/provenance/agent.py` | Wazalishaji wa `triple` maalum kwa `agent` |
|
||||
|
||||
## Mabadiliko ya CLI
|
||||
|
||||
**Kugundua:** Maswali ya `GraphRAG` na `Agent` yana aina ya `tg:Question`. Hutofautishwa na:
|
||||
1. Mfumo wa `URI`: `urn:trustgraph:agent:` dhidi ya `urn:trustgraph:question:`
|
||||
2. Vipengele vilivyotokana: `tg:Analysis` (`agent`) dhidi ya `tg:Exploration` (`GraphRAG`)
|
||||
|
||||
**`list_explain_traces.py`:**
|
||||
- Inaonyesha safu ya Aina (Agent vs GraphRAG)
|
||||
|
||||
**`show_explain_trace.py`:**
|
||||
- Hugundua kiotomatiki aina ya `trace`
|
||||
- Uonyesho wa `agent` unaonyesha: Swali → Hatua za uchambuzi → Hitimisho
|
||||
|
||||
## Utangamano na Mifumo ya Zamani
|
||||
|
||||
- `session_id` huenda kwa `""` - maombi ya zamani hufanya kazi, lakini hayata na `provenance`
|
||||
- `collection` huenda kwa `"default"` - `fallback` inayofaa
|
||||
- CLI hushughulikia aina zote za `trace` kwa utulivu
|
||||
|
||||
## Uthibitisho
|
||||
|
||||
```bash
|
||||
# Run an agent query
|
||||
tg-invoke-agent -q "What is the capital of France?"
|
||||
|
||||
# List traces (should show agent sessions with Type column)
|
||||
tg-list-explain-traces -U trustgraph -C default
|
||||
|
||||
# Show agent trace
|
||||
tg-show-explain-trace "urn:trustgraph:agent:xxx"
|
||||
```
|
||||
|
||||
## Kazi Zinazotarajiwa (Sio Katika Mradi Huyu)
|
||||
|
||||
- Utendakazi wa utegemezi wa DAG (wakati uchambuzi N hutumia matokeo kutoka kwa uchambuzi kadhaa uliopita)
|
||||
- Uunganisho wa utambulisho wa zana maalum (KnowledgeQuery → faili yake ya GraphRAG)
|
||||
- Utumaji wa utambulisho wa mtiririko (tumia kwa wakati, sio kwa wingi mwisho)
|
||||
113
docs/tech-specs/sw/architecture-principles.sw.md
Normal file
113
docs/tech-specs/sw/architecture-principles.sw.md
Normal file
|
|
@ -0,0 +1,113 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Msingi wa Usanifu wa Grafu ya Maarifa"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Msingi wa Usanifu wa Grafu ya Maarifa
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Msingi wa 1: Mfumo wa Grafu wa Mada-Kitendawili-Jambo (SPO)
|
||||
**Uamuzi**: Kubali SPO/RDF kama mfumo mkuu wa uwakilishi wa maarifa
|
||||
|
||||
**Sababu**:
|
||||
Hutoa uwezekano mwingi na utangamano na teknolojia za grafu zilizopo
|
||||
Inawezesha tafsiri rahisi kwa lugha zingine za kuuliza grafu (e.g., SPO → Cypher, lakini si kinyume chake)
|
||||
Huunda msingi ambao "unawezesha mengi" ya uwezo wa baadaye
|
||||
Inasaidia uhusiano wa kutoka-kwenye-node (SPO) na uhusiano wa kutoka-kwenye-jambo (RDF)
|
||||
|
||||
**Utendaji**:
|
||||
Muundo mkuu wa data: `node → edge → {node | literal}`
|
||||
Endelea utangamano na viwango vya RDF huku ukiunga mkono operesheni zilizopanuliwa za SPO
|
||||
|
||||
## Msingi wa 2: Uunganishaji wa Asili wa Grafu ya Maarifa na LLM
|
||||
**Uamuzi**: Boresha muundo na operesheni za grafu ya maarifa ili kuendana na mwingiliano wa LLM
|
||||
|
||||
**Sababu**:
|
||||
Matumizi kuu yanahusisha LLM zinazofanya kazi na grafu za maarifa
|
||||
Chaguo za teknolojia za grafu lazima zipende utangamano wa LLM kuliko mambo mengine
|
||||
Inawezesha mchakato wa usindikaji wa lugha ya asili ambao hutumia maarifa yaliyopangwa
|
||||
|
||||
**Utendaji**:
|
||||
Unda schema za grafu ambazo LLM zinaweza kuzielewa vizuri
|
||||
Boresha kwa mifumo ya kawaida ya mwingiliano wa LLM
|
||||
|
||||
## Msingi wa 3: Uramaji wa Grafu kwa Kutumia Uingizwaji
|
||||
**Uamuzi**: Tengeneza uhusiano wa moja kwa moja kutoka maswali ya lugha ya asili hadi node za grafu kupitia uingizwaji
|
||||
|
||||
**Sababu**:
|
||||
Inawezesha njia rahisi iwezekanavyo kutoka swali la NLP hadi uramaji wa grafu
|
||||
Inazuia hatua ngumu za kati za kuunda swali
|
||||
Hutoa uwezo wa utafutaji wa kiufundi ndani ya muundo wa grafu
|
||||
|
||||
**Utendaji**:
|
||||
`NLP Query → Graph Embeddings → Graph Nodes`
|
||||
Endelea uwakilishi wa uingizwaji kwa vyombo vyote vya grafu
|
||||
Unga mlingano wa moja kwa moja wa kiufundi kwa utatuzi wa swali
|
||||
|
||||
## Msingi wa 4: Utatuzi Ulio Msingi wa Vitambulisho vya Ufafu na Ufumbuzi Ulio Msingi wa Vitambulisho
|
||||
**Uamuzi**: Unga uongezaji wa maarifa kwa usindikaji sambamba kwa kutumia utambulisho wa vitu vya ufafu (kanuni ya 80%)
|
||||
|
||||
**Sababu**:
|
||||
**Lengo**: Uongezaji wa mchakato mmoja kwa hali kamili unawezesha utatuzi kamili wa vitu
|
||||
**Ukwereti**: Mahitaji ya uongezaji yanahitaji uwezo wa usindikaji sambamba
|
||||
**Suluhisho la Kompromi**: Unda kwa utambulisho wa vitu vya ufafu katika mchakato uliogawanyika
|
||||
|
||||
**Utendaji**:
|
||||
Unda mitambo ya kuzalisha vitambulisho sawa na vya kipekee katika viboreshaji tofauti vya maarifa
|
||||
Kitu kimoja kinachotajwa katika mchakato tofauti lazima kiwe na kitambulisho kimoja
|
||||
Amini kwamba ~20% ya hali ngumu zinaweza kuhitaji modeli zingine za usindikaji
|
||||
Unda mitambo ya dharura kwa hali ngumu za utatuzi wa vitu
|
||||
|
||||
## Msingi wa 5: Usanifu Ulioendeshwa na Tukio na Uchukuzi-Ulisikilizaji
|
||||
**Uamuzi**: Tengeneza mfumo wa ujumbe wa pub-sub kwa upangaji wa mfumo
|
||||
|
||||
**Sababu**:
|
||||
Inawezesha kuunganishwa kwa huru kati ya uongezaji wa maarifa, uhifadhi, na vipengele vya kuuliza
|
||||
Inasaidia sasisho na arifa za wakati halisi katika mfumo
|
||||
Inawezesha mchakato wa usindikaji uliogawanyika na unaoweza kupanuka
|
||||
|
||||
**Utendaji**:
|
||||
Uunganisho uliodumishwa na ujumbe kati ya vipengele vya mfumo
|
||||
Mito ya matukio kwa sasisho za maarifa, kukamilika kwa uongezaji, na matokeo ya kuuliza
|
||||
|
||||
## Msingi wa 6: Mawasiliano ya Wakala wa Kurejea
|
||||
**Uamuzi**: Unga operesheni za pub-sub za kurejea kwa usindikaji wa wakala
|
||||
|
||||
**Sababu**:
|
||||
Inawezesha mchakato wa wakala wa hali ya juu ambapo wakala wanaweza kuchochea na kujibu kila mmoja
|
||||
Inasaidia njia ngumu za usindikaji wa maarifa
|
||||
Inaruhusu mifumo ya usindikaji ya kurudia na ya mara kwa mara
|
||||
|
||||
**Utendaji**:
|
||||
Mfumo wa pub-sub lazima uweze kushughulikia simu za kurejea kwa usalama
|
||||
Mitambo ya upangaji wa wakala ambayo inazuia mzunguko usio na mwisho
|
||||
Usaidizi wa upangaji wa mchakato wa wakala
|
||||
|
||||
## Msingi wa 7: Uunganishaji wa Duka la Data ya Safu
|
||||
**Uamuzi**: Hakikisha utangamano wa kuuliza na mifumo ya uhifadhi wa safu
|
||||
|
||||
**Sababu**:
|
||||
Inawezesha maswali ya uchambuzi ya ufanisi juu ya data kubwa ya maarifa
|
||||
Inasaidia matumizi ya biashara ya ujasusi na ripoti
|
||||
Huunganisha uwakilishi wa maarifa ya grafu na mchakato wa uchambuzi wa jadi
|
||||
|
||||
**Utendaji**:
|
||||
Safu ya tafsiri ya kuuliza: Maswali ya grafu → Maswali ya safu
|
||||
Mkakati wa uhifadhi wa mchanganyiko unaounga mkono operesheni za grafu na mizigo ya uchambuzi
|
||||
Endelea utendaji wa kuuliza katika pande zote
|
||||
|
||||
--
|
||||
|
||||
## Muhtasari wa Kanuni za Usanifu
|
||||
|
||||
1. **Uwezekano Kwanza**: Mfumo wa SPO hutoa uwezekano mwingi
|
||||
2. **Uongezaji wa LLM**: Maamuzi yote ya usanifu yanafikiria mahitaji ya mwingiliano wa LLM
|
||||
3. **Ufanisi wa Kiufundi**: Uramaji wa moja kwa moja wa uingizwaji hadi node kwa utendaji bora wa swali
|
||||
4. **Uongezaji wa Kimapokeo**: Panga usahihi kamili na uwezo wa usindikaji uliogawanyika
|
||||
5. **Usaidizi wa Vitambulisho**: Ufafu wa vitu na utatuzi wa vitu
|
||||
6. **Mawasiliano ya Wakala**: Usaidizi wa mchakato wa wakala
|
||||
7. **Uunganishaji wa Duka la Data**: Usaidizi wa maswali ya uchambuzi
|
||||
|
||||
Misingi hizi huunda usanifu wa mfumo wa kujua ambao unachanganua umakini wa kinadharia na mahitaji ya utendakazi, ukiwa umeboreshwa kwa ajili ya ujumuishaji wa LLM na usindikaji ulioenelea.
|
||||
339
docs/tech-specs/sw/cassandra-consolidation.sw.md
Normal file
339
docs/tech-specs/sw/cassandra-consolidation.sw.md
Normal file
|
|
@ -0,0 +1,339 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Maelekezo ya Kisaikolojia: Uunganishaji wa Vipengele vya Usanidi wa Cassandra"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Maelekezo ya Kisaikolojia: Uunganishaji wa Vipengele vya Usanidi wa Cassandra
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
**Hali:** Rasimu
|
||||
**Mwandishi:** Msaidizi
|
||||
**Tarehe:** 2024-09-03
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelekezo haya yanashughulikia utofauti katika majina na mifumo ya usanidi kwa vigezo vya muunganisho wa Cassandra katika mfumo wa TrustGraph. Kwa sasa, mifumo miwili tofauti ya majina ya vigezo ipo (`cassandra_*` vs `graph_*`), ambayo husababisha mchanganyiko na ugumu wa matengenezo.
|
||||
|
||||
## Tatizo
|
||||
|
||||
Mfumo wa programu hutumia seti mbili tofauti za vigezo vya usanidi wa Cassandra:
|
||||
|
||||
1. **Moduli za /Config/Library za Maarifa** hutumia:
|
||||
`cassandra_host` (orodha ya seva)
|
||||
`cassandra_user`
|
||||
`cassandra_password`
|
||||
|
||||
2. **Moduli za /Storage za Grafu** hutumia:
|
||||
`graph_host` (seva moja, wakati mwingine hubadilishwa kuwa orodha)
|
||||
`graph_username`
|
||||
`graph_password`
|
||||
|
||||
3. **Uonyeshaji usio sawa wa amri:**
|
||||
Baadhi ya vichakata (e.g., `kg-store`) hazionyeshi mipangilio ya Cassandra kama hoja za amri
|
||||
Vichakata vingine huonyesha kwa majina na muundo tofauti
|
||||
Nakala ya usaidizi haionyeshi maadili chaguo-msingi ya vigezo vya mazingira
|
||||
|
||||
Seti zote mbili za vigezo zinaunganisha na kundi sawa la Cassandra lakini kwa mikataba tofauti ya majina, na kusababisha:
|
||||
Mchanganyiko wa usanidi kwa watumiaji
|
||||
Ongezeko la mzigo wa matengenezo
|
||||
Nyaraka zisizo sawa
|
||||
Uwezekano wa usanidi usio sahihi
|
||||
Uwezo wa kutofanya ubadilishaji wa mipangilio kupitia hoja za amri katika vichakata vingine
|
||||
|
||||
## Suluhisho Lililopendekezwa
|
||||
|
||||
### 1. Kuweka Majina ya Vigezo
|
||||
|
||||
Moduli zote zitatumia majina sawa ya vigezo ya `cassandra_*`:
|
||||
`cassandra_host` - Orodha ya seva (hifadhiwa ndani kama orodha)
|
||||
`cassandra_username` - Jina la mtumiaji kwa uthibitishaji
|
||||
`cassandra_password` - Nenosiri kwa uthibitishaji
|
||||
|
||||
### 2. Hoja za Amri
|
||||
|
||||
Vichakata vyote WILIVYO na kuonyesha usanidi wa Cassandra kupitia hoja za amri:
|
||||
`--cassandra-host` - Orodha iliyoachwa na alama ya mwelekeo wa koma ya seva
|
||||
`--cassandra-username` - Jina la mtumiaji kwa uthibitishaji
|
||||
`--cassandra-password` - Nenosiri kwa uthibitishaji
|
||||
|
||||
### 3. Usaidizi wa Vigezo vya Mazingira
|
||||
|
||||
Ikiwa hoja za amri hazitolewi wazi, mfumo utangalia vigezo vya mazingira:
|
||||
`CASSANDRA_HOST` - Orodha iliyoachwa na alama ya mwelekeo wa koma ya seva
|
||||
`CASSANDRA_USERNAME` - Jina la mtumiaji kwa uthibitishaji
|
||||
`CASSANDRA_PASSWORD` - Nenosiri kwa uthibitishaji
|
||||
|
||||
### 4. Maadili Chaguo-msingi
|
||||
|
||||
Ikiwa hoja za amri wala vigezo vya mazingira hazibainishwi:
|
||||
`cassandra_host` huanguka kwenye `["cassandra"]`
|
||||
`cassandra_username` huanguka kwenye `None` (hakuna uthibitishaji)
|
||||
`cassandra_password` huanguka kwenye `None` (hakuna uthibitishaji)
|
||||
|
||||
### 5. Mahitaji ya Nakala ya Usaidizi
|
||||
|
||||
Pato la `--help` lazima:
|
||||
Kuonyesha maadili ya vigezo vya mazingira kama chaguo-msingi wakati yamepangwa
|
||||
Kamwe kuonyesha maadili ya nenosiri (onyesha `****` au `<set>` badala yake)
|
||||
Kuonyesha wazi utaratibu wa utatuzi katika nakala ya usaidizi
|
||||
|
||||
Mfano wa pato la usaidizi:
|
||||
```
|
||||
--cassandra-host HOST
|
||||
Cassandra host list, comma-separated (default: prod-cluster-1,prod-cluster-2)
|
||||
[from CASSANDRA_HOST environment variable]
|
||||
|
||||
--cassandra-username USERNAME
|
||||
Cassandra username (default: cassandra_user)
|
||||
[from CASSANDRA_USERNAME environment variable]
|
||||
|
||||
--cassandra-password PASSWORD
|
||||
Cassandra password (default: <set from environment>)
|
||||
```
|
||||
|
||||
## Maelezo ya Utendaji
|
||||
|
||||
### Utaratibu wa Uamuzi wa Vigezo
|
||||
|
||||
Kwa kila kiparamu cha Cassandra, utaratibu wa uamuzi utakuwa:
|
||||
1. Thamani ya hoja ya mstari wa amri
|
||||
2. Kigezo cha mazingira (`CASSANDRA_*`)
|
||||
3. Thamani chaguo-msingi
|
||||
|
||||
### Usimamizi wa Kiparamu cha Host
|
||||
|
||||
Kiparamu cha `cassandra_host`:
|
||||
Mstari wa amri unapokea mnyororo ulioachiliwa na alama ya kung'aa: `--cassandra-host "host1,host2,host3"`
|
||||
Kigezo cha mazingira kinapokea mnyororo ulioachiliwa na alama ya kung'aa: `CASSANDRA_HOST="host1,host2,host3"`
|
||||
Daima kuhifadhiwa kama orodha ndani: `["host1", "host2", "host3"]`
|
||||
Host moja: `"localhost"` → inabadilishwa kuwa `["localhost"]`
|
||||
Tayari ni orodha: `["host1", "host2"]` → inatumika kama ilivyo
|
||||
|
||||
### Mantiki ya Uthibitisho
|
||||
|
||||
Uthibitisho utatumika wakati `cassandra_username` na `cassandra_password` zote zimetolewa:
|
||||
```python
|
||||
if cassandra_username and cassandra_password:
|
||||
# Use SSL context and PlainTextAuthProvider
|
||||
else:
|
||||
# Connect without authentication
|
||||
```
|
||||
|
||||
## Faili Zinazohitaji Marekebisho
|
||||
|
||||
### Moduli zinazotumia vigezo vya `graph_*` (zinazohitaji kubadilishwa):
|
||||
`trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/objects/cassandra/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/rows/cassandra/write.py`
|
||||
`trustgraph-flow/trustgraph/query/triples/cassandra/service.py`
|
||||
|
||||
### Moduli zinazotumia vigezo vya `cassandra_*` (zinazohitaji kusasishwa na chaguo-msingi la mazingira):
|
||||
`trustgraph-flow/trustgraph/tables/config.py`
|
||||
`trustgraph-flow/trustgraph/tables/knowledge.py`
|
||||
`trustgraph-flow/trustgraph/tables/library.py`
|
||||
`trustgraph-flow/trustgraph/storage/knowledge/store.py`
|
||||
`trustgraph-flow/trustgraph/cores/knowledge.py`
|
||||
`trustgraph-flow/trustgraph/librarian/librarian.py`
|
||||
`trustgraph-flow/trustgraph/librarian/service.py`
|
||||
`trustgraph-flow/trustgraph/config/service/service.py`
|
||||
`trustgraph-flow/trustgraph/cores/service.py`
|
||||
|
||||
### Faili za Majaribio Zinazohitaji Kusasishwa:
|
||||
`tests/unit/test_cores/test_knowledge_manager.py`
|
||||
`tests/unit/test_storage/test_triples_cassandra_storage.py`
|
||||
`tests/unit/test_query/test_triples_cassandra_query.py`
|
||||
`tests/integration/test_objects_cassandra_integration.py`
|
||||
|
||||
## Mbinu ya Utendaji
|
||||
|
||||
### Hatua ya 1: Unda Msaidizi wa Mpangilio wa Msingi
|
||||
Unda kazi za matumizi ili kuhakikisha mpangilio wa Cassandra ni sawa katika vichakata vyote:
|
||||
|
||||
```python
|
||||
import os
|
||||
import argparse
|
||||
|
||||
def get_cassandra_defaults():
|
||||
"""Get default values from environment variables or fallback."""
|
||||
return {
|
||||
'host': os.getenv('CASSANDRA_HOST', 'cassandra'),
|
||||
'username': os.getenv('CASSANDRA_USERNAME'),
|
||||
'password': os.getenv('CASSANDRA_PASSWORD')
|
||||
}
|
||||
|
||||
def add_cassandra_args(parser: argparse.ArgumentParser):
|
||||
"""
|
||||
Add standardized Cassandra arguments to an argument parser.
|
||||
Shows environment variable values in help text.
|
||||
"""
|
||||
defaults = get_cassandra_defaults()
|
||||
|
||||
# Format help text with env var indication
|
||||
host_help = f"Cassandra host list, comma-separated (default: {defaults['host']})"
|
||||
if 'CASSANDRA_HOST' in os.environ:
|
||||
host_help += " [from CASSANDRA_HOST]"
|
||||
|
||||
username_help = f"Cassandra username"
|
||||
if defaults['username']:
|
||||
username_help += f" (default: {defaults['username']})"
|
||||
if 'CASSANDRA_USERNAME' in os.environ:
|
||||
username_help += " [from CASSANDRA_USERNAME]"
|
||||
|
||||
password_help = "Cassandra password"
|
||||
if defaults['password']:
|
||||
password_help += " (default: <set>)"
|
||||
if 'CASSANDRA_PASSWORD' in os.environ:
|
||||
password_help += " [from CASSANDRA_PASSWORD]"
|
||||
|
||||
parser.add_argument(
|
||||
'--cassandra-host',
|
||||
default=defaults['host'],
|
||||
help=host_help
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--cassandra-username',
|
||||
default=defaults['username'],
|
||||
help=username_help
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--cassandra-password',
|
||||
default=defaults['password'],
|
||||
help=password_help
|
||||
)
|
||||
|
||||
def resolve_cassandra_config(args) -> tuple[list[str], str|None, str|None]:
|
||||
"""
|
||||
Convert argparse args to Cassandra configuration.
|
||||
|
||||
Returns:
|
||||
tuple: (hosts_list, username, password)
|
||||
"""
|
||||
# Convert host string to list
|
||||
if isinstance(args.cassandra_host, str):
|
||||
hosts = [h.strip() for h in args.cassandra_host.split(',')]
|
||||
else:
|
||||
hosts = args.cassandra_host
|
||||
|
||||
return hosts, args.cassandra_username, args.cassandra_password
|
||||
```
|
||||
|
||||
### Awamu ya 2: Sasisha Moduli Ukitumia Vigezo vya `graph_*`
|
||||
1. Badilisha majina ya vigezo kutoka `graph_*` hadi `cassandra_*`
|
||||
2. Badilisha mbinu (methods) maalum za `add_args()` kwa mbinu za kawaida za `add_cassandra_args()`
|
||||
3. Tumia kazi (functions) za kawaida za usaidizi wa usanidi
|
||||
4. Sasisha maandishi ya utangazaji (documentation strings)
|
||||
|
||||
Mfano wa mabadiliko:
|
||||
```python
|
||||
# OLD CODE
|
||||
@staticmethod
|
||||
def add_args(parser):
|
||||
parser.add_argument(
|
||||
'-g', '--graph-host',
|
||||
default="localhost",
|
||||
help=f'Graph host (default: localhost)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--graph-username',
|
||||
default=None,
|
||||
help=f'Cassandra username'
|
||||
)
|
||||
|
||||
# NEW CODE
|
||||
@staticmethod
|
||||
def add_args(parser):
|
||||
FlowProcessor.add_args(parser)
|
||||
add_cassandra_args(parser) # Use standard helper
|
||||
```
|
||||
|
||||
### Awamu ya 3: Sasisha Moduli Ukitumia Vigezo vya `cassandra_*`
|
||||
1. Ongeza uunganisha wa hoja za mstari wa amri ambapo haipo (k.m., `kg-store`)
|
||||
2. Badilisha ufafanuzi wa hoja zilizopo kwa `add_cassandra_args()`
|
||||
3. Tumia `resolve_cassandra_config()` kwa utaratibu thabiti
|
||||
4. Hakikisha utunzaji thabiti wa orodha ya seva
|
||||
|
||||
### Awamu ya 4: Sasisha Vipimo na Nyaraka
|
||||
1. Sasisha faili zote za vipimo ili zitumie majina mapya ya vigezo
|
||||
2. Sasisha nyaraka za CLI
|
||||
3. Sasisha nyaraka za API
|
||||
4. Ongeza nyaraka za vigezo vya mazingira
|
||||
|
||||
## Ulinganishaji na Mifumo ya Zamani
|
||||
|
||||
Ili kudumisha ulinganishaji na mifumo ya zamani wakati wa mabadiliko:
|
||||
|
||||
1. **Maonyo ya kutolewa nje** kwa vigezo vya `graph_*`
|
||||
2. **Ujumuishaji wa vigezo** - kukubali majina ya zamani na mapya awali
|
||||
3. **Utoaji wa hatua kwa hatua** katika matoleo mengi
|
||||
4. **Sasisho za nyaraka** pamoja na mwongozo wa uhamishaji
|
||||
|
||||
Mfano wa msimbo wa ulinganishaji na mifumo ya zamani:
|
||||
```python
|
||||
def __init__(self, **params):
|
||||
# Handle deprecated graph_* parameters
|
||||
if 'graph_host' in params:
|
||||
warnings.warn("graph_host is deprecated, use cassandra_host", DeprecationWarning)
|
||||
params.setdefault('cassandra_host', params.pop('graph_host'))
|
||||
|
||||
if 'graph_username' in params:
|
||||
warnings.warn("graph_username is deprecated, use cassandra_username", DeprecationWarning)
|
||||
params.setdefault('cassandra_username', params.pop('graph_username'))
|
||||
|
||||
# ... continue with standard resolution
|
||||
```
|
||||
|
||||
## Mbinu ya Majaribio
|
||||
|
||||
1. **Majaribio ya kitengo** kwa mantiki ya utatuzi wa usanidi
|
||||
2. **Majaribio ya ujumuishaji** na mchanganyiko mbalimbali wa usanidi
|
||||
3. **Majaribio ya vigezo vya mazingira**
|
||||
4. **Majaribio ya utangamano wa nyuma** na vigezo vilivyotolewa
|
||||
5. **Majaribio ya Docker compose** na vigezo vya mazingira
|
||||
|
||||
## Sasisho za Nyaraka
|
||||
|
||||
1. Sasisha nyaraka zote za amri za CLI
|
||||
2. Sasisha nyaraka za API
|
||||
3. Unda mwongozo wa uhamishaji
|
||||
4. Sasisha mifano ya Docker compose
|
||||
5. Sasisha nyaraka za kumbukumbu ya usanidi
|
||||
|
||||
## Hatari na Kupunguza Madhara
|
||||
|
||||
| Hatari | Athari | Kupunguza Madhara |
|
||||
|------|--------|------------|
|
||||
| Mabadiliko yanayoweza kusababisha matatizo kwa watumiaji | Ya juu | Tekeleza kipindi cha utangamano wa nyuma |
|
||||
| Uchanganyifu wa usanidi wakati wa mabadiliko | Ya kati | Nyaraka wazi na onyo la kutolewa |
|
||||
| Kushindwa kwa majaribio | Ya kati | Sasisho kamili ya majaribio |
|
||||
| Matatizo ya usakinishaji wa Docker | Ya juu | Sasisha mifano yote ya Docker compose |
|
||||
|
||||
## Vigezo vya Mafanikio
|
||||
|
||||
[ ] Moduli zote hutumia majina ya vigezo `cassandra_*` yanayofanana
|
||||
[ ] Wasindikaji wote huonyesha mipangilio ya Cassandra kupitia hoja za mstari wa amri
|
||||
[ ] Nakala ya msaada wa mstari wa amri inaonyesha chaguo-msingi ya vigezo vya mazingira
|
||||
[ ] Maelezo ya nenosiri hayajaonyeshwa katika nakala ya msaada
|
||||
[ ] Mfumo wa kurudisha nyuma wa vigezo vya mazingira unafanya kazi vizuri
|
||||
[ ] `cassandra_host` inashughulikiwa kwa utaratibu kama orodha ndani
|
||||
[ ] Utangamano wa nyuma umeendelezwa kwa angalau matoleo 2
|
||||
[ ] Majaribio yote hupita na mfumo mpya wa usanidi
|
||||
[ ] Nyaraka zimesasishwa kikamilifu
|
||||
[ ] Mifano ya Docker compose inafanya kazi na vigezo vya mazingira
|
||||
|
||||
## Ratiba
|
||||
|
||||
**Wiki ya 1:** Tekeleza kusaidia usanidi wa kawaida na sasisha moduli za `graph_*`
|
||||
**Wiki ya 2:** Ongeza usaidizi wa vigezo vya mazingira kwa moduli zilizopo za `cassandra_*`
|
||||
**Wiki ya 3:** Sasisha majaribio na nyaraka
|
||||
**Wiki ya 4:** Majaribio ya ujumuishaji na urekebishaji wa hitilafu
|
||||
|
||||
## Mambo ya Kuzingatia ya Baadaye
|
||||
|
||||
Fikiria kuongeza muundo huu kwa usanidi mwingine wa hifadhidata (e.g., Elasticsearch)
|
||||
Tekeleza uthibitisho wa usanidi na ujumbe bora wa kosa
|
||||
Ongeza usaidizi wa usanidi wa muunganisho wa Cassandra (e.g., pooli)
|
||||
Fikiria kuongeza usaidizi wa faili za usanidi (.env files)
|
||||
687
docs/tech-specs/sw/cassandra-performance-refactor.sw.md
Normal file
687
docs/tech-specs/sw/cassandra-performance-refactor.sw.md
Normal file
|
|
@ -0,0 +1,687 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Maelezo ya Kiufundi: Uboreshaji wa Utendaji wa Hifadhidata ya Maarifa ya Cassandra"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Maelezo ya Kiufundi: Uboreshaji wa Utendaji wa Hifadhidata ya Maarifa ya Cassandra
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
**Hali:** Rasimu
|
||||
**Mwandishi:** Msaidizi
|
||||
**Tarehe:** 2025-09-18
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelezo haya yanashughulikia masuala ya utendaji katika utekelezaji wa hifadhidata ya maarifa ya TrustGraph ya Cassandra na yanapendekeza uboreshaji kwa uhifadhi na utafutaji wa data ya RDF.
|
||||
|
||||
## Utendaji wa Sasa
|
||||
|
||||
### Muundo wa Skimu
|
||||
|
||||
Utendaji wa sasa hutumia muundo wa jedwali moja katika `trustgraph-flow/trustgraph/direct/cassandra_kg.py`:
|
||||
|
||||
```sql
|
||||
CREATE TABLE triples (
|
||||
collection text,
|
||||
s text,
|
||||
p text,
|
||||
o text,
|
||||
PRIMARY KEY (collection, s, p, o)
|
||||
);
|
||||
```
|
||||
|
||||
**Faharasa Pili:**
|
||||
`triples_s` KWA `s` (somo)
|
||||
`triples_p` KWA `p` (kitenzi)
|
||||
`triples_o` KWA `o` (kielele)
|
||||
|
||||
### Mifumo ya Umasiliano
|
||||
|
||||
Utaratibu wa sasa unaoendeshwa unao na mifumo 8 tofauti ya masiliano:
|
||||
|
||||
1. **get_all(mkusanyiko, kikomo=50)** - Pata vitriple vyote kwa mkusanyiko
|
||||
```sql
|
||||
SELECT s, p, o FROM triples WHERE collection = ? LIMIT 50
|
||||
```
|
||||
|
||||
2. **get_s(collection, s, limit=10)** - Utafiti kwa mada.
|
||||
```sql
|
||||
SELECT p, o FROM triples WHERE collection = ? AND s = ? LIMIT 10
|
||||
```
|
||||
|
||||
3. **get_p(collection, p, limit=10)** - Utafiti kwa kutumia vigezo.
|
||||
```sql
|
||||
SELECT s, o FROM triples WHERE collection = ? AND p = ? LIMIT 10
|
||||
```
|
||||
|
||||
4. **get_o(collection, o, limit=10)** - Utafiti kwa kutumia kitu.
|
||||
```sql
|
||||
SELECT s, p FROM triples WHERE collection = ? AND o = ? LIMIT 10
|
||||
```
|
||||
|
||||
5. **get_sp(collection, s, p, limit=10)** - Utafiti kwa mada + predikati
|
||||
```sql
|
||||
SELECT o FROM triples WHERE collection = ? AND s = ? AND p = ? LIMIT 10
|
||||
```
|
||||
|
||||
6. **get_po(collection, p, o, limit=10)** - Utafiti kwa kutumia vigezo na kitu ⚠️
|
||||
```sql
|
||||
SELECT s FROM triples WHERE collection = ? AND p = ? AND o = ? LIMIT 10 ALLOW FILTERING
|
||||
```
|
||||
|
||||
7. **get_os(collection, o, s, limit=10)** - Utafiti kwa kutumia kitu pamoja na mada ⚠️
|
||||
```sql
|
||||
SELECT p FROM triples WHERE collection = ? AND o = ? AND s = ? LIMIT 10 ALLOW FILTERING
|
||||
```
|
||||
|
||||
8. **get_spo(collection, s, p, o, limit=10)** - Mechi kamili ya triple.
|
||||
```sql
|
||||
SELECT s as x FROM triples WHERE collection = ? AND s = ? AND p = ? AND o = ? LIMIT 10
|
||||
```
|
||||
|
||||
### Muundo wa Sasa
|
||||
|
||||
**Faili: `trustgraph-flow/trustgraph/direct/cassandra_kg.py`**
|
||||
Darasa moja la `KnowledgeGraph` linaloshughulikia shughuli zote
|
||||
Uunganisho wa kikundi kupitia orodha ya kimataifa ya `_active_clusters`
|
||||
Jina la jedwali lililobainishwa: `"triples"`
|
||||
Spishi kwa kila mfumo wa mtumiaji
|
||||
Nakala ya SimpleStrategy kwa sababu 1
|
||||
|
||||
**Maeneo ya Uunganisho:**
|
||||
**Njia ya Kuandika:** `trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`
|
||||
**Njia ya Umasilisho:** `trustgraph-flow/trustgraph/query/triples/cassandra/service.py`
|
||||
**Hifadhi ya Maarifa:** `trustgraph-flow/trustgraph/tables/knowledge.py`
|
||||
|
||||
## Matatizo ya Utendaji Yanayobainika
|
||||
|
||||
### Matatizo ya Ngazi ya Muundo
|
||||
|
||||
1. **Muundo Usiofaa wa Ufunguo Mkuu**
|
||||
Sasa: `PRIMARY KEY (collection, s, p, o)`
|
||||
Hupelekea uwekaji duni wa data kwa mifumo ya kawaida ya ufikiaji
|
||||
Inahitaji matumizi ya gharama kubwa ya fahirisi za sekondari
|
||||
|
||||
2. **Matumizi Mengi ya Fahirisi za Sekondari** ⚠️
|
||||
Fahirisi tatu za sekondari kwenye safu zenye maadili mengi (s, p, o)
|
||||
Fahirisi za sekondari katika Cassandra ni ghali na hazipunguzi kasi vizuri
|
||||
Maswali 6 na 7 yanahitaji `ALLOW FILTERING`, ambayo inaonyesha muundo duni wa data
|
||||
|
||||
3. **Hatari ya Sehemu Zenye Trafiki Kubwa**
|
||||
Ufunguo mmoja wa sehemu `collection` unaweza kuunda sehemu zenye trafiki kubwa
|
||||
Mkusanyiko mkubwa utajikuta katika nodi moja
|
||||
Hakuna mkakati wa usambazaji wa mizigo
|
||||
|
||||
### Matatizo ya Ngazi ya Umasilisho
|
||||
|
||||
1. **Matumizi ya ALLOW FILTERING** ⚠️
|
||||
Aina mbili za maswali (get_po, get_os) zinahitaji `ALLOW FILTERING`
|
||||
Maswali haya husifia sehemu nyingi na ni ghali sana
|
||||
Utendaji unapungua kwa kasi kadri ya ukubwa wa data
|
||||
|
||||
2. **Mifumo ya Ufikiaji Yasiyo na Ufanisi**
|
||||
Hakuna uboreshaji kwa mifumo ya kawaida ya maswali ya RDF
|
||||
Hakuna fahirisi za pamoja kwa mchanganyiko wa maswali unaoonekana mara kwa mara
|
||||
Hakuna utambuzi wa mifumo ya utaftaji wa grafu
|
||||
|
||||
3. **Ukosefu wa Uboreshaji wa Umasilisho**
|
||||
Hakuna kuhifadhi kwa masimulizi yaliyotayarishwa
|
||||
Hakuna vidokezo au mikakati ya uboreshaji wa maswali
|
||||
Hakuna utambuzi wa upangishaji zaidi ya LIMIT rahisi
|
||||
|
||||
## Taarifa ya Tatizo
|
||||
|
||||
Utekelezaji wa sasa wa hifadhi ya maarifa ya Cassandra una matatizo mawili muhimu ya utendaji:
|
||||
|
||||
### 1. Utendaji Usio na Ufanisi wa Maswali ya get_po
|
||||
|
||||
Swali la `get_po(collection, p, o)` halipunguzi kasi kwa sababu linahitaji `ALLOW FILTERING`:
|
||||
|
||||
```sql
|
||||
SELECT s FROM triples WHERE collection = ? AND p = ? AND o = ? LIMIT 10 ALLOW FILTERING
|
||||
```
|
||||
|
||||
**Sababu ya kuwa hii ni tatizo:**
|
||||
`ALLOW FILTERING` inalazimisha Cassandra kuchanganua kila sehemu ndani ya mkusanyiko.
|
||||
Utendaji hupungua kwa mstari sawa na ukubwa wa data.
|
||||
Hii ni muundo wa kawaida wa swali la RDF (kutafuta vitu ambavyo vina uhusiano maalum wa tabia-jambo).
|
||||
Huunda mzigo mkubwa kwenye kundi kadri data inavyoongezeka.
|
||||
|
||||
### 2. Mkakati Usiofaa wa Uwekaji Pamoja
|
||||
|
||||
Ufunguo mkuu wa sasa `PRIMARY KEY (collection, s, p, o)` hutoa faida ndogo katika uwekaji pamoja:
|
||||
|
||||
**Matatizo na uwekaji pamoja wa sasa:**
|
||||
`collection` kama funguo ya sehemu haisambati data kwa ufanisi.
|
||||
Makusanyiko mengi yana data tofauti, na kuifanya uwekaji pamoja kuwa usiofaa.
|
||||
Hakuna utambuzi kwa mifumo ya kawaida ya ufikiaji katika maswali ya RDF.
|
||||
Makusanyiko makubwa huunda sehemu zenye mzigo mwingi kwenye nodi moja.
|
||||
Safu za uwekaji pamoja (s, p, o) haziboreshi kwa mifumo ya kawaida ya utaftaji wa grafu.
|
||||
|
||||
**Athari:**
|
||||
Maswali hayanapata faida kutoka kwa ukaribu wa data.
|
||||
Matumizi duni ya kumbukumbu (cache).
|
||||
Usambazaji usio sawa wa mzigo katika nodi za kundi.
|
||||
Zuio la uwezo wa kupanuka (scalability) kadri makusanyiko yanavyoongezeka.
|
||||
|
||||
## Suluhisho Lililopendekezwa: Mkakati wa Utofauti wa Jedwali 4
|
||||
|
||||
### Muhtasari
|
||||
|
||||
Badilisha jedwali moja `triples` na jedwali nne zilizoundwa kwa madhumuni maalum, kila moja iliyoboreshwa kwa mifumo maalum ya swali. Hii inafutilia hitaji la fahirisi za sekondari na ALLOW FILTERING huku ikiwapa utendaji bora kwa aina zote za swali. Jedwali la nne linaruhusu uondoaji wa makusanyiko kwa ufanisi licha ya funguo za sehemu zilizounganishwa.
|
||||
|
||||
### Muundo Mpya wa Skimu
|
||||
|
||||
**Jedwali la 1: Maswali Yanayozingatia Sijali (triples_s)**
|
||||
```sql
|
||||
CREATE TABLE triples_s (
|
||||
collection text,
|
||||
s text,
|
||||
p text,
|
||||
o text,
|
||||
PRIMARY KEY ((collection, s), p, o)
|
||||
);
|
||||
```
|
||||
**Inaboresha:** get_s, get_sp, get_os
|
||||
**Ufunguo wa Sehemu:** (mkusanyiko, s) - Usambazaji bora kuliko mkusanyiko pekee
|
||||
**Kukusanya:** (p, o) - Huwezesha utafutaji wa ufanisi wa vigezo/vitendo kwa ajili ya somo
|
||||
|
||||
**Jedwali 2: Maswali ya Vigezo-Vitendo (triples_p)**
|
||||
```sql
|
||||
CREATE TABLE triples_p (
|
||||
collection text,
|
||||
p text,
|
||||
o text,
|
||||
s text,
|
||||
PRIMARY KEY ((collection, p), o, s)
|
||||
);
|
||||
```
|
||||
**Inaboresha:** get_p, get_po (inabadilisha ALLOW FILTERING!)
|
||||
**Ufunguo wa Sehemu:** (mkusanyiko, p) - Ufikiaji wa moja kwa moja kupitia kigezo.
|
||||
**Kukusanyika:** (o, s) - Ufuatiliaji wa vitu na masomo unaofaa.
|
||||
|
||||
**Jedwali la 3: Maswali Yanayozingatia Vitu (triples_o)**
|
||||
```sql
|
||||
CREATE TABLE triples_o (
|
||||
collection text,
|
||||
o text,
|
||||
s text,
|
||||
p text,
|
||||
PRIMARY KEY ((collection, o), s, p)
|
||||
);
|
||||
```
|
||||
**Inaboresha:** get_o
|
||||
**Ufunguo wa Sehemu:** (mkusanyiko, o) - Ufikiaji wa moja kwa moja kwa kutumia kitu
|
||||
**Kukusanya:** (s, p) - Ufuatiliaji wa ufanisi wa somo-tabia
|
||||
|
||||
**Jedwali la 4: Usimamizi wa Mkusaniko na Maswali ya SPO (triples_collection)**
|
||||
```sql
|
||||
CREATE TABLE triples_collection (
|
||||
collection text,
|
||||
s text,
|
||||
p text,
|
||||
o text,
|
||||
PRIMARY KEY (collection, s, p, o)
|
||||
);
|
||||
```
|
||||
**Inaboresha:** get_spo, delete_collection
|
||||
**Ufunguo wa Sehemu (Partition Key):** mkusanyiko pekee - Huwezesha operesheni bora za kiwango cha mkusanyiko.
|
||||
**Kukusanyika (Clustering):** (s, p, o) - Mpangilio wa kawaida wa triple.
|
||||
**Madhumuni:** Matumizi mawili, kwa utafutaji sahihi wa SPO na kama faharasa ya kufuta.
|
||||
|
||||
### Ramani ya Utafutaji (Query Mapping)
|
||||
|
||||
| Utafutaji Asili | Jedwali Linalolengwa | Ubora wa Kuboresha |
|
||||
|----------------|-------------|------------------------|
|
||||
| get_all(collection) | triples_s | RUHASA YA KUCHANUA (inayokubalika kwa skani) |
|
||||
| get_s(collection, s) | triples_s | Ufikiaji wa moja kwa moja wa sehemu. |
|
||||
| get_p(collection, p) | triples_p | Ufikiaji wa moja kwa moja wa sehemu. |
|
||||
| get_o(collection, o) | triples_o | Ufikiaji wa moja kwa moja wa sehemu. |
|
||||
| get_sp(collection, s, p) | triples_s | Sehemu + kukusanyika. |
|
||||
| get_po(collection, p, o) | triples_p | **HAKUNA tena RUHASA LA KUCHANUA!** |
|
||||
| get_os(collection, o, s) | triples_o | Sehemu + kukusanyika. |
|
||||
| get_spo(collection, s, p, o) | triples_collection | Utafutaji wa ufunguo wa moja kwa moja. |
|
||||
| delete_collection(collection) | triples_collection | Soma faharasa, futa kwa wingi. |
|
||||
|
||||
### Mkakati wa Kufuta Mkusaniko
|
||||
|
||||
Pamoja na ufunguo wa sehemu mchanganyiko, hatuwezi tu kutekeleza `DELETE FROM table WHERE collection = ?`. Badala yake:
|
||||
|
||||
1. **Awamu ya Kusoma:** Tafuta `triples_collection` ili kuorodhesha triple zote:
|
||||
```sql
|
||||
SELECT s, p, o FROM triples_collection WHERE collection = ?
|
||||
```
|
||||
Hii ni bora kwa sababu `collection` ndiyo ufunguo wa kundi kwa jedwali hili.
|
||||
|
||||
2. **Awamu ya Ufutilishaji:** Kwa kila seti tatu (s, p, o), futa kutoka kwenye meza zote 4 kwa kutumia ufunguo kamili wa kundi:
|
||||
```sql
|
||||
DELETE FROM triples_s WHERE collection = ? AND s = ? AND p = ? AND o = ?
|
||||
DELETE FROM triples_p WHERE collection = ? AND p = ? AND o = ? AND s = ?
|
||||
DELETE FROM triples_o WHERE collection = ? AND o = ? AND s = ? AND p = ?
|
||||
DELETE FROM triples_collection WHERE collection = ? AND s = ? AND p = ? AND o = ?
|
||||
```
|
||||
Imefunganishwa katika makundi ya 100 ili kuongeza ufanisi.
|
||||
|
||||
**Uchambuzi wa Usawa:**
|
||||
✅ Inaendelea kudumisha utendaji bora wa maswali kwa kutumia vipande vilivyogawanywa.
|
||||
✅ Hakuna vipande ambavyo hupita kasi kwa makusanyo makubwa.
|
||||
❌ Mantiki ya kufuta ni ngumu zaidi (soma kisha futa).
|
||||
❌ Muda wa kufuta unalingana na ukubwa wa mkusanyiko.
|
||||
|
||||
### Faida
|
||||
|
||||
1. **Inaondoa ALLOW FILTERING** - Kila swali lina njia bora ya kufikia (isipokuwa skani ya get_all).
|
||||
2. **Hakuna Faharasa za Pili** - Kila jedwali NI faharasa kwa mtindo wake wa swali.
|
||||
3. **Usambazaji Bora wa Data** - Funguo za pamoja za kugawanya zinapanua mzigo kwa ufanisi.
|
||||
4. **Utendaji Unaoweza Kushawishiwa** - Muda wa swali unalingana na ukubwa wa matokeo, sio data jumla.
|
||||
5. **Inatumia Nguvu za Cassandra** - Imeundwa kwa usanifu wa Cassandra.
|
||||
6. **Inaruhusu Ufuta wa Makusanyo** - triples_collection hutumika kama faharasa ya kufuta.
|
||||
|
||||
## Mpango wa Utendaji
|
||||
|
||||
### Faili Zinazohitaji Marekebisho
|
||||
|
||||
#### Faili Kuu ya Utendaji
|
||||
|
||||
**`trustgraph-flow/trustgraph/direct/cassandra_kg.py`** - Inahitajika kuandikwa upya kabisa.
|
||||
|
||||
**Mbinu Zinazohitajika Kubadilishwa:**
|
||||
```python
|
||||
# Schema initialization
|
||||
def init(self) -> None # Replace single table with three tables
|
||||
|
||||
# Insert operations
|
||||
def insert(self, collection, s, p, o) -> None # Write to all three tables
|
||||
|
||||
# Query operations (API unchanged, implementation optimized)
|
||||
def get_all(self, collection, limit=50) # Use triples_by_subject
|
||||
def get_s(self, collection, s, limit=10) # Use triples_by_subject
|
||||
def get_p(self, collection, p, limit=10) # Use triples_by_po
|
||||
def get_o(self, collection, o, limit=10) # Use triples_by_object
|
||||
def get_sp(self, collection, s, p, limit=10) # Use triples_by_subject
|
||||
def get_po(self, collection, p, o, limit=10) # Use triples_by_po (NO ALLOW FILTERING!)
|
||||
def get_os(self, collection, o, s, limit=10) # Use triples_by_subject
|
||||
def get_spo(self, collection, s, p, o, limit=10) # Use triples_by_subject
|
||||
|
||||
# Collection management
|
||||
def delete_collection(self, collection) -> None # Delete from all three tables
|
||||
```
|
||||
|
||||
#### Faili za Uunganishaji (Hakuna Mabadiliko ya Mantiki Yanayohitajika)
|
||||
|
||||
**`trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`**
|
||||
Hakuna mabadiliko yanayohitajika - hutumia API ya KnowledgeGraph iliyopo
|
||||
Inafaidika moja kwa moja kutoka kwa uboreshaji wa utendaji
|
||||
|
||||
**`trustgraph-flow/trustgraph/query/triples/cassandra/service.py`**
|
||||
Hakuna mabadiliko yanayohitajika - hutumia API ya KnowledgeGraph iliyopo
|
||||
Inafaidika moja kwa moja kutoka kwa uboreshaji wa utendaji
|
||||
|
||||
### Faili za Majaribio Zinazohitaji Mabadiliko
|
||||
|
||||
#### Majaribio ya Kitengo
|
||||
**`tests/unit/test_storage/test_triples_cassandra_storage.py`**
|
||||
Sasisha matarajio ya majaribio kwa mabadiliko ya schema
|
||||
Ongeza majaribio kwa utangamano wa meza nyingi
|
||||
Hakikisha hakuna ALLOW FILTERING katika mipango ya swali
|
||||
|
||||
**`tests/unit/test_query/test_triples_cassandra_query.py`**
|
||||
Sasisha madai ya utendaji
|
||||
Jaribu mifumo yote 8 ya swali dhidi ya meza mpya
|
||||
Hakikisha uelekezaji wa swali hadi meza sahihi
|
||||
|
||||
#### Majaribio ya Uunganishaji
|
||||
**`tests/integration/test_cassandra_integration.py`**
|
||||
Majaribio ya mwisho na schema mpya
|
||||
Ulinganisho wa benchmarking wa utendaji
|
||||
Uthibitisho wa utangamano wa data katika meza
|
||||
|
||||
**`tests/unit/test_storage/test_cassandra_config_integration.py`**
|
||||
Sasisha majaribio ya uthibitisho wa schema
|
||||
Jaribu hali za uhamishaji
|
||||
|
||||
### Mkakati wa Utendaji
|
||||
|
||||
#### Awamu ya 1: Schema na Mbinu za Msingi
|
||||
1. **Andika upya mbinu ya `init()`** - Unda meza nne badala ya moja
|
||||
2. **Andika upya mbinu ya `insert()`** - Andika kwa wingi kwenye meza zote nne
|
||||
3. **Teleza taarifa zilizotayarishwa** - Kwa utendaji bora
|
||||
4. **Ongeza mantiki ya uelekezaji wa meza** - Elekeza maswali hadi meza bora
|
||||
5. **Teleza uondoaji wa mkusanyiko** - Soma kutoka kwa triples_collection, ondoa kwa wingi kutoka kwenye meza zote
|
||||
|
||||
#### Awamu ya 2: Uboreshaji wa Mbinu ya Swali
|
||||
1. **Andika upya kila mbinu ya get_*** ili itumie meza bora
|
||||
2. **Ondoa matumizi yote ya ALLOW FILTERING**
|
||||
3. **Teleza matumizi bora ya ufunguo wa uwekaji**
|
||||
4. **Ongeza uandikaji wa utendaji wa swali**
|
||||
|
||||
#### Awamu ya 3: Usimamizi wa Mkusaniko
|
||||
1. **Sasisha `delete_collection()`** - Ondoa kutoka kwenye meza zote tatu
|
||||
2. **Ongeza uthibitisho wa utangamano** - Hakikisha meza zote zinaendelea kuwa sawa
|
||||
3. **Teleza shughuli za wingi** - Kwa shughuli za meza nyingi za atomiki
|
||||
|
||||
### Maelezo Muhimu ya Utendaji
|
||||
|
||||
#### Mkakati wa Kuandika kwa Wingi
|
||||
```python
|
||||
def insert(self, collection, s, p, o):
|
||||
batch = BatchStatement()
|
||||
|
||||
# Insert into all four tables
|
||||
batch.add(self.insert_subject_stmt, (collection, s, p, o))
|
||||
batch.add(self.insert_po_stmt, (collection, p, o, s))
|
||||
batch.add(self.insert_object_stmt, (collection, o, s, p))
|
||||
batch.add(self.insert_collection_stmt, (collection, s, p, o))
|
||||
|
||||
self.session.execute(batch)
|
||||
```
|
||||
|
||||
#### Mantiki ya Uelekezaji wa Maswali
|
||||
```python
|
||||
def get_po(self, collection, p, o, limit=10):
|
||||
# Route to triples_p table - NO ALLOW FILTERING!
|
||||
return self.session.execute(
|
||||
self.get_po_stmt,
|
||||
(collection, p, o, limit)
|
||||
)
|
||||
|
||||
def get_spo(self, collection, s, p, o, limit=10):
|
||||
# Route to triples_collection table for exact SPO lookup
|
||||
return self.session.execute(
|
||||
self.get_spo_stmt,
|
||||
(collection, s, p, o, limit)
|
||||
)
|
||||
```
|
||||
|
||||
#### Mantiki ya Ufutilishaji wa Mkusanyiko
|
||||
```python
|
||||
def delete_collection(self, collection):
|
||||
# Step 1: Read all triples from collection table
|
||||
rows = self.session.execute(
|
||||
f"SELECT s, p, o FROM {self.collection_table} WHERE collection = %s",
|
||||
(collection,)
|
||||
)
|
||||
|
||||
# Step 2: Batch delete from all 4 tables
|
||||
batch = BatchStatement()
|
||||
count = 0
|
||||
|
||||
for row in rows:
|
||||
s, p, o = row.s, row.p, row.o
|
||||
|
||||
# Delete using full partition keys for each table
|
||||
batch.add(SimpleStatement(
|
||||
f"DELETE FROM {self.subject_table} WHERE collection = ? AND s = ? AND p = ? AND o = ?"
|
||||
), (collection, s, p, o))
|
||||
|
||||
batch.add(SimpleStatement(
|
||||
f"DELETE FROM {self.po_table} WHERE collection = ? AND p = ? AND o = ? AND s = ?"
|
||||
), (collection, p, o, s))
|
||||
|
||||
batch.add(SimpleStatement(
|
||||
f"DELETE FROM {self.object_table} WHERE collection = ? AND o = ? AND s = ? AND p = ?"
|
||||
), (collection, o, s, p))
|
||||
|
||||
batch.add(SimpleStatement(
|
||||
f"DELETE FROM {self.collection_table} WHERE collection = ? AND s = ? AND p = ? AND o = ?"
|
||||
), (collection, s, p, o))
|
||||
|
||||
count += 1
|
||||
|
||||
# Execute every 100 triples to avoid oversized batches
|
||||
if count % 100 == 0:
|
||||
self.session.execute(batch)
|
||||
batch = BatchStatement()
|
||||
|
||||
# Execute remaining deletions
|
||||
if count % 100 != 0:
|
||||
self.session.execute(batch)
|
||||
|
||||
logger.info(f"Deleted {count} triples from collection {collection}")
|
||||
```
|
||||
|
||||
#### Uboreshaji wa Matamshi Yaliyotayarishwa
|
||||
```python
|
||||
def prepare_statements(self):
|
||||
# Cache prepared statements for better performance
|
||||
self.insert_subject_stmt = self.session.prepare(
|
||||
f"INSERT INTO {self.subject_table} (collection, s, p, o) VALUES (?, ?, ?, ?)"
|
||||
)
|
||||
self.insert_po_stmt = self.session.prepare(
|
||||
f"INSERT INTO {self.po_table} (collection, p, o, s) VALUES (?, ?, ?, ?)"
|
||||
)
|
||||
self.insert_object_stmt = self.session.prepare(
|
||||
f"INSERT INTO {self.object_table} (collection, o, s, p) VALUES (?, ?, ?, ?)"
|
||||
)
|
||||
self.insert_collection_stmt = self.session.prepare(
|
||||
f"INSERT INTO {self.collection_table} (collection, s, p, o) VALUES (?, ?, ?, ?)"
|
||||
)
|
||||
# ... query statements
|
||||
```
|
||||
|
||||
## Mkakati wa Uhamishaji
|
||||
|
||||
### Mbinu ya Uhamishaji wa Data
|
||||
|
||||
#### Chaguo la 1: Uwekaji wa Blue-Green (Inapendekezwa)
|
||||
1. **Weka mfumo mpya pamoja na mfumo uliopo** - Tumia majina tofauti ya jedwali kwa muda
|
||||
2. **Kipindi cha kuandika mara mbili** - Andika kwenye mifumo ya zamani na mipya wakati wa mabadiliko
|
||||
3. **Uhamishaji wa nyuma** - Nakili data iliyopo kwenye jedwali jipya
|
||||
4. **Badilisha maswali** - Elekeza maswali kwenye jedwali jipya baada ya uhamishaji wa data
|
||||
5. **Futa jedwali la zamani** - Baada ya kipindi cha uhakiki
|
||||
|
||||
#### Chaguo la 2: Uhamishaji wa Moja kwa Moja
|
||||
1. **Kuongeza mfumo** - Unda jedwali jipya kwenye eneo la funguo lililopo
|
||||
2. **Skripti ya uhamishaji wa data** - Nakili kwa wingi kutoka kwenye jedwali la zamani hadi kwenye jedwali jipya
|
||||
3. **Sasisho la programu** - Weka programu mpya baada ya uhamishaji kukamilika
|
||||
4. **Kusafisha jedwali la zamani** - Ondoa jedwali la zamani na fahirisi
|
||||
|
||||
### Utangamano wa Nyuma
|
||||
|
||||
#### Mkakati wa Uwekaji
|
||||
```python
|
||||
# Environment variable to control table usage during migration
|
||||
USE_LEGACY_TABLES = os.getenv('CASSANDRA_USE_LEGACY', 'false').lower() == 'true'
|
||||
|
||||
class KnowledgeGraph:
|
||||
def __init__(self, ...):
|
||||
if USE_LEGACY_TABLES:
|
||||
self.init_legacy_schema()
|
||||
else:
|
||||
self.init_optimized_schema()
|
||||
```
|
||||
|
||||
#### Skripti ya Uhamishaji
|
||||
```python
|
||||
def migrate_data():
|
||||
# Read from old table
|
||||
old_triples = session.execute("SELECT collection, s, p, o FROM triples")
|
||||
|
||||
# Batch write to new tables
|
||||
for batch in batched(old_triples, 100):
|
||||
batch_stmt = BatchStatement()
|
||||
for row in batch:
|
||||
# Add to all three new tables
|
||||
batch_stmt.add(insert_subject_stmt, row)
|
||||
batch_stmt.add(insert_po_stmt, (row.collection, row.p, row.o, row.s))
|
||||
batch_stmt.add(insert_object_stmt, (row.collection, row.o, row.s, row.p))
|
||||
session.execute(batch_stmt)
|
||||
```
|
||||
|
||||
### Mbinu ya Uthibitisho
|
||||
|
||||
#### Vipimo vya Ulinganifu wa Data
|
||||
```python
|
||||
def validate_migration():
|
||||
# Count total records in old vs new tables
|
||||
old_count = session.execute("SELECT COUNT(*) FROM triples WHERE collection = ?", (collection,))
|
||||
new_count = session.execute("SELECT COUNT(*) FROM triples_by_subject WHERE collection = ?", (collection,))
|
||||
|
||||
assert old_count == new_count, f"Record count mismatch: {old_count} vs {new_count}"
|
||||
|
||||
# Spot check random samples
|
||||
sample_queries = generate_test_queries()
|
||||
for query in sample_queries:
|
||||
old_result = execute_legacy_query(query)
|
||||
new_result = execute_optimized_query(query)
|
||||
assert old_result == new_result, f"Query results differ for {query}"
|
||||
```
|
||||
|
||||
## Mbinu ya Majaribio
|
||||
|
||||
### Majaribio ya Utendaji
|
||||
|
||||
#### Hali za Majaribio ya Kiwango
|
||||
1. **Ulinganisho wa Utendaji wa Maswali**
|
||||
Vipimo vya utendaji kabla na baada kwa aina zote 8 za maswali
|
||||
Lenga uboreshaji wa utendaji wa `get_po` (ondoa `ALLOW FILTERING`)
|
||||
Pima muda wa maswali chini ya saizi tofauti za data
|
||||
|
||||
2. **Majaribio ya Upakiaji**
|
||||
Utendaji wa maswali kwa wakati mmoja
|
||||
Uwezo wa kuandika na shughuli za kikundi
|
||||
Matumizi ya kumbukumbu na CPU
|
||||
|
||||
3. **Majaribio ya Uwezo wa Kupanuka**
|
||||
Utendaji na saizi zinazoongezeka za mkusanyiko
|
||||
Usambazaji wa maswali ya mkusanyiko mwingi
|
||||
Matumizi ya nodi za kundi
|
||||
|
||||
#### Kijiko cha Majaribio
|
||||
**Kidogo:** 10K ya vitatu kwa kila mkusanyiko
|
||||
**Katikati:** 100K ya vitatu kwa kila mkusanyiko
|
||||
**Kubwa:** 1M+ ya vitatu kwa kila mkusanyiko
|
||||
**Mkusanyiko mwingi:** Jaribu usambazaji wa sehemu
|
||||
|
||||
### Majaribio ya Utendaji
|
||||
|
||||
#### Marekebisho ya Majaribio ya Kitengo
|
||||
```python
|
||||
# Example test structure for new implementation
|
||||
class TestCassandraKGPerformance:
|
||||
def test_get_po_no_allow_filtering(self):
|
||||
# Verify get_po queries don't use ALLOW FILTERING
|
||||
with patch('cassandra.cluster.Session.execute') as mock_execute:
|
||||
kg.get_po('test_collection', 'predicate', 'object')
|
||||
executed_query = mock_execute.call_args[0][0]
|
||||
assert 'ALLOW FILTERING' not in executed_query
|
||||
|
||||
def test_multi_table_consistency(self):
|
||||
# Verify all tables stay in sync
|
||||
kg.insert('test', 's1', 'p1', 'o1')
|
||||
|
||||
# Check all tables contain the triple
|
||||
assert_triple_exists('triples_by_subject', 'test', 's1', 'p1', 'o1')
|
||||
assert_triple_exists('triples_by_po', 'test', 'p1', 'o1', 's1')
|
||||
assert_triple_exists('triples_by_object', 'test', 'o1', 's1', 'p1')
|
||||
```
|
||||
|
||||
#### Sasisho la Mtihani wa Uunganishaji
|
||||
```python
|
||||
class TestCassandraIntegration:
|
||||
def test_query_performance_regression(self):
|
||||
# Ensure new implementation is faster than old
|
||||
old_time = benchmark_legacy_get_po()
|
||||
new_time = benchmark_optimized_get_po()
|
||||
assert new_time < old_time * 0.5 # At least 50% improvement
|
||||
|
||||
def test_end_to_end_workflow(self):
|
||||
# Test complete write -> query -> delete cycle
|
||||
# Verify no performance degradation in integration
|
||||
```
|
||||
|
||||
### Mpango wa Kurudisha Nyuma
|
||||
|
||||
#### Mbinu ya Kurudisha Nyuma Haraka
|
||||
1. **Kubadili jenereta la mazingira** - Rudi kwenye jedwali la zamani mara moja
|
||||
2. **Endelea kutumia jedwali la zamani** - Usifute hadi utendaji uthibitishwe
|
||||
3. **Arifa za ufuatiliaji** - Vinjari vya kiotomatiki vya kurudisha nyuma kulingana na viwango vya makosa/uwezekano wa kuchelewesha
|
||||
|
||||
#### Uthibitisho wa Kurudisha Nyuma
|
||||
```python
|
||||
def rollback_to_legacy():
|
||||
# Set environment variable
|
||||
os.environ['CASSANDRA_USE_LEGACY'] = 'true'
|
||||
|
||||
# Restart services to pick up change
|
||||
restart_cassandra_services()
|
||||
|
||||
# Validate functionality
|
||||
run_smoke_tests()
|
||||
```
|
||||
|
||||
## Hatari na Mambo ya Kuzingatia
|
||||
|
||||
### Hatari za Utendaji
|
||||
**Kuongezeka kwa muda wa kuandika** - Operesheni 4 za kuandika kwa kila kuingiza (33% zaidi kuliko mfumo wa meza 3)
|
||||
**Uongezekaji wa matumizi ya nafasi** - Mahitaji 4 ya nafasi (33% zaidi kuliko mfumo wa meza 3)
|
||||
**Hitilafu za kuandika kwa wingi** - Inahitajika udhibiti wa makosa unaofaa
|
||||
**Uchaguzi wa kufuta** - Kufuta kwa mkusanyiko inahitaji mzunguko wa kusoma na kisha kufuta
|
||||
|
||||
### Hatari za Uendeshaji
|
||||
**Uchaguzi wa uhamisho** - Uhamishaji wa data kwa data kubwa
|
||||
**Changamoto za utangamano** - Kuhakikisha meza zote zinaendelea kusawazishwa
|
||||
**Mapungufu ya ufuatiliaji** - Inahitajika metriki mpya kwa operesheni za meza nyingi
|
||||
|
||||
### Mikakati ya Kupunguza Hatari
|
||||
1. **Uanzishaji wa hatua kwa hatua** - Anza na mkusanyiko mdogo
|
||||
2. **Ufuatiliaji kamili** - Fuatilia metriki zote za utendaji
|
||||
3. **Uthibitisho otomatiki** - Uchunguzi wa utangamano wa mara kwa mara
|
||||
4. **Uwezo wa kurejesha haraka** - Uchaguzi wa meza kulingana na mazingira
|
||||
|
||||
## Vigezo vya Mafanikio
|
||||
|
||||
### Maboresho ya Utendaji
|
||||
[ ] **Kuondoa ALLOW FILTERING** - Maswali ya `get_po` na `get_os` yanatumika bila kuchujwa
|
||||
[ ] **Kupunguza muda wa swali** - Kuboresha kwa 50% au zaidi katika muda wa majibu ya swali
|
||||
**Usambazaji bora wa mzigo** - Hakuna sehemu zenye mzigo mwingi, usambazaji sare katika kila nodi ya kundi
|
||||
[ ] **Utendaji unaoweza kuongezeka** - Muda wa swali unalingana na ukubwa wa matokeo, sio data jumla
|
||||
|
||||
### Mahitaji ya Utendaji
|
||||
[ ] **Ulinganishaji wa API** - Msimbo wote uliopo unaendelea kufanya kazi bila mabadiliko
|
||||
[ ] **Utangamano wa data** - Meza zote tatu zinaendelea kusawazishwa
|
||||
[ ] **Hakuna upotevu wa data** - Uhamishaji unahifadhi triples zote zilizopo
|
||||
[ ] **Ulinganishaji wa nyuma** - Uwezo wa kurejea kwenye mpango wa zamani
|
||||
|
||||
### Mahitaji ya Uendeshaji
|
||||
[ ] **Uhamisho salama** - Uanzishaji wa kijani na bluu na uwezo wa kurejesha
|
||||
[ ] **Mazingira ya ufuatiliaji** - Metri kamili kwa operesheni za meza nyingi
|
||||
[ ] **Mazingira ya majaribio** - Mfumo wote wa maswali umejaribiwa na viwango vya utendaji
|
||||
[ ] **Nyaraka** - Mbinu zilizosasishwa za uanzishaji na uendeshaji
|
||||
|
||||
## Ratiba
|
||||
|
||||
### Awamu ya 1: Utendaji
|
||||
[ ] Andika upya `cassandra_kg.py` na mpango wa meza nyingi
|
||||
[ ] Leta operesheni za kuandika kwa wingi
|
||||
[ ] Ongeza utendaji wa tamko lililoboreshwa
|
||||
[ ] Sasisha vipimo vya kitengo
|
||||
|
||||
### Awamu ya 2: Majaribio ya Uunganisho
|
||||
[ ] Sasisha vipimo vya uunganisho
|
||||
[ ] Vipimo vya utendaji
|
||||
[ ] Majaribio ya mzigo na kiasi cha data ya kweli
|
||||
[ ] Skripti za uthibitisho wa utangamano wa data
|
||||
|
||||
### Awamu ya 3: Upangaji wa Uhamishaji
|
||||
[ ] Skripti za uanzishaji wa kijani na bluu
|
||||
[ ] Zana za uhamishaji wa data
|
||||
[ ] Sasisho za dashibodi ya ufuatiliaji
|
||||
[ ] Taratibu za kurejesha
|
||||
|
||||
### Awamu ya 4: Uanzishaji wa Uzalishaji
|
||||
[ ] Uanzishaji wa hatua kwa hatua katika uzalishaji
|
||||
[ ] Ufuatiliaji na uthibitisho wa utendaji
|
||||
[ ] Usafishaji wa meza za zamani
|
||||
[ ] Sasisho za nyaraka
|
||||
|
||||
## Hitimisho
|
||||
|
||||
Mbinu hii ya kupunguza data katika meza nyingi inashughulikia moja kwa moja matatizo mawili muhimu ya utendaji:
|
||||
|
||||
1. **Inaondoa ALLOW FILTERING iliyogharimu** kwa kutoa miundo bora ya meza kwa kila mfumo wa swali
|
||||
2. **Inaboresha ufanisi wa uwekaji** kupitia ufunguo wa pamoja wa sehemu ambazo husambaza mzigo vizuri
|
||||
|
||||
Mbinu hii inatumia nguvu za Cassandra huku ikiendelea kudumisha ulinganishaji kamili wa API, kuhakikisha kuwa msimbo uliopo unafaidika kiotomatiki kutoka kwa maboresho ya utendaji.
|
||||
410
docs/tech-specs/sw/collection-management.sw.md
Normal file
410
docs/tech-specs/sw/collection-management.sw.md
Normal file
|
|
@ -0,0 +1,410 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Maelekezo ya Ufundi ya Usimamizi wa Mkusanyiko"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Maelekezo ya Ufundi ya Usimamizi wa Mkusanyiko
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelekezo haya yanaelezea uwezo wa usimamizi wa mkusanyiko kwa TrustGraph, ambayo yanahitaji uundaji wa mkusanyiko unaoonekana na yanatoa udhibiti wa moja kwa moja wa mzunguko wa maisha wa mkusanyiko. Mkusanyiko lazima uundwe kwa uwazi kabla ya kutumika, kuhakikisha usawazishaji sahihi kati ya metadata ya msimamizi na kila mfumo wa kuhifadhi. Kipengele hiki kinaunga mkazo kwa matumizi manne makuu:
|
||||
|
||||
1. **Uundaji wa Mkusanyiko**: Unda mkusanyiko kwa uwazi kabla ya kuhifadhi data
|
||||
2. **Orodha ya Mkusanyiko**: Angalia mkusanyiko wote uliopo katika mfumo
|
||||
3. **Usimamizi wa Metadata ya Mkusanyiko**: Sasisha majina, maelezo, na lebo za mkusanyiko
|
||||
4. **Ufutaji wa Mkusanyiko**: Ondoa mkusanyiko na data inayohusiana katika aina zote za uhifadhi
|
||||
|
||||
## Lengo
|
||||
|
||||
**Uundaji wa Mkusanyiko unaoonekana**: Hakikisha mkusanyiko uundwe kabla ya data kuweza kuhifadhiwa
|
||||
**Usawazishaji wa Uhifadhi**: Hakikisha mkusanyiko umezaliwa katika mifumo yote ya uhifadhi (vektali, vitu, triplet)
|
||||
**Uonevu wa Mkusanyiko**: Wasaidie watumiaji kuorodhesha na kuchunguza mkusanyiko wote katika mazingira yao
|
||||
**Usafishaji wa Mkusanyiko**: Waruhusu kuondoa mkusanyiko ambao hauhitajiki tena
|
||||
**Mpangilio wa Mkusanyiko**: Unga lebo na lebo za mada kwa ajili ya ufuatiliaji na ugunduzi bora wa mkusanyiko
|
||||
**Usimamizi wa Metadata**: Unganisha metadata inayoeleweka na mkusanyiko kwa uwazi wa utendaji
|
||||
**Ugunduzi wa Mkusanyiko**: Ifanye iwe rahisi zaidi kupata mkusanyiko maalum kupitia utaratibu na utafutaji
|
||||
**Uwazi wa Utendaji**: Toa uonevu wazi wa mzunguko wa maisha na matumizi ya mkusanyiko
|
||||
**Usimamizi wa Rasilimali**: Wasaidie kusafisha mkusanyiko usiohitajika ili kuongeza matumizi ya rasilimali
|
||||
**Uadilifu wa Data**: Zuia mkusanyiko usio na uhusiano katika uhifadhi bila kufuatilia metadata
|
||||
|
||||
## Asili
|
||||
|
||||
Hapo awali, mkusanyiko katika TrustGraph uliundwa kwa njia isiyoonekana wakati wa operesheni za kupakia data, na kusababisha matatizo ya usawazishaji ambapo mkusanyiko ulikuwa na uwezekano wa kuwepo katika mifumo ya uhifadhi bila metadata inayolingana katika msimamizi. Hii ilisababisha changamoto za usimamizi na uwezekano wa data isiyo na uhusiano.
|
||||
|
||||
Mfumo wa uundaji wa mkusanyiko unaoonekana unafanya kazi na masuala haya kwa:
|
||||
Kuhitaji mkusanyiko uundwe kabla ya kutumika kupitia `tg-set-collection`
|
||||
Kutangaza uundaji wa mkusanyiko kwa mifumo yote ya uhifadhi
|
||||
Kuhifadhi hali iliyosawazishwa kati ya metadata ya msimamizi na uhifadhi
|
||||
Kuzuia uandishi kwenye mkusanyiko usio na uwepo
|
||||
Kutoa usimamizi wazi wa mzunguko wa maisha wa mkusanyiko
|
||||
|
||||
Maelekezo haya yanafafanua mfumo wa usimamizi wa mkusanyiko unaoonekana. Kwa kuhitaji uundaji wa mkusanyiko unaoonekana, TrustGraph huhakikisha:
|
||||
Mkusanyiko unafuatiliwa katika metadata ya msimamizi kuanzia uundaji
|
||||
Mifumo yote ya uhifadhi inajua mkusanyiko kabla ya kupokea data
|
||||
Hakuna mkusanyiko usio na uhusiano katika uhifadhi
|
||||
Uwazi wa utendaji na udhibiti wa mzunguko wa maisha wa mkusanyiko
|
||||
Usimamizi thabiti wa makosa wakati operesheni zinarejelea mkusanyiko usio na uwepo
|
||||
|
||||
## Ubunifu wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Mfumo wa usimamizi wa mkusanyiko utatekelezwa ndani ya miundombinu iliyopo ya TrustGraph:
|
||||
|
||||
1. **Jumuisha Huduma ya Msimamizi**
|
||||
Operesheni za usimamizi wa mkusanyiko zitaongezwa kwenye huduma iliyopo ya msimamizi
|
||||
Huduma mpya haihitajiki - inatumia mitindo iliyopo ya uthibitishaji na ufikiaji
|
||||
Inashughulikia orodha ya mkusanyiko, kufutwa, na usimamizi wa metadata
|
||||
|
||||
Moduli: trustgraph-librarian
|
||||
|
||||
2. **Jedwali la Metadata ya Mkusanyiko la Cassandra**
|
||||
Jedwali jipya katika nafasi ya funguo ya msimamizi iliyopo
|
||||
Inahifadhi metadata ya mkusanyiko na ufikiaji wa mtumiaji
|
||||
Ufunguo mkuu: (user_id, collection_id) kwa usawazishaji sahihi wa wateja wengi
|
||||
|
||||
Moduli: trustgraph-librarian
|
||||
|
||||
3. **Kifaa cha Amri cha Usimamizi wa Mkusanyiko**
|
||||
Kiwao cha amri kwa operesheni za mkusanyiko
|
||||
Inatoa orodha, kufuta, lebo, na amri za usimamizi wa lebo
|
||||
Inajumuisha na mfumo uliopo wa kifaa cha amri
|
||||
|
||||
Moduli: trustgraph-cli
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
#### Jedwali la Metadata ya Mkusanyiko la Cassandra
|
||||
|
||||
Metadata ya mkusanyiko itahifadhiwa katika jedwali lililopangwa la Cassandra katika nafasi ya funguo ya msimamizi:
|
||||
|
||||
```sql
|
||||
CREATE TABLE collections (
|
||||
user text,
|
||||
collection text,
|
||||
name text,
|
||||
description text,
|
||||
tags set<text>,
|
||||
created_at timestamp,
|
||||
updated_at timestamp,
|
||||
PRIMARY KEY (user, collection)
|
||||
);
|
||||
```
|
||||
|
||||
Muundo wa jedwali:
|
||||
**user** + **collection**: Ufunguo mkuu unaojumuisha unaohakikisha kutenganishwa kwa watumiaji
|
||||
**name**: Jina la mkusanyiko linaloweza kusomwa na binadamu
|
||||
**description**: Maelezo ya kina ya madhumuni ya mkusanyiko
|
||||
**tags**: Kundi la lebo kwa ajili ya uainishaji na kuchujwa
|
||||
**created_at**: Alama ya muda ya uundaji wa mkusanyiko
|
||||
**updated_at**: Alama ya muda ya mabadiliko ya mwisho
|
||||
|
||||
Mbinu hii inaruhusu:
|
||||
Usimamizi wa mkusanyiko wa wateja wengi pamoja na kutenganishwa kwa watumiaji
|
||||
Ufuatiliaji wa haraka kwa mtumiaji na mkusanyiko
|
||||
Mfumo wa lebo unaobadilika kwa ajili ya upangaji
|
||||
Ufuatiliaji wa mzunguko wa maisha kwa ajili ya ufahamu wa utendaji
|
||||
|
||||
#### Mzunguko wa Maisha wa Mkusanyiko
|
||||
|
||||
Mkusanyiko huundwa wazi katika mfumo wa usimamizi kabla ya shughuli za data zinaweza kuendelea:
|
||||
|
||||
1. **Uundaji wa Mkusanyiko** (Njia Mbili):
|
||||
|
||||
**Njia A: Uundaji unaoanzishwa na Mtumiaji** kupitia `tg-set-collection`:
|
||||
Mtumiaji hutoa kitambulisho cha mkusanyiko, jina, maelezo, na lebo
|
||||
Mfumo wa usimamizi huunda rekodi ya metadata katika meza ya `collections`
|
||||
Mfumo wa usimamizi hutuma "unda-mkusanyiko" kwa kila mfumo wa kuhifadhi
|
||||
Mifumo yote ya kuhifadhi huunda mkusanyiko na kuthibitisha mafanikio
|
||||
Mkusanyiko sasa uko tayari kwa shughuli za data
|
||||
|
||||
**Njia B: Uundaji Otomatiki wakati wa Uwasilishaji wa Hati**:
|
||||
Mtumiaji huwasilisha hati inayobainisha kitambulisho cha mkusanyiko
|
||||
Mfumo wa usimamizi huhakikisha ikiwa mkusanyiko umejumuishwa katika meza ya metadata
|
||||
Ikiwa haujumuishwa: Mfumo wa usimamizi huunda metadata na mipangilio chache (jina=kitambulisho_cha_mkusanyiko, maelezo/lebo tupu)
|
||||
Mfumo wa usimamizi hutuma "unda-mkusanyiko" kwa kila mfumo wa kuhifadhi
|
||||
Mifumo yote ya kuhifadhi huunda mkusanyiko na kuthibitisha mafanikio
|
||||
Ufuatiliaji wa hati unaendelea na mkusanyiko sasa umeanzishwa
|
||||
|
||||
Njia zote mbili zinahakikisha kuwa mkusanyiko umejumuishwa katika metadata ya mfumo wa usimamizi NA katika mifumo yote ya kuhifadhi kabla ya shughuli za data.
|
||||
|
||||
2. **Uthibitisho wa Uhifadhi**: Shughuli za kuandika zinathibitisha kuwa mkusanyiko umejumuishwa:
|
||||
Mifumo ya kuhifadhi huhakikisha hali ya mkusanyiko kabla ya kukubali kuandika
|
||||
Kuandika kwa mkusanyiko usiojumuishwa kunarudisha kosa
|
||||
Hii inazuia kuandika moja kwa moja ambayo yanaweza kuepuka mantiki ya uundaji wa mkusanyiko ya mfumo wa usimamizi
|
||||
|
||||
3. **Tabia ya Ufuatiliaji**: Shughuli za kufuatilia hushughulikia mkusanyiko usiojumuishwa kwa utulivu:
|
||||
Ufuatiliaji kwa mkusanyiko usiojumuishwa hurudisha matokeo tupu
|
||||
Hakuna kosa linalorushwa kwa shughuli za kufuatilia
|
||||
Inaruhusu uchunguzi bila kuhitaji mkusanyiko kuwepo
|
||||
|
||||
4. **Sasisho za Metadata**: Watumiaji wanaweza kusasisha metadata ya mkusanyiko baada ya uundaji:
|
||||
Sasisha jina, maelezo, na lebo kupitia `tg-set-collection`
|
||||
Sasisho hutumika kwa metadata ya mfumo wa usimamizi pekee
|
||||
Mifumo ya kuhifadhi yanaendelea kuweka mkusanyiko lakini sasisho za metadata hazisambazwi
|
||||
|
||||
5. **Ufutaji Wazi**: Watumiaji huondoa mkusanyiko kupitia `tg-delete-collection`:
|
||||
Mfumo wa usimamizi hutuma "ondoa-mkusanyiko" kwa kila mfumo wa kuhifadhi
|
||||
Inasubiri uthibitisho kutoka kwa mifumo yote ya kuhifadhi
|
||||
Huondoa rekodi ya metadata ya mfumo wa usimamizi tu baada ya kusafishwa kwa uhifadhi kukamilika
|
||||
Inahakikisha hakuna data iliyoachwa katika uhifadhi
|
||||
|
||||
**Kanuni Muhimu**: Mfumo wa usimamizi ndio sehemu pekee ya udhibiti kwa uundaji wa mkusanyiko. Iwe iliyoanzishwa na amri ya mtumiaji au uwasilishaji wa hati, mfumo wa usimamizi huhakikisha ufuatiliaji sahihi wa metadata na usawazishaji wa mfumo wa kuhifadhi kabla ya kuruhusu shughuli za data.
|
||||
|
||||
Shughuli zinazohitajika:
|
||||
**Unda Mkusanyiko**: Shughuli ya mtumiaji kupitia `tg-set-collection` AU otomatiki wakati wa uwasilishaji wa hati
|
||||
**Sasisha Metadata ya Mkusanyiko**: Shughuli ya mtumiaji ili kurekebisha jina, maelezo, na lebo
|
||||
**Ondoa Mkusanyiko**: Shughuli ya mtumiaji ili kuondoa mkusanyiko na data yake katika maduka yote
|
||||
**Orodha ya Mkusanyiko**: Shughuli ya mtumiaji ili kuona mkusanyiko pamoja na kuchujwa kwa lebo
|
||||
|
||||
#### Usimamizi wa Mkusanyiko wa Maduka Mengi
|
||||
|
||||
Mkusanyiko huwepo katika mifumo tofauti ya kuhifadhi katika TrustGraph:
|
||||
**Maduka ya Vifaa** (Qdrant, Milvus, Pinecone): Kuhifadhi vifaa na data ya vifaa
|
||||
**Maduka ya Vitu** (Cassandra): Kuhifadhi hati na data ya faili
|
||||
**Maduka ya Vitatu** (Cassandra, Neo4j, Memgraph, FalkorDB): Kuhifadhi data ya grafu/RDF
|
||||
|
||||
Kila aina ya duka inatekeleza:
|
||||
**Ufuatiliaji wa Hali ya Mkusanyiko**: Kuhifadhi habari kuhusu makusanyiko ambayo yamepo.
|
||||
**Uundaji wa Mkusanyiko**: Kukubali na kuchakata "unda-mkusanyiko" shughuli.
|
||||
**Uthibitisho wa Mkusanyiko**: Angalia ikiwa mkusanyiko unapatikana kabla ya kukubali uandikaji.
|
||||
**Ufutaji wa Mkusanyiko**: Ondoa data yote kwa mkusanyiko uliotajwa.
|
||||
|
||||
Huduma ya mhakimishi inaangazia shughuli za mkusanyiko katika aina zote za kuhifadhi, kuhakikisha:
|
||||
Makusanyiko yanaundwa katika mifumo yote ya nyuma kabla ya matumizi.
|
||||
Mifumo yote ya nyuma inaonyesha uundaji kabla ya kurejesha mafanikio.
|
||||
Mzunguko wa mkusanyiko umeunganishwa katika aina tofauti za uhifadhi.
|
||||
Usimamizi thabiti wa makosa wakati makusanyiko hayapo.
|
||||
|
||||
#### Ufuatiliaji wa Hali ya Mkusanyiko kwa Aina ya Uhifadhi
|
||||
|
||||
Kila mfumo wa nyuma wa uhifadhi unafuatilia hali ya mkusanyiko tofauti kulingana na uwezo wake:
|
||||
|
||||
**Duka la Maneno la Cassandra:**
|
||||
Hutumia meza iliyopo `triples_collection`.
|
||||
Huunda alama ya mfumo wakati mkusanyiko unaundwa.
|
||||
Uulizo: `SELECT collection FROM triples_collection WHERE collection = ? LIMIT 1`.
|
||||
Uchunguzi wa sehemu moja kwa uwepo wa mkusanyiko.
|
||||
|
||||
**Vifaa vya Vector vya Qdrant/Milvus/Pinecone:**
|
||||
API za asili za mkusanyiko hutoa uchunguzi wa uwepo.
|
||||
Makusanyiko yanaundwa na usanidi sahihi wa vector.
|
||||
Njia `collection_exists()` hutumia API ya uhifadhi.
|
||||
Uundaji wa mkusanyiko unafanya uthibitisho wa mahitaji ya kipimo.
|
||||
|
||||
**Vifaa vya Grafu vya Neo4j/Memgraph/FalkorDB:**
|
||||
Hutumia nodi `:CollectionMetadata` kufuatilia makusanyiko.
|
||||
Vipengele vya nodi: `{user, collection, created_at}`.
|
||||
Uulizo: `MATCH (c:CollectionMetadata {user: $user, collection: $collection})`.
|
||||
Tofauti na nodi za data kwa kutenganisha vizuri.
|
||||
Inaruhusu orodha na uthibitisho wa mkusanyiko kuwa rahisi.
|
||||
|
||||
**Duka la Vitu la Cassandra:**
|
||||
Hutumia meza ya metadata ya mkusanyiko au mistari ya alama.
|
||||
Mfano sawa na duka la maneno.
|
||||
Inathibitisha mkusanyiko kabla ya uandikaji wa hati.
|
||||
|
||||
### API
|
||||
|
||||
API za Usimamizi wa Mkusanyiko (Mhakimishi):
|
||||
**Unda/Boresha Mkusanyiko**: Unda mkusanyiko mpya au boresha metadata iliyopo kupitia `tg-set-collection`.
|
||||
**Orodha ya Makusanyiko**: Pata makusanyiko kwa mtumiaji na uchujaji wa tag wa hiari.
|
||||
**Futa Mkusanyiko**: Ondoa mkusanyiko na data inayohusiana, na kuenea kwa aina zote za kuhifadhi.
|
||||
|
||||
API za Usimamizi wa Uhifadhi (Wote Wasindikaji wa Uhifadhi):
|
||||
**Unda Mkusanyiko**: Shughuli ya "unda-mkusanyiko", weka mkusanyiko katika uhifadhi.
|
||||
**Futa Mkusanyiko**: Shughuli ya "futa-mkusanyiko", ondoa data yote ya mkusanyiko.
|
||||
**Angalia Ikiwa Mkusanyiko Upo**: Uthibitisho wa ndani kabla ya kukubali shughuli za uandikaji.
|
||||
|
||||
API za Uendeshaji wa Data (Tabia Imebadilishwa):
|
||||
**API za Uandikaji**: Thibitisha kuwa mkusanyiko unapatikana kabla ya kukubali data, na uweke makosa ikiwa haipo.
|
||||
**API za Uulizo**: Rejesha matokeo tupu kwa makusanyiko ambayo hayapo bila makosa.
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
Utendaji utafuata mifumo iliyopo ya TrustGraph kwa ujumuishaji wa huduma na muundo wa amri ya CLI.
|
||||
|
||||
#### Ufufuo wa Ufufuo wa Mkusanyiko
|
||||
|
||||
Wakati mtumiaji anaanzisha ufutaji wa mkusanyiko kupitia huduma ya mhakimishi:
|
||||
|
||||
1. **Uthibitisho wa Metadata**: Thibitisha kuwa mkusanyiko unapatikana na mtumiaji ana ruhusa ya kufuta.
|
||||
2. **Ufufuo wa Duka**: Mhakimishi inaangazia ufutaji katika waandishi wote wa duka:
|
||||
Mwandishi wa duka la vector: Ondoa embeddings na fahirisi za vector kwa mtumiaji na mkusanyiko.
|
||||
Mwandishi wa duka la kitu: Ondoa hati na faili kwa mtumiaji na mkusanyiko.
|
||||
Mwandishi wa duka la maneno: Ondoa data ya grafu na maneno kwa mtumiaji na mkusanyiko.
|
||||
3. **Usafishaji wa Metadata**: Ondoa rekodi ya metadata ya mkusanyiko kutoka Cassandra.
|
||||
4. **Usimamizi wa Makosa**: Ikiwa ufutaji wowote wa duka hufeli, dhibiti uthabiti kupitia utaratibu wa kurejesha au kujaribu tena.
|
||||
|
||||
#### Kiolesura cha Usimamizi wa Mkusanyiko
|
||||
|
||||
**⚠️ MFUMO WA KALE - IMEBADILISHWA NA MFUMO WA MSINGI WA MSINGI**
|
||||
|
||||
Arkitektura iliyoelezwa ya msingi ya folyo imebadilishwa na mbinu iliyosimama na usanidi inayotumia `CollectionConfigHandler`. Mifumo yote ya nyuma ya uhifadhi sasa hupokea sasisho za mkusanyiko kupitia ujumbe wa kushinikiza usanidi badala ya folyo maalum za usimamizi.
|
||||
|
||||
~~Wote waandishi wa duka inatekeleza kiolesura cha kawaida cha usimamizi wa mkusanyiko na schema ya kawaida:~~
|
||||
|
||||
~~**Schema ya Ujumbe (`StorageManagementRequest`):**~~
|
||||
```json
|
||||
{
|
||||
"operation": "create-collection" | "delete-collection",
|
||||
"user": "user123",
|
||||
"collection": "documents-2024"
|
||||
}
|
||||
```
|
||||
|
||||
~~**Usawa wa Mifumo:**~~
|
||||
~~**Kikao cha Usimamizi wa Hifadhi ya Data (Vector Store)** (`vector-storage-management`): Hifadhi za vector/embedding~~
|
||||
~~**Kikao cha Usimamizi wa Hifadhi ya Data (Object Store)** (`object-storage-management`): Hifadhi za data/nyaraka~~
|
||||
~~**Kikao cha Usimamizi wa Hifadhi ya Data (Triple Store)** (`triples-storage-management`): Hifadhi za grafu/RDF~~
|
||||
~~**Kikao cha Majibu ya Hifadhi ya Data** (`storage-management-response`): Majibu yote hutumwa hapa~~
|
||||
|
||||
**Utendaji Sasa:**
|
||||
|
||||
Mifumo yote ya hifadhi ya data sasa hutumia `CollectionConfigHandler`:
|
||||
**Uunganishaji wa Uhamisho wa Config**: Huduma za hifadhi ya data huzungushwa kwa arifa za uhamisho wa config
|
||||
**Usawajili Otomatiki**: Mkusanyiko huundwa/kufutwa kulingana na mabadiliko ya config
|
||||
**Mfumo wa Kielelezo:** Mkusanyiko umeinuliwa katika huduma ya config, hifadhi ya data husawazishwa ili kuendana
|
||||
**Hakuna Ombi/Jibu:** Huondoa gharama ya uratibu na ufuatiliaji wa majibu
|
||||
**Ufuatiliaji wa Hali ya Mkusanyiko:** Inahifadhiwa kupitia kumbukumbu `known_collections`
|
||||
**Operesheni za Idempotent:** Ni salama kuchakata config sawa mara nyingi
|
||||
|
||||
Kila mfumo wa hifadhi ya data unatekeleza:
|
||||
`create_collection(user: str, collection: str, metadata: dict)` - Unda miundo ya mkusanyiko
|
||||
`delete_collection(user: str, collection: str)` - Ondoa data yote ya mkusanyiko
|
||||
`collection_exists(user: str, collection: str) -> bool` - Thibitisha kabla ya kuandika
|
||||
|
||||
#### Urekebishaji wa Hifadhi ya Data ya Triple ya Cassandra
|
||||
|
||||
Kama sehemu ya utekelezaji huu, hifadhi ya data ya triple ya Cassandra itarekebishwa kutoka kwa mfumo wa jedwali-kwa-mkusanyiko hadi mfumo wa jedwali lililo na muundo mmoja:
|
||||
|
||||
**Muundo Sasa:**
|
||||
Keyspace kwa kila mtumiaji, jedwali tofauti kwa kila mkusanyiko
|
||||
Schema: `(s, p, o)` na `PRIMARY KEY (s, p, o)`
|
||||
Majina ya jedwali: mkusanyiko wa mtumiaji unakuwa jedwali tofauti za Cassandra
|
||||
|
||||
**Muundo Mpya:**
|
||||
Keyspace kwa kila mtumiaji, jedwali moja la "triples" kwa mkusanyiko wote
|
||||
Schema: `(collection, s, p, o)` na `PRIMARY KEY (collection, s, p, o)`
|
||||
Utengano wa mkusanyiko kupitia ugawaji wa mkusanyiko
|
||||
|
||||
**Mabadiliko Yanayohitajika:**
|
||||
|
||||
1. **Urekebishaji wa Darasa la TrustGraph** (`trustgraph/direct/cassandra.py`):
|
||||
Ondoa parameter `table` kutoka kwa konstrukata, tumia jedwali "triples" lililo na muundo mmoja
|
||||
Ongeza parameter `collection` kwa mbinu zote
|
||||
Sasisha schema ili kujumuisha mkusanyiko kama safu ya kwanza
|
||||
**Sasisho za Indexi:** Indexi mpya zitaundwa ili kusaidia mifumo yote 8 ya swali:
|
||||
Indexi kwenye `(s)` kwa maswali yanayohusiana na mada
|
||||
Indexi kwenye `(p)` kwa maswali yanayohusiana na sifa
|
||||
Indexi kwenye `(o)` kwa maswali yanayohusiana na kitu
|
||||
Kumbuka: Cassandra haitumii indexi za sekondari za safu nyingi, kwa hivyo hizi ni indexi za safu moja
|
||||
|
||||
**Utendaji wa Mfumo wa Swali:**
|
||||
✅ `get_all()` - skani ya ugawaji kwenye `collection`
|
||||
✅ `get_s(s)` - hutumia ufunguo mkuu kwa ufanisi (`collection, s`)
|
||||
✅ `get_p(p)` - hutumia `idx_p` na `collection` ya kuchujwa
|
||||
✅ `get_o(o)` - hutumia `idx_o` na `collection` ya kuchujwa
|
||||
✅ `get_sp(s, p)` - hutumia ufunguo mkuu kwa ufanisi (`collection, s, p`)
|
||||
⚠️ `get_po(p, o)` - inahitaji `ALLOW FILTERING` (inatumia ama `idx_p` au `idx_o` pamoja na kuchujwa)
|
||||
✅ `get_os(o, s)` - hutumia `idx_o` na kuchujwa cha ziada kwenye `s`
|
||||
✅ `get_spo(s, p, o)` - hutumia ufunguo mkuu kwa ufanisi
|
||||
|
||||
**Kumbuka kuhusu ALLOW FILTERING:** Mfumo wa swali `get_po` unahitaji `ALLOW FILTERING` kwa sababu unahitaji sifa na kikomo cha kitu bila indexi ya pamoja inayofaa. Hii inakubalika kwa sababu mfumo huu wa swali ni mdogo kuliko maswali yanayohusiana na mada katika matumizi ya kawaida ya hifadhi ya data ya triple
|
||||
|
||||
2. **Sasisho za Mwandishi wa Hifadhi ya Data** (`trustgraph/storage/triples/cassandra/write.py`):
|
||||
Dumishe muunganisho mmoja wa TrustGraph kwa kila mtumiaji badala ya kwa kila (mtumiaji, mkusanyiko)
|
||||
Pasa mkusanyiko kwenye operesheni za kuingiza
|
||||
Matumizi bora ya rasilimali kwa muunganisho mdogo
|
||||
|
||||
3. **Sasisho za Huduma ya Swali** (`trustgraph/query/triples/cassandra/service.py`):
|
||||
Muunganisho mmoja wa TrustGraph kwa kila mtumiaji
|
||||
Pasa mkusanyiko kwa operesheni zote za swali
|
||||
Dumishe mantiki sawa ya swali na parameter ya mkusanyiko
|
||||
|
||||
**Faida:**
|
||||
**Uondoo Ulioboreshwa wa Mkusanyiko:** Ondoa kwa kutumia ufunguo wa ugawaji `collection` katika meza zote 4
|
||||
**Ufanisi wa Rasilimali:** Muunganisho mdogo wa hifadhi ya data na vitu vya jedwali
|
||||
**Operesheni za Mkusanyiko Mbalimbali:** Ni rahisi kutekeleza operesheni zinazohusisha mkusanyiko mwingi
|
||||
**Muundo Uliofanana:** Inalingana na mbinu iliyounganishwa ya metadata ya mkusanyiko
|
||||
**Uthibitisho wa Mkusanyiko:** Ni rahisi kuangalia uwepo wa mkusanyiko kupitia jedwali `triples_collection`
|
||||
|
||||
Operesheni za ukusanyaji zitakuwa za atomiki ambapo inawezekana na hutoa utunzaji wa makosa unaofaa na uthibitisho.
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
Operesheni za usimamizi wa ukusanyaji zinahitaji idhini inayofaa ili kuzuia ufikiaji usioidhinishwa au kufutwa kwa ukusanyaji. Udhibiti wa ufikiaji utalingana na modeli za usalama za TrustGraph zilizopo.
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
Operesheni za kuorodhesha ukusanyaji zinaweza kuhitaji upangishaji katika mazingira yenye idadi kubwa ya ukusanyaji. Maswali ya metadata yanapaswa kuboreshwa kwa mifumo ya kawaida ya kuchujwa.
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
Majaribio kamili yataangazia:
|
||||
Mchakato wa kuunda ukusanyaji kutoka mwanzo hadi mwisho
|
||||
Usawazishaji wa mfumo wa kuhifadhi data
|
||||
Uthibitisho wa kuandika kwa ukusanyaji usiopo
|
||||
Usimamizi wa maswali ya ukusanyaji usiopo
|
||||
Ufutilishaji wa ukusanyaji unaoenea katika maduka yote
|
||||
Usimamizi wa makosa na hali ya uponyaji
|
||||
Majaribio ya kitengo kwa kila mfumo wa kuhifadhi data
|
||||
Majaribio ya ujumuishaji kwa operesheni za duka nyingi
|
||||
|
||||
## Hali ya Utendaji
|
||||
|
||||
### ✅ Vipengele Vilivyokamilika
|
||||
|
||||
1. **Huduma ya Usimamizi wa Ukusanyaji ya Librarian** (`trustgraph-flow/trustgraph/librarian/collection_manager.py`)
|
||||
Operesheni za CRUD za metadata ya ukusanyaji (orodha, sasisha, futa)
|
||||
Ujumuishaji wa jedwali la metadata ya ukusanyaji ya Cassandra kupitia `LibraryTableStore`
|
||||
Uratibu wa kufutiliwa kwa ukusanyaji unaoenea katika aina zote za uhifadhi
|
||||
Usimamizi wa ombi/jibu lisilo na wakati pamoja na usimamizi wa makosa unaofaa
|
||||
|
||||
2. **Mfumo wa Metadata wa Ukusanyaji** (`trustgraph-base/trustgraph/schema/services/collection.py`)
|
||||
Mfumo wa `CollectionManagementRequest` na `CollectionManagementResponse`
|
||||
Mfumo wa `CollectionMetadata` kwa rekodi za ukusanyaji
|
||||
Ufafanuzi wa mada ya folyo ya ombi/jibu la ukusanyaji
|
||||
|
||||
3. **Mfumo wa Usimamizi wa Uhifadhi** (`trustgraph-base/trustgraph/schema/services/storage.py`)
|
||||
Mfumo wa `StorageManagementRequest` na `StorageManagementResponse`
|
||||
Mada za folyo za usimamizi wa uhifadhi zimefafanuliwa
|
||||
Muundo wa ujumbe kwa operesheni za ukusanyaji za kiwango cha uhifadhi
|
||||
|
||||
4. **Mfumo wa Jedwali la Cassandra 4** (`trustgraph-flow/trustgraph/direct/cassandra_kg.py`)
|
||||
Funguo za sehemu mchanganyiko kwa utendaji wa swali
|
||||
Jedwali la `triples_collection` kwa maswali ya SPO na kufuatilia kufutiliwa
|
||||
Ufutilishaji wa ukusanyaji umetekelezwa na mfumo wa kusoma kisha kufuta
|
||||
|
||||
### ✅ Uhamishaji kwa Mfumo Kulingana na Config - UMEKABALIKA
|
||||
|
||||
**Maduka yote ya uhifadhi yamehamishwa kutoka kwa mfumo unaotegemea folyo hadi kwa mfumo unaotegemea config wa `CollectionConfigHandler`.**
|
||||
|
||||
Uhamishaji uliokamilika:
|
||||
✅ `trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/triples/neo4j/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/triples/memgraph/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/triples/falkordb/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/graph_embeddings/qdrant/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/graph_embeddings/milvus/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/graph_embeddings/pinecone/write.py`
|
||||
✅ `trustgraph-flow/trustgraph/storage/objects/cassandra/write.py`
|
||||
|
||||
Maduka yote sasa:
|
||||
Yanachukua kutoka `CollectionConfigHandler`
|
||||
Yamesajiliwa kwa arifa za config inayoboreshwa kupitia `self.register_config_handler(self.on_collection_config)`
|
||||
Yatekeleza `create_collection(user, collection, metadata)` na `delete_collection(user, collection)`
|
||||
Yatumie `collection_exists(user, collection)` ili kuthibitisha kabla ya kuandika
|
||||
Yanasawazishwa kiotomatiki na mabadiliko ya huduma ya config
|
||||
|
||||
Infrastrakturu ya zamani inayotegemea folyo imeondolewa:
|
||||
✅ Mfumo wa `StorageManagementRequest` na `StorageManagementResponse` umeondolewa
|
||||
✅ Ufafanuzi wa mada za usimamizi wa uhifadhi umeondolewa
|
||||
✅ Mtumiaji/mtayarishaji wa folyo kutoka kwa maduka yote umeondolewa
|
||||
✅ Wasimamizi wa `on_storage_management` kutoka kwa maduka yote umeondolewa
|
||||
144
docs/tech-specs/sw/document-embeddings-chunk-id.sw.md
Normal file
144
docs/tech-specs/sw/document-embeddings-chunk-id.sw.md
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Kitambulisho cha Sehemu ya Matini (Document Embeddings Chunk ID)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Kitambulisho cha Sehemu ya Matini (Document Embeddings Chunk ID)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Hifadhi ya matini ya maandishi kwa sasa huhifadhi matini ya sehemu moja kwa moja katika sehemu ya data ya hifadhi ya vector, na hivyo kurudia data ambayo ipo katika Garage. Hati hii inabadilisha uhifadhi wa matini ya sehemu kwa kutumia marejeleo ya `chunk_id`.
|
||||
|
||||
## Hali ya Sasa
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ChunkEmbeddings:
|
||||
chunk: bytes = b""
|
||||
vectors: list[list[float]] = field(default_factory=list)
|
||||
|
||||
@dataclass
|
||||
class DocumentEmbeddingsResponse:
|
||||
error: Error | None = None
|
||||
chunks: list[str] = field(default_factory=list)
|
||||
```
|
||||
|
||||
Hifadhi ya data ya aina ya vector:
|
||||
```python
|
||||
payload={"doc": chunk} # Duplicates Garage content
|
||||
```
|
||||
|
||||
## Ubunifu
|
||||
|
||||
### Mabadiliko ya Mpango
|
||||
|
||||
**ChunkEmbeddings** - badilisha "chunk" na "chunk_id":
|
||||
```python
|
||||
@dataclass
|
||||
class ChunkEmbeddings:
|
||||
chunk_id: str = ""
|
||||
vectors: list[list[float]] = field(default_factory=list)
|
||||
```
|
||||
|
||||
**Jibu la DocumentEmbeddingsResponse** - irudishe `chunk_ids` badala ya `chunks`:
|
||||
```python
|
||||
@dataclass
|
||||
class DocumentEmbeddingsResponse:
|
||||
error: Error | None = None
|
||||
chunk_ids: list[str] = field(default_factory=list)
|
||||
```
|
||||
|
||||
### Mfumo wa Hifadhi ya Vektor
|
||||
|
||||
Maduka yote (Qdrant, Milvus, Pinecone):
|
||||
```python
|
||||
payload={"chunk_id": chunk_id}
|
||||
```
|
||||
|
||||
### Mabadiliko ya Mchakato wa RAG wa Hati
|
||||
|
||||
Mchakato wa RAG wa hati hupata maudhui ya sehemu kutoka kwa Garage:
|
||||
|
||||
```python
|
||||
# Get chunk_ids from embeddings store
|
||||
chunk_ids = await self.rag.doc_embeddings_client.query(...)
|
||||
|
||||
# Fetch chunk content from Garage
|
||||
docs = []
|
||||
for chunk_id in chunk_ids:
|
||||
content = await self.rag.librarian_client.get_document_content(
|
||||
chunk_id, self.user
|
||||
)
|
||||
docs.append(content)
|
||||
```
|
||||
|
||||
### Mabadiliko ya API/SDK
|
||||
|
||||
**DocumentEmbeddingsClient** hurudia chunk_ids:
|
||||
```python
|
||||
return resp.chunk_ids # Changed from resp.chunks
|
||||
```
|
||||
|
||||
**Muundo wa data** (Mfasiri wa Majibu ya Matangazo ya Hati):
|
||||
```python
|
||||
result["chunk_ids"] = obj.chunk_ids # Changed from chunks
|
||||
```
|
||||
|
||||
### Mabadiliko ya CLI
|
||||
|
||||
Zana ya CLI inaonyesha kitambulisho cha vipande (watumiaji wanaweza kupata maudhui kando ikiwa ni lazima).
|
||||
|
||||
## Faili Zinazohitajika Kubadilishwa
|
||||
|
||||
### Mpango (Schema)
|
||||
`trustgraph-base/trustgraph/schema/knowledge/embeddings.py` - ChunkEmbeddings
|
||||
`trustgraph-base/trustgraph/schema/services/query.py` - DocumentEmbeddingsResponse
|
||||
|
||||
### Ujumbe/Watafsiri
|
||||
`trustgraph-base/trustgraph/messaging/translators/embeddings_query.py` - DocumentEmbeddingsResponseTranslator
|
||||
|
||||
### Mteja (Client)
|
||||
`trustgraph-base/trustgraph/base/document_embeddings_client.py` - rudisha kitambulisho cha vipande
|
||||
|
||||
### SDK/API ya Python
|
||||
`trustgraph-base/trustgraph/api/flow.py` - document_embeddings_query
|
||||
`trustgraph-base/trustgraph/api/socket_client.py` - document_embeddings_query
|
||||
`trustgraph-base/trustgraph/api/async_flow.py` - ikiwa inafaa
|
||||
`trustgraph-base/trustgraph/api/bulk_client.py` - uagizaji/uangamizi wa vipande vya maandishi
|
||||
`trustgraph-base/trustgraph/api/async_bulk_client.py` - uagizaji/uangamizi wa vipande vya maandishi
|
||||
|
||||
### Huduma ya Vipande vya Maandishi (Embeddings Service)
|
||||
`trustgraph-flow/trustgraph/embeddings/document_embeddings/embeddings.py` - pitisha kitambulisho cha kipande
|
||||
|
||||
### Waandishi wa Uhifadhi (Storage Writers)
|
||||
`trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py`
|
||||
|
||||
### Huduma za Utafutaji (Query Services)
|
||||
`trustgraph-flow/trustgraph/query/doc_embeddings/qdrant/service.py`
|
||||
`trustgraph-flow/trustgraph/query/doc_embeddings/milvus/service.py`
|
||||
`trustgraph-flow/trustgraph/query/doc_embeddings/pinecone/service.py`
|
||||
|
||||
### Lango (Gateway)
|
||||
`trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_query.py`
|
||||
`trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_export.py`
|
||||
`trustgraph-flow/trustgraph/gateway/dispatch/document_embeddings_import.py`
|
||||
|
||||
### Utafutaji wa Hati (Document RAG)
|
||||
`trustgraph-flow/trustgraph/retrieval/document_rag/rag.py` - ongeza mteja wa "librarian"
|
||||
`trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py` - pata kutoka "Garage"
|
||||
|
||||
### CLI
|
||||
`trustgraph-cli/trustgraph/cli/invoke_document_embeddings.py`
|
||||
`trustgraph-cli/trustgraph/cli/save_doc_embeds.py`
|
||||
`trustgraph-cli/trustgraph/cli/load_doc_embeds.py`
|
||||
|
||||
## Faida
|
||||
|
||||
1. Chanzo kimoja cha ukweli - maandishi ya vipande tu katika "Garage"
|
||||
2. Kupunguzwa kwa uhifadhi wa hifadhi ya vector
|
||||
3. Inawezesha uhakikisho wa muda wa utafutaji kupitia kitambulisho cha kipande.
|
||||
675
docs/tech-specs/sw/embeddings-batch-processing.sw.md
Normal file
675
docs/tech-specs/sw/embeddings-batch-processing.sw.md
Normal file
|
|
@ -0,0 +1,675 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Kiufundi vya Uendeshaji wa Pamoja wa Matukio (Embeddings)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vipimo vya Kiufundi vya Uendeshaji wa Pamoja wa Matukio (Embeddings)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelekezo haya yanaelezea uboreshaji kwa huduma ya matukio ili kusaidia uendeshaji wa pamoja wa maandishi mengi katika ombi moja. Utaratibu wa sasa huendesha maandishi moja kwa wakati, na hivyo kupoteza faida kubwa za utendaji ambazo modeli za matukio hutoa wakati wa kuendesha matukio.
|
||||
|
||||
1. **Utofauti wa Uendeshaji wa Maandishi Moja**: Utaratibu wa sasa unaficha maandishi ya moja ndani ya orodha, na hivyo kutumia viboresho vya uendeshaji wa FastEmbed.
|
||||
2. **Mizio ya Ombi kwa Maandishi Kila Moja**: Maandishi kila moja yanahitaji ujumbe tofauti wa Pulsar.
|
||||
3. **Utofauti wa Uendeshaji wa Modeli**: Modeli za matukio zina gharama thabiti kwa kila kundi; madaraja madogo hutumia rasilimali za GPU/CPU.
|
||||
4. **Uendeshaji wa Mfululizo katika Huduma Zinazotumia**: Huduma muhimu huenda kupitia vipengele na kuita matukio moja kwa moja.
|
||||
|
||||
## Lengo
|
||||
|
||||
**Usaidizi wa API ya Matukio**: Kuruhusu uendeshaji wa maandishi mengi katika ombi moja.
|
||||
**Ulinganifu na Mifumo ya Zamani**: Kuendelea kutoa usaidizi kwa ombi la maandishi moja.
|
||||
**Uboreshaji Mkubwa wa Ufanisi**: Lengo ni uboreshaji wa ufanisi wa mara 5-10 kwa operesheni za jumla.
|
||||
**Kupunguza Muda wa Kila Maandishi**: Kupunguza muda wa wastani wakati wa kuendesha matukio ya maandishi mengi.
|
||||
**Ufanisi wa Kumbukumbu**: Kuendesha madaraja bila matumizi mengi ya kumbukumbu.
|
||||
**Usiohusiana na Mtoa Huduma**: Kusaidia uendeshaji wa pamoja kwa FastEmbed, Ollama, na watoa huduma wengine.
|
||||
**Kubadilisha Huduma Zinazotumia**: Kusasisha huduma zote zinazotumia matukio ili kutumia API ya matukio ambapo inafaa.
|
||||
|
||||
## Asili
|
||||
|
||||
### Utaratibu wa Sasa - Huduma ya Matukio
|
||||
|
||||
Utaratibu wa matukio katika `trustgraph-flow/trustgraph/embeddings/fastembed/processor.py` unaonyesha upotevu mkubwa wa utendaji:
|
||||
|
||||
```python
|
||||
# fastembed/processor.py line 56
|
||||
async def on_embeddings(self, text, model=None):
|
||||
use_model = model or self.default_model
|
||||
self._load_model(use_model)
|
||||
|
||||
vecs = self.embeddings.embed([text]) # Single text wrapped in list
|
||||
|
||||
return [v.tolist() for v in vecs]
|
||||
```
|
||||
|
||||
**Matatizo:**
|
||||
|
||||
1. **Ukubwa wa Kundi 1**: Njia ya `embed()` ya FastEmbed imeundwa kwa ajili ya usindikaji wa kundi, lakini tunaiita kila wakati na `[text]` - kikundi cha ukubwa wa 1.
|
||||
|
||||
2. **Mizio ya Kila Ombi**: Kila ombi la uainishaji (embedding) hutoa:
|
||||
Usajili/uondoaji wa ujumbe wa Pulsar
|
||||
Muda wa kusafiri wa mtandao (latency)
|
||||
Mizio ya kuanzisha utekelezaji wa mfumo (model inference)
|
||||
Mizio ya upangaji wa async ya Python
|
||||
|
||||
3. **Kizuia cha Mpango (Schema)**: Mpango wa `EmbeddingsRequest` unaunga mkono tu maandishi moja:
|
||||
```python
|
||||
@dataclass
|
||||
class EmbeddingsRequest:
|
||||
text: str = "" # Single text only
|
||||
```
|
||||
|
||||
### Wataalamu Wanaotumia Sasa - Uendeshaji wa Mfululizo
|
||||
|
||||
#### 1. Lango la API
|
||||
|
||||
**Faili:** `trustgraph-flow/trustgraph/gateway/dispatch/embeddings.py`
|
||||
|
||||
Lango linakubali ombi za uingizaji maandishi moja kupitia HTTP/WebSocket na huvipeleka kwa huduma ya uingizaji. Kwa sasa, hakuna mwisho wa kazi za kikundi.
|
||||
|
||||
```python
|
||||
class EmbeddingsRequestor(ServiceRequestor):
|
||||
# Handles single EmbeddingsRequest -> EmbeddingsResponse
|
||||
request_schema=EmbeddingsRequest, # Single text only
|
||||
response_schema=EmbeddingsResponse,
|
||||
```
|
||||
|
||||
**Athari:** Wateja wa nje (programu za wavuti, skripti) lazima wafanye ombi la HTTP N ili kuingiza maandishi N.
|
||||
|
||||
#### 2. Huduma ya Kuingiza Nyaraka
|
||||
|
||||
**Faili:** `trustgraph-flow/trustgraph/embeddings/document_embeddings/embeddings.py`
|
||||
|
||||
Huprosesa vipande vya nyaraka moja kwa moja:
|
||||
|
||||
```python
|
||||
async def on_message(self, msg, consumer, flow):
|
||||
v = msg.value()
|
||||
|
||||
# Single chunk per request
|
||||
resp = await flow("embeddings-request").request(
|
||||
EmbeddingsRequest(text=v.chunk)
|
||||
)
|
||||
vectors = resp.vectors
|
||||
```
|
||||
|
||||
**Athari:** Kila sehemu ya hati inahitaji ombi tofauti la uingizaji (embedding). Hati yenye sehemu 100 = ombi la uingizaji 100.
|
||||
|
||||
#### 3. Huduma ya Uingizaji wa Picha (Graph Embeddings Service)
|
||||
|
||||
**Faili:** `trustgraph-flow/trustgraph/embeddings/graph_embeddings/embeddings.py`
|
||||
|
||||
Inafanya mzunguko kwenye vitu na kuingiza kila kimoja kwa mtiririko:
|
||||
|
||||
```python
|
||||
async def on_message(self, msg, consumer, flow):
|
||||
for entity in v.entities:
|
||||
# Serial embedding - one entity at a time
|
||||
vectors = await flow("embeddings-request").embed(
|
||||
text=entity.context
|
||||
)
|
||||
entities.append(EntityEmbeddings(
|
||||
entity=entity.entity,
|
||||
vectors=vectors,
|
||||
chunk_id=entity.chunk_id,
|
||||
))
|
||||
```
|
||||
|
||||
**Athari:** Ujumbe wenye vitu 50 = ombi la uwekaji wa maandishi (embedding) la kila kitu. Hii ni kikwazo kikubwa wakati wa uundaji wa grafu ya maarifa.
|
||||
|
||||
#### 4. Huduma ya Uwekaji wa Maandishi ya Safu
|
||||
|
||||
**Faili:** `trustgraph-flow/trustgraph/embeddings/row_embeddings/embeddings.py`
|
||||
|
||||
Huenda kupitia maandishi ya kipekee na huweka kila moja kwa mmoja:
|
||||
|
||||
```python
|
||||
async def on_message(self, msg, consumer, flow):
|
||||
for text, (index_name, index_value) in texts_to_embed.items():
|
||||
# Serial embedding - one text at a time
|
||||
vectors = await flow("embeddings-request").embed(text=text)
|
||||
|
||||
embeddings_list.append(RowIndexEmbedding(
|
||||
index_name=index_name,
|
||||
index_value=index_value,
|
||||
text=text,
|
||||
vectors=vectors
|
||||
))
|
||||
```
|
||||
|
||||
**Athari:** Kuchakata jedwali lenye maadili 100 ya kipekee yaliyopangwa = maombi 100 ya uwekaji data (embedding) kwa kila moja.
|
||||
|
||||
#### 5. EmbeddingsClient (Mteja wa Msingi)
|
||||
|
||||
**Faili:** `trustgraph-base/trustgraph/base/embeddings_client.py`
|
||||
|
||||
Mteja unaotumika na vichakataji vyote vya mtiririko unao na uwezo wa kuweka data (embedding) kwa maandishi moja tu:
|
||||
|
||||
```python
|
||||
class EmbeddingsClient(RequestResponse):
|
||||
async def embed(self, text, timeout=30):
|
||||
resp = await self.request(
|
||||
EmbeddingsRequest(text=text), # Single text
|
||||
timeout=timeout
|
||||
)
|
||||
return resp.vectors
|
||||
```
|
||||
|
||||
**Athari:** Wateja wote wanaotumia programu hii wamezuiliwa kufanya kazi za maandishi pekee.
|
||||
|
||||
#### 6. Vifaa vya Amri (Command-Line Tools)
|
||||
|
||||
**Faili:** `trustgraph-cli/trustgraph/cli/invoke_embeddings.py`
|
||||
|
||||
Zana ya CLI inakubali hoja moja ya maandishi:
|
||||
|
||||
```python
|
||||
def query(url, flow_id, text, token=None):
|
||||
result = flow.embeddings(text=text) # Single text
|
||||
vectors = result.get("vectors", [])
|
||||
```
|
||||
|
||||
**Athari:** Watumiaji hawawezi kuingiza data kwa wingi kupitia amri. Kuchakata faili ya maandishi inahitaji utendaji wa N mara.
|
||||
|
||||
#### 7. SDK ya Python
|
||||
|
||||
SDK ya Python hutoa madarasa mawili ya wateja kwa kuingiliana na huduma za TrustGraph. Zote mbili zinaunga mkono tu kuingiza maandishi moja.
|
||||
|
||||
**Faili:** `trustgraph-base/trustgraph/api/flow.py`
|
||||
|
||||
```python
|
||||
class FlowInstance:
|
||||
def embeddings(self, text):
|
||||
"""Get embeddings for a single text"""
|
||||
input = {"text": text}
|
||||
return self.request("service/embeddings", input)["vectors"]
|
||||
```
|
||||
|
||||
**Faili:** `trustgraph-base/trustgraph/api/socket_client.py`
|
||||
|
||||
```python
|
||||
class SocketFlowInstance:
|
||||
def embeddings(self, text: str, **kwargs: Any) -> Dict[str, Any]:
|
||||
"""Get embeddings for a single text via WebSocket"""
|
||||
request = {"text": text}
|
||||
return self.client._send_request_sync(
|
||||
"embeddings", self.flow_id, request, False
|
||||
)
|
||||
```
|
||||
|
||||
**Athari:** Wasanidi wa Python wanaotumia SDK lazima watumie mzunguko kwenye maandishi na kufanya maombi tofauti ya API. Hakuna msaada wa uingizaji wa data kwa wingi (batch) kwa watumiaji wa SDK.
|
||||
|
||||
### Athari za Utendaji
|
||||
|
||||
Kwa uingizaji wa kawaida wa hati (vifaa 1000 vya maandishi):
|
||||
**Sasa**: Maombi 1000 tofauti, matumizi 1000 ya mfumo wa utambuzi (model inference).
|
||||
**Kwa wingi (batch_size=32)**: Maombi 32, matumizi 32 ya mfumo wa utambuzi (96.8% ya kupungua).
|
||||
|
||||
Kwa uingizaji wa data kwa wingi (message na vitu 50):
|
||||
**Sasa**: Simu 50 za `await` mfululizo, takriban sekunde 5-10.
|
||||
**Kwa wingi**: Simu 1-2 za wingi, takriban sekunde 0.5-1 (uboreshaji wa mara 5-10).
|
||||
|
||||
Maktaba kama FastEmbed na zile sawa hufikia ongezeko la takriban moja kwa moja la ufanisi kwa wingi, hadi kikomo cha vifaa (kawaida vifaa 32-128 kwa wingi).
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Muundo
|
||||
|
||||
Uboreshaji wa uingizaji wa data kwa wingi unahitaji mabadiliko katika vipengele vifuatavyo:
|
||||
|
||||
#### 1. **Uboreshaji wa Mfumo**
|
||||
Panua `EmbeddingsRequest` ili kusaidia maandishi mengi.
|
||||
Panua `EmbeddingsResponse` ili kurejesha seti nyingi za vector.
|
||||
Dumishe utangamano na maombi ya maandishi moja.
|
||||
|
||||
Moduli: `trustgraph-base/trustgraph/schema/services/llm.py`
|
||||
|
||||
#### 2. **Uboreshaji wa Huduma ya Msingi**
|
||||
Sasisha `EmbeddingsService` ili kushughulikia maombi ya wingi.
|
||||
Ongeza usanidi wa ukubwa wa wingi.
|
||||
Lenga ushughulikiaji wa maombi unaoelinganisha na wingi.
|
||||
|
||||
Moduli: `trustgraph-base/trustgraph/base/embeddings_service.py`
|
||||
|
||||
#### 3. **Sasisho za Mchakato wa Mtoa Huduma**
|
||||
Sasisha mchakato wa FastEmbed ili kupitisha wingi kamili kwa `embed()`.
|
||||
Sasisha mchakato wa Ollama ili kushughulikia wingi (ikiwa inasaidiwa).
|
||||
Ongeza ushughulikiaji wa mfululizo kama njia ya dharura kwa watoa huduma ambao hawasaidii wingi.
|
||||
|
||||
Moduli:
|
||||
`trustgraph-flow/trustgraph/embeddings/fastembed/processor.py`
|
||||
`trustgraph-flow/trustgraph/embeddings/ollama/processor.py`
|
||||
|
||||
#### 4. **Uboreshaji kwa Mteja**
|
||||
Ongeza njia ya kuingiza data kwa wingi katika `EmbeddingsClient`
|
||||
Saidia API za moja kwa moja na za wingi
|
||||
Ongeza uingizaji wa data kwa wingi kwa data kubwa
|
||||
|
||||
Moduli: `trustgraph-base/trustgraph/base/embeddings_client.py`
|
||||
|
||||
#### 5. **Sasisho kwa Msimuizi - Wasindikaji wa Mchakato**
|
||||
Sasisha `graph_embeddings` ili kuingiza muktadha wa vitu kwa wingi
|
||||
Sasisha `row_embeddings` ili kuingiza maandishi ya faharasa kwa wingi
|
||||
Sasisha `document_embeddings` ikiwa uingizaji wa data kwa wingi unawezekana
|
||||
|
||||
Moduli:
|
||||
`trustgraph-flow/trustgraph/embeddings/graph_embeddings/embeddings.py`
|
||||
`trustgraph-flow/trustgraph/embeddings/row_embeddings/embeddings.py`
|
||||
`trustgraph-flow/trustgraph/embeddings/document_embeddings/embeddings.py`
|
||||
|
||||
#### 6. **Uboreshaji kwa Lango la API**
|
||||
Ongeza mwisho wa kuingiza data kwa wingi
|
||||
Saidia safu ya maandishi katika mwili wa ombi
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/gateway/dispatch/embeddings.py`
|
||||
|
||||
#### 7. **Uboreshaji kwa Zana ya CLI**
|
||||
Ongeza usaidizi wa maandishi mengi au uingizaji wa faili
|
||||
Ongeza parameter ya ukubwa wa wingi
|
||||
|
||||
Moduli: `trustgraph-cli/trustgraph/cli/invoke_embeddings.py`
|
||||
|
||||
#### 8. **Uboreshaji kwa SDK ya Python**
|
||||
Ongeza njia ya `embeddings_batch()` katika `FlowInstance`
|
||||
Ongeza njia ya `embeddings_batch()` katika `SocketFlowInstance`
|
||||
Saidia API za moja kwa moja na za wingi kwa watumiaji wa SDK
|
||||
|
||||
Moduli:
|
||||
`trustgraph-base/trustgraph/api/flow.py`
|
||||
`trustgraph-base/trustgraph/api/socket_client.py`
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
#### EmbeddingsRequest
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class EmbeddingsRequest:
|
||||
texts: list[str] = field(default_factory=list)
|
||||
```
|
||||
|
||||
Matumizi:
|
||||
Nakala moja: `EmbeddingsRequest(texts=["hello world"])`
|
||||
Kundi: `EmbeddingsRequest(texts=["text1", "text2", "text3"])`
|
||||
|
||||
#### Jibu la Uelekezaji
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class EmbeddingsResponse:
|
||||
error: Error | None = None
|
||||
vectors: list[list[list[float]]] = field(default_factory=list)
|
||||
```
|
||||
|
||||
Muundo wa majibu:
|
||||
`vectors[i]` ina mkusanyiko wa vektali kwa `texts[i]`
|
||||
Kila mkusanyiko wa vektali ni `list[list[float]]` (modeli zinaweza kurejesha vektali nyingi kwa kila maandishi)
|
||||
Kwa mfano: maandishi 3 → `vectors` ina vipengele 3, kila kipengele kina uelekezo wa maandishi hayo
|
||||
|
||||
### API
|
||||
|
||||
#### EmbeddingsClient
|
||||
|
||||
```python
|
||||
class EmbeddingsClient(RequestResponse):
|
||||
async def embed(
|
||||
self,
|
||||
texts: list[str],
|
||||
timeout: float = 300,
|
||||
) -> list[list[list[float]]]:
|
||||
"""
|
||||
Embed one or more texts in a single request.
|
||||
|
||||
Args:
|
||||
texts: List of texts to embed
|
||||
timeout: Timeout for the operation
|
||||
|
||||
Returns:
|
||||
List of vector sets, one per input text
|
||||
"""
|
||||
resp = await self.request(
|
||||
EmbeddingsRequest(texts=texts),
|
||||
timeout=timeout
|
||||
)
|
||||
if resp.error:
|
||||
raise RuntimeError(resp.error.message)
|
||||
return resp.vectors
|
||||
```
|
||||
|
||||
#### Ncha ya Ufikiaji (Endpoint) ya Uingizaji (Embedding) ya Langara ya API
|
||||
|
||||
Ncha ya ufikiaji (endpoint) imesasishwa ili kusaidia uingizaji (embedding) mmoja au wa kikundi:
|
||||
|
||||
```
|
||||
POST /api/v1/embeddings
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"texts": ["text1", "text2", "text3"],
|
||||
"flow_id": "default"
|
||||
}
|
||||
|
||||
Response:
|
||||
{
|
||||
"vectors": [
|
||||
[[0.1, 0.2, ...]],
|
||||
[[0.3, 0.4, ...]],
|
||||
[[0.5, 0.6, ...]]
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
#### Awamu ya 1: Marekebisho ya Mfumo
|
||||
|
||||
**Ombi la Uingizaji:**
|
||||
```python
|
||||
@dataclass
|
||||
class EmbeddingsRequest:
|
||||
texts: list[str] = field(default_factory=list)
|
||||
```
|
||||
|
||||
**Jibu la Uingizwaji:**
|
||||
```python
|
||||
@dataclass
|
||||
class EmbeddingsResponse:
|
||||
error: Error | None = None
|
||||
vectors: list[list[list[float]]] = field(default_factory=list)
|
||||
```
|
||||
|
||||
**Sasisho la `EmbeddingsService.on_request`:**
|
||||
```python
|
||||
async def on_request(self, msg, consumer, flow):
|
||||
request = msg.value()
|
||||
id = msg.properties()["id"]
|
||||
model = flow("model")
|
||||
|
||||
vectors = await self.on_embeddings(request.texts, model=model)
|
||||
response = EmbeddingsResponse(error=None, vectors=vectors)
|
||||
|
||||
await flow("response").send(response, properties={"id": id})
|
||||
```
|
||||
|
||||
#### Awamu ya 2: Sasisho la Mchakato wa FastEmbed
|
||||
|
||||
**Sasa (Haina ufanisi):**
|
||||
```python
|
||||
async def on_embeddings(self, text, model=None):
|
||||
use_model = model or self.default_model
|
||||
self._load_model(use_model)
|
||||
vecs = self.embeddings.embed([text]) # Batch of 1
|
||||
return [v.tolist() for v in vecs]
|
||||
```
|
||||
|
||||
**Imebhadiliwa:**
|
||||
```python
|
||||
async def on_embeddings(self, texts: list[str], model=None):
|
||||
"""Embed texts - processes all texts in single model call"""
|
||||
if not texts:
|
||||
return []
|
||||
|
||||
use_model = model or self.default_model
|
||||
self._load_model(use_model)
|
||||
|
||||
# FastEmbed handles the full batch efficiently
|
||||
all_vecs = list(self.embeddings.embed(texts))
|
||||
|
||||
# Return list of vector sets, one per input text
|
||||
return [[v.tolist()] for v in all_vecs]
|
||||
```
|
||||
|
||||
#### Awamu ya 3: Sasisho la Huduma ya Uingizaji Picha kwenye Grafu
|
||||
|
||||
**Sasa (Mfululizo):**
|
||||
```python
|
||||
async def on_message(self, msg, consumer, flow):
|
||||
entities = []
|
||||
for entity in v.entities:
|
||||
vectors = await flow("embeddings-request").embed(text=entity.context)
|
||||
entities.append(EntityEmbeddings(...))
|
||||
```
|
||||
|
||||
**Imebhadilishwa (Kundi):**
|
||||
```python
|
||||
async def on_message(self, msg, consumer, flow):
|
||||
# Collect all contexts
|
||||
contexts = [entity.context for entity in v.entities]
|
||||
|
||||
# Single batch embedding call
|
||||
all_vectors = await flow("embeddings-request").embed(texts=contexts)
|
||||
|
||||
# Pair results with entities
|
||||
entities = [
|
||||
EntityEmbeddings(
|
||||
entity=entity.entity,
|
||||
vectors=vectors[0], # First vector from the set
|
||||
chunk_id=entity.chunk_id,
|
||||
)
|
||||
for entity, vectors in zip(v.entities, all_vectors)
|
||||
]
|
||||
```
|
||||
|
||||
#### Awamu ya 4: Sasisho la Huduma ya Uwekaji Data katika Safu
|
||||
|
||||
**Sasa (Mfululizo):**
|
||||
```python
|
||||
for text, (index_name, index_value) in texts_to_embed.items():
|
||||
vectors = await flow("embeddings-request").embed(text=text)
|
||||
embeddings_list.append(RowIndexEmbedding(...))
|
||||
```
|
||||
|
||||
**Imebhadilishwa (Kundi):**
|
||||
```python
|
||||
# Collect texts and metadata
|
||||
texts = list(texts_to_embed.keys())
|
||||
metadata = list(texts_to_embed.values())
|
||||
|
||||
# Single batch embedding call
|
||||
all_vectors = await flow("embeddings-request").embed(texts=texts)
|
||||
|
||||
# Pair results
|
||||
embeddings_list = [
|
||||
RowIndexEmbedding(
|
||||
index_name=meta[0],
|
||||
index_value=meta[1],
|
||||
text=text,
|
||||
vectors=vectors[0] # First vector from the set
|
||||
)
|
||||
for text, meta, vectors in zip(texts, metadata, all_vectors)
|
||||
]
|
||||
```
|
||||
|
||||
#### Awamu ya 5: Kuboresha Zana ya Kifaa cha Amri (CLI)
|
||||
|
||||
**CLI iliyoboreshwa:**
|
||||
```python
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(...)
|
||||
|
||||
parser.add_argument(
|
||||
'text',
|
||||
nargs='*', # Zero or more texts
|
||||
help='Text(s) to convert to embedding vectors',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-f', '--file',
|
||||
help='File containing texts (one per line)',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--batch-size',
|
||||
type=int,
|
||||
default=32,
|
||||
help='Batch size for processing (default: 32)',
|
||||
)
|
||||
```
|
||||
|
||||
Matumizi:
|
||||
```bash
|
||||
# Single text (existing)
|
||||
tg-invoke-embeddings "hello world"
|
||||
|
||||
# Multiple texts
|
||||
tg-invoke-embeddings "text one" "text two" "text three"
|
||||
|
||||
# From file
|
||||
tg-invoke-embeddings -f texts.txt --batch-size 64
|
||||
```
|
||||
|
||||
#### Awamu ya 6: Kuboresha Kitengo cha Programu (SDK) cha Python
|
||||
|
||||
**FlowInstance (mfumo wa wateja wa HTTP):**
|
||||
|
||||
```python
|
||||
class FlowInstance:
|
||||
def embeddings(self, texts: list[str]) -> list[list[list[float]]]:
|
||||
"""
|
||||
Get embeddings for one or more texts.
|
||||
|
||||
Args:
|
||||
texts: List of texts to embed
|
||||
|
||||
Returns:
|
||||
List of vector sets, one per input text
|
||||
"""
|
||||
input = {"texts": texts}
|
||||
return self.request("service/embeddings", input)["vectors"]
|
||||
```
|
||||
|
||||
**SocketFlowInstance (mfumo wa mteja wa WebSocket):**
|
||||
|
||||
```python
|
||||
class SocketFlowInstance:
|
||||
def embeddings(self, texts: list[str], **kwargs: Any) -> list[list[list[float]]]:
|
||||
"""
|
||||
Get embeddings for one or more texts via WebSocket.
|
||||
|
||||
Args:
|
||||
texts: List of texts to embed
|
||||
|
||||
Returns:
|
||||
List of vector sets, one per input text
|
||||
"""
|
||||
request = {"texts": texts}
|
||||
response = self.client._send_request_sync(
|
||||
"embeddings", self.flow_id, request, False
|
||||
)
|
||||
return response["vectors"]
|
||||
```
|
||||
|
||||
**Mfano wa Matumizi ya SDK:**
|
||||
|
||||
```python
|
||||
# Single text
|
||||
vectors = flow.embeddings(["hello world"])
|
||||
print(f"Dimensions: {len(vectors[0][0])}")
|
||||
|
||||
# Batch embedding
|
||||
texts = ["text one", "text two", "text three"]
|
||||
all_vectors = flow.embeddings(texts)
|
||||
|
||||
# Process results
|
||||
for text, vecs in zip(texts, all_vectors):
|
||||
print(f"{text}: {len(vecs[0])} dimensions")
|
||||
```
|
||||
|
||||
## Mambo ya Kuzingatia Kuhusu Usalama
|
||||
|
||||
**Vikomo vya Ukubwa wa Ombi**: Punguza ukubwa wa juu wa kila kikundi ili kuzuia matumizi yasiyofaa ya rasilimali.
|
||||
**Usimamizi wa Muda wa Hesabu (Timeout)**: Punguza muda wa hesabu ipasavyo kwa ukubwa wa kikundi.
|
||||
**Vikomo vya Kumbukumbu**: Fuatilia matumizi ya kumbukumbu kwa vikundi vikubwa.
|
||||
**Uthibitisho wa Pembejeo**: Thibitisha maandishi yote katika kikundi kabla ya kuchakata.
|
||||
|
||||
## Mambo ya Kuzingatia Kuhusu Utendaji
|
||||
|
||||
### Ubora Unaotarajiwa
|
||||
|
||||
**Uwezo wa Kuchakata (Throughput):**
|
||||
Maandishi moja: ~10-50 maandishi/sekunde (kulingana na mfumo)
|
||||
Kikundi (ukubwa wa 32): ~200-500 maandishi/sekunde (uboreshaji wa 5-10x)
|
||||
|
||||
**Muda wa Kuchakata Kila Maandishi:**
|
||||
Maandishi moja: 50-200ms kwa kila maandishi
|
||||
Kikundi (ukubwa wa 32): 5-20ms kwa kila maandishi (kwa wastani)
|
||||
|
||||
**Ubora Maalum kwa Huduma:**
|
||||
|
||||
| Huduma | Sasa | Kwa Kikundi | Uboreshaji |
|
||||
|---------|---------|---------|-------------|
|
||||
| Uwekaji Picha (50 vitu) | 5-10s | 0.5-1s | 5-10x |
|
||||
| Uwekaji Mistari (100 maandishi) | 10-20s | 1-2s | 5-10x |
|
||||
| Uingizaji wa Hati (1000 sehemu) | 100-200s | 10-30s | 5-10x |
|
||||
|
||||
### Vigezo vya Usanidi
|
||||
|
||||
```python
|
||||
# Recommended defaults
|
||||
DEFAULT_BATCH_SIZE = 32
|
||||
MAX_BATCH_SIZE = 128
|
||||
BATCH_TIMEOUT_MULTIPLIER = 2.0
|
||||
```
|
||||
|
||||
## Mbinu ya Majaribio
|
||||
|
||||
### Majaribio ya Kitengo
|
||||
Uingizaji wa maandishi moja (utangamano wa nyuma)
|
||||
Usimamizi wa kundi tupu
|
||||
Utumiaji wa ukubwa wa juu wa kundi
|
||||
Usimamizi wa makosa kwa kushindwa kwa kundi
|
||||
|
||||
### Majaribio ya Uunganisho
|
||||
Uingizaji wa kundi kamili kupitia Pulsar
|
||||
Uchakataji wa kundi wa huduma ya uingizaji wa grafu
|
||||
Uchakataji wa kundi wa huduma ya uingizaji wa mstari
|
||||
Ncha ya kundi ya lango la API
|
||||
|
||||
### Majaribio ya Utendaji
|
||||
Tathmini ya kasi ya uingizaji wa moja dhidi ya kundi
|
||||
Matumizi ya kumbukumbu chini ya saizi tofauti za kundi
|
||||
Uchambuzi wa usambazaji wa kuchelewesha
|
||||
|
||||
## Mpango wa Uhamisho
|
||||
|
||||
Hii ni toleo la mabadiliko makubwa. Awamu zote zinafanywa pamoja.
|
||||
|
||||
### Awamu ya 1: Mabadiliko ya Mpango
|
||||
Badilisha `text: str` na `texts: list[str]` katika EmbeddingsRequest
|
||||
Badilisha aina ya `vectors` kuwa `list[list[list[float]]]` katika EmbeddingsResponse
|
||||
|
||||
### Awamu ya 2: Masuala ya Marekebisho
|
||||
Sasisha saini ya `on_embeddings` katika washauri wa FastEmbed na Ollama
|
||||
Chakata kundi kamili katika wito mmoja wa modeli
|
||||
|
||||
### Awamu ya 3: Masuala ya Wateja
|
||||
Sasisha `EmbeddingsClient.embed()` ili kukubali `texts: list[str]`
|
||||
|
||||
### Awamu ya 4: Watumiaji
|
||||
Sasisha graph_embeddings ili kuingiza muktadha wa vitu katika kundi
|
||||
Sasisha row_embeddings ili kuingiza maandishi ya faharasa katika kundi
|
||||
Sasisha document_embeddings ili itumie mpango mpya
|
||||
Sasisha zana ya CLI
|
||||
|
||||
### Awamu ya 5: Lango la API
|
||||
Sasisha ncha ya uingizaji kwa mpango mpya
|
||||
|
||||
### Awamu ya 6: SDK ya Python
|
||||
Sasisha saini ya `FlowInstance.embeddings()`
|
||||
Sasisha saini ya `SocketFlowInstance.embeddings()`
|
||||
|
||||
## Maswali ya Funguo
|
||||
|
||||
**Uingizaji wa Kundi Kubwa**: Je, tunapaswa kusaidia uingizaji wa matokeo kwa kundi kubwa sana (>100 maandishi)?
|
||||
**Vikomo Maalum vya Mtoa Huduma**: Je, tunapaswa kushughulikia watoa huduma wenye saizi tofauti za kundi?
|
||||
**Usimamizi wa Kushindwa kwa Kiasi**: Ikiwa maandishi moja katika kundi kushindwa, je, tunapaswa kushindwa kundi lote au kurudisha matokeo ya sehemu?
|
||||
**Uingizaji wa Kundi wa Mengine**: Je, tunapaswa kuingiza kote kwa ujumbe mwingi wa Chunk au kuendelea na uchakataji wa kila ujumbe?
|
||||
|
||||
## Marejeleo
|
||||
|
||||
[Dokumenti ya FastEmbed](https://github.com/qdrant/fastembed)
|
||||
[API ya Uingizaji ya Ollama](https://github.com/ollama/ollama)
|
||||
[Utekelezaji wa Huduma ya Uingizaji](trustgraph-base/trustgraph/base/embeddings_service.py)
|
||||
[Uboreshaji wa Utendaji wa GraphRAG](graphrag-performance-optimization.md)
|
||||
268
docs/tech-specs/sw/entity-centric-graph.sw.md
Normal file
268
docs/tech-specs/sw/entity-centric-graph.sw.md
Normal file
|
|
@ -0,0 +1,268 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Uhifadhi wa Mfumo wa Maarifa unaozingatia Vitu kwenye Cassandra"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Uhifadhi wa Mfumo wa Maarifa unaozingatia Vitu kwenye Cassandra
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Hati hii inaelezea mfumo wa uhifadhi wa mifumo ya maarifa ya mtindo wa RDF kwenye Apache Cassandra. Mfumo hutumia mbinu inayozingatia **vitu**, ambapo kila kitu kinajua kila nne (quad) ambacho kinashiriki na jukumu ambalo linacheza. Hii inabadilisha mbinu ya awali ya meza nyingi zinazobadilisha mpangilio wa SPO kuwa meza mbili tu.
|
||||
|
||||
## Asili na Lengo
|
||||
|
||||
### Mbinu ya Jadi
|
||||
|
||||
Hifadhi ya kawaida ya nne (quad) ya RDF kwenye Cassandra inahitaji meza nyingi zisizofaa ili kufidia mitindo ya maswali — kwa kawaida meza 6 au zaidi zinazowakilisha mabadiliko tofauti ya Mada (Subject), Kielelezo (Predicate), Kitu (Object), na Kundi (Dataset) (SPOD). Kila nne (quad) imeandikwa kwenye kila meza, na kusababisha ongezeko kubwa la uandishi, gharama za uendeshaji, na utata wa muundo.
|
||||
|
||||
Zaidi ya hayo, utatuzi wa lebo (kupata majina yanayoweza kusomwa kwa urahisi kwa vitu) inahitaji maswali ya ziada, ambayo ni ya gharama kubwa hasa katika matumizi ya AI na GraphRAG ambapo lebo ni muhimu kwa muktadha wa LLM.
|
||||
|
||||
### Mwangaza wa Mbinu inayozingatia Vitu
|
||||
|
||||
Kila kikundi cha `(D, S, P, O)` kinahusisha hadi vitu 4. Kwa kuandika safu kwa kila shiriki la kitu katika kikundi, tunahakikisha kwamba **maswali yoyote yenye angalau kipengele kimoja kinachojulikana yatatumia ufunguo wa sehemu**. Hii inafunika mifumo yote 16 ya maswali na jedwali moja la data.
|
||||
|
||||
Faida kuu:
|
||||
|
||||
**Jedwali 2** badala ya 7+
|
||||
**Hali 4 kwa kila kikundi** badala ya 6+
|
||||
**Utatuzi wa lebo bila malipo** — lebo za kitu ziko karibu na uhusiano wake, na hivyo kuongeza kasi ya kumbukumbu ya programu.
|
||||
**Mifumo yote 16 ya maswali** hutumika kwa usomaji wa sehemu moja.
|
||||
**Utendaji rahisi** — jedwali moja la data ili kurekebisha, kupunguza, na kurekebisha.
|
||||
|
||||
## Mpango
|
||||
|
||||
### Jedwali 1: quads_by_entity
|
||||
|
||||
Jedwali kuu la data. Kila kitu kina sehemu inayoyakilisha vikundi vyote ambavyo kitu hicho kinashiriki. Jina lake linaonyesha mfumo wa swali (utafutaji kwa kutumia kitu).
|
||||
|
||||
```sql
|
||||
CREATE TABLE quads_by_entity (
|
||||
collection text, -- Collection/tenant scope (always specified)
|
||||
entity text, -- The entity this row is about
|
||||
role text, -- 'S', 'P', 'O', 'G' — how this entity participates
|
||||
p text, -- Predicate of the quad
|
||||
otype text, -- 'U' (URI), 'L' (literal), 'T' (triple/reification)
|
||||
s text, -- Subject of the quad
|
||||
o text, -- Object of the quad
|
||||
d text, -- Dataset/graph of the quad
|
||||
dtype text, -- XSD datatype (when otype = 'L'), e.g. 'xsd:string'
|
||||
lang text, -- Language tag (when otype = 'L'), e.g. 'en', 'fr'
|
||||
PRIMARY KEY ((collection, entity), role, p, otype, s, o, d, dtype, lang)
|
||||
);
|
||||
```
|
||||
|
||||
**Ufunguo wa partition:** `(collection, entity)` — unafichwa kwa mkusanyiko, na partition moja kwa kila entiti.
|
||||
|
||||
**Mazingira ya utaratibu wa safu za kuunganisha (clustering):**
|
||||
|
||||
1. **role** — maswali mengi huanza kwa "entiti hii ni mada/jambo"
|
||||
2. **p** — chujio cha kawaida cha pili, "nipa uhusiano wote wa `knows`"
|
||||
3. **otype** — inaruhusu kuchujwa kwa uhusiano wenye thamani ya URI dhidi ya uhusiano wenye thamani ya moja kwa moja
|
||||
4. **s, o, d** — safu zilizobaki kwa uhakikisho
|
||||
5. **dtype, lang** — hutofautisha maandishi yenye thamani sawa lakini metadata tofauti ya aina (k.m., `"thing"` vs `"thing"@en` vs `"thing"^^xsd:string`)
|
||||
|
||||
### Jedwali 2: quads_by_collection
|
||||
|
||||
Inasaidia maswali na uondoaji wa kiwango cha mkusanyiko. Inatoa orodha ya quads zote zinazohusiana na mkusanyiko. Imejina ili kuonyesha muundo wa swali (utafutaji kwa mkusanyiko).
|
||||
|
||||
```sql
|
||||
CREATE TABLE quads_by_collection (
|
||||
collection text,
|
||||
d text, -- Dataset/graph of the quad
|
||||
s text, -- Subject of the quad
|
||||
p text, -- Predicate of the quad
|
||||
o text, -- Object of the quad
|
||||
otype text, -- 'U' (URI), 'L' (literal), 'T' (triple/reification)
|
||||
dtype text, -- XSD datatype (when otype = 'L')
|
||||
lang text, -- Language tag (when otype = 'L')
|
||||
PRIMARY KEY (collection, d, s, p, o, otype, dtype, lang)
|
||||
);
|
||||
```
|
||||
|
||||
Imefunganishwa kwanza kwa muundo wa data, na hivyo kuruhusu kufutwa kwa vitu au muundo wa data. Safu za `otype`, `dtype`, na `lang` zimejumuishwa katika ufunganishi ili kutofautisha vipengele ambavyo vina thamani sawa lakini data tofauti — katika RDF, `"thing"`, `"thing"@en`, na `"thing"^^xsd:string` ni thamani tofauti kwa maana.
|
||||
|
||||
## Njia ya Kuandika
|
||||
|
||||
Kwa kila kipengele kinachokuja `(D, S, P, O)` ndani ya mkusanyiko `C`, andika **safu 4** kwenye `quads_by_entity` na **safu 1** kwenye `quads_by_collection`.
|
||||
|
||||
### Mfano
|
||||
|
||||
Ikiwa kuna kipengele katika mkusanyiko `tenant1`:
|
||||
|
||||
```
|
||||
Dataset: https://example.org/graph1
|
||||
Subject: https://example.org/Alice
|
||||
Predicate: https://example.org/knows
|
||||
Object: https://example.org/Bob
|
||||
```
|
||||
|
||||
Andika mistari 4 hadi `quads_by_entity`:
|
||||
|
||||
| mkusanyiko | kitu | jukumu | p | aina ya kitu | s | o | d |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| tenant1 | https://example.org/graph1 | G | https://example.org/knows | U | https://example.org/Alice | https://example.org/Bob | https://example.org/graph1 |
|
||||
| tenant1 | https://example.org/Alice | S | https://example.org/knows | U | https://example.org/Alice | https://example.org/Bob | https://example.org/graph1 |
|
||||
| tenant1 | https://example.org/knows | P | https://example.org/knows | U | https://example.org/Alice | https://example.org/Bob | https://example.org/graph1 |
|
||||
| tenant1 | https://example.org/Bob | O | https://example.org/knows | U | https://example.org/Alice | https://example.org/Bob | https://example.org/graph1 |
|
||||
|
||||
Andika mstari 1 hadi `quads_by_collection`:
|
||||
|
||||
| mkusanyiko | d | s | p | o | aina ya kitu | aina ya data | lugha |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| tenant1 | https://example.org/graph1 | https://example.org/Alice | https://example.org/knows | https://example.org/Bob | U | | |
|
||||
|
||||
### Mfano Halisi
|
||||
|
||||
Kwa jozi ya lebo:
|
||||
|
||||
```
|
||||
Dataset: https://example.org/graph1
|
||||
Subject: https://example.org/Alice
|
||||
Predicate: http://www.w3.org/2000/01/rdf-schema#label
|
||||
Object: "Alice Smith" (lang: en)
|
||||
```
|
||||
|
||||
Msimbo `otype` ni `'L'`, `dtype` ni `'xsd:string'`, na `lang` ni `'en'`. Thamani halisi `"Alice Smith"` huhifadhiwa katika `o`. Safu 3 tu zinahitajika katika `quads_by_entity` — hakuna safu inayorekodiwa kwa thamani kama kitu, kwa sababu vitu haviwezi kuchunguzwa kando.
|
||||
|
||||
## Mifumo ya Uchunguzi
|
||||
|
||||
### Mifumo 16 Yote ya DSPO
|
||||
|
||||
Katika meza iliyo hapa chini, "Kielelezo kamili" ina maana kwamba swali hutumia kielelezo cha kuendelea cha safu za kuunganisha. "Ufuatiliaji wa sehemu + chujio" ina maana kwamba Cassandra husoma sehemu ya moja ya sehemu na kuchuja katika kumbukumbu — bado ni ufanisi, lakini sio mechi ya kielelezo safi.
|
||||
|
||||
| # | Inajulikana | Tafuta kitu | Kielelezo cha kuunganisha | Ufanisi |
|
||||
|---|---|---|---|---|
|
||||
| 1 | D,S,P,O | kitu=S, jukumu='S', p=P | Mechi kamili | Kielelezo kamili |
|
||||
| 2 | D,S,P,? | kitu=S, jukumu='S', p=P | Chujio kwenye D | Ufuatiliaji wa sehemu + chujio |
|
||||
| 3 | D,S,?,O | kitu=S, jukumu='S' | Chujio kwenye D, O | Ufuatiliaji wa sehemu + chujio |
|
||||
| 4 | D,?,P,O | kitu=O, jukumu='O', p=P | Chujio kwenye D | Ufuatiliaji wa sehemu + chujio |
|
||||
| 5 | ?,S,P,O | kitu=S, jukumu='S', p=P | Chujio kwenye O | Ufuatiliaji wa sehemu + chujio |
|
||||
| 6 | D,S,?,? | kitu=S, jukumu='S' | Chujio kwenye D | Ufuatiliaji wa sehemu + chujio |
|
||||
| 7 | D,?,P,? | kitu=P, jukumu='P' | Chujio kwenye D | Ufuatiliaji wa sehemu + chujio |
|
||||
| 8 | D,?,?,O | kitu=O, jukumu='O' | Chujio kwenye D | Ufuatiliaji wa sehemu + chujio |
|
||||
| 9 | ?,S,P,? | kitu=S, jukumu='S', p=P | — | **Kielelezo kamili** |
|
||||
| 10 | ?,S,?,O | kitu=S, jukumu='S' | Chujio kwenye O | Ufuatiliaji wa sehemu + chujio |
|
||||
| 11 | ?,?,P,O | kitu=O, jukumu='O', p=P | — | **Kielelezo kamili** |
|
||||
| 12 | D,?,?,? | kitu=D, jukumu='G' | — | **Kielelezo kamili** |
|
||||
| 13 | ?,S,?,? | kitu=S, jukumu='S' | — | **Kielelezo kamili** |
|
||||
| 14 | ?,?,P,? | kitu=P, jukumu='P' | — | **Kielelezo kamili** |
|
||||
| 15 | ?,?,?,O | kitu=O, jukumu='O' | — | **Kielelezo kamili** |
|
||||
| 16 | ?,?,?,? | — | Ufuatiliaji kamili | Uchunguzi tu |
|
||||
|
||||
**Matokeo muhimu**: 7 kati ya mifumo 15 isiyo ya kawaida ni mechi kamili za kielelezo cha kuunganisha. Mifumo 8 iliyobaki ni usomaji wa sehemu moja na chujio ndani ya sehemu. Kila swali lenye kipengele kinachojulikana hupiga ufunguo wa sehemu.
|
||||
|
||||
Mfumo 16 (?,?,?,?) haujulikani katika mazoezi kwa sababu mkusanyiko daima umeelezwa, na hivyo kuifanya iwe mfumo wa 12.
|
||||
|
||||
### Mifano ya kawaida ya swali
|
||||
|
||||
**Kila kitu kuhusu kitu:**
|
||||
|
||||
```sql
|
||||
SELECT * FROM quads_by_entity
|
||||
WHERE collection = 'tenant1' AND entity = 'https://example.org/Alice';
|
||||
```
|
||||
|
||||
**Uhusiano wote unaotoka kwa kitu:**
|
||||
|
||||
```sql
|
||||
SELECT * FROM quads_by_entity
|
||||
WHERE collection = 'tenant1' AND entity = 'https://example.org/Alice'
|
||||
AND role = 'S';
|
||||
```
|
||||
|
||||
**Tabia maalum ya kitu:**
|
||||
|
||||
```sql
|
||||
SELECT * FROM quads_by_entity
|
||||
WHERE collection = 'tenant1' AND entity = 'https://example.org/Alice'
|
||||
AND role = 'S' AND p = 'https://example.org/knows';
|
||||
```
|
||||
|
||||
**KILabeli kwa kitu (lugha mahususi):**
|
||||
|
||||
```sql
|
||||
SELECT * FROM quads_by_entity
|
||||
WHERE collection = 'tenant1' AND entity = 'https://example.org/Alice'
|
||||
AND role = 'S' AND p = 'http://www.w3.org/2000/01/rdf-schema#label'
|
||||
AND otype = 'L';
|
||||
```
|
||||
|
||||
Kisha, chambua matokeo kwa kutumia `lang = 'en'` upande wa programu, ikiwa ni lazima.
|
||||
|
||||
**Tu uhusiano ambao una thamani ya URI (viungo vya aina ya kitu-kwa-kitu):**
|
||||
|
||||
```sql
|
||||
SELECT * FROM quads_by_entity
|
||||
WHERE collection = 'tenant1' AND entity = 'https://example.org/Alice'
|
||||
AND role = 'S' AND p = 'https://example.org/knows' AND otype = 'U';
|
||||
```
|
||||
|
||||
**Utafutaji wa kurudi nyuma — ni nini kinachoelekeza kwenye kitu hiki:**
|
||||
|
||||
```sql
|
||||
SELECT * FROM quads_by_entity
|
||||
WHERE collection = 'tenant1' AND entity = 'https://example.org/Bob'
|
||||
AND role = 'O';
|
||||
```
|
||||
|
||||
## Utatuzi wa Lebo na Ukaushaji wa Kumbukumbu (Cache Warming)
|
||||
|
||||
Moja ya faida muhimu zaidi ya mfumo unaozingatia vitu (entity-centric) ni kwamba **utatuzi wa lebo unakuwa matokeo ya ziada**.
|
||||
|
||||
Katika mfumo wa zamani unaojumuisha meza nyingi, kupata lebo inahitaji maswali tofauti: pata triplet, tambua URI za vitu katika matokeo, kisha pata `rdfs:label` kwa kila moja. Mfumo huu wa N+1 ni ghali.
|
||||
|
||||
Katika mfumo unaozingatia vitu, kuhoji kitu hurejesha **quads zote** - ikiwa ni pamoja na lebo zake, aina, na sifa zingine. Wakati programu inahifadhi matokeo ya maswali, lebo zimeandaliwa kabla ya chochote kuomba.
|
||||
|
||||
Sera mbili za matumizi zinaonyesha kwamba hii inafanya kazi vizuri katika mazoezi:
|
||||
|
||||
**Maswali yanayoeleweka na binadamu**: kawaida matokeo madogo, lebo ni muhimu. Maswali ya vitu huandalia kumbukumbu (cache).
|
||||
**Maswali ya AI/wingi**: matokeo makubwa na mipaka ngumu. Lebo ama hazihitajiki au zinahitajika tu kwa sehemu ndogo ya vitu ambavyo tayari viko kwenye kumbukumbu.
|
||||
|
||||
Wasiwasi wa kisia wa kutatua lebo kwa matokeo makubwa (k.m. vitu 30,000) hupunguzwa na utambuzi wa vitendo kwamba hakuna mtumiaji wa binadamu au AI anayeweza kuchakata lebo nyingi. Mipaka ya programu ya maswali inahakikisha kwamba shinikizo la kumbukumbu linabaki linaloweza kudhibitiwa.
|
||||
|
||||
## Sehemu Zinazopaswa Kusambazwa na Ufafanuzi
|
||||
|
||||
Ufafanuzi (taarifa za aina ya RDF-star kuhusu taarifa) huunda vitu vya kitovu - k.m. hati ya chanzo ambayo inasaidia ukweli mwingi uliotolewa. Hii inaweza kuzalisha sehemu zinazopaswa kusambazwa.
|
||||
|
||||
Mambo yanayoweza kupunguza:
|
||||
|
||||
**Mipaka ya maswali ya programu**: maswali yote ya GraphRAG na yale yanayoeleweka na binadamu yana mipaka ngumu, kwa hivyo sehemu zinazopaswa kusambazwa hazisomwi kamwe kwa upeo wa njia ya usomaji.
|
||||
**Cassandra inashughulikia usomaji wa sehemu kwa ufanisi**: uchanganuzi wa safu ya ufunguo wa uainishaji na kusimamishwa mapema ni wa haraka hata kwenye sehemu kubwa.
|
||||
**Ufutaji wa mkusanyiko** (operesheni pekee ambayo inaweza kuvuka sehemu kamili) ni mchakato unaokubalika wa asili.
|
||||
|
||||
## Ufufuo wa Mkusaniko
|
||||
|
||||
Huendeshwa na wito wa API, inafanya kazi kwa asili (inatimiza kwa wakati).
|
||||
|
||||
1. Soma `quads_by_collection` kwa mkusanyiko unaolengwa ili kupata quads zote.
|
||||
2. Toa vitu vya kipekee kutoka kwa quads (mahesabu ya s, p, o, d).
|
||||
3. Kwa kila kitu cha kipekee, futa sehemu kutoka kwa `quads_by_entity`.
|
||||
4. Futa mistari kutoka kwa `quads_by_collection`.
|
||||
|
||||
Jedwali la `quads_by_collection` hutoa fahirisi inayohitajika ili kupata sehemu zote za kitu bila uchanganuzi kamili wa jedwali. Ufufuo wa kiwango cha sehemu ni wa ufanisi kwa sababu `(collection, entity)` ndio ufunguo wa sehemu.
|
||||
|
||||
## Njia ya Uhamishaji kutoka kwa Mfumo wa Meza Nyingi
|
||||
|
||||
Mfumo unaozingatia vitu unaweza kuwepo na mfumo wa zamani unaojumuisha meza nyingi wakati wa uhamishaji:
|
||||
|
||||
1. Weka meza za `quads_by_entity` na `quads_by_collection` pamoja na meza zilizopo.
|
||||
2. Andika quads mpya kwa meza zote mbili za zamani na mpya.
|
||||
3. Jaza data iliyopo kwenye meza mpya.
|
||||
4. Hamisha njia za maswali moja kwa moja.
|
||||
5. Ondoa meza za zamani baada ya maswali yote kuhamishwa.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
| Nguvu | Zamani (meza 6) | Kitu (meza 2) |
|
||||
|---|---|---|
|
||||
| Meza | 7+ | 2 |
|
||||
| Andishi kwa kila quad | 6+ | 5 (4 data + 1 manifest) |
|
||||
| Utafiti wa lebo | Safari tofauti | Bila shida kupitia ukaushaji wa kumbukumbu |
|
||||
| Mfumo wa maswali | 16 katika meza 6 | 16 katika meza 1 |
|
||||
| Ufumbuzi wa mpango | Wa juu | Wa chini |
|
||||
| Uendeshaji | Meza 6 za kurekebisha/kufufua | Jedwali 1 la data |
|
||||
| Usaidizi wa ufafanuzi | Ufumbuzi wa ziada | Inafaa asili |
|
||||
| Uchunguzi wa aina ya kitu | Haipatikani | Asili (kupitia uainishaji wa otype) |
|
||||
228
docs/tech-specs/sw/explainability-cli.sw.md
Normal file
228
docs/tech-specs/sw/explainability-cli.sw.md
Normal file
|
|
@ -0,0 +1,228 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Maelezo ya Kiufundi ya Zana za Amri (CLI) za Ufafanuzi"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Maelezo ya Kiufundi ya Zana za Amri (CLI) za Ufafanuzi
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Hali
|
||||
|
||||
Rasimu
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelezo haya yanaeleza zana za amri (CLI) za kuchanganua na kuchunguza data ya ufafanuzi katika TrustGraph. Zana hizi zinawawezesha watumiaji kufuatilia jinsi majibu yalivyopatikana na kuchanganua mnyororo wa asili kutoka kwa uhusiano (edges) hadi kwenye nyaraka za asili.
|
||||
|
||||
Zana tatu za CLI:
|
||||
|
||||
1. **`tg-show-document-hierarchy`** - Onyesha hierarkia ya nyaraka → kurasa → vipande → uhusiano
|
||||
2. **`tg-list-explain-traces`** - Orodha ya vipindi vyote vya GraphRAG na maswali
|
||||
3. **`tg-show-explain-trace`** - Onyesha mnyororo kamili wa ufafanuzi kwa kipindi
|
||||
|
||||
## Lengo
|
||||
|
||||
**Uchanganuzi**: Kuwawezesha watengenezaji kuchunguza matokeo ya usindikaji wa nyaraka
|
||||
**Ufuatiliaji**: Kufuatilia ukweli wowote uliopatikana hadi kwenye nyaraka yake ya asili
|
||||
**Unyonyaji**: Kuonyesha jinsi GraphRAG ilivyopata jibu
|
||||
**Urahisi wa Matumizi**: Kiolesura rahisi cha CLI na mipangilio ya kawaida
|
||||
|
||||
## Asili
|
||||
|
||||
TrustGraph ina mifumo miwili ya asili:
|
||||
|
||||
1. **Asili ya wakati wa uundaji** (angalia `extraction-time-provenance.md`): Inarecord uhusiano wa nyaraka → kurasa → vipande → uhusiano wakati wa kuingizwa. Hifadhiwa katika grafu iliyoitwa `urn:graph:source` kwa kutumia `prov:wasDerivedFrom`.
|
||||
|
||||
2. **Ufafanuzi wa wakati wa kuulizia** (angalia `query-time-explainability.md`): Inarecord mnyororo wa swali → uchunguzi → umakini → muhtasari wakati wa maswali ya GraphRAG. Hifadhiwa katika grafu iliyoitwa `urn:graph:retrieval`.
|
||||
|
||||
Mapungufu ya sasa:
|
||||
Hakuna njia rahisi ya kuonyesha hierarkia ya nyaraka baada ya usindikaji
|
||||
Lazima kuulize data ya ufafanuzi kwa kutumia triples
|
||||
Hakuna mtazamo uliochanganywa wa kipindi cha GraphRAG
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Zana 1: tg-show-document-hierarchy
|
||||
|
||||
**Lengo**: Ikiwa unapokea kitambulisho cha nyaraka, tembea na uonyeshe vitu vyote vilivyotokana.
|
||||
|
||||
**Matumizi**:
|
||||
```bash
|
||||
tg-show-document-hierarchy "urn:trustgraph:doc:abc123"
|
||||
tg-show-document-hierarchy --show-content --max-content 500 "urn:trustgraph:doc:abc123"
|
||||
```
|
||||
|
||||
**Vigezo**:
|
||||
| Arg | Maelezo |
|
||||
|-----|-------------|
|
||||
| `document_id` | URI ya hati (ya nafasi) |
|
||||
| `-u/--api-url` | URL ya lango (ya kawaida: `$TRUSTGRAPH_URL`) |
|
||||
| `-t/--token` | Ishara ya uthibitishaji (ya kawaida: `$TRUSTGRAPH_TOKEN`) |
|
||||
| `-U/--user` | Kitambulisho cha mtumiaji (ya kawaida: `trustgraph`) |
|
||||
| `-C/--collection` | Mkusanyiko (ya kawaida: `default`) |
|
||||
| `--show-content` | Jumuisha yaliyomo katika faili/hati |
|
||||
| `--max-content` | Herufi nyingi kwa kila faili (ya kawaida: 200) |
|
||||
| `--format` | Matokeo: `tree` (ya kawaida), `json` |
|
||||
|
||||
**Utendaji**:
|
||||
1. Tafuta data: `?child prov:wasDerivedFrom <document_id>` katika `urn:graph:source`
|
||||
2. Tafuta kwa urudi-urudi watoto wa kila matokeo
|
||||
3. Jenga muundo wa mti: Hati → Kurasa → Sehemu
|
||||
4. Ikiwa `--show-content`, pata yaliyomo kutoka kwa API ya msimamizi
|
||||
5. Onyesha kama mti ulioainishwa au JSON
|
||||
|
||||
**Mfano wa Matokeo**:
|
||||
```
|
||||
Document: urn:trustgraph:doc:abc123
|
||||
Title: "Sample PDF"
|
||||
Type: application/pdf
|
||||
|
||||
└── Page 1: urn:trustgraph:doc:abc123/p1
|
||||
├── Chunk 0: urn:trustgraph:doc:abc123/p1/c0
|
||||
│ Content: "The quick brown fox..." [truncated]
|
||||
└── Chunk 1: urn:trustgraph:doc:abc123/p1/c1
|
||||
Content: "Machine learning is..." [truncated]
|
||||
```
|
||||
|
||||
### Zana 2: tg-list-explain-traces
|
||||
|
||||
**Madhumuni**: Kuorodhesha vipindi vyote vya GraphRAG (maswali) katika mkusanyiko.
|
||||
|
||||
**Matumizi**:
|
||||
```bash
|
||||
tg-list-explain-traces
|
||||
tg-list-explain-traces --limit 20 --format json
|
||||
```
|
||||
|
||||
**Vigezo**:
|
||||
| Arg | Maelezo |
|
||||
|-----|-------------|
|
||||
| `-u/--api-url` | URL ya lango |
|
||||
| `-t/--token` | Token ya uthibitishaji |
|
||||
| `-U/--user` | Kitambulisho cha mtumiaji |
|
||||
| `-C/--collection` | Mkusanyiko |
|
||||
| `--limit` | Matokeo ya juu (ya kawaida: 50) |
|
||||
| `--format` | Matokeo: `table` (ya kawaida), `json` |
|
||||
|
||||
**Utekelezaji**:
|
||||
1. Uliza: `?session tg:query ?text` katika `urn:graph:retrieval`
|
||||
2. Uliza alama za wakati: `?session prov:startedAtTime ?time`
|
||||
3. Onyesha kama jedwali
|
||||
|
||||
**Mfano wa Matokeo**:
|
||||
```
|
||||
Session ID | Question | Time
|
||||
----------------------------------------------|--------------------------------|---------------------
|
||||
urn:trustgraph:question:abc123 | What was the War on Terror? | 2024-01-15 10:30:00
|
||||
urn:trustgraph:question:def456 | Who founded OpenAI? | 2024-01-15 09:15:00
|
||||
```
|
||||
|
||||
### Zana 3: tg-show-explain-trace
|
||||
|
||||
**Madhumuni**: Kuonyesha mnyororo kamili wa uelewaji kwa kipindi cha GraphRAG.
|
||||
|
||||
**Matumizi**:
|
||||
```bash
|
||||
tg-show-explain-trace "urn:trustgraph:question:abc123"
|
||||
tg-show-explain-trace --max-answer 1000 --show-provenance "urn:trustgraph:question:abc123"
|
||||
```
|
||||
|
||||
**Vigezo**:
|
||||
| Arg | Maelezo |
|
||||
|-----|-------------|
|
||||
| `question_id` | URI ya swali (nafasi) |
|
||||
| `-u/--api-url` | URL ya lango |
|
||||
| `-t/--token` | Ishara ya uthibitishaji |
|
||||
| `-U/--user` | Kitambulisho cha mtumiaji |
|
||||
| `-C/--collection` | Mkusanyiko |
|
||||
| `--max-answer` | Idadi ya juu ya herufi kwa jibu (ya kawaida: 500) |
|
||||
| `--show-provenance` | Fuatilia miunganisho hadi kwenye hati za asili |
|
||||
| `--format` | Pato: `text` (ya kawaida), `json` |
|
||||
|
||||
**Utendaji**:
|
||||
1. Pata maandishi ya swali kutoka kwa `tg:query`.
|
||||
2. Tafuta utafutaji: `?exp prov:wasGeneratedBy <question_id>`
|
||||
3. Tafuta umakini: `?focus prov:wasDerivedFrom <exploration_id>`
|
||||
4. Pata miunganisho iliyochaguliwa: `<focus_id> tg:selectedEdge ?edge`
|
||||
5. Kwa kila muunganisho, pata `tg:edge` (triple iliyotiwa mabano) na `tg:reasoning`.
|
||||
6. Tafuta muhtasari: `?synth prov:wasDerivedFrom <focus_id>`
|
||||
7. Pata jibu kutoka kwa `tg:document` kupitia msimamizi wa maktaba.
|
||||
8. Ikiwa `--show-provenance`, fuatilia miunganisho hadi kwenye hati za asili.
|
||||
|
||||
**Mfano wa Pato**:
|
||||
```
|
||||
=== GraphRAG Session: urn:trustgraph:question:abc123 ===
|
||||
|
||||
Question: What was the War on Terror?
|
||||
Time: 2024-01-15 10:30:00
|
||||
|
||||
--- Exploration ---
|
||||
Retrieved 50 edges from knowledge graph
|
||||
|
||||
--- Focus (Edge Selection) ---
|
||||
Selected 12 edges:
|
||||
|
||||
1. (War on Terror, definition, "A military campaign...")
|
||||
Reasoning: Directly defines the subject of the query
|
||||
Source: chunk → page 2 → "Beyond the Vigilant State"
|
||||
|
||||
2. (Guantanamo Bay, part_of, War on Terror)
|
||||
Reasoning: Shows key component of the campaign
|
||||
|
||||
--- Synthesis ---
|
||||
Answer:
|
||||
The War on Terror was a military campaign initiated...
|
||||
[truncated at 500 chars]
|
||||
```
|
||||
|
||||
## Faili Zinazotakazwa Kuundwa
|
||||
|
||||
| Faili | Madhumuni |
|
||||
|------|---------|
|
||||
| `trustgraph-cli/trustgraph/cli/show_document_hierarchy.py` | Chombo 1 |
|
||||
| `trustgraph-cli/trustgraph/cli/list_explain_traces.py` | Chombo 2 |
|
||||
| `trustgraph-cli/trustgraph/cli/show_explain_trace.py` | Chombo 3 |
|
||||
|
||||
## Faili Zinazotakazwa Kurekebishwa
|
||||
|
||||
| Faili | Marekebisho |
|
||||
|------|--------|
|
||||
| `trustgraph-cli/setup.py` | Ongeza vipengele vya `console_scripts` |
|
||||
|
||||
## Maelezo ya Utendaji
|
||||
|
||||
1. **Usalama wa yaliyomo ya binary**: Jaribu kusimbua kwa UTF-8; ikiwa hufanikiwa, onyesha `[Binary: {size} bytes]`
|
||||
2. **Ufupishaji**: Zifuata sheria za `--max-content`/`--max-answer` pamoja na ishara ya `[truncated]`
|
||||
3. **Manuku matatu yaliyotiwa:** Changanua muundo wa RDF-star kutoka kwa `predicate` ya `tg:edge`
|
||||
4. **Mifumo:** Fuata mifumo iliyopo ya CLI kutoka `query_graph.py`
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
Maswali yote yanazingatia mipaka ya mtumiaji/mkusanyiko
|
||||
Uthibitishaji wa token unaoendeshwa kupitia `--token` au `$TRUSTGRAPH_TOKEN`
|
||||
|
||||
## Mkakati wa Upimaji
|
||||
|
||||
Uthibitisho wa mwongozo kwa data ya mfano:
|
||||
```bash
|
||||
# Load a test document
|
||||
tg-load-pdf -f test.pdf -c test-collection
|
||||
|
||||
# Verify hierarchy
|
||||
tg-show-document-hierarchy "urn:trustgraph:doc:test"
|
||||
|
||||
# Run a GraphRAG query with explainability
|
||||
tg-invoke-graph-rag --explainable -q "Test question"
|
||||
|
||||
# List and inspect traces
|
||||
tg-list-explain-traces
|
||||
tg-show-explain-trace "urn:trustgraph:question:xxx"
|
||||
```
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Uwezekano wa kueleza matokeo wakati wa swali: `docs/tech-specs/query-time-explainability.md`
|
||||
Chanzo cha data wakati wa uundaji: `docs/tech-specs/extraction-time-provenance.md`
|
||||
Kifaa cha amri (CLI) cha mfano uliopo: `trustgraph-cli/trustgraph/cli/invoke_graph_rag.py`
|
||||
355
docs/tech-specs/sw/extraction-flows.sw.md
Normal file
355
docs/tech-specs/sw/extraction-flows.sw.md
Normal file
|
|
@ -0,0 +1,355 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Mchakato wa Utoaji"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Mchakato wa Utoaji
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
Hati hii inaeleza jinsi data inapita katika mfumo wa utoaji wa TrustGraph, kuanzia utoaji wa hati hadi uhifadhi katika hifadhia za maarifa.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
```
|
||||
┌──────────┐ ┌─────────────┐ ┌─────────┐ ┌────────────────────┐
|
||||
│ Librarian│────▶│ PDF Decoder │────▶│ Chunker │────▶│ Knowledge │
|
||||
│ │ │ (PDF only) │ │ │ │ Extraction │
|
||||
│ │────────────────────────▶│ │ │ │
|
||||
└──────────┘ └─────────────┘ └─────────┘ └────────────────────┘
|
||||
│ │
|
||||
│ ├──▶ Triples
|
||||
│ ├──▶ Entity Contexts
|
||||
│ └──▶ Rows
|
||||
│
|
||||
└──▶ Document Embeddings
|
||||
```
|
||||
|
||||
## Hifadhi ya Maudhui
|
||||
|
||||
### Uhifadhi wa Data (S3/Minio)
|
||||
|
||||
Maudhui ya nyaraka huhifadhiwa katika uhifadhi wa data unaolingana na S3:
|
||||
Muundo wa njia: `doc/{object_id}` ambapo object_id ni UUID
|
||||
Aina zote za nyaraka huhifadhiwa hapa: nyaraka za asili, kurasa, sehemu
|
||||
|
||||
### Uhifadhi wa MetaData (Cassandra)
|
||||
|
||||
MetaData ya nyaraka iliyohifadhiwa katika Cassandra ni pamoja na:
|
||||
Kitambulisho cha nyaraka, kichwa, aina (aina ya MIME)
|
||||
Rejea ya `object_id` kwa uhifadhi wa data
|
||||
`parent_id` kwa nyaraka za watoto (kurasa, sehemu)
|
||||
`document_type`: "chanzo", "ukurasa", "sehemu", "jibu"
|
||||
|
||||
### Kigezo cha Kwanza na la Kuendelea
|
||||
|
||||
Usafirishaji wa maudhui hutumia mkakati unaotegemea saizi:
|
||||
**< 2MB**: Maudhui yajumuishwa ndani ya ujumbe (yamekodishwa kwa base64)
|
||||
**≥ 2MB**: `document_id` pekee hutumwa; mchakato hupata kupitia API ya msimamizi
|
||||
|
||||
## Hatua ya 1: Uwasilishaji wa Nyaraka (Msimamizi)
|
||||
|
||||
### Kifaa cha Kuanzia
|
||||
|
||||
Nyaraka huingia katika mfumo kupitia operesheni ya `add-document` ya msimamizi:
|
||||
1. Maudhui yamepakuliwa kwenye uhifadhi wa data
|
||||
2. Rekodi ya metaData imeundwa katika Cassandra
|
||||
3. Inarudisha kitambulisho cha nyaraka
|
||||
|
||||
### Kuanzisha Utoaji
|
||||
|
||||
Operesheni ya `add-processing` inaanzisha utoaji:
|
||||
Inaonyesha `document_id`, `flow` (kitambulisho cha mnyororo), `collection` (hifadhi inayolengwa)
|
||||
Msimamizi wa `load_document()` hupata maudhui na huyaweka kwenye folyo ya ingizo
|
||||
|
||||
### Muundo: Nyaraka
|
||||
|
||||
```
|
||||
Document
|
||||
├── metadata: Metadata
|
||||
│ ├── id: str # Document identifier
|
||||
│ ├── user: str # Tenant/user ID
|
||||
│ ├── collection: str # Target collection
|
||||
│ └── metadata: list[Triple] # (largely unused, historical)
|
||||
├── data: bytes # PDF content (base64, if inline)
|
||||
└── document_id: str # Librarian reference (if streaming)
|
||||
```
|
||||
|
||||
**Uelekezaji:** Kulingana na sehemu `kind`:
|
||||
`application/pdf` → Kundi `document-load` → Kipangishi cha PDF
|
||||
`text/plain` → Kundi `text-load` → Kipande
|
||||
|
||||
## Hatua ya 2: Kipangishi cha PDF
|
||||
|
||||
Hubadilisha hati za PDF kuwa kurasa za maandishi.
|
||||
|
||||
### Mchakato
|
||||
|
||||
1. Pata maudhui (moja kwa moja `data` au kupitia `document_id` kutoka kwa msimamizi)
|
||||
2. Toa kurasa kwa kutumia PyPDF
|
||||
3. Kwa kila ukurasa:
|
||||
Hifadhi kama hati ndogo kwa msimamizi (`{doc_id}/p{page_num}`)
|
||||
Toa matoleo ya asili (ukurasa ulichukuliwa kutoka hati)
|
||||
Peleka kwa kipande
|
||||
|
||||
### Mpango: TextDocument
|
||||
|
||||
```
|
||||
TextDocument
|
||||
├── metadata: Metadata
|
||||
│ ├── id: str # Page URI (e.g., https://trustgraph.ai/doc/xxx/p1)
|
||||
│ ├── user: str
|
||||
│ ├── collection: str
|
||||
│ └── metadata: list[Triple]
|
||||
├── text: bytes # Page text content (if inline)
|
||||
└── document_id: str # Librarian reference (e.g., "doc123/p1")
|
||||
```
|
||||
|
||||
## Hatua ya 3: Kugawanya maandishi
|
||||
|
||||
Hugawanya maandishi katika sehemu ndogo kulingana na ukubwa uliopangwa.
|
||||
|
||||
### Vigezo (vielekezi ambavyo vinaweza kusanidiwa)
|
||||
|
||||
`chunk_size`: Ukubwa unaolengwa wa sehemu ndogo kwa herufi (kiwango chachilia: 2000)
|
||||
`chunk_overlap`: Mzunguko kati ya sehemu ndogo (kiwango chachilia: 100)
|
||||
|
||||
### Mchakato
|
||||
|
||||
1. Pata maudhui ya maandishi (moja kwa moja au kupitia mfumo wa kumbukumbu)
|
||||
2. Gawanya kwa kutumia mgawaji wa herufi unaojielekeza
|
||||
3. Kwa kila sehemu ndogo:
|
||||
Hifadhi kama hati ndogo katika mfumo wa kumbukumbu (`{parent_id}/c{index}`)
|
||||
Toa taarifa za asili (sehemu ndogo ilitokana na ukurasa/hati)
|
||||
Peleka kwa vichakavu vya utoaji
|
||||
|
||||
### Muundo: Sehemu Ndogo
|
||||
|
||||
```
|
||||
Chunk
|
||||
├── metadata: Metadata
|
||||
│ ├── id: str # Chunk URI
|
||||
│ ├── user: str
|
||||
│ ├── collection: str
|
||||
│ └── metadata: list[Triple]
|
||||
├── chunk: bytes # Chunk text content
|
||||
└── document_id: str # Librarian chunk ID (e.g., "doc123/p1/c3")
|
||||
```
|
||||
|
||||
### Hierarkia ya Kitambulisho cha Hati
|
||||
|
||||
Hati za chini huandika urithi wao katika kitambulisho:
|
||||
Chanzo: `doc123`
|
||||
Ukurasa: `doc123/p5`
|
||||
Sehemu kutoka ukurasa: `doc123/p5/c2`
|
||||
Sehemu kutoka maandishi: `doc123/c2`
|
||||
|
||||
## Hatua ya 4: Utokaji wa Maarifa
|
||||
|
||||
Mfumo mbalimbali wa utokaji unapatikana, unaochaguliwa na usanidi wa mtiririko.
|
||||
|
||||
### Mfumo A: Basic GraphRAG
|
||||
|
||||
Wasindikaji wawili sambamba:
|
||||
|
||||
**kg-extract-definitions**
|
||||
Ingizo: Sehemu
|
||||
Patoto: Triples (ufafanuzi wa vitu), EntityContexts
|
||||
Hutokaje: lebo za vitu, ufafanuzi
|
||||
|
||||
**kg-extract-relationships**
|
||||
Ingizo: Sehemu
|
||||
Patoto: Triples (uhusiano), EntityContexts
|
||||
Hutokaje: uhusiano wa subjekti-kivumbe-kijisumu
|
||||
|
||||
### Mfumo B: Inayoendeshwa na Ontolojia (kg-extract-ontology)
|
||||
|
||||
Ingizo: Sehemu
|
||||
Patoto: Triples, EntityContexts
|
||||
Hutumia ontolojia iliyosanidiwa ili kuongoza utokaji
|
||||
|
||||
### Mfumo C: Inayoendeshwa na Wakala (kg-extract-agent)
|
||||
|
||||
Ingizo: Sehemu
|
||||
Patoto: Triples, EntityContexts
|
||||
Hutumia mfumo wa wakala kwa utokaji
|
||||
|
||||
### Mfumo D: Utokaji wa Mistari (kg-extract-rows)
|
||||
|
||||
Ingizo: Sehemu
|
||||
Patoto: Mistari (data iliyopangwa, si triples)
|
||||
Hutumia ufafanuzi wa schema ili kutokaje rekodi zilizopangwa
|
||||
|
||||
### Schema: Triples
|
||||
|
||||
```
|
||||
Triples
|
||||
├── metadata: Metadata
|
||||
│ ├── id: str
|
||||
│ ├── user: str
|
||||
│ ├── collection: str
|
||||
│ └── metadata: list[Triple] # (set to [] by extractors)
|
||||
└── triples: list[Triple]
|
||||
└── Triple
|
||||
├── s: Term # Subject
|
||||
├── p: Term # Predicate
|
||||
├── o: Term # Object
|
||||
└── g: str | None # Named graph
|
||||
```
|
||||
|
||||
### Mfumo: EntityContexts
|
||||
|
||||
```
|
||||
EntityContexts
|
||||
├── metadata: Metadata
|
||||
└── entities: list[EntityContext]
|
||||
└── EntityContext
|
||||
├── entity: Term # Entity identifier (IRI)
|
||||
├── context: str # Textual description for embedding
|
||||
└── chunk_id: str # Source chunk ID (provenance)
|
||||
```
|
||||
|
||||
### Mfumo: Safu
|
||||
|
||||
```
|
||||
Rows
|
||||
├── metadata: Metadata
|
||||
├── row_schema: RowSchema
|
||||
│ ├── name: str
|
||||
│ ├── description: str
|
||||
│ └── fields: list[Field]
|
||||
└── rows: list[dict[str, str]] # Extracted records
|
||||
```
|
||||
|
||||
## Hatua ya 5: Uzalishaji wa Uelekezo (Embeddings)
|
||||
|
||||
### Uelekezo wa Grafu
|
||||
|
||||
Hubadilisha muktadha wa vitu katika uelekezo wa vector.
|
||||
|
||||
**Mchakato:**
|
||||
1. Pokea Muktadha wa Vitu (EntityContexts)
|
||||
2. Piga simu kwa huduma ya uelekezo (embeddings) kwa kutumia maandishi ya muktadha
|
||||
3. Toa Uelekezo wa Grafu (ramani ya kitu hadi vector)
|
||||
|
||||
**Muundo: Uelekezo wa Grafu (GraphEmbeddings)**
|
||||
|
||||
```
|
||||
GraphEmbeddings
|
||||
├── metadata: Metadata
|
||||
└── entities: list[EntityEmbeddings]
|
||||
└── EntityEmbeddings
|
||||
├── entity: Term # Entity identifier
|
||||
├── vector: list[float] # Embedding vector
|
||||
└── chunk_id: str # Source chunk (provenance)
|
||||
```
|
||||
|
||||
### Uelekezaji wa Hati
|
||||
|
||||
Hubadilisha maandishi ya sehemu moja moja kwa uelekezaji wa vector.
|
||||
|
||||
**Mchakato:**
|
||||
1. Pokea Sehemu
|
||||
2. Piga simu kwa huduma ya uelekezaji kwa kutumia maandishi ya sehemu
|
||||
3. Toa Uelekezaji wa Hati
|
||||
|
||||
**Muundo: Uelekezaji wa Hati**
|
||||
|
||||
```
|
||||
DocumentEmbeddings
|
||||
├── metadata: Metadata
|
||||
└── chunks: list[ChunkEmbeddings]
|
||||
└── ChunkEmbeddings
|
||||
├── chunk_id: str # Chunk identifier
|
||||
└── vector: list[float] # Embedding vector
|
||||
```
|
||||
|
||||
### Ulinganisho wa Safu
|
||||
|
||||
Hubadilisha nambari za safu kuwa ulinganisho wa vekta.
|
||||
|
||||
**Mchakato:**
|
||||
1. Pokea Safu
|
||||
2. Linganisha nambari zilizoelezwa za safu
|
||||
3. Toa kwenye hifadhi ya vekta ya safu
|
||||
|
||||
## Hatua ya 6: Uhifadhi
|
||||
|
||||
### Hifadhi ya Triple
|
||||
|
||||
Inapokea: Triples
|
||||
Uhifadhi: Cassandra (meza zenye msingi wa vitu)
|
||||
Picha zilizoainishwa zinatenganisha maarifa ya msingi kutoka kwa asili:
|
||||
`""` (ya kawaida): Ukweli wa maarifa ya msingi
|
||||
`urn:graph:source`: Asili ya uondoaji
|
||||
`urn:graph:retrieval`: Uwezekano wa kufafanua wakati wa kuuliza
|
||||
|
||||
### Hifadhi ya Vektaja (Ulinganisho wa Picha)
|
||||
|
||||
Inapokea: Ulinganisho wa Picha
|
||||
Uhifadhi: Qdrant, Milvus, au Pinecone
|
||||
Imeorodheshwa kwa: IRI ya kitu
|
||||
Meta: chunk_id kwa asili
|
||||
|
||||
### Hifadhi ya Vektaja (Ulinganisho wa Nyaraka)
|
||||
|
||||
Inapokea: Ulinganisho wa Nyaraka
|
||||
Uhifadhi: Qdrant, Milvus, au Pinecone
|
||||
Imeorodheshwa kwa: chunk_id
|
||||
|
||||
### Hifadhi ya Safu
|
||||
|
||||
Inapokea: Safu
|
||||
Uhifadhi: Cassandra
|
||||
Muundo wa meza unaoongozwa na schema
|
||||
|
||||
### Hifadhi ya Vektaja ya Safu
|
||||
|
||||
Inapokea: Ulinganisho wa safu
|
||||
Uhifadhi: Hifadhi ya Vektaja
|
||||
Imeorodheshwa kwa: nambari za safu
|
||||
|
||||
## Uchunguzi wa Uwanja wa Meta
|
||||
|
||||
### Uwanja Unaotumika Kwa Kazi
|
||||
|
||||
| Uwanja | Matumizi |
|
||||
|-------|-------|
|
||||
| `metadata.id` | Kitambulisho cha nyaraka/kipande, uandishi wa matukio, asili |
|
||||
| `metadata.user` | Usimamizi wa wateja wengi, uelekezaji wa uhifadhi |
|
||||
| `metadata.collection` | Uchaguzi wa mkusanyiko unaolengwa |
|
||||
| `document_id` | Rejea ya mkusanyaji, kuunganisha asili |
|
||||
| `chunk_id` | Kufuatilia asili kupitia mnyororo |
|
||||
|
||||
<<<<<<< HEAD
|
||||
### Uwanja Unaowezekana kuwa Ziada
|
||||
|
||||
| Uwanja | Hali |
|
||||
|-------|--------|
|
||||
| `metadata.metadata` | Imepangwa kama `[]` na vichujio vyote; meta ya kiwango cha nyaraka sasa inashughulikiwa na mkusanyaji wakati wa kuwasilisha |
|
||||
=======
|
||||
### Uwanja Ulioondolewa
|
||||
|
||||
| Uwanja | Hali |
|
||||
|-------|--------|
|
||||
| `metadata.metadata` | Imeondolewa kutoka kwa darasa la `Metadata`. Triples za meta ya kiwango cha nyaraka sasa hutolewa moja kwa moja na mkusanyaji kwenye hifadhi ya triples wakati wa kuwasilisha, sio kuletwa kupitia mnyororo wa uondoaji. |
|
||||
>>>>>>> e3bcbf73 (Uwanja wa meta (orodha ya triples) katika darasa la Mnyororo wa Meta)
|
||||
|
||||
### Mfumo wa Uwanja wa Bytes
|
||||
|
||||
Uwanja wote wa yaliyomo (`data`, `text`, `chunk`) ni `bytes` lakini huondolewa mara moja kuwa maandishi ya UTF-8 na vichujio vyote. Hakuna kichujio kinachotumia bytes mbichi.
|
||||
|
||||
## Usanidi wa Mnyororo
|
||||
|
||||
Mnyororo huainishwa nje na hutolewa kwa mkusanyaji kupitia huduma ya usanidi. Kila mnyororo unaonyesha:
|
||||
|
||||
Ndege za ingizo (`text-load`, `document-load`)
|
||||
Mnyororo wa vichujio
|
||||
Vigezo (ukubwa wa kipande, njia ya uondoaji, n.k.)
|
||||
|
||||
Mifano ya mnyororo:
|
||||
`pdf-graphrag`: PDF → Dekoda → Kipande → Maelezo + Mahusiano → Ulinganisho
|
||||
`text-graphrag`: Nakshata → Kipande → Maelezo + Mahusiano → Ulinganisho
|
||||
`pdf-ontology`: PDF → Dekoda → Kipande → Uondoaji wa Ontolojia → Ulinganisho
|
||||
`text-rows`: Nakshata → Kipande → Uondoaji wa Safu → Hifadhi ya Safu
|
||||
258
docs/tech-specs/sw/extraction-provenance-subgraph.sw.md
Normal file
258
docs/tech-specs/sw/extraction-provenance-subgraph.sw.md
Normal file
|
|
@ -0,0 +1,258 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Asili ya Utoaji: Mfumo wa Subgraph"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Asili ya Utoaji: Mfumo wa Subgraph
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Tatizo
|
||||
|
||||
<<<<<<< HEAD
|
||||
Hivi sasa, utoaji wa wakati wa uondoaji huunda uelekezaji kamili kwa kila
|
||||
triple iliyoundwa: `stmt_uri`, `activity_uri`, na metadata inayohusiana
|
||||
ya PROV-O kwa kila ukweli wa maarifa. Kushughulikia sehemu moja
|
||||
ambayo hutoa uhusiano wa 20 hutoa triples ~220 za utoaji pamoja na
|
||||
triples ~20 za maarifa - mzigo wa takriban 10:1.
|
||||
|
||||
Hii ni ghali (uhifadhi, uwekaji wa indexi, usambazaji) na pia si sahihi
|
||||
kimaana. Kila sehemu hushughulikiwa na simu moja ya LLM ambayo hutoa
|
||||
triples zake zote katika mshono mmoja. Mfumo wa sasa wa kila triple
|
||||
huficha hili kwa kuunda udanganyifu wa matukio 20 ya uondoaji
|
||||
huru.
|
||||
|
||||
Zaidi ya hayo, vichakavu viwili vya nne vya uondoaji (kg-extract-ontology,
|
||||
kg-extract-agent) havina utoaji wowote, na hivyo kuacha pengo katika
|
||||
=======
|
||||
Hivi sasa, utoaji wa taarifa wakati wa utoaji huunda uelekezaji kamili kwa kila
|
||||
triple iliyotoa: `stmt_uri`, `activity_uri`, na metadata inayohusiana
|
||||
ya PROV-O kwa kila ukweli wa maarifa. Kushughulikia sehemu moja
|
||||
ambayo hutoa uhusiano wa 20 hutoa triples ~220 za taarifa juu ya
|
||||
triples ~20 za maarifa - mzigo wa takriban 10:1.
|
||||
|
||||
Hii ni ghali (uhifadhi, urekebishaji, usambazaji) na pia si sahihi kimaana.
|
||||
Kila sehemu hushughulikiwa na simu moja ya LLM ambayo hutoa triples zake zote
|
||||
katika mshughuliko mmoja. Mfumo wa sasa wa kila triple huficha hili kwa
|
||||
kuunda udanganyifu wa matukio 20 ya kujitenga ya utoaji.
|
||||
|
||||
|
||||
Zaidi ya hayo, vichakavu viwili vya utoaji vifo (kg-extract-ontology,
|
||||
kg-extract-agent) havina taarifa zozote, na hivyo kuacha pengo katika
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
njia ya ukaguzi.
|
||||
|
||||
## Suluhisho
|
||||
|
||||
Badilisha uelekezaji wa kila triple na **mfumo wa subgraph**: rekodi moja
|
||||
<<<<<<< HEAD
|
||||
ya utoaji kwa kila uondoaji wa sehemu, inayoshirikiwa na triples zote
|
||||
=======
|
||||
ya taarifa kwa kila utoaji wa sehemu, inayoshirikiwa na triples zote
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
zilizozalishwa kutoka sehemu hiyo.
|
||||
|
||||
### Mabadiliko ya Dhana
|
||||
|
||||
| Zamani | Mpya |
|
||||
|-----|-----|
|
||||
| `stmt_uri` (`https://trustgraph.ai/stmt/{uuid}`) | `subgraph_uri` (`https://trustgraph.ai/subgraph/{uuid}`) |
|
||||
| `statement_uri()` | `subgraph_uri()` |
|
||||
<<<<<<< HEAD
|
||||
| `tg:reifies` (1:1, utambulisho) | `tg:contains` (1:wengi, uwezeshaji) |
|
||||
|
||||
### Muundo Unaolengwa
|
||||
|
||||
Triples zote za utoaji huwekwa katika grafu iliyoitwa `urn:graph:source`.
|
||||
=======
|
||||
| `tg:reifies` (1:1, utambulisho) | `tg:contains` (1:wengi, kuingia) |
|
||||
|
||||
### Muundo Unaolengwa
|
||||
|
||||
Triples zote za taarifa huwekwa katika grafu iliyoitwa `urn:graph:source`.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```
|
||||
# Subgraph contains each extracted triple (RDF-star quoted triples)
|
||||
<subgraph> tg:contains <<s1 p1 o1>> .
|
||||
<subgraph> tg:contains <<s2 p2 o2>> .
|
||||
<subgraph> tg:contains <<s3 p3 o3>> .
|
||||
|
||||
# Derivation from source chunk
|
||||
<subgraph> prov:wasDerivedFrom <chunk_uri> .
|
||||
<subgraph> prov:wasGeneratedBy <activity> .
|
||||
|
||||
# Activity: one per chunk extraction
|
||||
<activity> rdf:type prov:Activity .
|
||||
<activity> rdfs:label "{component_name} extraction" .
|
||||
<activity> prov:used <chunk_uri> .
|
||||
<activity> prov:wasAssociatedWith <agent> .
|
||||
<activity> prov:startedAtTime "2026-03-13T10:00:00Z" .
|
||||
<activity> tg:componentVersion "0.25.0" .
|
||||
<activity> tg:llmModel "gpt-4" . # if available
|
||||
<activity> tg:ontology <ontology_uri> . # if available
|
||||
|
||||
# Agent: stable per component
|
||||
<agent> rdf:type prov:Agent .
|
||||
<agent> rdfs:label "{component_name}" .
|
||||
```
|
||||
|
||||
### Kulinganisha Kiasi
|
||||
|
||||
Kwa kila sehemu inayozalisha triples tatu zilizochukuliwa:
|
||||
|
||||
| | Zamani (kwa kila triple) | Mpya (subgraph) |
|
||||
|---|---|---|
|
||||
| `tg:contains` / `tg:reifies` | N | N |
|
||||
| Triples za shughuli | ~9 x N | ~9 |
|
||||
| Triples za wakala | 2 x N | 2 |
|
||||
| Metadata ya taarifa/subgraph | 2 x N | 2 |
|
||||
| **Triples tatu za jumla za asili** | **~13N** | **N + 13** |
|
||||
| **Mfano (N=20)** | **~260** | **33** |
|
||||
|
||||
## Upeo
|
||||
|
||||
### Wasindikaji ambao Watasasishwa (asili iliyopo, kwa kila triple)
|
||||
|
||||
**kg-extract-definitions**
|
||||
(`trustgraph-flow/trustgraph/extract/kg/definitions/extract.py`)
|
||||
|
||||
Hivi sasa huita `statement_uri()` + `triple_provenance_triples()` ndani
|
||||
ya loop ya kila ufafanuzi.
|
||||
|
||||
Mabadiliko:
|
||||
Hamisha `subgraph_uri()` na `activity_uri()` kabla ya loop
|
||||
Kusanya triples za `tg:contains` ndani ya loop
|
||||
Toa kundi la shughuli/wakala/uzalishaji mara moja baada ya loop
|
||||
|
||||
**kg-extract-relationships**
|
||||
(`trustgraph-flow/trustgraph/extract/kg/relationships/extract.py`)
|
||||
|
||||
Mfano sawa na ufafanuzi. Mabadiliko sawa.
|
||||
|
||||
<<<<<<< HEAD
|
||||
### Wasindikaji ambao Watasasishwa ili Kuongeza Asili (sasa hayapo)
|
||||
=======
|
||||
### Wasindikaji ambao Wataongezwa Asili (sasa hayapo)
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**kg-extract-ontology**
|
||||
(`trustgraph-flow/trustgraph/extract/kg/ontology/extract.py`)
|
||||
|
||||
Hivi sasa hutoa triples bila asili. Ongeza asili ya subgraph
|
||||
kwa kutumia mfano sawa: subgraph moja kwa kila sehemu, `tg:contains` kwa kila
|
||||
triple iliyochukuliwa.
|
||||
|
||||
**kg-extract-agent**
|
||||
(`trustgraph-flow/trustgraph/extract/kg/agent/extract.py`)
|
||||
|
||||
Hivi sasa hutoa triples bila asili. Ongeza asili ya subgraph
|
||||
kwa kutumia mfano sawa.
|
||||
|
||||
<<<<<<< HEAD
|
||||
### Mabadiliko ya Maktaba ya Asili iliyoshirikiwa
|
||||
=======
|
||||
### Mabadiliko ya Maktaba ya Asili Iliyoshirikiwa
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**`trustgraph-base/trustgraph/provenance/triples.py`**
|
||||
|
||||
Badilisha `triple_provenance_triples()` na `subgraph_provenance_triples()`
|
||||
Kazi mpya inakubali orodha ya triples zilizochukuliwa badala ya moja
|
||||
Inazalisha `tg:contains` moja kwa kila triple, kundi la shughuli/wakala lililoshirikiwa
|
||||
Ondoa `triple_provenance_triples()` ya zamani
|
||||
|
||||
**`trustgraph-base/trustgraph/provenance/uris.py`**
|
||||
|
||||
Badilisha `statement_uri()` na `subgraph_uri()`
|
||||
|
||||
**`trustgraph-base/trustgraph/provenance/namespaces.py`**
|
||||
|
||||
Badilisha `TG_REIFIES` na `TG_CONTAINS`
|
||||
|
||||
<<<<<<< HEAD
|
||||
### Hayajajumuishwa katika Upeo
|
||||
=======
|
||||
### Hayako Katika Upeo
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**kg-extract-topics**: wasindikaji wa mtindo wa zamani, hawatumiki kwa sasa katika
|
||||
mtiririko wa kawaida
|
||||
**kg-extract-rows**: hutoa mistari si triples, mfumo tofauti wa
|
||||
asili
|
||||
**Asili ya wakati wa swali** (`urn:graph:retrieval`): suala tofauti,
|
||||
tayari hutumia mfumo tofauti (swali/uchunguzi/lengo/muhtasari)
|
||||
**Asili ya hati/ukurasa/sehemu** (dekoda ya PDF, kichunguzi): tayari hutumia
|
||||
`derived_entity_triples()` ambayo ni kwa kila kitu, si kwa kila triple — hakuna
|
||||
suala la ziada
|
||||
|
||||
## Maelezo ya Utendaji
|
||||
|
||||
### Upangaji Upya wa Loop ya Msindikaji
|
||||
|
||||
Kabla (kwa kila triple, katika uhusiano):
|
||||
```python
|
||||
for rel in rels:
|
||||
# ... build relationship_triple ...
|
||||
stmt_uri = statement_uri()
|
||||
prov_triples = triple_provenance_triples(
|
||||
stmt_uri=stmt_uri,
|
||||
extracted_triple=relationship_triple,
|
||||
...
|
||||
)
|
||||
triples.extend(set_graph(prov_triples, GRAPH_SOURCE))
|
||||
```
|
||||
|
||||
Baada ya (mfumo mdogo):
|
||||
```python
|
||||
sg_uri = subgraph_uri()
|
||||
|
||||
for rel in rels:
|
||||
# ... build relationship_triple ...
|
||||
extracted_triples.append(relationship_triple)
|
||||
|
||||
prov_triples = subgraph_provenance_triples(
|
||||
subgraph_uri=sg_uri,
|
||||
extracted_triples=extracted_triples,
|
||||
chunk_uri=chunk_uri,
|
||||
component_name=default_ident,
|
||||
component_version=COMPONENT_VERSION,
|
||||
llm_model=llm_model,
|
||||
ontology_uri=ontology_uri,
|
||||
)
|
||||
triples.extend(set_graph(prov_triples, GRAPH_SOURCE))
|
||||
```
|
||||
|
||||
### Saini Mpya ya Msaidizi
|
||||
|
||||
```python
|
||||
def subgraph_provenance_triples(
|
||||
subgraph_uri: str,
|
||||
extracted_triples: List[Triple],
|
||||
chunk_uri: str,
|
||||
component_name: str,
|
||||
component_version: str,
|
||||
llm_model: Optional[str] = None,
|
||||
ontology_uri: Optional[str] = None,
|
||||
timestamp: Optional[str] = None,
|
||||
) -> List[Triple]:
|
||||
"""
|
||||
Build provenance triples for a subgraph of extracted knowledge.
|
||||
|
||||
Creates:
|
||||
- tg:contains link for each extracted triple (RDF-star quoted)
|
||||
- One prov:wasDerivedFrom link to source chunk
|
||||
- One activity with agent metadata
|
||||
"""
|
||||
```
|
||||
|
||||
### Mabadiliko Makubwa
|
||||
|
||||
<<<<<<< HEAD
|
||||
Hii ni mabadiliko makubwa kwa mfumo wa asili ya data. Asili ya data haijatolewa, kwa hivyo hakuna uhamishaji unaohitajika. Msimbo wa zamani wa ⟦CODE_0⟧ /
|
||||
=======
|
||||
Hii ni mabadiliko makubwa kwa mfumo wa uhakikisho. Uhakikisho haujatolewa, kwa hivyo hakuna uhamishaji unaohitajika. Msimbo wa zamani wa ⟦CODE_0⟧ /
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
`tg:reifies` unaweza kuondolewa kabisa.
|
||||
Msimbo `statement_uri` unaweza kufutwa kabisa.
|
||||
935
docs/tech-specs/sw/extraction-time-provenance.sw.md
Normal file
935
docs/tech-specs/sw/extraction-time-provenance.sw.md
Normal file
|
|
@ -0,0 +1,935 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Asili ya Data Wakati wa Utoaji: Safu ya Chanzo"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
<<<<<<< HEAD
|
||||
# Asili ya Data Wakati wa Utoaji: Safu ya Chanzo
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Hati hii ina rekodi za maelezo kuhusu asili ya data wakati wa utoaji kwa ajili ya kazi ya maelezo ya baadaye. Asili ya data wakati wa utoaji inarejelea "safu ya chanzo" - ambako data ilitoka awali, jinsi ilivyochukuliwa na kubadilishwa.
|
||||
|
||||
Hii ni tofauti na asili ya data wakati wa kuulizia (angalia `query-time-provenance.md`) ambayo inarejelea utaratibu wa akili wa mhusika.
|
||||
|
||||
## Tatizo
|
||||
|
||||
### Utendaji wa Sasa
|
||||
|
||||
Hivi sasa, asili ya data inafanya kazi kama ifuatavyo:
|
||||
Meta-data ya hati huhifadhiwa kama triple za RDF katika grafu ya maarifa.
|
||||
Kitambulisho cha hati huunganisha meta-data na hati, hivyo hati inaonekana kama node katika grafu.
|
||||
Wakati uhusiano (maelezo/ukweli) unachukuliwa kutoka kwa hati, uhusiano wa `subjectOf` huunganisha uhusiano uliochukuliwa na hati ya chanzo.
|
||||
|
||||
### Matatizo ya Mbinu ya Sasa
|
||||
|
||||
1. **Upakiaji wa meta-data unaorudia:** Meta-data ya hati huunganishwa na kupakiwa mara kwa mara na kila kundi la triple zilizochukuliwa kutoka kwa hati hiyo. Hii ni matumizi ya rasilimali na kurudia - meta-data sawa husafiri kama mizigo na kila pato la utoaji.
|
||||
|
||||
2. **Asili ya data ya juu:** Uhusiano wa `subjectOf` wa sasa huunganisha tu ukweli moja kwa moja na hati ya juu. Hakuna uonevu katika mnyororo wa mabadiliko - ukweli huo ulichukuliwa kutoka kwa ukurasa gani, sehemu gani, mbinu gani ya utoaji iliyotumika.
|
||||
|
||||
### Hali Inayotakikana
|
||||
|
||||
1. **Pakia meta-data mara moja:** Meta-data ya hati inapaswa kupakiwa mara moja na kuunganishwa na node ya juu ya hati, sio kurudiwa na kila kundi la triple.
|
||||
|
||||
2. **Grafu ya asili ya data iliyo na maelezo:** Rekodi mnyororo kamili wa mabadiliko kutoka kwa hati ya chanzo hadi kwa vitu vyote vya kati hadi kwa ukweli uliopatikana. Kwa mfano, mabadiliko ya hati ya PDF:
|
||||
=======
|
||||
# Asili ya Data Wakati wa Uvunaji: Safu ya Chanzo
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Hati hii ina rekodi za maelezo kuhusu asili ya data wakati wa uvunaji kwa ajili ya kazi zijazo za kubuni. Asili ya data wakati wa uvunaji inarejelea "safu ya chanzo" - ambako data ilitoka awali, jinsi ilivyovunwa na kubadilishwa.
|
||||
|
||||
Hii ni tofauti na asili ya data wakati wa utafutaji (angalia `query-time-provenance.md`) ambayo inarejelea hoja za msimuizi.
|
||||
|
||||
## Tatizo
|
||||
|
||||
### Utendaji Sasa
|
||||
|
||||
Hivi sasa, asili ya data inafanya kazi kama ifuatavyo:
|
||||
Meta-data ya hati inahifadhiwa kama triple za RDF katika grafu ya maarifa.
|
||||
Kitambulisho cha hati (document ID) huunganisha meta-data na hati, hivyo hati inaonekana kama nodi katika grafu.
|
||||
Wakati uhusiano (relationships/facts) unavyovunwa kutoka kwa hati, uhusiano wa `subjectOf` huunganisha uhusiano uliovunwa na hati ya asili.
|
||||
|
||||
### Matatizo ya Mbinu Hali
|
||||
|
||||
1. **Upakiaji wa meta-data unaorudia-rudia:** Meta-data ya hati huunganishwa na kupakiwa mara kwa mara na kila kundi la triple zinazovunwa kutoka kwa hati hiyo. Hii ni matumizi ya rasilimali na kurudia - meta-data sawa husafiri kama mizigo na kila matokeo ya uvunaji.
|
||||
|
||||
2. **Asili ya data ya kawaida:** Uhusiano wa `subjectOf` unaoonekana sasa huunganisha tu ukweli moja kwa moja na hati ya juu. Hakuna uwazi kuhusu mnyororo wa mabadiliko - ukurasa gani ukweli ulikuja, kipande gani, njia gani ya uvunaji iliyotumika.
|
||||
|
||||
### Hali Inayotakikana
|
||||
|
||||
1. **Pakia meta-data mara moja:** Meta-data ya hati inapaswa kupakiwa mara moja na kuunganishwa na nodi ya hati ya juu, sio kurudiwa na kila kundi la triple.
|
||||
|
||||
2. **Grafu ya asili ya data yenye maelezo:** Rekodi mnyororo kamili wa mabadiliko kutoka kwa hati ya asili kupitia kwa vitu vyote vya kati hadi kwa ukweli uliovunwa. Kwa mfano, mabadiliko ya hati ya PDF:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```
|
||||
PDF file (source document with metadata)
|
||||
→ Page 1 (decoded text)
|
||||
→ Chunk 1
|
||||
→ Extracted edge/fact (via subjectOf)
|
||||
→ Extracted edge/fact
|
||||
→ Chunk 2
|
||||
→ Extracted edge/fact
|
||||
→ Page 2
|
||||
→ Chunk 3
|
||||
→ ...
|
||||
```
|
||||
|
||||
3. **Hifadhi iliyounganishwa:** Mfumo wa uhusiano wa asili (provenance DAG) huhifadhiwa katika mfumo sawa wa maarifa kama maarifa yaliyopatikana. Hii inaruhusu uhusiano wa asili kuchunguzwa kwa njia ile ile kama maarifa - kufuata miundo kurudi nyuma kutoka kwa ukweli wowote hadi mahali pake halisi.
|
||||
|
||||
4. **Kitambulisho cha kudumu:** Kila kitu (artifact) cha kati (ukurasa, sehemu) kina kitambulisho cha kudumu kama node katika mfumo.
|
||||
|
||||
<<<<<<< HEAD
|
||||
5. **Uunganisho wa mzazi-mtoto:** Hati zilizoundwa zinaunganishwa na wazazi wao hadi kwenye hati ya asili ya juu kwa kutumia aina za uhusiano sawa.
|
||||
|
||||
6. **Uhusiano sahihi wa ukweli:** Uhusiano wa `subjectOf` kwenye miundo iliyopatikana unaelekeza kwenye mzazi wa karibu (sehemu), sio kwenye hati ya juu. Uhusiano wa asili kamili hupatikana kwa kutembea juu ya DAG.
|
||||
=======
|
||||
5. **Uunganisho wa mzazi-mtoto:** Hati zilizoundwa zinaunganishwa na wazazi wao hadi hati ya asili ya juu zaidi kwa kutumia aina za uhusiano sawa.
|
||||
|
||||
6. **Uhusiano sahihi wa ukweli:** Uhusiano wa `subjectOf` kwenye miundo iliyopatikana unaelekeza kwa mzazi wa karibu (sehemu), sio hati ya juu. Uhusiano wa asili kamili hupatikana kwa kutembea juu ya DAG.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Matumizi
|
||||
|
||||
### UC1: Uhusiano wa Chanzo katika Majibu ya GraphRAG
|
||||
|
||||
**Hali:** Mtumiaji hufanya swali la GraphRAG na kupokea jibu kutoka kwa programu (agent).
|
||||
|
||||
**Mchakato:**
|
||||
1. Mtumiaji huwasilisha swali kwa programu ya GraphRAG.
|
||||
2. Programu inapata ukweli unaohusiana kutoka kwa mfumo wa maarifa ili kuunda jibu.
|
||||
3. Kulingana na vipimo vya uhusiano wa asili wakati wa swali, programu huripoti ukweli ambao ulichangia jibu.
|
||||
4. Kila ukweli unaunganishwa na sehemu yake ya asili kupitia mfumo wa uhusiano wa asili.
|
||||
5. Sehemu zinaunganishwa na kurasa, kurasa zinaunganishwa na hati za asili.
|
||||
|
||||
**Matokeo ya Uzoefu wa Mtumiaji (UX):** Kiolesura huonyesha jibu la LLM pamoja na uhusiano wa chanzo. Mtumiaji anaweza:
|
||||
Kuona ukweli ambao uliunga mkono jibu.
|
||||
Kuchunguza kutoka kwa ukweli → sehemu → kurasa → hati.
|
||||
Kusoma hati za asili ili kuthibitisha madai.
|
||||
Kuelewa hasa wapi katika hati (ukurasa gani, sehemu gani) ukweli ulitoka.
|
||||
|
||||
**Faida:** Watumiaji wanaweza kuthibitisha majibu yaliyozalishwa na AI dhidi ya vyanzo vya msingi, kuunda uaminifu na kuwezesha ukaguzi wa ukweli.
|
||||
|
||||
### UC2: Kurekebisha Ubora wa Upatikanaji
|
||||
|
||||
<<<<<<< HEAD
|
||||
Ukweli unaonekana kuwa mbaya. Tembelea kurudi nyuma kupitia sehemu → ukurasa → hati ili kuona maandishi ya asili. Je, ilikuwa upatikanaji mbaya, au chanzo kilikuwa kibaya?
|
||||
|
||||
### UC3: Upatikanaji wa Kurekebishwa
|
||||
|
||||
Hati ya asili inasasishwa. Ni sehemu/ukweli gani uliotokana nayo? Ghairi na uundue tena tu zile, badala ya kuchakata kila kitu.
|
||||
|
||||
### UC4: Ufutilishaji wa Data / Haki ya Kusahau
|
||||
|
||||
Hati ya asili lazima iondolewe (GDPR, kisheria, n.k.). Tembelea DAG ili kupata na kuondoa ukweli wote uliotokana.
|
||||
|
||||
### UC5: Suluhisho la Mzozo
|
||||
|
||||
Ushawishi mbili unapingana. Tembelea zote kurudi kwenye vyanzo vyao ili kuelewa kwa nini na uamue ni ipi ya kuamini (chanzo cha mamlaka zaidi, cha hivi karibuni, n.k.).
|
||||
|
||||
### UC6: Uzito wa Uamuzi wa Chanzo
|
||||
|
||||
Vyanzo vingine ni vya mamlaka kuliko vingine. Ushawishi unaweza kupimwa au kuchujwa kulingana na uamuzi/ubora wa hati zao za asili.
|
||||
=======
|
||||
Ukweli unaonekana kuwa mbaya. Rudi nyuma kupitia sehemu → ukurasa → hati ili kuona maandishi ya asili. Je, ilikuwa upatikanaji mbaya, au chanzo kilikuwa kibaya?
|
||||
|
||||
### UC3: Upatikanaji wa Kurekebishwa
|
||||
|
||||
Hati ya asili inasasishwa. Ni sehemu/ukweli zipi zilizotokana nayo? Ghairi na uundue tena zile tu, badala ya kuchakata kila kitu.
|
||||
|
||||
### UC4: Ufutilishaji wa Data / Haki ya Kusahau
|
||||
|
||||
Hati ya asili lazima iondolewe (GDPR, kisheria, n.k.). Tembea kwenye DAG ili kupata na kuondoa ukweli wote uliotokana.
|
||||
|
||||
### UC5: Suluhisho la Mzozo
|
||||
|
||||
Ushawishi mbili unapingana. Rudi nyuma kwa vyanzo vyake ili kuelewa kwa nini na uamue ni ipi ya kuamini (chanzo cha mamlaka zaidi, cha hivi karibuni, n.k.).
|
||||
|
||||
### UC6: Uzito wa Mamlaka ya Chanzo
|
||||
|
||||
Vyanzo vingine ni vya mamlaka kuliko vingine. Ushawishi unaweza kupimwa au kuchujwa kulingana na mamlaka/ubora wa hati zake za asili.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
### UC7: Ulinganisho wa Mfumo wa Upatikanaji
|
||||
|
||||
Linganisha matokeo kutoka kwa mbinu/matoleo tofauti ya upatikanaji. Mfumo wa upatikanaji upi uliunda ukweli bora kutoka kwa chanzo kimoja?
|
||||
|
||||
## Maeneo ya Uunganisho
|
||||
|
||||
### Msimamizi wa Maktaba
|
||||
|
||||
<<<<<<< HEAD
|
||||
Kifaa cha msimamizi wa maktaba hutoa tayari uhifadhi wa hati na kitambulisho cha kipekee cha hati. Mfumo wa asili unajumuishwa na miundombinu hii iliyopo.
|
||||
|
||||
#### Uwezo Ulioopo (tayari umetekelezwa)
|
||||
|
||||
**Uunganisho wa Hati ya Mzazi-Mtoto:**
|
||||
=======
|
||||
Kifaa cha msimamizi wa maktaba tayari hutoa uhifadhi wa hati pamoja na kitambulisho cha kipekee cha hati. Mfumo wa asili unajumuishwa na miundombinu hii iliyopo.
|
||||
|
||||
#### Uwezo Ulioopo (tayari umetekelezwa)
|
||||
|
||||
**Uunganisho wa Hati za Mzazi na Mtoto:**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
`parent_id` field katika `DocumentMetadata` - huunganisha hati ya mtoto na hati ya mzazi
|
||||
`document_type` field - maadili: `"source"` (asili) au `"extracted"` (iliyotokana)
|
||||
`add-child-document` API - huunda hati ya mtoto na `document_type = "extracted"` moja kwa moja
|
||||
`list-children` API - hurudisha hati zote za watoto za hati ya mzazi
|
||||
Ufutilishaji wa mfuatano - kuondoa hati ya mzazi huondoa moja kwa moja hati zote za watoto
|
||||
|
||||
**Kitambulisho cha Hati:**
|
||||
Kitambulisho cha hati huamuliwa na mteja (hayajaumbwa kiotomatiki)
|
||||
Hati zimepangwa kwa `(user, document_id)` iliyounganishwa katika Cassandra
|
||||
Kitambulisho cha kitu (UUIDs) huundwa ndani kwa uhifadhi wa blob
|
||||
|
||||
**Usaidizi wa MetaData:**
|
||||
`metadata: list[Triple]` field - triples za RDF kwa metaData iliyopangwa
|
||||
`title`, `comments`, `tags` - metaData ya msingi ya hati
|
||||
`time` - wakati, `kind` - aina ya MIME
|
||||
|
||||
**Muundo wa Uhifadhi:**
|
||||
MetaData huhifadhiwa katika Cassandra (`librarian` keyspace, `document` meza)
|
||||
Yaliyomo huhifadhiwa katika uhifadhi wa blob wa MinIO/S3 (`library` ndoo)
|
||||
<<<<<<< HEAD
|
||||
Uwasilishaji mahiri wa yaliyomo: hati < 2MB zimejumuishwa, hati kubwa zaidi hutiririshwa
|
||||
=======
|
||||
Utoaji wa yaliyomo mahiri: hati < 2MB zimejumuishwa, hati kubwa zaidi hutiririshwa
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Faili Muhimu
|
||||
|
||||
`trustgraph-flow/trustgraph/librarian/librarian.py` - Operesheni muhimu za msimamizi wa maktaba
|
||||
`trustgraph-flow/trustgraph/librarian/service.py` - Mchakato wa huduma, upakaji hati
|
||||
`trustgraph-flow/trustgraph/tables/library.py` - Duka la meza ya Cassandra
|
||||
`trustgraph-base/trustgraph/schema/services/library.py` - Ufafanuzi wa mpango
|
||||
|
||||
<<<<<<< HEAD
|
||||
#### Mapungufu Yanayohitaji Kusuluhishwa
|
||||
|
||||
Msimamizi wa maktaba una vipengele muhimu lakini kwa sasa:
|
||||
1. Uunganisho wa mzazi-mtoto ni safu moja tu - hakuna msaada wa utambuzi wa DAG wa ngazi nyingi
|
||||
2. Hakuna hesabu ya kawaida ya aina ya uhusiano (e.g., `derivedFrom`, `extractedFrom`)
|
||||
3. MetaData ya asili (mbinu ya uondoaji, uaminifu, nafasi ya kipande) hayajaainishwa
|
||||
4. Hakuna API ya kuuliza ili kutambua mnyororo kamili wa asili kutoka kwa ukweli hadi chanzo
|
||||
|
||||
## Muundo wa Mtiririko wa Utoaji hadi Utoaji
|
||||
|
||||
Kila mchakato katika mstari huu unafuata mfumo unaoendana:
|
||||
Kupokea kitambulisho cha hati kutoka kwa chanzo
|
||||
=======
|
||||
#### Mapungufu Yanayohitaji Kushughulikiwa
|
||||
|
||||
Msimamizi wa maktaba una vipengele muhimu lakini kwa sasa:
|
||||
1. Uunganisho wa mzazi-mtoto ni safu moja tu - hakuna msaada wa uvukaji wa DAG wa ngazi nyingi
|
||||
2. Hakuna hesabu ya kawaida ya aina ya uhusiano (e.g., `derivedFrom`, `extractedFrom`)
|
||||
3. MetaData ya asili (njia ya uondoaji, uaminifu, nafasi ya kipande) hayajawekwa kikao
|
||||
4. Hakuna API ya kuuliza ili kuvuka mnyororo kamili wa asili kutoka kwa ukweli hadi chanzo
|
||||
|
||||
## Muundo wa Mtiririko wa Kila Hatua
|
||||
|
||||
Kila mchakato katika mstari huu unafuata mfumo unaoendana:
|
||||
Kupokea kitambulisho cha hati kutoka kwa mfumo wa juu
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Kuchukua yaliyomo kutoka kwa msimamizi wa maktaba
|
||||
Kutoa vifaa vya watoto
|
||||
Kwa kila mtoto: kuhifadhi kwenye msimamizi wa maktaba, kutuma upau kwenye grafu, kusonga kitambulisho mbele
|
||||
|
||||
### Mitiririko ya Uendeshaji
|
||||
|
||||
Kuna mitiririko miwili kulingana na aina ya hati:
|
||||
|
||||
#### Mtiririko wa Hati ya PDF
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Librarian (initiate processing) │
|
||||
│ 1. Emit root document metadata to knowledge graph (once) │
|
||||
│ 2. Send root document ID to PDF extractor │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ PDF Extractor (per page) │
|
||||
│ 1. Fetch PDF content from librarian using document ID │
|
||||
│ 2. Extract pages as text │
|
||||
│ 3. For each page: │
|
||||
│ a. Save page as child document in librarian (parent = root doc) │
|
||||
│ b. Emit parent-child edge to knowledge graph │
|
||||
│ c. Send page document ID to chunker │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Chunker (per chunk) │
|
||||
│ 1. Fetch page content from librarian using document ID │
|
||||
│ 2. Split text into chunks │
|
||||
│ 3. For each chunk: │
|
||||
│ a. Save chunk as child document in librarian (parent = page) │
|
||||
│ b. Emit parent-child edge to knowledge graph │
|
||||
│ c. Send chunk document ID + chunk content to next processor │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
|
||||
Post-chunker optimization: messages carry both
|
||||
chunk ID (for provenance) and content (to avoid
|
||||
librarian round-trip). Chunks are small (2-4KB).
|
||||
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Knowledge Extractor (per chunk) │
|
||||
│ 1. Receive chunk ID + content directly (no librarian fetch needed) │
|
||||
│ 2. Extract facts/triples and embeddings from chunk content │
|
||||
│ 3. For each triple: │
|
||||
│ a. Emit triple to knowledge graph │
|
||||
│ b. Emit reified edge linking triple → chunk ID (edge pointing │
|
||||
│ to edge - first use of reification support) │
|
||||
│ 4. For each embedding: │
|
||||
│ a. Emit embedding with its entity ID │
|
||||
│ b. Link entity ID → chunk ID in knowledge graph │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
#### Mtiririko wa Nyaraka za Nakshata
|
||||
|
||||
<<<<<<< HEAD
|
||||
Nyaraka za nakshata huenda moja kwa moja kwenye sehemu ya "chunker" na hazitumii programu ya kutenganisha faili za PDF:
|
||||
=======
|
||||
Nyaraka za nakshata huenda moja kwa moja kwenye sehemu ya "chunker" na hazitumii programu ya kutolea maelezo kutoka kwa faili za PDF:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Librarian (initiate processing) │
|
||||
│ 1. Emit root document metadata to knowledge graph (once) │
|
||||
│ 2. Send root document ID directly to chunker (skip PDF extractor) │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Chunker (per chunk) │
|
||||
│ 1. Fetch text content from librarian using document ID │
|
||||
│ 2. Split text into chunks │
|
||||
│ 3. For each chunk: │
|
||||
│ a. Save chunk as child document in librarian (parent = root doc) │
|
||||
│ b. Emit parent-child edge to knowledge graph │
|
||||
│ c. Send chunk document ID + chunk content to next processor │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Knowledge Extractor │
|
||||
│ (same as PDF flow) │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
DAG iliyotokea ni ya kiwango kimoja chini:
|
||||
=======
|
||||
Matokeo ya DAG (Grafu ya Kuelekea Mbele) ni ngazi moja chini:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```
|
||||
PDF: Document → Pages → Chunks → Triples/Embeddings
|
||||
Text: Document → Chunks → Triples/Embeddings
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
Ubunifu unaoendana na matumizi ya aina zote, kwa sababu mfumo wa kugawanya (chunker) hutumia data yake ya pembejeo kwa njia ya jumla - hutumia kitambulisho chochote cha hati kinachopokelewa kama mzazi, bila kujali kama hiyo ni hati ya asili au ukurasa.
|
||||
=======
|
||||
Ubunifu unaoendana na matumizi ya aina zote, kwa sababu mfumo wa kugawanya (chunker) hutumia data yake ya pembeni kwa njia ya jumla - hutumia kitambulisho chochote cha hati kinachopokelewa kama mzazi, bila kujali kama hiyo ni hati ya asili au ukurasa.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
### Mpango wa Meta-data (PROV-O)
|
||||
|
||||
Meta-data ya asili hutumia ontolojia ya W3C PROV-O. Hii hutoa msamiati wa kawaida na inawezesha usaini/uthibitishaji wa matokeo ya utoaji katika siku zijazo.
|
||||
|
||||
### Dhana Zikuu za PROV-O
|
||||
|
||||
| Aina ya PROV-O | Matumizi katika TrustGraph |
|
||||
|-------------|------------------|
|
||||
| `prov:Entity` | Hati, Ukurasa, Sehemu, Triple, Uingizwaji |
|
||||
<<<<<<< HEAD
|
||||
| `prov:Activity` | Mifano ya operesheni za utoaji |
|
||||
=======
|
||||
| `prov:Activity` | Mifano ya shughuli za utoaji |
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
| `prov:Agent` | Vipengele vya TG (mfumo wa utoaji wa PDF, mfumo wa kugawanya, n.k.) pamoja na matoleo |
|
||||
|
||||
### Mahusiano ya PROV-O
|
||||
|
||||
<<<<<<< HEAD
|
||||
| Kifurushi | Maana | Mfano |
|
||||
|-----------|---------|---------|
|
||||
| `prov:wasDerivedFrom` | Kitu kinachotokana na kitu kingine | Ukurasa ulikuwa umetokana na Hati |
|
||||
| `prov:wasGeneratedBy` | Kitu kilichoundwa na shughuli | Ukurasa ulikuwa umelindwa na Shughuli ya Utoaji wa PDF |
|
||||
=======
|
||||
| Kifurushi | Maana | Kifaa |
|
||||
|-----------|---------|---------|
|
||||
| `prov:wasDerivedFrom` | Kitu kinachotokana na kitu kingine | Ukurasa ulitokana na Hati |
|
||||
| `prov:wasGeneratedBy` | Kitu kilichoanzishwa na shughuli | Ukurasa ulianzishwa na Shughuli ya Utoaji wa PDF |
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
| `prov:used` | Shughuli ilitumia kitu kama pembejeo | Shughuli ya Utoaji wa PDF ilitumia Hati |
|
||||
| `prov:wasAssociatedWith` | Shughuli ilifanywa na wakala | Shughuli ya Utoaji wa PDF ilihusishwa na tg:PDFExtractor |
|
||||
|
||||
### Meta-data katika Kila Ngazi
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Hati ya Asili (inatoolewa na Librarian):**
|
||||
=======
|
||||
**Hati ya Asili (inatoa Librarian):**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```
|
||||
doc:123 a prov:Entity .
|
||||
doc:123 dc:title "Research Paper" .
|
||||
doc:123 dc:source <https://example.com/paper.pdf> .
|
||||
doc:123 dc:date "2024-01-15" .
|
||||
doc:123 dc:creator "Author Name" .
|
||||
doc:123 tg:pageCount 42 .
|
||||
doc:123 tg:mimeType "application/pdf" .
|
||||
```
|
||||
|
||||
**Ukurasa (uliochukuliwa na programu ya kuchambua faili za PDF):**
|
||||
```
|
||||
page:123-1 a prov:Entity .
|
||||
page:123-1 prov:wasDerivedFrom doc:123 .
|
||||
page:123-1 prov:wasGeneratedBy activity:pdf-extract-456 .
|
||||
page:123-1 tg:pageNumber 1 .
|
||||
|
||||
activity:pdf-extract-456 a prov:Activity .
|
||||
activity:pdf-extract-456 prov:used doc:123 .
|
||||
activity:pdf-extract-456 prov:wasAssociatedWith tg:PDFExtractor .
|
||||
activity:pdf-extract-456 tg:componentVersion "1.2.3" .
|
||||
activity:pdf-extract-456 prov:startedAtTime "2024-01-15T10:30:00Z" .
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Sehemu (imetolewa na Chunker):**
|
||||
=======
|
||||
**Sehemu (inatoolewa na Chunker):**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```
|
||||
chunk:123-1-1 a prov:Entity .
|
||||
chunk:123-1-1 prov:wasDerivedFrom page:123-1 .
|
||||
chunk:123-1-1 prov:wasGeneratedBy activity:chunk-789 .
|
||||
chunk:123-1-1 tg:chunkIndex 1 .
|
||||
chunk:123-1-1 tg:charOffset 0 .
|
||||
chunk:123-1-1 tg:charLength 2048 .
|
||||
|
||||
activity:chunk-789 a prov:Activity .
|
||||
activity:chunk-789 prov:used page:123-1 .
|
||||
activity:chunk-789 prov:wasAssociatedWith tg:Chunker .
|
||||
activity:chunk-789 tg:componentVersion "1.0.0" .
|
||||
activity:chunk-789 tg:chunkSize 2048 .
|
||||
activity:chunk-789 tg:chunkOverlap 200 .
|
||||
```
|
||||
|
||||
**Tatu (imetolewa na Mvumbuzi wa Maarifa):**
|
||||
```
|
||||
# The extracted triple (edge)
|
||||
entity:JohnSmith rel:worksAt entity:AcmeCorp .
|
||||
|
||||
# Subgraph containing the extracted triples
|
||||
subgraph:001 tg:contains <<entity:JohnSmith rel:worksAt entity:AcmeCorp>> .
|
||||
subgraph:001 prov:wasDerivedFrom chunk:123-1-1 .
|
||||
subgraph:001 prov:wasGeneratedBy activity:extract-999 .
|
||||
|
||||
activity:extract-999 a prov:Activity .
|
||||
activity:extract-999 prov:used chunk:123-1-1 .
|
||||
activity:extract-999 prov:wasAssociatedWith tg:KnowledgeExtractor .
|
||||
activity:extract-999 tg:componentVersion "2.1.0" .
|
||||
activity:extract-999 tg:llmModel "claude-3" .
|
||||
activity:extract-999 tg:ontology <http://example.org/ontologies/business-v1> .
|
||||
```
|
||||
|
||||
**Uingizwaji (hifadhiwa katika hifadhi ya vekta, sio hifadhi ya triple):**
|
||||
|
||||
Uingizwaji huhifadhiwa katika hifadhi ya vekta pamoja na metadata, sio kama triple za RDF. Kila rekodi ya uingizwaji ina:
|
||||
|
||||
<<<<<<< HEAD
|
||||
| Nguvu | Maelezo | Mfano |
|
||||
|-------|-------------|---------|
|
||||
| vekta | Vektali ya uingizwaji | [0.123, -0.456, ...] |
|
||||
| kitu | URI ya node ambayo uingizwaji unawakilisha | `entity:JohnSmith` |
|
||||
| kitambulisho_cha_sehemu | Sehemu ya asili (asili) | `chunk:123-1-1` |
|
||||
| mfumo | Mfumo wa uingizwaji uliotumika | `text-embedding-ada-002` |
|
||||
| toleo_la_komponenti | Toleo la programu ya uingizwaji | `1.0.0` |
|
||||
|
||||
Nguvu ya `entity` huunganisha uingizwaji na grafu ya maarifa (URI ya node). Nguvu ya `chunk_id` hutoa asili kurudi kwa sehemu ya asili, na kuwezesha ufuatiliaji hadi kwenye hati asili.
|
||||
|
||||
#### Miondoko ya Jina ya TrustGraph
|
||||
|
||||
Maneno maalum chini ya nafasi ya `tg:` kwa metadata maalum ya uondoaji:
|
||||
|
||||
| Neno | Doman | Maelezo |
|
||||
|-----------|--------|-------------|
|
||||
| `tg:contains` | Subgraph | Inaashiria triple iliyo ndani ya subgraph hii ya uondoaji |
|
||||
| `tg:pageCount` | Hati | Idadi jumla ya kurasa katika hati ya asili |
|
||||
| `tg:mimeType` | Hati | Aina ya MIME ya hati ya asili |
|
||||
| `tg:pageNumber` | Ukurasa | Namba ya ukurasa katika hati ya asili |
|
||||
| `tg:chunkIndex` | Sehemu | Index ya sehemu ndani ya sehemu ya wazazi |
|
||||
| `tg:charOffset` | Sehemu | Marekebisho ya herufi katika maandishi ya wazazi |
|
||||
| `tg:charLength` | Sehemu | Urefu wa sehemu katika herufi |
|
||||
| `tg:chunkSize` | Shughuli | Ukubwa uliopangwa wa sehemu |
|
||||
| `tg:chunkOverlap` | Shughuli | Ulinganishi kati ya sehemu |
|
||||
| `tg:componentVersion` | Shughuli | Toleo la komponenti ya TG |
|
||||
| `tg:llmModel` | Shughuli | LLM iliyotumika kwa uondoaji |
|
||||
| `tg:ontology` | Shughuli | Ontology iliyotumika kuongoza uondoaji |
|
||||
| `tg:embeddingModel` | Shughuli | Mfumo uliotumika kwa uingizwaji |
|
||||
| `tg:sourceText` | Tamko | Nakala kamili kutoka ambayo triple iliondolewa |
|
||||
| `tg:sourceCharOffset` | Tamko | Marekebisho ya herufi ndani ya sehemu ambapo nakala ya asili huanza |
|
||||
| `tg:sourceCharLength` | Tamko | Urefu wa nakala ya asili katika herufi |
|
||||
|
||||
#### Uanzishaji wa Dhana (Kwa Mkusanyiko Kila Kila)
|
||||
|
||||
Grafu ya maarifa ni ya aina ya ontology na inaanzishwa kuwa tupu. Wakati wa kuandika data ya asili ya PROV-O kwa mkusanyiko kwa mara ya kwanza, dhana lazima ianzishwe na lebo za RDF kwa madarasa na maneno yote. Hii inahakikisha onyesho linalosoma kwa binadamu katika maswali na UI.
|
||||
=======
|
||||
| Shamba | Maelezo | Mfano |
|
||||
|-------|-------------|---------|
|
||||
| vekta | Vakta ya uingizwaji | [0.123, -0.456, ...] |
|
||||
| kitu | URI ya node ambayo uingizwaji unawakilisha | `entity:JohnSmith` |
|
||||
| kitambulisho_cha_sehemu | Sehemu ya asili (asili) | `chunk:123-1-1` |
|
||||
| mfumo | Mfumo wa uingizwaji uliotumika | `text-embedding-ada-002` |
|
||||
| toleo_la_komponenti | Toleo la mfumo wa uingizwaji wa TG | `1.0.0` |
|
||||
|
||||
Shamba la `entity` huunganisha uingizwaji na grafu ya maarifa (URI ya node). Shamba la `chunk_id` hutoa asili kurudi kwa sehemu ya asili, na hivyo kuruhusu ufuatiliaji hadi kwenye hati asili.
|
||||
|
||||
#### Miongozo ya Upanuzi wa Nafasi ya TrustGraph
|
||||
|
||||
Maneno maalum chini ya nafasi ya `tg:` kwa metadata maalum ya utoaji:
|
||||
|
||||
| Dhana | Eneo | Maelezo |
|
||||
|-----------|--------|-------------|
|
||||
| `tg:contains` | Subgraph | Inaashiria triple iliyo ndani ya subgraph hii. |
|
||||
| `tg:pageCount` | Document | Idadi jumla ya kurasa katika hati ya asili. |
|
||||
| `tg:mimeType` | Document | Aina ya MIME ya hati ya asili. |
|
||||
| `tg:pageNumber` | Page | Namba ya ukurasa katika hati ya asili. |
|
||||
| `tg:chunkIndex` | Chunk | Indexi ya chunk ndani ya mzazi. |
|
||||
| `tg:charOffset` | Chunk | Marekebisho ya herufi katika maandishi ya mzazi. |
|
||||
| `tg:charLength` | Chunk | Urefu wa chunk katika herufi. |
|
||||
| `tg:chunkSize` | Activity | Ukubwa wa chunk uliopangwa. |
|
||||
| `tg:chunkOverlap` | Activity | Uwianifu uliopangwa kati ya chunks. |
|
||||
| `tg:componentVersion` | Activity | Toleo la kipengele cha TG. |
|
||||
| `tg:llmModel` | Activity | LLM iliyotumika kwa uondoaji. |
|
||||
| `tg:ontology` | Activity | URI ya ontology iliyotumika kuongoza uondoaji. |
|
||||
| `tg:embeddingModel` | Activity | Mfumo uliotumika kwa embeddings. |
|
||||
| `tg:sourceText` | Statement | Nakala kamili kutoka ambayo triple iliondolewa. |
|
||||
| `tg:sourceCharOffset` | Statement | Marekebisho ya herufi ndani ya chunk ambapo nakala ya chanzo huanza. |
|
||||
| `tg:sourceCharLength` | Statement | Urefu wa nakala ya chanzo katika herufi. |
|
||||
|
||||
#### Uanzishaji wa Dhana (Kwa Kundi Kila Kimoja)
|
||||
|
||||
Grafu ya maarifa ni ya kawaida na huanzishwa kuwa tupu. Wakati wa kuandika data ya asili ya PROV-O kwenye mkusanyiko kwa mara ya kwanza, dhana lazima ianzishwe kwa lebo za RDF kwa madarasa na dhana zote. Hii inahakikisha onyesho linalosoma na binadamu katika maswali na UI.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**Madarasa ya PROV-O:**
|
||||
```
|
||||
prov:Entity rdfs:label "Entity" .
|
||||
prov:Activity rdfs:label "Activity" .
|
||||
prov:Agent rdfs:label "Agent" .
|
||||
```
|
||||
|
||||
**Predikati za PROV-O:**
|
||||
```
|
||||
prov:wasDerivedFrom rdfs:label "was derived from" .
|
||||
prov:wasGeneratedBy rdfs:label "was generated by" .
|
||||
prov:used rdfs:label "used" .
|
||||
prov:wasAssociatedWith rdfs:label "was associated with" .
|
||||
prov:startedAtTime rdfs:label "started at" .
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Predikatendi za TrustGraph:**
|
||||
=======
|
||||
**Maneno ya TrustGraph:**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```
|
||||
tg:contains rdfs:label "contains" .
|
||||
tg:pageCount rdfs:label "page count" .
|
||||
tg:mimeType rdfs:label "MIME type" .
|
||||
tg:pageNumber rdfs:label "page number" .
|
||||
tg:chunkIndex rdfs:label "chunk index" .
|
||||
tg:charOffset rdfs:label "character offset" .
|
||||
tg:charLength rdfs:label "character length" .
|
||||
tg:chunkSize rdfs:label "chunk size" .
|
||||
tg:chunkOverlap rdfs:label "chunk overlap" .
|
||||
tg:componentVersion rdfs:label "component version" .
|
||||
tg:llmModel rdfs:label "LLM model" .
|
||||
tg:ontology rdfs:label "ontology" .
|
||||
tg:embeddingModel rdfs:label "embedding model" .
|
||||
tg:sourceText rdfs:label "source text" .
|
||||
tg:sourceCharOffset rdfs:label "source character offset" .
|
||||
tg:sourceCharLength rdfs:label "source character length" .
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Kumbukumbu kuhusu utekelezaji:** Kamusi hii ya kuanzia inapaswa kuwa ya aina ambayo inaweza kuendeshwa mara nyingi bila kuunda nakala. Inaweza kuanzishwa wakati wa usindikaji wa hati ya kwanza katika mkusanyiko, au kama hatua tofauti ya uanzishaji wa mkusanyiko.
|
||||
|
||||
#### Asili ya Sehemu Ndogo (Lengo)
|
||||
|
||||
Kwa asili ya kina zaidi, itakuwa muhimu kurekodi hasa katika sehemu gani ya kipande ambapo triple ilitokana. Hii inaruhusu:
|
||||
|
||||
Kuonyesha maandishi ya asili hasa katika kiolesura (UI)
|
||||
Kuthibitisha usahihi wa uondoaji dhidi ya asili
|
||||
Kuchunguza ubora wa uondoaji katika kiwango cha sentensi
|
||||
=======
|
||||
**Kumbuka kuhusu utekelezaji:** Msamiati huu wa kuanzia inapaswa kuwa sawa - salama kuendeshwa mara nyingi bila kuunda nakala. Inaweza kuanzishwa wakati wa usindikaji wa hati ya kwanza katika mkusanyiko, au kama hatua tofauti ya uanzishaji wa mkusanyiko.
|
||||
|
||||
#### Asili ya Sehemu Ndogo (Lengo)
|
||||
|
||||
Kwa asili ya kina zaidi, itakuwa muhimu kurekodi haswa katika sehemu gani ndani ya kipande ambapo triple ilitokana. Hii inawezesha:
|
||||
|
||||
Kuonyesha maandishi ya asili haswa katika kiolesura (UI)
|
||||
Kuangalia usahihi wa utoaji kulingana na asili
|
||||
Kuchunguza ubora wa utoaji katika kiwango cha sentensi
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**Mfano na ufuatiliaji wa nafasi:**
|
||||
```
|
||||
# The extracted triple
|
||||
entity:JohnSmith rel:worksAt entity:AcmeCorp .
|
||||
|
||||
# Subgraph with sub-chunk provenance
|
||||
subgraph:001 tg:contains <<entity:JohnSmith rel:worksAt entity:AcmeCorp>> .
|
||||
subgraph:001 prov:wasDerivedFrom chunk:123-1-1 .
|
||||
subgraph:001 tg:sourceText "John Smith has worked at Acme Corp since 2019" .
|
||||
subgraph:001 tg:sourceCharOffset 1547 .
|
||||
subgraph:001 tg:sourceCharLength 46 .
|
||||
```
|
||||
|
||||
**Mfano na sehemu ya maandishi (badala):**
|
||||
```
|
||||
subgraph:001 tg:contains <<entity:JohnSmith rel:worksAt entity:AcmeCorp>> .
|
||||
subgraph:001 prov:wasDerivedFrom chunk:123-1-1 .
|
||||
subgraph:001 tg:sourceRange "1547-1593" .
|
||||
subgraph:001 tg:sourceText "John Smith has worked at Acme Corp since 2019" .
|
||||
```
|
||||
|
||||
**Mazingatio ya utekelezaji:**
|
||||
|
||||
Utaratibu wa kutolea maelezo unaotegemea modeli ya lugha (LLM) huenda usitoe nafasi za herufi kwa kawaida.
|
||||
Inaweza kuwezekana kuomba LLM irudishe sentensi/maneno ya asili pamoja na vitu vilivyotolewa.
|
||||
<<<<<<< HEAD
|
||||
Badala yake, inaweza kufanywa urekebishaji wa ziada ili kulinganisha vitu vilivyotolewa na maandishi ya asili.
|
||||
Kuna mtego kati ya utata wa utoleaji wa maelezo na kiwango cha uhakikisho.
|
||||
Inaweza kuwa rahisi kufanikisha kwa kutumia mbinu zilizopangwa kuliko utoleaji wa maelezo wa aina huru unaotegemea LLM.
|
||||
|
||||
Hii imewekwa kama lengo la baadaye - uhakikisho wa kimsingi wa kiwango cha sehemu unapaswa kutekelezwa kwanza, na kufuatilia kwa sehemu ndogo kama uboreshaji wa baadaye ikiwa inawezekana.
|
||||
|
||||
### Mfumo wa Uhifadhi Mkubwa
|
||||
|
||||
Mfumo wa uhakikisho unajengwa hatua kwa hatua wakati hati zinapopitia katika mchakato:
|
||||
|
||||
| Hifadhi | Kile kinachohifadhiwa | Madhumuni |
|
||||
|-------|---------------|---------|
|
||||
| Msimamizi | Yaliyomo ya hati + viungo vya mzazi-mtoto | Kupata yaliyomo, kufuta kwa mfuatano |
|
||||
| Grafu ya Maarifa | Miunganisho ya mzazi-mtoto + metadata | Maswali ya uhakikisho, utambuzi wa ukweli |
|
||||
|
||||
Hifadhi zote mbili zinahifadhi muundo sawa wa DAG. Msimamizi huhifadhi yaliyomo; grafu huhifadhi uhusiano na inaruhusu maswali ya utaftaji.
|
||||
|
||||
### Kanuni Muhimu za Ubunifu
|
||||
|
||||
1. **Kitambulisho cha hati kama kitengo cha mchakato** - Wasindikaji hupitisha vitambulisho, sio yaliyomo. Yaliyomo hupatikana kutoka kwa msimamizi wakati inahitajika.
|
||||
|
||||
2. **Tolea mara moja katika chanzo** - Metadata imeandikwa kwenye grafu mara moja wakati wa mchakato unaanza, sio kurudiwa baadaye.
|
||||
|
||||
3. **Mfumo sawa wa wasindikaji** - Kila wasindikaji hufuata muundo sawa wa kupokea/kupata/kutoa/kuokoa/kutoa/kusonga.
|
||||
|
||||
4. **Ujenzi wa hatua kwa hatua wa DAG** - Kila wasindikaji huongeza kiwango chake kwenye DAG. Mnyororo kamili wa uhakikisho unajengwa hatua kwa hatua.
|
||||
|
||||
5. **Uboreshaji baada ya kugawanyika** - Baada ya kugawanyika, ujumbe unaambatana na kitambulisho na yaliyomo. Sehemu ndogo (2-4KB), kwa hivyo kuingiza yaliyomo inazuia safari zisizo za lazima za msimamizi wakati inahifadhi uhakikisho kupitia kitambulisho.
|
||||
|
||||
## Majukumu ya Utekelezaji
|
||||
|
||||
### Marekebisho ya Msimamizi
|
||||
=======
|
||||
Au, inaweza kufanywa urekebishaji wa ziada ili kulinganisha vitu vilivyotolewa na maandishi ya asili.
|
||||
Kuna mtego kati ya utata wa utoleaji wa maelezo na uwazi wa asili.
|
||||
Inaweza kuwa rahisi kufanikisha kwa kutumia mbinu zilizopangwa kuliko utoleaji wa maelezo wa bure unaotegemea LLM.
|
||||
|
||||
Hii imewekwa kama lengo la baadaye - utaratibu wa msingi wa utoleaji wa maelezo wa kiwango cha sehemu unapaswa kutekelezwa kwanza, na kufuatilia kwa sehemu ndogo kama uboreshaji wa baadaye ikiwa inawezekana.
|
||||
|
||||
### Mfumo wa Uhifadhi Mkubwa
|
||||
|
||||
Mfumo wa DAG wa utoleaji wa maelezo huundwa hatua kwa hatua wakati hati zinapopitia katika mchakato:
|
||||
|
||||
| Hifadhi | Kitu Kinachohifadhiwa | Lengo |
|
||||
|-------|---------------|---------|
|
||||
| Mkumbaji | Yaliyomo ya hati + viungo vya mzazi-mtoto | Upatikanaji wa yaliyomo, kufuta kwa mfuatano |
|
||||
| Grafu ya Maarifa | Aina za mzazi-mtoto + metadata | Maswali ya utoleaji wa maelezo, uhusishaji wa ukweli |
|
||||
|
||||
Hifadhi zote mbili zinahifadhi muundo sawa wa DAG. Mkumbaji huhifadhi yaliyomo; grafu huhifadhi uhusiano na inaruhusu maswali ya utaftaji.
|
||||
|
||||
### Kanuni Muhimu za Ubunifu
|
||||
|
||||
1. **Kitambulisho cha hati kama kitengo cha mtiririko** - Wasindikaji hutuma kitambulisho, sio yaliyomo. Yaliyomo hupatikana kutoka kwa mkumbaji wakati inahitajika.
|
||||
|
||||
2. **Tolea mara moja katika chanzo** - Metadata imeandikwa kwenye grafu mara moja wakati wa mchakato unaanza, sio kurudiwa baadaye.
|
||||
|
||||
3. **Muundo sawa wa wasindikaji** - Kila wasindikaji hufuata muundo sawa wa kupokea/kupata/kutoa/kuokoa/kutoa/kusonga.
|
||||
|
||||
4. **Uundaji wa hatua kwa hatua wa DAG** - Kila wasindikaji huongeza kiwango chake kwenye DAG. Mnyororo kamili wa utoleaji wa maelezo huundwa hatua kwa hatua.
|
||||
|
||||
5. **Uboreshaji baada ya kugawanyika** - Baada ya kugawanyika, ujumbe unaambatana na kitambulisho na yaliyomo. Sehemu ndogo (2-4KB), kwa hivyo kujumuisha yaliyomo inazuia safari zisizo za lazima za mkumbaji wakati inahifadhi utoleaji wa maelezo kupitia kitambulisho.
|
||||
|
||||
## Majukumu ya Utendaji
|
||||
|
||||
### Marekebisho ya Mkumbaji
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Hali ya Sasa
|
||||
|
||||
Inaanzisha mchakato wa hati kwa kutuma kitambulisho cha hati kwa wasindikaji wa kwanza.
|
||||
<<<<<<< HEAD
|
||||
Hakuna muunganisho na duka la vitriple - metadata huunganishwa na matokeo ya utoleaji.
|
||||
=======
|
||||
Hakuna muunganisho na duka la vitri - metadata huunganishwa na matokeo ya utoleaji.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
`add-child-document` huunda viungo vya mzazi-mtoto vya kiwango kimoja.
|
||||
`list-children` hurudisha watoto wa karibu tu.
|
||||
|
||||
#### Marekebisho Yanayohitajika
|
||||
|
||||
<<<<<<< HEAD
|
||||
**1. Kiolesura kipya: Muunganisho wa duka la vitriple**
|
||||
|
||||
Msimamizi anahitaji kutoa kingo za metadata ya hati moja kwa moja kwenye grafu ya maarifa wakati wa kuanzisha mchakato.
|
||||
Ongeza mteja/mpublisher wa duka la vitriple kwenye huduma ya msimamizi.
|
||||
Wakati wa kuanzisha mchakato: toa metadata ya hati ya mizizi kama kingo za grafu (mara moja).
|
||||
|
||||
**2. Hesabu ya aina ya hati**
|
||||
=======
|
||||
**1. Kiolesura kipya: Muunganisho wa duka la vitri**
|
||||
|
||||
Mkumbaji anahitaji kutoa kingo za metadata ya hati moja kwa moja kwenye grafu ya maarifa wakati wa kuanzisha mchakato.
|
||||
Ongeza mteja/mpublisher wa duka la vitri kwenye huduma ya mkumbaji.
|
||||
Wakati wa kuanzisha mchakato: toa metadata ya hati ya mizizi kama kingo za grafu (mara moja).
|
||||
|
||||
**2. Msamiati wa aina ya hati**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
Sanidi maadili ya `document_type` kwa watoto wa hati:
|
||||
`source` - hati iliyopakiwa asili.
|
||||
`page` - ukurasa uliotolewa kutoka chanzo (PDF, n.k.).
|
||||
`chunk` - sehemu ya maandishi iliyotokana na ukurasa au chanzo.
|
||||
|
||||
#### Muhtasari wa Marekebisho ya Kiolesura
|
||||
|
||||
<<<<<<< HEAD
|
||||
| Kiolesura | Marekebisho |
|
||||
|-----------|--------|
|
||||
| Duka la vitriple | Muunganisho mpya wa kutoka nje - toa kingo za metadata ya hati |
|
||||
=======
|
||||
| Kiolesura | Mabadiliko |
|
||||
|-----------|--------|
|
||||
| Duka la vitri | Muunganisho mpya wa kutoka nje - toa kingo za metadata ya hati |
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
| Kuanzisha mchakato | Toa metadata kwenye grafu kabla ya kusonga kitambulisho cha hati |
|
||||
|
||||
### Marekebisho ya Mtoa Hati ya PDF
|
||||
|
||||
#### Hali ya Sasa
|
||||
|
||||
Hupokea yaliyomo ya hati (au mitiririko ya hati kubwa).
|
||||
Hutolea maandishi kutoka kwa kurasa za PDF.
|
||||
Hupeleka yaliyomo ya ukurasa kwa mtoa sehemu.
|
||||
<<<<<<< HEAD
|
||||
Hakuna mwingiliano na msimamizi au duka la vitriple.
|
||||
|
||||
#### Marekebisho Yanayohitajika
|
||||
|
||||
**1. Kiolesura kipya: Mteja wa msimamizi**
|
||||
|
||||
Mtoa hati ya PDF anahitaji kuhifadhi kila ukurasa kama hati ya mtoto katika msimamizi.
|
||||
Ongeza mteja wa msimamizi kwenye huduma ya mtoa hati ya PDF.
|
||||
Kwa kila ukurasa: piga `add-child-document` na mzazi = kitambulisho cha hati ya mizizi.
|
||||
|
||||
**2. Kiolesura kipya: Muunganisho wa duka la vitriple**
|
||||
|
||||
Mtoa hati ya PDF anahitaji kutoa kingo za mzazi-mtoto kwenye grafu ya maarifa.
|
||||
Ongeza mteja/mpublisher wa duka la vitriple.
|
||||
Kwa kila ukurasa: toa kingo inayounganisha hati ya ukurasa na hati ya mzazi.
|
||||
=======
|
||||
Hakuna mwingiliano na mkumbaji au duka la vitri.
|
||||
|
||||
#### Marekebisho Yanayohitajika
|
||||
|
||||
**1. Kiolesura kipya: Mteja wa mkumbaji**
|
||||
|
||||
Mtoa hati ya PDF anahitaji kuhifadhi kila ukurasa kama hati ya mtoto katika mkumbaji.
|
||||
Ongeza mteja wa mkumbaji kwenye huduma ya mtoa hati ya PDF.
|
||||
Kwa kila ukurasa: piga `add-child-document` na mzazi = kitambulisho cha hati ya mizizi.
|
||||
|
||||
**2. Kiolesura kipya: Muunganisho wa duka la vitri**
|
||||
|
||||
Mtoa hati ya PDF anahitaji kutoa aina za mzazi-mtoto kwenye grafu ya maarifa.
|
||||
Ongeza mteja/mpublisher wa duka la vitri.
|
||||
Kwa kila ukurasa: toa aina inayounganisha ukurasa wa hati na hati ya mzazi.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**3. Badilisha muundo wa matokeo**
|
||||
|
||||
Badala ya kusambaza yaliyomo ya ukurasa moja kwa moja, sambaza kitambulisho cha hati ya ukurasa.
|
||||
<<<<<<< HEAD
|
||||
Chunker itapata yaliyomo kutoka kwa 'librarian' kwa kutumia kitambulisho.
|
||||
=======
|
||||
Chunker itapakua yaliyomo kutoka kwa 'librarian' kwa kutumia kitambulisho.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Muhtasari wa Mabadiliko ya Kiolesura
|
||||
|
||||
| Kiolesura | Mabadiliko |
|
||||
|-----------|--------|
|
||||
| Librarian | Mabadiliko mapya ya kutoka - hifadhi hati za watoto |
|
||||
<<<<<<< HEAD
|
||||
| Hifadhi tatu | Mabadiliko mapya ya kutoka - toka miunganisho ya mzazi-mtoto |
|
||||
=======
|
||||
| Hifadhi tatu | Mabadiliko mapya ya kutoka - toa miunganisho ya mzazi-mtoto |
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
| Ujumbe wa pato | Mabadiliko kutoka yaliyomo hadi kitambulisho cha hati |
|
||||
|
||||
### Mabadiliko ya Chunker
|
||||
|
||||
#### Hali ya Sasa
|
||||
|
||||
Yanapokea yaliyomo ya ukurasa/maandishi
|
||||
<<<<<<< HEAD
|
||||
Yanagawanyika katika sehemu ndogo
|
||||
Yanatuma yaliyomo ya sehemu ndogo kwa wasindikaji wa baadaye
|
||||
=======
|
||||
Yanagawanywa katika sehemu
|
||||
Yanatuma yaliyomo ya sehemu kwa wasindikaji wa baadaye
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Hakuna mwingiliano na 'librarian' au hifadhi tatu
|
||||
|
||||
#### Mabadiliko Yanayohitajika
|
||||
|
||||
**1. Badilisha utunzaji wa ingizo**
|
||||
|
||||
<<<<<<< HEAD
|
||||
Pokea kitambulisho cha hati badala ya yaliyomo, pata kutoka kwa 'librarian'.
|
||||
Ongeza mteja wa 'librarian' kwenye huduma ya chunker
|
||||
Pata yaliyomo ya ukurasa kwa kutumia kitambulisho cha hati
|
||||
|
||||
**2. Kiolesura kipya: Mteja wa 'Librarian' (kuandika)**
|
||||
|
||||
Hifadhi kila sehemu ndogo kama hati ya mtoto katika 'librarian'.
|
||||
Kwa kila sehemu ndogo: piga simu `add-child-document` na mzazi = kitambulisho cha hati ya ukurasa
|
||||
=======
|
||||
Pokea kitambulisho cha hati badala ya yaliyomo, upakue kutoka kwa 'librarian'.
|
||||
Ongeza mteja wa 'librarian' kwenye huduma ya chunker
|
||||
Pakua yaliyomo ya ukurasa kwa kutumia kitambulisho cha hati
|
||||
|
||||
**2. Kiolesura kipya: Mteja wa 'Librarian' (kuandika)**
|
||||
|
||||
Hifadhi kila sehemu kama hati ya mtoto katika 'librarian'.
|
||||
Kwa kila sehemu: piga simu `add-child-document` na mzazi = kitambulisho cha hati ya ukurasa
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**3. Kiolesura kipya: Muunganisho wa hifadhi tatu**
|
||||
|
||||
Toa miunganisho ya mzazi-mtoto kwa grafu ya maarifa.
|
||||
Ongeza mteja/mpublisher wa hifadhi tatu
|
||||
<<<<<<< HEAD
|
||||
Kwa kila sehemu ndogo: toa muunganiko unaounganisha hati ya sehemu ndogo na hati ya ukurasa
|
||||
|
||||
**4. Badilisha muundo wa pato**
|
||||
|
||||
Sambaza kitambulisho cha hati ya sehemu ndogo na yaliyomo ya sehemu ndogo (uboreshaji wa baada ya chunker).
|
||||
=======
|
||||
Kwa kila sehemu: toa muunganisho unaounganisha hati ya sehemu na hati ya ukurasa
|
||||
|
||||
**4. Badilisha muundo wa pato**
|
||||
|
||||
Sambaza kitambulisho cha hati ya sehemu na yaliyomo ya sehemu (uboreshaji wa baada ya chunker).
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Wasindikaji wa baadaye hupokea kitambulisho kwa ajili ya asili + yaliyomo ili kufanya kazi nayo
|
||||
|
||||
#### Muhtasari wa Mabadiliko ya Kiolesura
|
||||
|
||||
| Kiolesura | Mabadiliko |
|
||||
|-----------|--------|
|
||||
| Ujumbe wa ingizo | Mabadiliko kutoka yaliyomo hadi kitambulisho cha hati |
|
||||
<<<<<<< HEAD
|
||||
| Librarian | Mabadiliko mapya ya kutoka (kusoma + kuandika) - pata yaliyomo, hifadhi hati za watoto |
|
||||
=======
|
||||
| Librarian | Mabadiliko mapya ya kutoka (kusoma + kuandika) - pakua yaliyomo, hifadhi hati za watoto |
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
| Hifadhi tatu | Mabadiliko mapya ya kutoka - toa miunganisho ya mzazi-mtoto |
|
||||
| Ujumbe wa pato | Mabadiliko kutoka yaliyomo-tu hadi kitambulisho + yaliyomo |
|
||||
|
||||
### Mabadiliko ya Mvumbuzi wa Maarifa
|
||||
|
||||
#### Hali ya Sasa
|
||||
|
||||
<<<<<<< HEAD
|
||||
Yanapokea yaliyomo ya sehemu ndogo
|
||||
Yanatoa triples na embeddings
|
||||
Yanatoa kwa hifadhi ya triples na hifadhi ya embeddings
|
||||
`subjectOf` uhusiano unaelekeza kwenye hati ya juu (si sehemu ndogo)
|
||||
=======
|
||||
Yanapokea yaliyomo ya sehemu
|
||||
Yanatoa triples na embeddings
|
||||
Yanatuma kwa hifadhi ya triples na hifadhi ya embeddings
|
||||
`subjectOf` uhusiano unaelekeza kwenye hati ya juu (si sehemu)
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Mabadiliko Yanayohitajika
|
||||
|
||||
**1. Badilisha utunzaji wa ingizo**
|
||||
|
||||
<<<<<<< HEAD
|
||||
Pokea kitambulisho cha hati ya sehemu ndogo pamoja na yaliyomo.
|
||||
Tumia kitambulisho cha sehemu ndogo kwa ulinganisho (yaliyomo tayari yamejumuishwa kwa uboreshaji)
|
||||
|
||||
**2. Sasisha asili ya triples**
|
||||
|
||||
Unganisha triples zilizotolewa na sehemu ndogo (si hati ya juu).
|
||||
Tumia reification ili kuunda muunganiko unaoelekeza kwenye muunganiko
|
||||
`subjectOf` uhusiano: triple → kitambulisho cha hati ya sehemu ndogo
|
||||
=======
|
||||
Pokea kitambulisho cha hati ya sehemu pamoja na yaliyomo.
|
||||
Tumia kitambulisho cha sehemu kwa ulinganisho (yaliyomo tayari yamejumuishwa kwa uboreshaji)
|
||||
|
||||
**2. Sasisha asili ya triples**
|
||||
|
||||
Unganisha triples zilizotolewa na sehemu (si hati ya juu).
|
||||
Tumia reification ili kuunda muunganisho unaoelekeza kwenye muunganisho
|
||||
`subjectOf` uhusiano: triple → kitambulisho cha hati ya sehemu
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Matumizi ya kwanza ya msaada uliopo wa reification
|
||||
|
||||
**3. Sasisha asili ya embeddings**
|
||||
|
||||
<<<<<<< HEAD
|
||||
Unganisha kitambulisho cha entiti ya embedding na sehemu ndogo.
|
||||
Toa muunganiko: kitambulisho cha entiti ya embedding → kitambulisho cha hati ya sehemu ndogo
|
||||
=======
|
||||
Unganisha kitambulisho cha entiti ya embedding na sehemu.
|
||||
Toa muunganisho: kitambulisho cha entiti ya embedding → kitambulisho cha hati ya sehemu
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Muhtasari wa Mabadiliko ya Kiolesura
|
||||
|
||||
| Kiolesura | Mabadiliko |
|
||||
|-----------|--------|
|
||||
<<<<<<< HEAD
|
||||
| Ujumbe wa ingizo | Inatarajia kitambulisho cha sehemu ndogo + yaliyomo (si yaliyomo tu) |
|
||||
| Hifadhi tatu | Tumia reification kwa asili ya triple → sehemu |
|
||||
| Asili ya embedding | Unganisha kitambulisho cha entiti → kitambulisho cha sehemu |
|
||||
=======
|
||||
| Ujumbe wa ingizo | Inatarajia kitambulisho cha sehemu + yaliyomo (si yaliyomo tu) |
|
||||
| Hifadhi ya triples | Tumia reification kwa asili ya triple → sehemu |
|
||||
| Asili ya embeddings | Unganisha kitambulisho cha entiti → kitambulisho cha sehemu |
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Asili ya wakati wa swali: `docs/tech-specs/query-time-provenance.md`
|
||||
Kiwango cha PROV-O kwa uundaji wa asili
|
||||
Meta-data ya chanzo iliyopo katika grafu ya maarifa (inahitaji ukaguzi)
|
||||
399
docs/tech-specs/sw/flow-class-definition.sw.md
Normal file
399
docs/tech-specs/sw/flow-class-definition.sw.md
Normal file
|
|
@ -0,0 +1,399 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Maelezo ya Ufafanuzi wa Mfumo wa Mtiririko"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Maelezo ya Ufafanuzi wa Mfumo wa Mtiririko
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
<<<<<<< HEAD
|
||||
Mfumo wa mtiririko unafafanua kiolezo kamili cha mtiririko wa data katika mfumo wa TrustGraph. Unapoongezwa, huunda mtandao unaounganishwa wa vichakata ambavyo hushughulikia uingizaji wa data, uchakataji, uhifadhi, na kuulizia kama mfumo mmoja.
|
||||
=======
|
||||
Mfumo wa mtiririko unafafanua mfumo kamili wa mtiririko wa data katika mfumo wa TrustGraph. Unapoongezwa, huunda mtandao unaounganishwa wa vichakata ambavyo hushughulikia uingizaji wa data, uchakataji, uhifadhi, na kuulizia kama mfumo mmoja.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Muundo
|
||||
|
||||
Ufafanuzi wa mfumo wa mtiririko una sehemu tano kuu:
|
||||
|
||||
### 1. Sehemu ya Darasa
|
||||
<<<<<<< HEAD
|
||||
Inafafanua vichakata vya huduma ambavyo huanzishwa mara moja kwa kila mfumo wa mtiririko. Vichakata hivi hushughulikia ombi kutoka kwa visasisho vyote vya mfumo wa mtiririko vya darasa hili.
|
||||
=======
|
||||
Inafafanua vichakata vya huduma ambavyo huanzishwa mara moja kwa kila mfumo wa mtiririko. Vichakata hivi hushughulikia ombi kutoka kwa visasisho vyote vya mfumo wa mtiririko wa darasa hili.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```json
|
||||
"class": {
|
||||
"service-name:{class}": {
|
||||
"request": "queue-pattern:{class}",
|
||||
"response": "queue-pattern:{class}",
|
||||
"settings": {
|
||||
"setting-name": "fixed-value",
|
||||
"parameterized-setting": "{parameter-name}"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Sifa:**
|
||||
Zinashirikiwa katika visasisho vyote vya aina moja.
|
||||
Hutoa huduma za kawaida au zisizo na hali (LLMs, modeli za uingizaji).
|
||||
<<<<<<< HEAD
|
||||
Tumia jina la kigezo cha `{class}` kwa ajili ya kujina kwa folyo.
|
||||
=======
|
||||
Tumia jina la kigezo `{class}` kwa ajili ya kujina kwa folyo.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Mipangilio inaweza kuwa maadili thabiti au kupangwa kwa kutumia sintaksia ya `{parameter-name}`.
|
||||
Mifano: `embeddings:{class}`, `text-completion:{class}`, `graph-rag:{class}`
|
||||
|
||||
### 2. Sehemu ya Folyo
|
||||
Inafafanua vichakataji maalum ya folyo ambavyo huanzishwa kwa kila visa maalum la folyo. Kila folyo hupata seti yake mwenyewe ya vichakataji hivi.
|
||||
|
||||
```json
|
||||
"flow": {
|
||||
"processor-name:{id}": {
|
||||
"input": "queue-pattern:{id}",
|
||||
"output": "queue-pattern:{id}",
|
||||
"settings": {
|
||||
"setting-name": "fixed-value",
|
||||
"parameterized-setting": "{parameter-name}"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Sifa:**
|
||||
Mfano pekee kwa kila mtiririko.
|
||||
Hushughulikia data na hali maalum ya mtiririko.
|
||||
Tumia kigezo cha `{id}` kwa ajili ya kujina kwa folyo.
|
||||
Mipangilio inaweza kuwa maadili thabiti au kupangishwa kwa kutumia sintaksia ya `{parameter-name}`.
|
||||
Mifano: `chunker:{id}`, `pdf-decoder:{id}`, `kg-extract-relationships:{id}`.
|
||||
|
||||
<<<<<<< HEAD
|
||||
### 3. Sehemu ya Vifaa
|
||||
Inaelezea pointi za kuingilia na mikataba ya mwingiliano kwa mtiririko. Haya huunda safu ya API kwa mifumo ya nje na mawasiliano ya vipengele vya ndani.
|
||||
|
||||
Vifaa vinaweza kuwa na aina mbili:
|
||||
=======
|
||||
### Sura ya 3. Sehemu ya Vifaa vya Kuunganisha
|
||||
Inaelezea pointi za kuingilia na mikataba ya kuingiliana kwa mtiririko. Hizi huunda safu ya API kwa mifumo ya nje na mawasiliano ya vipengele vya ndani.
|
||||
|
||||
Vifaa vya kuunganisha vinaweza kuwa na aina mbili:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**Mfumo wa "Tuma na Usahau"** (folyo moja):
|
||||
```json
|
||||
"interfaces": {
|
||||
"document-load": "persistent://tg/flow/document-load:{id}",
|
||||
"triples-store": "persistent://tg/flow/triples-store:{id}"
|
||||
}
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Muundo wa Ombi/Jibu** (kitu chenye sehemu za ombi/jibu):
|
||||
=======
|
||||
**Muundo wa Ombi/Jibu** (objekti yenye sehemu za ombi/jibu):
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```json
|
||||
"interfaces": {
|
||||
"embeddings": {
|
||||
"request": "non-persistent://tg/request/embeddings:{class}",
|
||||
"response": "non-persistent://tg/response/embeddings:{class}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Aina za Mfumo:**
|
||||
**Vituo vya Kuingia**: Maeneo ambapo mifumo ya nje huingiza data (`document-load`, `agent`)
|
||||
**Mifumo ya Huduma**: Mfumo wa ombi/jibu kwa huduma (`embeddings`, `text-completion`)
|
||||
**Mifumo ya Data**: Vituo vya muunganisho wa mtiririko wa data (`triples-store`, `entity-contexts-load`)
|
||||
|
||||
### 4. Sehemu ya Vigezo
|
||||
Huunganisha majina ya vigezo maalum ya mtiririko na ufafanuzi wa vigezo unaohifadhiwa katika eneo moja:
|
||||
=======
|
||||
**Aina za Vifaa vya Kuunganisha:**
|
||||
**Vifaa vya Kuanzia:** Maeneo ambapo mifumo ya nje huingiza data (`document-load`, `agent`)
|
||||
**Vifaa vya Huduma:** Mfumo wa ombi/jibu kwa huduma (`embeddings`, `text-completion`)
|
||||
**Vifaa vya Data:** Vifaa vya kuunganisha mtiririko wa data (`triples-store`, `entity-contexts-load`)
|
||||
|
||||
### 4. Sehemu ya Vigezo
|
||||
Huunganisha majina ya vigezo maalum ya mtiririko na ufafanuzi wa vigezo unaohifadhiwa katika eneo la kati:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```json
|
||||
"parameters": {
|
||||
"model": "llm-model",
|
||||
"temp": "temperature",
|
||||
"chunk": "chunk-size"
|
||||
}
|
||||
```
|
||||
|
||||
**Sifa:**
|
||||
<<<<<<< HEAD
|
||||
Nenosiri ni majina ya vigezo yanayotumika katika mipangilio ya kichakata (k.m., `{model}`)
|
||||
=======
|
||||
Funguo ni majina ya vigezo yanayotumika katika mipangilio ya kichakata (k.m., `{model}`)
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Maelezo yanaashiria ufafanuzi wa vigezo uliohifadhiwa katika schema/config
|
||||
Inaruhusu matumizi ya mara kwa mara ya ufafanuzi wa kawaida wa vigezo katika michakato.
|
||||
Hupunguza marudio ya schemas za vigezo.
|
||||
|
||||
### 5. Meta Data
|
||||
<<<<<<< HEAD
|
||||
Taarifa za ziada kuhusu mpango wa mtiririko:
|
||||
=======
|
||||
Habari ya ziada kuhusu mpango wa mtiririko:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```json
|
||||
"description": "Human-readable description",
|
||||
"tags": ["capability-1", "capability-2"]
|
||||
```
|
||||
|
||||
## Vigezo vya Kiolele
|
||||
|
||||
### Vigezo vya Mfumo
|
||||
|
||||
#### {id}
|
||||
<<<<<<< HEAD
|
||||
Huibwa na kitambulisho kipekee cha kila mfumo.
|
||||
Huunda rasilimali zilizotengwa kwa kila mfumo.
|
||||
Mifano: `flow-123`, `customer-A-flow`
|
||||
|
||||
#### {class}
|
||||
Huibwa na jina la mpango wa mfumo.
|
||||
Huunda rasilimali zilizoshirikiwa katika mifumo ya aina moja.
|
||||
Mifano: `standard-rag`, `enterprise-rag`
|
||||
=======
|
||||
Huibadilishwa na kitambulisho kipekee cha kila mtiririko.
|
||||
Huunda rasilimali zilizojitenga kwa kila mtiririko.
|
||||
Mfano: `flow-123`, `customer-A-flow`
|
||||
|
||||
#### {class}
|
||||
Huibadilishwa na jina la mpango wa mtiririko.
|
||||
Huunda rasilimali zilizoshirikiwa katika mitiririko ya aina moja.
|
||||
Mfano: `standard-rag`, `enterprise-rag`
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
### Vigezo vya Parameta
|
||||
|
||||
#### {parameter-name}
|
||||
<<<<<<< HEAD
|
||||
Parameta maalum zilizobainishwa wakati wa kuanzisha mfumo.
|
||||
Majina ya parameta yanalingana na funguo katika sehemu ya `parameters` ya mfumo.
|
||||
Hutumiwa katika mipangilio ya kichakuzi ili kuboresha tabia.
|
||||
Mifano: `{model}`, `{temp}`, `{chunk}`
|
||||
Huibwa na maadili yaliyotolewa wakati wa kuanzisha mfumo.
|
||||
Yanathibitishwa kulingana na ufafanuzi wa parameta uliohifadhiwa katika mfumo.
|
||||
|
||||
## Mpangilio wa Kichakuzi
|
||||
|
||||
Mpangilio hutoa maadili ya usanidi kwa vichakuzi wakati wa uundaji. Inaweza kuwa:
|
||||
|
||||
### Mpangilio Thabiti
|
||||
=======
|
||||
Parameta maalum zilizobainishwa wakati wa kuanzisha mtiririko.
|
||||
Majina ya parametri yanalingana na funguo katika sehemu ya `parameters` ya mtiririko.
|
||||
Hutumiwa katika mipangilio ya kichakataji ili kuboresha tabia.
|
||||
Mifano: `{model}`, `{temp}`, `{chunk}`
|
||||
Huibadilishwa na maadili yaliyotolewa wakati wa kuanzisha mtiririko.
|
||||
Yanathibitishwa kulingana na ufafanuzi wa parametri uliohifadhiwa katika mfumo.
|
||||
|
||||
## Mipangilio ya Kichakataji
|
||||
|
||||
Mipangilio hutoa maadili ya usanidi kwa vichakataji wakati wa uundaji. Inaweza kuwa:
|
||||
|
||||
### Mipangilio Thabiti
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Maadili ya moja kwa moja ambayo hayubadiliki:
|
||||
```json
|
||||
"settings": {
|
||||
"model": "gemma3:12b",
|
||||
"temperature": 0.7,
|
||||
"max_retries": 3
|
||||
}
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
### Mipangilio Iliyobadilishwa
|
||||
=======
|
||||
### Parameta za Mpangilio
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Maelezo ambayo hutumia vigezo vilivyotolewa wakati wa kuanzisha mtiririko:
|
||||
```json
|
||||
"settings": {
|
||||
"model": "{model}",
|
||||
"temperature": "{temp}",
|
||||
"endpoint": "https://{region}.api.example.com"
|
||||
}
|
||||
```
|
||||
|
||||
Majina ya vigezo katika mipangilio yanalingana na funguo katika sehemu ya `parameters` ya mtiririko.
|
||||
|
||||
### Mifano ya Mipangilio
|
||||
|
||||
**Mchakato wa LLM na Vigezo:**
|
||||
```json
|
||||
// In parameters section:
|
||||
"parameters": {
|
||||
"model": "llm-model",
|
||||
"temp": "temperature",
|
||||
"tokens": "max-tokens",
|
||||
"key": "openai-api-key"
|
||||
}
|
||||
|
||||
// In processor definition:
|
||||
"text-completion:{class}": {
|
||||
"request": "non-persistent://tg/request/text-completion:{class}",
|
||||
"response": "non-persistent://tg/response/text-completion:{class}",
|
||||
"settings": {
|
||||
"model": "{model}",
|
||||
"temperature": "{temp}",
|
||||
"max_tokens": "{tokens}",
|
||||
"api_key": "{key}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Kifurushi cha Mpangilio wa Kurekebisha na Unaoweza Kubadilishwa:**
|
||||
=======
|
||||
**Sehemu za Kujumuisha Zenye Mipangilio Thabiti na Inayoweza Kubadilishwa:**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```json
|
||||
// In parameters section:
|
||||
"parameters": {
|
||||
"chunk": "chunk-size"
|
||||
}
|
||||
|
||||
// In processor definition:
|
||||
"chunker:{id}": {
|
||||
"input": "persistent://tg/flow/chunk:{id}",
|
||||
"output": "persistent://tg/flow/chunk-load:{id}",
|
||||
"settings": {
|
||||
"chunk_size": "{chunk}",
|
||||
"chunk_overlap": 100,
|
||||
"encoding": "utf-8"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Mifumo ya Kinyororo (Pulsar)
|
||||
|
||||
<<<<<<< HEAD
|
||||
Mfumo wa mtiririko hutumia Apache Pulsar kwa ujumbe. Majina ya nyororo yanafuata muundo wa Pulsar:
|
||||
=======
|
||||
Mfumo wa mtiririko hutumia Apache Pulsar kwa ajili ya ujumbe. Majina ya nyororo yanafuata muundo wa Pulsar:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```
|
||||
<persistence>://<tenant>/<namespace>/<topic>
|
||||
```
|
||||
|
||||
### Vipengele:
|
||||
<<<<<<< HEAD
|
||||
**uthibitisho**: `persistent` au `non-persistent` (Njia ya uthibitisho ya Pulsar)
|
||||
**mwendeshaji**: `tg` kwa maelezo ya muundo wa mtiririko yanayotolewa na TrustGraph
|
||||
**nafasi**: Inaonyesha muundo wa ujumbe
|
||||
=======
|
||||
**uhifadhi**: `persistent` au `non-persistent` (Njia ya uhifadhi ya Pulsar)
|
||||
**mwendeshaji**: `tg` kwa maelekezo ya muundo wa mtiririko yanayotolewa na TrustGraph
|
||||
**nafasi**: Inaonyesha mtindo wa ujumbe
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
`flow`: Huduma za "tumia na usahau"
|
||||
`request`: Sehemu ya ombi ya huduma za ombi/jibu
|
||||
`response`: Sehemu ya jibu ya huduma za ombi/jibu
|
||||
**mada**: Jina maalum la folyo/mada na vigezo vya kiolezo
|
||||
|
||||
### Folyozilizohifadhiwa
|
||||
<<<<<<< HEAD
|
||||
Muundo: `persistent://tg/flow/<topic>:{id}`
|
||||
Inatumika kwa huduma za "tumia na usahau" na mtiririko wa data endelevu
|
||||
Data inabaki katika hifadhi ya Pulsar wakati wa kuanzishwa upya
|
||||
Mfano: `persistent://tg/flow/chunk-load:{id}`
|
||||
|
||||
### Folyozilizohifadhiwa
|
||||
Muundo: `non-persistent://tg/request/<topic>:{class}` au `non-persistent://tg/response/<topic>:{class}`
|
||||
Inatumika kwa muundo wa ujumbe wa ombi/jibu
|
||||
Inapotea, haihifadhiwa kwenye diski na Pulsar
|
||||
Latensi ndogo, inafaa kwa mawasiliano ya aina ya RPC
|
||||
=======
|
||||
Mtindo: `persistent://tg/flow/<topic>:{id}`
|
||||
Inatumika kwa huduma za "tumia na usahau" na mtiririko wa data endelevu
|
||||
Data inahifadhiwa katika hifadhi ya Pulsar katika kuanzishwa upya
|
||||
Mfano: `persistent://tg/flow/chunk-load:{id}`
|
||||
|
||||
### Folyozilizohifadhiwa
|
||||
Mtindo: `non-persistent://tg/request/<topic>:{class}` au `non-persistent://tg/response/<topic>:{class}`
|
||||
Inatumika kwa mitindo ya ujumbe ya ombi/jibu
|
||||
Inapotea, haihifadhiwi kwenye diski na Pulsar
|
||||
Latensi ya chini, inayofaa kwa mawasiliano ya aina ya RPC
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Mfano: `non-persistent://tg/request/embeddings:{class}`
|
||||
|
||||
## Usanifu wa Mtiririko wa Data
|
||||
|
||||
<<<<<<< HEAD
|
||||
Muundo wa mtiririko huunda mtiririko wa data unaounganishwa ambapo:
|
||||
|
||||
1. **Mchakato wa Kusindika Nyaraka**: Mtiririko kutoka kwa kupokea hadi kubadilisha hadi kuhifadhi
|
||||
=======
|
||||
Muundo wa mtiririko huunda mtiririko wa data uliounganishwa ambapo:
|
||||
|
||||
1. **Mchakato wa Kusindika Nyaraka**: Mtiririko kutoka kwa kupokea kupitia mabadiliko hadi kuhifadhi
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
2. **Huduma za Uchunguzi**: Wasindikaji waliojumuishwa ambao huchunguza hifadhi na huduma sawa za data
|
||||
3. **Huduma Zilizoshirikiwa**: Wasindikaji wa kati ambao mtiririko wote unaweza kutumia
|
||||
4. **Waandikaji wa Hifadhi**: Kuhifadhi data iliyosindikwa kwenye hifadhi husika
|
||||
|
||||
<<<<<<< HEAD
|
||||
Wasindikaji wote (wote `{id}` na `{class}`) hufanya kazi pamoja kama grafu moja ya mtiririko wa data, sio mifumo tofauti.
|
||||
=======
|
||||
Wasindikaji wote (wote `{id}` na `{class}`) hufanya kazi pamoja kama grafu ya mtiririko wa data iliyounganishwa, sio mifumo tofauti.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Uanzishaji wa Mfano wa Mtiririko
|
||||
|
||||
Imepewa:
|
||||
Kitambulisho cha Mfano wa Mtiririko: `customer-A-flow`
|
||||
Muundo wa Mtiririko: `standard-rag`
|
||||
Ramani za vigezo vya mtiririko:
|
||||
`"model": "llm-model"`
|
||||
`"temp": "temperature"`
|
||||
`"chunk": "chunk-size"`
|
||||
Vigezo vilivyotolewa na mtumiaji:
|
||||
`model`: `gpt-4`
|
||||
`temp`: `0.5`
|
||||
`chunk`: `512`
|
||||
|
||||
Upanuzi wa kiolezo:
|
||||
`persistent://tg/flow/chunk-load:{id}` → `persistent://tg/flow/chunk-load:customer-A-flow`
|
||||
`non-persistent://tg/request/embeddings:{class}` → `non-persistent://tg/request/embeddings:standard-rag`
|
||||
`"model": "{model}"` → `"model": "gpt-4"`
|
||||
`"temperature": "{temp}"` → `"temperature": "0.5"`
|
||||
`"chunk_size": "{chunk}"` → `"chunk_size": "512"`
|
||||
|
||||
Hii huunda:
|
||||
Mchakato wa kusindika nyaraka uliotengwa kwa `customer-A-flow`
|
||||
<<<<<<< HEAD
|
||||
Huduma ya pamoja ya uingizaji kwa mtiririko wote wa `standard-rag`
|
||||
Mtiririko kamili kutoka kwa kupokea nyaraka hadi uchunguzi
|
||||
Wasindikaji walioelekezwa na maadili ya vigezo vilivyotolewa
|
||||
=======
|
||||
Huduma ya pamoja ya uingishaji kwa mtiririko wote wa `standard-rag`
|
||||
Mtiririko kamili kutoka kwa kupokea nyaraka hadi uchunguzi
|
||||
Wasindikaji uliopangwa na maadili ya vigezo vilivyotolewa
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Faida
|
||||
|
||||
1. **Ufanisi wa Rasilimali**: Huduma ghali hushirikiwa katika mitiririko
|
||||
2. **Kutengwa kwa Mtiririko**: Kila mtiririko una mchakato wake wa kusindika data
|
||||
3. **Uwezo wa Kupanuka**: Inaweza kuanzisha mitiririko mingi kutoka kwa kiolezo kimoja
|
||||
4. **Uunganishaji**: Tofauti wazi kati ya vipengele vilivyoshirikiwa na vilivyohusiana na mtiririko
|
||||
5. **Usanifu Uliounganishwa**: Uchunguzi na usindikaji ni sehemu ya mtiririko mmoja wa data
|
||||
712
docs/tech-specs/sw/flow-configurable-parameters.sw.md
Normal file
712
docs/tech-specs/sw/flow-configurable-parameters.sw.md
Normal file
|
|
@ -0,0 +1,712 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Mfumo wa Uwekaji Njia (Flow Blueprint) - Vigezo Vinavyoweza Kubadilishwa - Maelezo ya Kiufundi"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Mfumo wa Uwekaji Njia (Flow Blueprint) - Vigezo Vinavyoweza Kubadilishwa - Maelezo ya Kiufundi
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Maelezo
|
||||
|
||||
Maelezo haya yanaeleza utekelezaji wa vigezo vinavyoweza kubadilishwa kwa mifumo ya uwekaji njia (flow blueprints) katika TrustGraph. Vigezo huruhusu watumiaji kubadilisha vigezo vya kichakato (processor) wakati wa kuanzisha mfumo wa uwekaji njia kwa kutoa maadili ambayo hubadilisha nafasi za vigezo katika ufafanuzi wa mfumo wa uwekaji njia.
|
||||
|
||||
<<<<<<< HEAD
|
||||
Vigezo hufanya kazi kupitia ubadilishaji wa vigezo vya kishabaha katika vigezo vya kichakato, sawa na jinsi vigezo vya `{id}` na `{class}` hufanya kazi, lakini kwa maadili ambayo hutolewa na mtumiaji.
|
||||
=======
|
||||
Vigezo hufanya kazi kupitia ubadilishaji wa vigezo vya kishabaha katika vigezo vya kichakato, sawa na jinsi vigezo `{id}` na `{class}` hufanya kazi, lakini kwa maadili ambayo hutolewa na mtumiaji.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
Uunganishaji huu unaunga mkono matumizi manne makuu:
|
||||
|
||||
1. **Uchaguzi wa Mfumo**: Kuruhusu watumiaji kuchagua mifumo tofauti ya LLM (e.g., `gemma3:8b`, `gpt-4`, `claude-3`) kwa vichakato.
|
||||
2. **Uwekaji Njia wa Rasilimali**: Kurekebisha vigezo vya kichakato kama vile saizi za sehemu, saizi za kundi, na mipaka ya utendaji.
|
||||
3. **Urekebishaji wa Tabia**: Kubadilisha tabia ya kichakato kupitia vigezo kama vile halijoto, max-tokens, au viwango vya urejesho.
|
||||
4. **Vigezo Maalum ya Mazingira**: Kusanidi sehemu za mwisho, funguo za API, au anwani za mtandao (URLs) maalum kwa eneo kwa kila uwekaji.
|
||||
|
||||
## Lengo
|
||||
|
||||
**Uwekaji Njia wa Kichakato Unaoweza Kubadilishwa**: Kuruhusu uwekaji njia wa vigezo vya kichakato wakati wa utendaji kupitia ubadilishaji wa vigezo.
|
||||
**Uthibitisho wa Vigezo**: Kutoa ukaguzi wa aina na uthibitisho wa vigezo wakati wa kuanzisha mfumo wa uwekaji njia.
|
||||
**Maadili ya Msingi**: Kusaidia maadili ya msingi ambayo yanafaa lakini kuruhusu ubadilishaji kwa watumiaji wa hali ya juu.
|
||||
**Ubadilishaji wa Kishabaha**: Kubadilisha nafasi za vigezo katika vigezo vya kichakato kwa urahisi.
|
||||
<<<<<<< HEAD
|
||||
**Uunganishaji wa UI**: Kuruhusu uingizaji wa vigezo kupitia interfaces za API na UI.
|
||||
=======
|
||||
**Uunganisho wa UI**: Kuruhusu uingizaji wa vigezo kupitia interfaces za API na UI.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
**Usalama wa Aina**: Kuhakikisha kwamba aina za vigezo zinafanana na aina zilizotarajiwa za vigezo vya kichakato.
|
||||
**Ufafanuzi**: Mifumo ya vigezo inayojieleza yenyewe ndani ya ufafanuzi wa mifumo ya uwekaji njia.
|
||||
**Ulinganifu na Mifumo ya Zamani**: Kuhifadhi ulinganifu na mifumo ya uwekaji njia iliyopo ambayo haitumii vigezo.
|
||||
|
||||
## Asili
|
||||
|
||||
<<<<<<< HEAD
|
||||
Mifumo ya uwekaji njia katika TrustGraph sasa inaunga mkono vigezo vya kichakato ambavyo yanaweza kuwa na maadili thabiti au nafasi za vigezo. Hii huunda fursa ya urekebishaji wakati wa utendaji.
|
||||
=======
|
||||
Mifumo ya uwekaji njia katika TrustGraph sasa inasaidia vigezo vya kichakato ambavyo yanaweza kuwa na maadili thabiti au nafasi za vigezo. Hii huunda fursa ya urekebishaji wakati wa utendaji.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
Vigezo vya kichakato vya sasa vinaunga mkono:
|
||||
Maadili thabiti: `"model": "gemma3:12b"`
|
||||
Nafasi za vigezo: `"model": "gemma3:{model-size}"`
|
||||
|
||||
Maelezo haya yanaeleza jinsi vigezo:
|
||||
Yanavyoonyeshwa katika ufafanuzi wa mifumo ya uwekaji njia
|
||||
Yanavyothibitishwa wakati wa kuanzisha mifumo ya uwekaji njia
|
||||
Yanavyobadilishwa katika vigezo vya kichakato
|
||||
Yanavyoonyeshwa kupitia API na UI
|
||||
|
||||
Kwa kutumia vigezo vya kichakato, TrustGraph inaweza:
|
||||
Kupunguza uduzi wa mifumo ya uwekaji njia kwa kutumia vigezo kwa tofauti
|
||||
Kuruhusu watumiaji kurekebisha tabia ya kichakato bila kubadilisha ufafanuzi
|
||||
Kusaidia usanidi maalum wa mazingira kupitia maadili ya vigezo
|
||||
Kuhifadhi usalama wa aina kupitia uthibitisho wa shabaha ya vigezo
|
||||
|
||||
<<<<<<< HEAD
|
||||
## Ubunifu wa Kiufundi
|
||||
|
||||
### Muundo
|
||||
|
||||
Mfumo wa vigezo vinavyoweza kubadilishwa unahitaji vipengele vifuatavyo vya kiufundi:
|
||||
|
||||
1. **Ufafanuzi wa Shabaha ya Vigezo**
|
||||
Ufafanuzi wa vigezo unaotegemea shabaha ya JSON ndani ya metadata ya mfumo wa uwekaji njia.
|
||||
=======
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Mfumo wa vigezo unaoweza kubadilishwa unahitaji vipengele vifuatavyo vya kiufundi:
|
||||
|
||||
1. **Ufafanuzi wa Shabaha ya Vigezo**
|
||||
Ufafanuzi wa vigezo unaotegemea kwenye JSON Schema ndani ya metadata ya mfumo wa uwekaji njia.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Ufafanuzi wa aina ikiwa ni pamoja na aina ya maandishi, nambari, ya kweli, enum, na aina ya kitu.
|
||||
Kanuni za uthibitisho ikiwa ni pamoja na maadili ya chini/juu, mifumo, na mashamba yanayohitajika.
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/flow/definition.py
|
||||
|
||||
<<<<<<< HEAD
|
||||
2. **Injini ya Ufumbuzi wa Vigezo**
|
||||
Uthibitisho wa vigezo wakati wa utendaji dhidi ya shabaha.
|
||||
Matumizi ya maadili ya msingi kwa vigezo ambavyo havijatolewa.
|
||||
Uingizaji wa vigezo katika muktadha wa utendaji wa mfumo wa uwekaji njia.
|
||||
=======
|
||||
2. **Injini ya Urekebishaji wa Vigezo**
|
||||
Uthibitisho wa vigezo wakati wa utendaji dhidi ya shabaha.
|
||||
Matumizi ya maadili ya msingi kwa vigezo ambavyo havijatolewa.
|
||||
Uingizaji wa vigezo katika muktadha wa utekelezaji wa mfumo wa uwekaji njia.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Marekebisho na ubadilishaji wa aina kama inavyohitajika.
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/flow/parameter_resolver.py
|
||||
|
||||
<<<<<<< HEAD
|
||||
3. **Uunganishaji wa Hifadhi ya Vigezo**
|
||||
=======
|
||||
3. **Uunganisho wa Hifadhi ya Vigezo**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Kupata ufafanuzi wa vigezo kutoka kwa duka la shabaha/usanidi.
|
||||
Kuhifadhi ufafanuzi wa vigezo unaotumika mara kwa mara.
|
||||
Uthibitisho dhidi ya shabaha zilizohifadhiwa katikati.
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/flow/parameter_store.py
|
||||
|
||||
4. **Viendelezi vya Kuanzisha Mfumo wa Uwekaji Njia**
|
||||
Viendelezi vya API kukubali maadili ya vigezo wakati wa kuanzisha mfumo wa uwekaji njia.
|
||||
<<<<<<< HEAD
|
||||
Ufumbuzi wa ramani ya vigezo (majina ya mifumo ya uwekaji njia hadi majina ya ufafanuzi).
|
||||
=======
|
||||
Urekebishaji wa ramani ya vigezo (majina ya mifumo kwa majina ya ufafanuzi).
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Usimamizi wa makosa kwa mchanganyiko usiofaa wa vigezo.
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/flow/launcher.py
|
||||
|
||||
5. **Fomu za Vigezo za UI**
|
||||
<<<<<<< HEAD
|
||||
Uundaji wa fomu ya kiotomatiki kutoka kwa metadata ya vigezo ya mfumo wa uwekaji njia.
|
||||
Kuonyesha vigezo kwa utaratibu kwa kutumia `order`.
|
||||
Laha za vigezo za maelezo kwa kutumia `description`.
|
||||
Uthibitisho wa ingizo dhidi ya ufafanuzi wa aina ya vigezo.
|
||||
Vigezo vilivyosanidiwa na vipuli.
|
||||
=======
|
||||
Uzalishaji wa fomu ya kiotomatiki kutoka kwa metadata ya vigezo ya mfumo wa uwekaji njia.
|
||||
Kuonyesha vigezo kwa utaratibu kwa kutumia `order`.
|
||||
Laha za vigezo za maelezo kwa kutumia `description`.
|
||||
Uthibitisho wa uingizaji dhidi ya ufafanuzi wa aina ya vigezo.
|
||||
Vipangilio na vipuli vya vigezo.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
Moduli: trustgraph-ui/components/flow-parameters/
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
#### Ufafanuzi wa Vigezo (Imehifadhiwa katika Shabaha/Usanidi)
|
||||
<<<<<<< HEAD
|
||||
Ufafanuzi wa vigezo unaotegemea shabaha ya JSON ndani ya metadata ya mfumo wa uwekaji njia.
|
||||
Ufafanuzi wa aina ikiwa ni pamoja na aina ya maandishi, nambari, ya kweli, enum, na aina ya kitu.
|
||||
Kanuni za uthibitisho ikiwa ni pamoja na maadili ya chini/juu, mifumo, na mashamba yanayohitajika.
|
||||
=======
|
||||
|
||||
Ufafanuzi wa vigezo umehifadhiwa ndani ya duka la shabaha/usanidi.
|
||||
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```json
|
||||
{
|
||||
"llm-model": {
|
||||
"type": "string",
|
||||
"description": "LLM model to use",
|
||||
"default": "gpt-4",
|
||||
"enum": [
|
||||
{
|
||||
"id": "gpt-4",
|
||||
"description": "OpenAI GPT-4 (Most Capable)"
|
||||
},
|
||||
{
|
||||
"id": "gpt-3.5-turbo",
|
||||
"description": "OpenAI GPT-3.5 Turbo (Fast & Efficient)"
|
||||
},
|
||||
{
|
||||
"id": "claude-3",
|
||||
"description": "Anthropic Claude 3 (Thoughtful & Safe)"
|
||||
},
|
||||
{
|
||||
"id": "gemma3:8b",
|
||||
"description": "Google Gemma 3 8B (Open Source)"
|
||||
}
|
||||
],
|
||||
"required": false
|
||||
},
|
||||
"model-size": {
|
||||
"type": "string",
|
||||
"description": "Model size variant",
|
||||
"default": "8b",
|
||||
"enum": ["2b", "8b", "12b", "70b"],
|
||||
"required": false
|
||||
},
|
||||
"temperature": {
|
||||
"type": "number",
|
||||
"description": "Model temperature for generation",
|
||||
"default": 0.7,
|
||||
"minimum": 0.0,
|
||||
"maximum": 2.0,
|
||||
"required": false
|
||||
},
|
||||
"chunk-size": {
|
||||
"type": "integer",
|
||||
"description": "Document chunk size",
|
||||
"default": 512,
|
||||
"minimum": 128,
|
||||
"maximum": 2048,
|
||||
"required": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Mpango wa Mchakato na Marejeleo ya Vigezo
|
||||
|
||||
Mipango ya mchakato inaelezea metadata ya vigezo pamoja na marejeleo ya aina, maelezo, na mpangilio:
|
||||
|
||||
```json
|
||||
{
|
||||
"flow_class": "document-analysis",
|
||||
"parameters": {
|
||||
"llm-model": {
|
||||
"type": "llm-model",
|
||||
"description": "Primary LLM model for text completion",
|
||||
"order": 1
|
||||
},
|
||||
"llm-rag-model": {
|
||||
"type": "llm-model",
|
||||
"description": "LLM model for RAG operations",
|
||||
"order": 2,
|
||||
"advanced": true,
|
||||
"controlled-by": "llm-model"
|
||||
},
|
||||
"llm-temperature": {
|
||||
"type": "temperature",
|
||||
"description": "Generation temperature for creativity control",
|
||||
"order": 3,
|
||||
"advanced": true
|
||||
},
|
||||
"chunk-size": {
|
||||
"type": "chunk-size",
|
||||
"description": "Document chunk size for processing",
|
||||
"order": 4,
|
||||
"advanced": true
|
||||
},
|
||||
"chunk-overlap": {
|
||||
"type": "integer",
|
||||
"description": "Overlap between document chunks",
|
||||
"order": 5,
|
||||
"advanced": true,
|
||||
"controlled-by": "chunk-size"
|
||||
}
|
||||
},
|
||||
"class": {
|
||||
"text-completion:{class}": {
|
||||
"request": "non-persistent://tg/request/text-completion:{class}",
|
||||
"response": "non-persistent://tg/response/text-completion:{class}",
|
||||
"parameters": {
|
||||
"model": "{llm-model}",
|
||||
"temperature": "{llm-temperature}"
|
||||
}
|
||||
},
|
||||
"rag-completion:{class}": {
|
||||
"request": "non-persistent://tg/request/rag-completion:{class}",
|
||||
"response": "non-persistent://tg/response/rag-completion:{class}",
|
||||
"parameters": {
|
||||
"model": "{llm-rag-model}",
|
||||
"temperature": "{llm-temperature}"
|
||||
}
|
||||
}
|
||||
},
|
||||
"flow": {
|
||||
"chunker:{id}": {
|
||||
"input": "persistent://tg/flow/chunk:{id}",
|
||||
"output": "persistent://tg/flow/chunk-load:{id}",
|
||||
"parameters": {
|
||||
"chunk_size": "{chunk-size}",
|
||||
"chunk_overlap": "{chunk-overlap}"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
Sehemu ya `parameters` inaeleza jina la kila parameter (funguo) inayohusiana na mtiririko, na inaunganisha na vitu vya metadata ya parameter ambavyo vina:
|
||||
`type`: Rejea kwa ufafanuzi wa parameter uliotolewa kwa njia ya kati (k.m., "llm-model")
|
||||
`description`: Maelezo ambayo yanaweza kusomwa na binadamu kwa ajili ya kuonyeshwa kwenye kiolesura (UI)
|
||||
`order`: Mpangilio wa kuonyeshwa wa parameter katika fomu (nambari ndogo huonyeshwa kwanza)
|
||||
`advanced` (hiari): Bendera ya boolean inayoelezea ikiwa hii ni parameter ya hali ya juu (ya kawaida: false). Ikiwa imewekwa kuwa "true", kiolesura kinaweza kuficha parameter hii kwa chagu ku, au kuiweka katika sehemu ya "Advanced"
|
||||
`controlled-by` (hiari): Jina la parameter nyingine ambayo inadhibiti thamani ya parameter hii wakati katika hali rahisi. Ikiwa imeingizwa, parameter hii inaruhusu thamani yake kutoka kwa parameter inayodhibiti, isipokuwa ikiwa imebadilishwa wazi.
|
||||
|
||||
Mbinu hii inaruhusu:
|
||||
Ufafanuzi wa aina ya parameter unaoweza kutumika tena katika mipangilio mingi.
|
||||
Usimamizi na uthibitishaji wa aina ya parameter katika eneo moja.
|
||||
Maelezo na mpangilio wa parameter unaohusiana na kila mtiririko.
|
||||
Uzoefu bora wa kiolesura (UI) kwa kutumia fomu za parameter zenye maelezo.
|
||||
Uthibitishaji thabiti wa parameter katika mitiririko yote.
|
||||
=======
|
||||
Sehemu ya `parameters` inaeleza jina la parameter (funguo) inayohusiana na mtiririko, na inaunganisha na vitu vya metadata ya parameter ambavyo vina:
|
||||
`type`: Rejea kwa ufafanuzi wa parameter uliotolewa kwa njia ya kati (k.m., "llm-model")
|
||||
`description`: Maelezo ambayo yanaweza kusomwa na binadamu kwa ajili ya kuonyeshwa kwenye kiolesura (UI)
|
||||
`order`: Mfululizo wa kuonyeshwa kwa fomu za parameter (nambari ndogo huonyeshwa kwanza)
|
||||
`advanced` (hiari): Bendera ya Boolean inayoelezea ikiwa hii ni parameter ya hali ya juu (ya kawaida: false). Ikiwa imewekwa kuwa "true", kiolesura kinaweza kuficha parameter hii kwa chaguльку au kuiweka katika sehemu ya "Advanced"
|
||||
`controlled-by` (hiari): Jina la parameter nyingine ambayo inadhibiti thamani ya parameter hii wakati katika hali rahisi. Ikiwa imeingizwa, parameter hii inaruhusu thamani yake kutoka kwa parameter inayodhibiti, isipokuwa ikiwa imebadilishwa wazi.
|
||||
|
||||
Mbinu hii inaruhusu:
|
||||
Ufafanuzi wa aina ya parameter unaoweza kutumika tena katika mipangilio mingi ya mtiririko.
|
||||
Usimamizi na uthibitishaji wa aina ya parameter unaozingatia.
|
||||
Maelezo na mpangilio wa parameter unaozingatia mtiririko.
|
||||
Uzoefu bora wa kiolesura (UI) na fomu za parameter zenye maelezo.
|
||||
Uthibitishaji thabiti wa parameter katika mitiririko.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Kuongeza kwa urahisi aina mpya za parameter za kawaida.
|
||||
Kiolesura kilichorahisishwa na mgawanyiko wa hali ya msingi/ya hali ya juu.
|
||||
Urithi wa thamani ya parameter kwa mipangilio inayohusiana.
|
||||
|
||||
#### Ombi la Uzinduzi wa Mtiririko
|
||||
|
||||
API ya uzinduzi wa mtiririko inakubali parameter kwa kutumia majina ya parameter ya mtiririko:
|
||||
|
||||
```json
|
||||
{
|
||||
"flow_class": "document-analysis",
|
||||
"flow_id": "customer-A-flow",
|
||||
"parameters": {
|
||||
"llm-model": "claude-3",
|
||||
"llm-temperature": 0.5,
|
||||
"chunk-size": 1024
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Kumbuka: Katika mfano huu, `llm-rag-model` haitoa maelezo wazi lakini itapokea thamani "claude-3" kutoka kwa `llm-model` kutokana na uhusiano wake wa `controlled-by`. Vile vile, `chunk-overlap` inaweza kupokea thamani iliyohitajiwa kulingana na `chunk-size`.
|
||||
|
||||
Mfumo utafanya:
|
||||
1. Kuchukua metadata ya vigezo kutoka ufafanuzi wa mpango (blueprint).
|
||||
2. Kuunganisha majina ya vigezo vya mpango na ufafanuzi wao wa aina (e.g., `llm-model` → `llm-model` aina).
|
||||
3. Kutatua uhusiano wa "controlled-by" (e.g., `llm-rag-model` inarithi kutoka kwa `llm-model`).
|
||||
<<<<<<< HEAD
|
||||
4. Kuthibitisha maadili yaliyotolewa na mtumiaji na yaliyorithiwa dhidi ya ufafanuzi wa aina ya vigezo.
|
||||
=======
|
||||
4. Kuthibitisha maadili yaliyotolewa na mtumiaji na yaliyorithishwa dhidi ya ufafanuzi wa aina ya vigezo.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
5. Kubadilisha maadili yaliyotatuliwa katika vigezo vya kichakataji (processor) wakati wa kuunda mpango.
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
#### Mchakato wa Kutatua Vigezo
|
||||
|
||||
Wakati mpango unaanza, mfumo hufanya hatua zifuatazo za kutatua vigezo:
|
||||
|
||||
1. **Kupakia Mpango (Flow Blueprint)**: Pakia ufafanuzi wa mpango na uchukue metadata ya vigezo.
|
||||
<<<<<<< HEAD
|
||||
2. **Kuchukua Metadata**: Chukua `type`, `description`, `order`, `advanced`, na `controlled-by` kwa kila kiparamu kilichoainishwa katika sehemu ya `parameters` ya ufafanuzi wa mpango.
|
||||
3. **Kutafuta Ufafanuzi wa Aina**: Kwa kila kiparamu katika ufafanuzi wa mpango:
|
||||
Pata ufafanuzi wa aina ya kiparamu kutoka kwa duka la schema/config kwa kutumia sehemu ya `type`.
|
||||
Ufafanuzi wa aina huhifadhiwa na aina "parameter-type" katika mfumo wa config.
|
||||
Kila ufafanuzi wa aina una schema ya kiparamu, thamani ya chaguo-msingi, na sheria za uthibitishaji.
|
||||
4. **Kutatua Thamani ya Chaguo-msingi**:
|
||||
Kwa kila kiparamu kilichoainishwa katika ufafanuzi wa mpango:
|
||||
Angalia ikiwa mtumiaji ametoa thamani kwa kiparamu hiki.
|
||||
Ikiwa hakuna thamani iliyotolewa na mtumiaji, tumia thamani ya `default` kutoka kwa ufafanuzi wa aina ya kiparamu.
|
||||
Unda ramani kamili ya vigezo inayojumuisha maadili yaliyotolewa na mtumiaji na maadili chaguo-msingi.
|
||||
5. **Kutatua Ufuataji wa Vigezo** (uhusiano wa "controlled-by"):
|
||||
Kwa vigezo vyenye sehemu ya `controlled-by`, angalia ikiwa thamani ilitolewa wazi.
|
||||
Ikiwa hakuna thamani iliyotolewa wazi, arithia thamani kutoka kwa kiparamu kinachodhibiti.
|
||||
Ikiwa kiparamu kinachodhibiti pia hakina thamani, tumia chaguo-msingi kutoka kwa ufafanuzi wa aina.
|
||||
Hakikisha kuwa hakuna utegemezi wa mzunguko katika uhusiano wa `controlled-by`.
|
||||
6. **Uthibitishaji**: Thibitisha seti kamili ya vigezo (vile vilivyotolewa na mtumiaji, chaguo-msingi, na vile vilivyorithiwa) dhidi ya ufafanuzi wa aina.
|
||||
=======
|
||||
2. **Kuchukua Metadata**: Chukua `type`, `description`, `order`, `advanced`, na `controlled-by` kwa kila kigezo kilichoainishwa katika sehemu ya `parameters` ya ufafanuzi wa mpango.
|
||||
3. **Kutafuta Ufafanuzi wa Aina**: Kwa kila kigezo katika ufafanuzi wa mpango:
|
||||
Pata ufafanuzi wa aina ya kigezo kutoka kwa duka la schema/config kwa kutumia sehemu ya `type`.
|
||||
Ufafanuzi wa aina huhifadhiwa na aina "parameter-type" katika mfumo wa config.
|
||||
Kila ufafanuzi wa aina una schema ya kigezo, thamani ya chaguo-msingi, na sheria za uthibitishaji.
|
||||
4. **Kutatua Thamani ya Chaguo-msingi**:
|
||||
Kwa kila kigezo kilichoainishwa katika ufafanuzi wa mpango:
|
||||
Angalia ikiwa mtumiaji ametoa thamani kwa kigezo hiki.
|
||||
Ikiwa hakuna thamani iliyotolewa na mtumiaji, tumia thamani ya `default` kutoka kwa ufafanuzi wa aina ya kigezo.
|
||||
Unda ramani kamili ya vigezo inayojumuisha maadili yaliyotolewa na mtumiaji na maadili chaguo-msingi.
|
||||
5. **Kutatua Ufuasi wa Vigezo** (uhusiano wa "controlled-by"):
|
||||
Kwa vigezo vyenye sehemu ya `controlled-by`, angalia ikiwa thamani ilitolewa wazi.
|
||||
Ikiwa hakuna thamani iliyotolewa wazi, arithia thamani kutoka kwa kigezo kinachodhibiti.
|
||||
Ikiwa kigezo kinachodhibiti pia hakina thamani, tumia chaguo-msingi kutoka kwa ufafanuzi wa aina.
|
||||
Hakikisha kuwa hakuna utegemezi wa mzunguko katika uhusiano wa `controlled-by`.
|
||||
6. **Uthibitishaji**: Thibitisha seti kamili ya vigezo (vile vilivyotolewa na mtumiaji, chaguo-msingi, na vile vilivyorithishwa) dhidi ya ufafanuzi wa aina.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
7. **Uhifadhi**: Hifadhi seti kamili ya vigezo yaliyotatuliwa pamoja na mfano wa mpango kwa ajili ya uhakiki.
|
||||
8. **Ubadilishaji wa Kigezo**: Badilisha nafasi za kigezo katika vigezo vya kichakataji na maadili yaliyotatuliwa.
|
||||
9. **Uundaji wa Kichakataji**: Unda vichakataji na vigezo vilivyobadilishwa.
|
||||
|
||||
**Maelezo Muhimu ya Utendaji:**
|
||||
<<<<<<< HEAD
|
||||
Huduma ya mpango INAVYOHITAJI kuchanganya vigezo vilivyotolewa na mtumiaji na chaguo-msingi kutoka kwa ufafanuzi wa aina ya kiparamu.
|
||||
Seti kamili ya vigezo (ikiwa ni pamoja na chaguo-msingi zilizotumiwa) INAVYOHITAJI kuhifadhiwa na mpango kwa ajili ya ufuatiliaji.
|
||||
Kutatua vigezo hufanyika wakati wa kuanza kwa mpango, sio wakati wa kuunda kichakataji.
|
||||
Vigezo muhimu ambavyo havina chaguo-msingi HAVIHUITAJI kusababisha kuanza kwa mpango kushindwa na ujumbe wa kosa wazi.
|
||||
|
||||
#### Ufuataji wa Vigezo na "controlled-by"
|
||||
|
||||
Sehemu ya `controlled-by` inaruhusu urithi wa thamani ya kiparamu, ambayo ni muhimu sana kwa kurahisisha mazingira ya mtumiaji huku ikiendelea kudumisha uwezekano:
|
||||
|
||||
**Mfano wa Matukio:**
|
||||
Kiparamu cha `llm-model` kinadhibiti mfumo mkuu wa LLM.
|
||||
Kiparamu cha `llm-rag-model` kina `"controlled-by": "llm-model"`.
|
||||
Katika hali rahisi, kuweka `llm-model` kwa "gpt-4" huanzisha kiotomatiki `llm-rag-model` kwa "gpt-4" pia.
|
||||
Katika hali ya juu, watumiaji wanaweza kubadilisha `llm-rag-model` na thamani tofauti.
|
||||
|
||||
**Sheria za Kutatua:**
|
||||
1. Ikiwa kiparamu kina thamani iliyotolewa wazi, tumia thamani hiyo.
|
||||
2. Ikiwa hakuna thamani iliyotolewa wazi na `controlled-by` imewekwa, tumia thamani ya kiparamu kinachodhibiti.
|
||||
3. Ikiwa kiparamu kinachodhibiti hakina thamani, rudi kwenye chaguo-msingi kutoka kwa ufafanuzi wa aina.
|
||||
4. Utendaji wa mzunguko katika uhusiano wa `controlled-by` husababisha kosa la uthibitishaji.
|
||||
|
||||
**Tabia ya UI:**
|
||||
Katika hali ya msingi/rahisi: Vigezo vyenye `controlled-by` vinaweza kufichwa au kuonyeshwa kama visivyo na uwezo wa kubadilishwa na thamani iliyoarithi.
|
||||
Katika hali ya juu: Vigezo vyote huonyeshwa na vinaweza kusanidiwa kivyake.
|
||||
Wakati kiparamu kinachodhibiti kinapobadilika, vigezo vinavyotegemea hupatikana kiotomatiki isipokuwa zimebadilishwa wazi.
|
||||
|
||||
#### Uunganisho wa Pulsar
|
||||
=======
|
||||
Huduma ya mpango INAVYOKWENDA kuchanganya vigezo vilivyotolewa na mtumiaji na chaguo-msingi kutoka kwa ufafanuzi wa aina ya kigezo.
|
||||
Seti kamili ya vigezo (ikiwa ni pamoja na chaguo-msingi zilizotumiwa) INAVYOKWENDA kuhifadhiwa na mpango kwa ajili ya ufuatiliaji.
|
||||
Kutatua vigezo hufanyika wakati wa kuanza kwa mpango, sio wakati wa kuunda kichakataji.
|
||||
Vigezo muhimu ambavyo havina chaguo-msingi HAVI kukatisha kuanza kwa mpango na kuonyesha ujumbe wa kosa wazi.
|
||||
|
||||
#### Ufuasi wa Vigezo na "controlled-by"
|
||||
|
||||
Sehemu ya `controlled-by` inaruhusu urithi wa thamani ya kigezo, ambayo ni muhimu sana kwa kurahisisha mazingira ya mtumiaji huku ikiendelea kuwezesha utendaji:
|
||||
|
||||
**Mfano wa Matukio:**
|
||||
Kigezo cha `llm-model` kinadhibiti mfumo mkuu wa LLM.
|
||||
Kigezo cha `llm-rag-model` kina `"controlled-by": "llm-model"`.
|
||||
Katika hali rahisi, kuweka `llm-model` kwa "gpt-4" hufanya `llm-rag-model` pia iwe "gpt-4" kiotomatiki.
|
||||
Katika hali ya juu, watumiaji wanaweza kubadilisha `llm-rag-model` na thamani tofauti.
|
||||
|
||||
**Sheria za Kutatua:**
|
||||
1. Ikiwa kigezo kina thamani iliyotolewa wazi, tumia thamani hiyo.
|
||||
2. Ikiwa hakuna thamani iliyotolewa na `controlled-by` imewekwa, tumia thamani ya kigezo kinachodhibiti.
|
||||
3. Ikiwa kigezo kinachodhibiti hakina thamani, rudi kwenye chaguo-msingi kutoka kwa ufafanuzi wa aina.
|
||||
4. Utendaji wa mzunguko katika uhusiano wa `controlled-by` husababisha kosa la uthibitishaji.
|
||||
|
||||
**Tabia ya UI:**
|
||||
Katika hali ya msingi/rahisi: Vigezo vyenye `controlled-by` vinaweza kufichwa au kuonyeshwa kama visivyo na uwezo (read-only) na thamani iliyoorithishwa.
|
||||
Katika hali ya juu: Vigezo vyote huonyeshwa na vinaweza kusanidiwa kivyake.
|
||||
Wakati kigezo kinachodhibiti kinapobadilika, vigezo vinavyotegemea hupatikana kiotomatiki isipokuwa ikiwa vimewekwa kuwa visivyo na uwezo.
|
||||
|
||||
#### Ujumuishaji wa Pulsar
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
1. **Operesheni ya Kuanza-Mpango**
|
||||
Operesheni ya kuanza-mpango ya Pulsar inahitaji kukubali sehemu ya `parameters` inayojumuisha ramani ya maadili ya vigezo.
|
||||
Schema ya ombi la kuanza-mpango ya Pulsar inapaswa kusasishwa ili kujumuisha sehemu ya `parameters` ya hiari.
|
||||
Mfano wa ombi:
|
||||
```json
|
||||
{
|
||||
"flow_class": "document-analysis",
|
||||
"flow_id": "customer-A-flow",
|
||||
"parameters": {
|
||||
"model": "claude-3",
|
||||
"size": "12b",
|
||||
"temp": 0.5,
|
||||
"chunk": 1024
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
2. **Operesheni ya Kupata Mtiririko**
|
||||
Mfumo wa Pulsar wa jibu la "get-flow" lazima ubadilishwe ili kujumuisha sehemu ya `parameters`
|
||||
Hii inaruhusu wateja kupata maadili ya vigezo ambayo yalitumiwa wakati mtiririko ulipoanzishwa.
|
||||
Jibu la mfano:
|
||||
=======
|
||||
2. **Operesheni ya Kupata Mtiririko (Get-Flow)**
|
||||
Mfumo wa Pulsar wa jibu la operesheni ya kupata mtiririko lazima ubadilishwe ili kujumuisha sehemu `parameters`
|
||||
Hii inaruhusu wateja kupata maadili ya vigezo ambayo yalitumiwa wakati mtiririko ulipoanzishwa.
|
||||
Mfano wa jibu:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```json
|
||||
{
|
||||
"flow_id": "customer-A-flow",
|
||||
"flow_class": "document-analysis",
|
||||
"status": "running",
|
||||
"parameters": {
|
||||
"model": "claude-3",
|
||||
"size": "12b",
|
||||
"temp": 0.5,
|
||||
"chunk": 1024
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Utendaji wa Huduma ya Mchakato
|
||||
|
||||
Huduma ya usanidi wa mchakato (`trustgraph-flow/trustgraph/config/service/flow.py`) inahitaji maboresho yafuatayo:
|
||||
|
||||
1. **Kitendaji cha Ufafanuzi wa Vigezo**
|
||||
```python
|
||||
async def resolve_parameters(self, flow_class, user_params):
|
||||
"""
|
||||
Resolve parameters by merging user-provided values with defaults.
|
||||
|
||||
Args:
|
||||
flow_class: The flow blueprint definition dict
|
||||
user_params: User-provided parameters dict
|
||||
|
||||
Returns:
|
||||
Complete parameter dict with user values and defaults merged
|
||||
"""
|
||||
```
|
||||
|
||||
Kazi hii inapaswa:
|
||||
<<<<<<< HEAD
|
||||
Kuchukua metadata ya vigezo kutoka sehemu ya `parameters` ya mpango wa mtiririko
|
||||
Kwa kila vigezo, pata ufafanuzi wa aina kutoka kwa hifadhi ya usanidi
|
||||
Tumia maadili chaguu kwa vigezo vyovyote ambavyo havijatolewa na mtumiaji
|
||||
Kushughulikia uhusiano wa urithi wa `controlled-by`
|
||||
Kurudisha seti kamili ya vigezo
|
||||
|
||||
2. **Njia Iliyorekebishwa ya `handle_start_flow`**
|
||||
Piga `resolve_parameters` baada ya kupakua mpango wa mtiririko
|
||||
Tumia seti kamili ya vigezo vilivyomalizika kwa kubadilisha kigezo
|
||||
Hifadhi seti kamili ya vigezo (sio tu zile zilizotolewa na mtumiaji) pamoja na mtiririko
|
||||
Thibitisha kwamba vigezo vyote muhimu vina maadili
|
||||
|
||||
3. **Uchukuzi wa Aina ya Vigezo**
|
||||
Ufafanuzi wa aina ya vigezo huhifadhiwa katika usanidi na aina "parameter-type"
|
||||
Kila ufafanuzi wa aina una schema, thamani chaguo, na sheria za uthibitishaji
|
||||
Hifadhi aina za vigezo zinazotumika mara kwa mara ili kupunguza utafutaji wa usanidi
|
||||
=======
|
||||
Kuchukua metadata ya vigezo kutoka sehemu ya `parameters` ya mpango wa mtiririko.
|
||||
Kwa kila vigezo, pata ufafanuzi wa aina yake kutoka kwa hifadhi ya usanidi.
|
||||
Tumia maadili chaguu kwa vigezo vyovyote ambavyo havijatolewa na mtumiaji.
|
||||
Kushughulikia uhusiano wa urithi wa `controlled-by`.
|
||||
Kurudisha seti kamili ya vigezo.
|
||||
|
||||
2. **Njia Iliyorekebishwa ya `handle_start_flow`**
|
||||
Piga `resolve_parameters` baada ya kupakua mpango wa mtiririko.
|
||||
Tumia seti kamili ya vigezo vilivyofafanuliwa kwa badiliko la kiolezo.
|
||||
Hifadhi seti kamili ya vigezo (si tu ile iliyotolewa na mtumiaji) pamoja na mtiririko.
|
||||
Thibitisha kuwa vigezo vyote muhimu vina maadili.
|
||||
|
||||
3. **Uchukuzi wa Aina ya Vigezo**
|
||||
Ufafanuzi wa aina ya vigezo huhifadhiwa katika usanidi na aina "parameter-type".
|
||||
Kila ufafanuzi wa aina una schema, thamani chaguo, na sheria za uthibitishaji.
|
||||
Hifadhi aina za vigezo zinazotumika mara kwa mara ili kupunguza utafutaji wa usanidi.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Ujumuishaji wa Mfumo wa Usanidi
|
||||
|
||||
3. **Uhifadhi wa Kitu cha Mtiririko**
|
||||
<<<<<<< HEAD
|
||||
Wakati mtiririko unaongezwa kwenye mfumo wa usanidi na kipengele cha mtiririko katika meneja wa usanidi, kitu cha mtiririko lazima kiwe na maadili yaliyomalizika ya vigezo
|
||||
Meneja wa usanidi lazima ahifadhi vigezo vyote vilivyotolewa na mtumiaji na maadili yaliyomalizika (pamoja na maadili chaguo)
|
||||
Vitu vya mtiririko katika mfumo wa usanidi vinapaswa kujumuisha:
|
||||
`parameters`: Maadili ya vigezo yaliyomalizika ambayo hutumiwa kwa mtiririko
|
||||
=======
|
||||
Wakati mtiririko unaongezwa kwenye mfumo wa usanidi na kipengele cha mtiririko katika meneja wa usanidi, kitu cha mtiririko lazima kiwe na maadili yaliyofafanuliwa ya vigezo.
|
||||
Meneja wa usanidi lazima ahifadhi vigezo vyote vilivyotolewa na mtumiaji na maadili yaliyofafanuliwa (pamoja na maadili chaguo).
|
||||
Vitu vya mtiririko katika mfumo wa usanidi vinapaswa kujumuisha:
|
||||
`parameters`: Maadili yaliyofafanuliwa ya vigezo ambayo hutumiwa kwa mtiririko.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Ujumuishaji wa CLI
|
||||
|
||||
4. **Amani za CLI za Maktaba**
|
||||
Amani za CLI ambazo huanzisha mitiririko zinahitaji usaidizi wa vigezo:
|
||||
<<<<<<< HEAD
|
||||
Kukubali maadili ya vigezo kupitia bendera za mstari wa amri au faili za usanidi
|
||||
Thibitisha vigezo dhidi ya ufafanuzi wa mpango wa mtiririko kabla ya kuwasilisha
|
||||
Usaidizi wa uingizaji wa faili ya vigezo (JSON/YAML) kwa seti ngumu ya vigezo
|
||||
|
||||
Amani za CLI ambazo zinaonyesha mitiririko zinahitaji kuonyesha habari ya vigezo:
|
||||
Onyesha maadili ya vigezo ambayo yalitumiwa wakati mtiririko ulipoanzishwa
|
||||
Onyesha vigezo vinavyopatikana kwa mpango wa mtiririko
|
||||
Onyesha schema na maadili chaguo ya vigezo
|
||||
=======
|
||||
Kukubali maadili ya vigezo kupitia bendera za mstari wa amri au faili za usanidi.
|
||||
Thibitisha vigezo dhidi ya ufafanuzi wa mpango wa mtiririko kabla ya kuwasilisha.
|
||||
Usaidizi wa uingizaji wa faili ya vigezo (JSON/YAML) kwa seti ngumu ya vigezo.
|
||||
|
||||
Amani za CLI ambazo zinaonyesha mitiririko zinahitaji kuonyesha taarifa za vigezo:
|
||||
Onyesha maadili ya vigezo ambayo yalitumiwa wakati mtiririko ulipoanzishwa.
|
||||
Onyesha vigezo vinavyopatikana kwa mpango wa mtiririko.
|
||||
Onyesha schema na maadili chaguo ya vigezo.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Ujumuishaji wa Darasa la Msingi la Processor
|
||||
|
||||
5. **Usaidizi wa ParameterSpec**
|
||||
<<<<<<< HEAD
|
||||
Darasa za msingi za processor zinahitaji kusaidia kubadilisha vigezo kupitia utaratibu uliopo wa ParametersSpec
|
||||
Darasa la ParametersSpec (lililopo katika moduli sawa na ConsumerSpec na ProducerSpec) linapaswa kuimarishwa ikiwa ni lazima ili kusaidia kubadilisha kigezo
|
||||
Wasindikaji wanapaswa kuwa na uwezo wa kuita ParametersSpec ili kusanidi vigezo vyao na maadili ya vigezo ambayo yamefafanuliwa wakati wa kuzindua mtiririko
|
||||
Utaratibu wa utekelezaji wa ParametersSpec lazima:
|
||||
Kukubali usanidi wa vigezo ambao una nafasi za vigezo (k.m., `{model}`, `{temperature}`)
|
||||
Kusaidia kubadilisha vigezo wakati wa uendeshaji wa wasindikaji
|
||||
Thibitisha kwamba maadili yaliyobadilishwa yanalingana na aina na vikwazo vilivyotarajiwa
|
||||
Kutoa ushughulikiaji wa makosa kwa marejeleo yaliyopotea au yasiyo halali ya vigezo
|
||||
|
||||
#### Kanuni za Kubadilisha
|
||||
|
||||
Vigezo hutumia muundo wa `{parameter-name}` katika vigezo vya wasindikaji
|
||||
Majina ya vigezo katika vigezo yanalingana na funguo katika sehemu ya `parameters` ya mtiririko
|
||||
Kubadilisha hufanyika pamoja na `{id}` na `{class}`
|
||||
Marejeleo yasiyo halali ya vigezo husababisha makosa wakati wa kuzindua
|
||||
Uthibitisho wa aina hutokea kulingana na ufafanuzi wa vigezo uliohifadhiwa katikati
|
||||
**MUHIMU**: Maadili yote ya vigezo huhifadhiwa na hutumwa kama maandishi
|
||||
Nambari hubadilishwa kuwa maandishi (k.m., `0.7` inakuwa `"0.7"`)
|
||||
Booleans hubadilishwa kuwa maandishi ya chini (k.m., `true` inakuwa `"true"`)
|
||||
Hii inahitajika na schema ya Pulsar ambayo ina `parameters = Map(String())`
|
||||
=======
|
||||
Darasa za msingi za processor zinahitaji kusaidia badiliko la vigezo kupitia utaratibu uliopo wa ParametersSpec.
|
||||
Darasa la ParametersSpec (lililopo katika moduli sawa na ConsumerSpec na ProducerSpec) linapaswa kuimarishwa ikiwa ni lazima ili kusaidia badiliko la kiolezo la vigezo.
|
||||
Wasindikaji wanapaswa kuwa na uwezo wa kutumia ParametersSpec ili kusanidi vigezo vyao na maadili ya vigezo ambayo yamefafanuliwa wakati wa kuzindua mtiririko.
|
||||
Utaratibu wa utekelezaji wa ParametersSpec lazima:
|
||||
Kukubali usanidi wa vigezo ambao una nafasi za vigezo (k.m., `{model}`, `{temperature}`).
|
||||
Kusaidia badiliko la vigezo wakati wa utekelezaji wa processor.
|
||||
Thibitisha kuwa maadili yaliyobadilishwa yanalingana na aina na vikwazo vilivyotarajiwa.
|
||||
Kutoa usimamizi wa makosa kwa marejeleo yaliyopotea au yasiyo halali ya vigezo.
|
||||
|
||||
#### Kanuni za Badiliko
|
||||
|
||||
Vigezo hutumia muundo wa `{parameter-name}` katika vigezo vya processor.
|
||||
Majina ya vigezo katika vigezo yanalingana na funguo katika sehemu ya `parameters` ya mtiririko.
|
||||
Badiliko hufanyika pamoja na `{id}` na `{class}`.
|
||||
Marejeleo yasiyo halali ya vigezo husababisha makosa wakati wa kuzindua.
|
||||
Uthibitisho wa aina hutokea kulingana na ufafanuzi wa vigezo uliohifadhiwa katikati.
|
||||
**MUHIMU**: Maadili yote ya vigezo huhifadhiwa na hutumwa kama maandishi.
|
||||
Nambari hubadilishwa kuwa maandishi (k.m., `0.7` inakuwa `"0.7"`).
|
||||
Maneno ya kweli hubadilishwa kuwa maandishi ya chini (k.m., `true` inakuwa `"true"`).
|
||||
Hii inahitajika na schema ya Pulsar ambayo ina `parameters = Map(String())`.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
Mfano wa utatuzi:
|
||||
```
|
||||
Flow parameter mapping: "model": "llm-model"
|
||||
Processor parameter: "model": "{model}"
|
||||
User provides: "model": "gemma3:8b"
|
||||
Final parameter: "model": "gemma3:8b"
|
||||
|
||||
Example with type conversion:
|
||||
Parameter type default: 0.7 (number)
|
||||
Stored in flow: "0.7" (string)
|
||||
Substituted in processor: "0.7" (string)
|
||||
```
|
||||
|
||||
## Mbinu ya Majaribio
|
||||
|
||||
Majaribio ya kitengo kwa uthibitishaji wa muundo wa vigezo
|
||||
<<<<<<< HEAD
|
||||
Majaribio ya ujumuishaji kwa ubadilishaji wa vigezo katika vigezo vya kichakato
|
||||
Majaribio ya mwisho kwa kuzindua michakato na maadili tofauti ya vigezo
|
||||
Majaribio ya UI kwa utengenezaji na uthibitishaji wa fomu ya vigezo
|
||||
Majaribio ya utendaji kwa michakato yenye vigezo vingi
|
||||
=======
|
||||
Majaribio ya ujumuishaji kwa ubadilishaji wa vigezo katika vigezo vya kichakata
|
||||
Majaribio ya mwisho kwa kuzindua mtiririko na maadili tofauti ya vigezo
|
||||
Majaribio ya UI kwa utengenezaji na uthibitishaji wa fomu ya vigezo
|
||||
Majaribio ya utendaji kwa mtiririko na vigezo vingi
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Hali za kipekee: vigezo visivyopo, aina zisizo sahihi, marejeleo ya vigezo yasiyo sahihi
|
||||
|
||||
## Mpango wa Uhamisho
|
||||
|
||||
<<<<<<< HEAD
|
||||
1. Mfumo unapaswa kuendelea kusaidia mipango ya michakato bila vigezo
|
||||
vilivyotangazwa.
|
||||
2. Mfumo unapaswa kuendelea kusaidia michakato bila vigezo vilivyobainishwa:
|
||||
Hii inafanya kazi kwa michakato bila vigezo, na michakato yenye vigezo
|
||||
(yana maadili ya chagu).
|
||||
|
||||
## Maswali ya Wazi
|
||||
|
||||
S: Je, vigezo vinapaswa kusaidia vitu vikubwa vilivyojumuishwa au kubaki kwenye aina rahisi?
|
||||
J: Maadili ya vigezo yatakuwa yamekodishwa kama maandishi, tunapaswa
|
||||
kubaki na maandishi.
|
||||
|
||||
S: Je, je, nafasi za vigezo zinapaswa kuruhusiwa katika majina ya folyo au tu katika
|
||||
=======
|
||||
1. Mfumo unapaswa kuendelea kusaidia mipango ya mtiririko ambayo haina vigezo
|
||||
vilivyotangazwa.
|
||||
2. Mfumo unapaswa kuendelea kusaidia mtiririko bila vigezo vilivyobainishwa:
|
||||
Hii inafanya kazi kwa mtiririko ambayo haina vigezo, na mtiririko una vigezo
|
||||
(yana maadili chagu).
|
||||
|
||||
## Maswali ya Wazi
|
||||
|
||||
S: Je, vigezo vinapaswa kusaidia vitu vikubwa vilivyojumuishwa au kuendelea na aina rahisi?
|
||||
J: Maadili ya vigezo yatakuwa yamekodishwa kama maandishi, tunapaswa
|
||||
kuzingatia maandishi.
|
||||
|
||||
S: Je, vigezo vya maandishi vinapaswa kuruhusiwa katika majina ya folyo au tu katika
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
vigezo?
|
||||
J: Tu katika vigezo ili kuondoa uingizwaji wa ajabu na hali za kipekee.
|
||||
|
||||
S: Jinsi ya kushughulikia migogoro kati ya majina ya vigezo na vigezo vya mfumo kama vile
|
||||
`id` na `class`?
|
||||
<<<<<<< HEAD
|
||||
J: Ni vibaya kutaja id na darasa wakati wa kuzindua michakato
|
||||
=======
|
||||
J: Ni vibaya kutaja id na darasa wakati wa kuzindua mtiririko
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
S: Je, tunapaswa kusaidia vigezo vilivyohitajiwa (vilivyotokana na vigezo vingine)?
|
||||
J: Tu ubadilishaji wa maandishi ili kuondoa uingizwaji wa ajabu na hali za kipekee.
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Vipimo vya Mpango wa JSON: https://json-schema.org/
|
||||
<<<<<<< HEAD
|
||||
Vipimo vya Ufafanuzi wa Mpango wa Michakato: docs/tech-specs/flow-class-definition.md
|
||||
=======
|
||||
Vipimo vya Ufafanuzi wa Mpango wa Mtiririko: docs/tech-specs/flow-class-definition.md
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
359
docs/tech-specs/sw/graph-contexts.sw.md
Normal file
359
docs/tech-specs/sw/graph-contexts.sw.md
Normal file
|
|
@ -0,0 +1,359 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Tahadhari za Kiufundi za Mazingira ya Picha (Graph Contexts)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Tahadhari za Kiufundi za Mazingira ya Picha (Graph Contexts)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelezo haya yanataja mabadiliko kwenye vitu vya msingi (primitives) vya TrustGraph ili
|
||||
kuendana na RDF 1.2 na kusaidia maana kamili ya RDF Dataset. Hii ni
|
||||
mabadiliko ambayo yanaweza kusababisha utosoni (breaking change) kwa toleo la 2.x.
|
||||
|
||||
### Matoleo
|
||||
|
||||
- **2.0**: Toleo la kwanza kwa watumiaji. Vitu muhimu vipo, lakini vinaweza
|
||||
kusiletelewa kikamilifu kwa matumizi ya uzalishaji.
|
||||
- **2.1 / 2.2**: Toleo la uzalishaji. Uthabiti na ukamilifu umehakikishwa.
|
||||
|
||||
Uhusika wa uthabiti ni wa kujua - watumiaji wa kwanza wanaweza kupata
|
||||
uwezo mpya kabla ya sifa zote kuwa zimekamilishwa.
|
||||
|
||||
## Lengo
|
||||
|
||||
Lengo kuu la kazi hii ni kuruhusu metadata kuhusu ukweli/taarifa:
|
||||
|
||||
- **Habari za muda (Temporal information)**: Kuunganisha ukweli na metadata ya wakati
|
||||
- Wakati ambapo ukweli ulikuwa unaaminika kuwa ni kweli
|
||||
- Wakati ambapo ukweli ulipoanza kuwa kweli
|
||||
- Wakati ambapo ukweli ulipoonekana kuwa bandia
|
||||
|
||||
- **Chanzo/Asili (Provenance/Sources)**: Kufuatilia vyanzo ambavyo vinaunga mkono ukweli
|
||||
- "Ukweli huu ulikuwa unaungwa mkono na chanzo X"
|
||||
- Kuunganisha ukweli na hati zao za asili
|
||||
|
||||
- **Ukweli/Amani (Veracity/Trust)**: Kurekodi maelezo kuhusu ukweli
|
||||
- "Mtu P amesema kwamba hii ni kweli"
|
||||
- "Mtu Q anadai kwamba hii ni bandia"
|
||||
- Kuruhusu upimaji wa uaminifu na ugunduzi wa migogoro
|
||||
|
||||
**Nadharia**: Ufufunzaji (reification) (RDF-star / triples zilizotiwa nukuu) ni
|
||||
mfumo muhimu wa kufanikisha matokeo haya, kwani yote yanahitaji kufanya
|
||||
maelezo kuhusu maelezo.
|
||||
|
||||
## Msingi
|
||||
|
||||
Ili kueleza "ukweli kwamba (Alice anajua Bob) uligunduliwa tarehe 2024-01-15" au
|
||||
"chanzo X kinaunga mkono dai kwamba (Y husababisha Z)", unahitaji
|
||||
kurejelea ukingo kama kitu ambacho unaweza kufanya maelezo. Triples
|
||||
za kawaida hazisaidii hii.
|
||||
|
||||
### Vikwazo vya Sasa
|
||||
|
||||
Darasa la `Value` linalopo `trustgraph-base/trustgraph/schema/core/primitives.py`
|
||||
linaweza kuwakilisha:
|
||||
- Nodi za URI (`is_uri=True`)
|
||||
- Maadili ya kawaida (`is_uri=False`)
|
||||
|
||||
Nguvu ya `type` ipo lakini haitumiki kuwakilisha aina za XSD.
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Sifa za RDF za Kusaidiwa
|
||||
|
||||
#### Sifa za Msingi (Zilizohusiana na Lengo la Ufufunzaji)
|
||||
|
||||
Sifa hizi zinahusiana moja kwa moja na malengo ya muda, chanzo, na ukweli:
|
||||
|
||||
1. **Triples Zilizotiwa Nukuu za RDF 1.2 (RDF-star)**
|
||||
- Ukingo ambao unaelekea kwenye ukingo mwingine
|
||||
- Triple inaweza kuonekana kama mada au kitu cha Triple nyingine
|
||||
- Inaruhusu maelezo kuhusu maelezo (ufufunzaji)
|
||||
- Mfumo muhimu wa kuongeza maelezo kwa ukweli binafsi
|
||||
|
||||
2. **RDF Dataset / Picha Zilizojulikana (Named Graphs)**
|
||||
- Usaidizi wa picha nyingi zilizojulikana ndani ya dataset
|
||||
- Kila picha huainishwa na IRI
|
||||
- Mabadiliko kutoka kwa triples (s, p, o) hadi quads (s, p, o, g)
|
||||
- Inajumuisha picha ya chaguo-msingi pamoja na picha za majina
|
||||
- IRI ya picha inaweza kuwa mada katika maelezo, kwa mfano:
|
||||
```
|
||||
<graph-source-A> <discoveredOn> "2024-01-15"
|
||||
<graph-source-A> <hasVeracity> "high"
|
||||
```
|
||||
- Kumbuka: Picha zilizojulikana ni kipengele tofauti kutoka kwa ufufunzaji.
|
||||
Zina matumizi mengine yakiwa nje ya kuongeza maelezo (kugawanya, udhibiti
|
||||
wa ufikiaji, shirika la dataset) na zinapaswa kuchukuliwa kama
|
||||
uwezo tofauti.
|
||||
|
||||
3. **Nodi Tupu (Limited Support)**
|
||||
- Nodi bila URI ya kimataifa
|
||||
- Inasaidiwa kwa utangamano wakati wa kupakua data ya RDF ya nje
|
||||
- **Hali mdogo**: Hakuna ahadi kuhusu utambulisho wa imara baada ya kupakia
|
||||
- Zipate kupitia maswali ya wildcard (pangilia kwa muunganisho, sio kwa ID)
|
||||
- Sio kipengele cha kwanza - usitegemee usimamizi sahihi wa nodi tupu
|
||||
|
||||
#### Marekebisho ya Nafasi (2.0 Breaking Change)
|
||||
|
||||
Sifa hizi hazihusiani moja kwa moja na malengo ya ufufunzaji lakini ni
|
||||
mafanikio muhimu ya kuingiza wakati wa kufanya mabadiliko ambayo yanaweza
|
||||
kusababisha utosoni:
|
||||
|
||||
4. **Aina za Literal**
|
||||
- Tumia vizuri nguvu ya `type` kwa aina za XSD
|
||||
- Mifano: xsd:string, xsd:integer, xsd:dateTime, n.k.
|
||||
- Huondoa kikwazo cha sasa: haiwezi kuwakilisha tarehe au nambari
|
||||
kwa usahihi
|
||||
|
||||
5. **Lebo za Lugha**
|
||||
- Usaidizi wa sifa za lugha kwenye maadili ya kawaida (@en, @fr, n.k.)
|
||||
- Kumbuka: Literal inaweza kuwa na lebo ya lugha AU aina, sio zote
|
||||
(isipokuwa rdf:langString)
|
||||
- Muhimu kwa matumizi ya AI/mbalimbali za lugha
|
||||
|
||||
### Miundo ya Data
|
||||
|
||||
#### Nguvu (kutoka Value)
|
||||
|
||||
Darasa la `Value` litabadilishwa na `Term` ili kuakisi vizuri dhana za RDF.
|
||||
Mabadiliko haya yafanywa kwa sababu mbili:
|
||||
1. Inaakisi majina na dhana za RDF (nguvu inaweza kuwa IRI, literal,
|
||||
nodi tupu, au triple iliyotiwa nukuu - sio tu "maadili")
|
||||
2. Inalazimisha ukaguzi wa msimbo kwenye interface ya mabadiliko ambayo
|
||||
yanaweza kusababisha utosoni - msimbo wowote unaoendelea kurejelea `Value`
|
||||
unaonyeshwa kuwa umevunjika na unahitaji kusasishwa.
|
||||
|
||||
Nguvu inaweza kuwakilisha:
|
||||
|
||||
- **IRI/URI** - Nodi/rasilimali iliyojulikana
|
||||
- **Nodi Tupu** - Nodi isiyo na jina yenye upeo wa ndani
|
||||
- **Literal** - Maadili ya data ambayo ina:
|
||||
- Aina ya data (aina ya XSD), AU
|
||||
- Lebo ya lugha
|
||||
- **Triple Iliyotiwa Nukuu** - Triple inayotumika kama nguvu (RDF 1.2)
|
||||
|
||||
##### Mbinu Iliyo Chaguliwa: Darasa Moja na Kichunguzi Aina
|
||||
|
||||
Mahitaji ya utayarishaji yanaendesha muundo - kichunguzi aina inahitajika
|
||||
katika muundo wa waya bila kujali uwakilishi wa Python. Darasa moja na
|
||||
aina ni inayofaa na inaakibiana na mtindo wa `Value` wa sasa.
|
||||
|
||||
Msimbo wa aina ya herufi moja hutoa utayarishaji kompakt:
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
|
||||
# Mara ya aina ya Nguvu
|
||||
IRI = "i" # Nodi ya IRI/URI
|
||||
BLANK = "b" # Nodi tupu
|
||||
LITERAL = "l" # Maadili
|
||||
TRIPLE = "t" # Triple iliyotiwa nukuu (RDF 1.2)
|
||||
|
||||
@dataclass
|
||||
class Term:
|
||||
type: str = "" # Moja ya: IRI, BLANK, LITERAL, TRIPLE
|
||||
|
||||
# Kwa masharti ya IRI (aina == IRI)
|
||||
iri: str = ""
|
||||
|
||||
# Kwa nodi tupu (aina == BLANK)
|
||||
id: str = ""
|
||||
|
||||
# Kwa maadili (aina == LITERAL)
|
||||
value: str = ""
|
||||
datatype: str = "" # URI ya aina ya XSD (inatenganishwa)
|
||||
language: str = "" # Lebo ya lugha (inatenganishwa)
|
||||
|
||||
# Kwa triples zilizotiwa nukuu (aina == TRIPLE)
|
||||
triple: "Triple | None" = None
|
||||
```
|
||||
|
||||
Mifano ya matumizi:
|
||||
|
||||
```python
|
||||
# Nguvu ya IRI
|
||||
node = Term(type=IRI, iri="http://example.org/Alice")
|
||||
|
||||
# Maadili na aina
|
||||
age = Term(type=LITERAL, value="42", datatype="xsd:integer")
|
||||
|
||||
# Maadili na lebo ya lugha
|
||||
label = Term(type=LITERAL, value="Hello", language="en")
|
||||
|
||||
# Nodi tupu
|
||||
anon = Term(type=BLANK, id="_:b1")
|
||||
|
||||
# Triple iliyotiwa nukuu (taarifa kuhusu taarifa)
|
||||
inner = Triple(
|
||||
s=Term(type=IRI, iri="http://example.org/Alice"),
|
||||
p=Term(type=IRI, iri="http://example.org/knows"),
|
||||
o=Term(type=IRI, iri="http://example.org/Bob"),
|
||||
)
|
||||
reified = Term(type=TRIPLE, triple=inner)
|
||||
```
|
||||
|
||||
##### Mbinu Zingine Zilizozingatiwa
|
||||
|
||||
**Njia B: Muungano wa madarasa maalum** (`Term = IRI | BlankNode | Literal | QuotedTriple`)
|
||||
- Ilikataliwa: Utayarishaji bado unahitaji kichunguzi aina, na kuongeza
|
||||
ugumu.
|
||||
|
||||
**Njia C: Darasa la msingi na madarasa ya ndoto**
|
||||
- Ilikataliwa: Tatizo lile lile la utayarishaji, pamoja na utata wa
|
||||
urithi wa dataclass
|
||||
|
||||
#### Triple / Quad
|
||||
|
||||
Darasa la `Triple` hupata nguvu ya hiari kuwa quad:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Triple:
|
||||
s: Term | None = None # Mada
|
||||
p: Term | None = None # Kieleuzo
|
||||
o: Term | None = None # Kitu
|
||||
g: str | None = None # Jina la picha (IRI), None = picha ya
|
||||
# chaguo-msingi
|
||||
```
|
||||
|
||||
Maamuzi ya muundo:
|
||||
- **Jina la nguvu**: `g` kwa utangamano na `s`, `p`, `o`
|
||||
- **Hiari**: `None` inamaanisha picha ya chaguo-msingi (isiyo na jina)
|
||||
- **Aina**: Kamba (IRI) badala ya Nguvu
|
||||
- Majina ya picha daima ni IRIs
|
||||
- Nodi tupu kama majina ya picha zilikataliwa (zinaweza kusababisha
|
||||
uchanganyifu)
|
||||
- Hakuna haja ya utaratibu kamili wa Nguvu
|
||||
|
||||
Kumbuka: Jina la darasa linaendelea kuwa `Triple` licha ya kuwa quad sasa.
|
||||
Hii inazuia mabadiliko na "triple" bado ni terminolojia ya kawaida.
|
||||
Mazingira ya picha ni metadata kuhusu mahali ambapo triple inakaa.
|
||||
|
||||
### Mfano wa Maswali Yanayowezekana
|
||||
|
||||
Msimu wa maswali unaokubaliwa hivi sasa unachanganya masharti ya S, P, O.
|
||||
Katika quoted triples, triple yenyewe inakuwa nguvu halali katika
|
||||
maeneo hayo. Hapa chini kuna mifano ya maswali yanayodumisha malengo
|
||||
yaliyoelezwa.
|
||||
|
||||
#### Semantika ya Paramu ya Picha
|
||||
|
||||
Kufuata makubaliano ya SPARQL kwa utangamano wa kurudi nyuma:
|
||||
|
||||
- **`g` imepunguzwa / None**: Huuliza picha ya chaguo-msingi pekee
|
||||
- **`g` = IRI maalum**: Huuliza picha hiyo maalum pekee
|
||||
- **`g` = wildcard / `*`**: Huuliza katika picha zote (inalingana na
|
||||
SPARQL `GRAPH ?g { ... }`)
|
||||
|
||||
Hii inaendelea na maswali rahisi rahisi na inafanya maswali ya picha
|
||||
kuwa ya hiari.
|
||||
|
||||
Maswali ya picha (g=wildcard) yanasaidiwa kikamilifu. Cassandra schema
|
||||
inajumuisha jedwali maalum (SPOG, POSG, OSPG) ambapo g ni nguzo ya
|
||||
kugandana badala ya nguzo ya sehemu, na inaruhusu maswali ya ufanisi
|
||||
katika picha zote.
|
||||
|
||||
#### Maswali ya Muda
|
||||
|
||||
**Kutafuta ukweli wote ambao uligunduliwa baada ya tarehe fulani:**
|
||||
```
|
||||
S: ? # triple yoyote iliyotiwa nukuu
|
||||
P: <discoveredOn>
|
||||
O: > "2024-01-15"^^xsd:date
|
||||
G: null
|
||||
```
|
||||
**Kumbuka:** `^^xsd:date` inahitajika ili kuonyesha kuwa `2024-01-15` ni
|
||||
tarehe, si kamba.
|
||||
|
||||
**Kumbuka:** Huwezi kuuliza "tafuta triples ambapo kieleuzo cha mada ya
|
||||
triple iliyotiwa nukuu ni X"
|
||||
|
||||
#### Maswala ya Chanzo/Asili
|
||||
|
||||
```python
|
||||
# Tafuta ukweli ambao umeungwa mkono na chanzo X
|
||||
# Tafuta taarifa ambapo chanzo ni X
|
||||
# Tafuta masharti ambapo chanzo ni X
|
||||
```
|
||||
|
||||
#### Maswala ya Ukweli/Amani
|
||||
|
||||
```python
|
||||
# Tafuta maelezo ambayo yamesemwa kuwa ni kweli
|
||||
# Tafuta masharti ambapo ukweli ni kweli
|
||||
# Tafuta maelezo ambayo yamesemwa kuwa ni kweli
|
||||
```
|
||||
|
||||
Vitu vyote vyote vya RDF vinaruhusiwa, ikiwa ni pamoja na maneno ya
|
||||
mtumiaji. Mkakati: epuka kuweka kitu chochote kilichofungwa isipokuwa
|
||||
linapohitajika.
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
Picha hazina kipengele cha usalama. Watumiaji na makusudi ndio mipaka
|
||||
ya usalama. Picha ni tu kwa shirika la data na usaidizi wa ufufunzaji.
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
- Triples zilizotiwa nukuu zinaongeza kina cha kuziba - zinaweza kuathiri
|
||||
utendaji wa maswali
|
||||
- Mikakati ya ufuataji wa picha inahitajika kwa maswali ya picha
|
||||
- Muundo wa schema ya Cassandra utahitaji kuweka nafasi kwa uhifadhi
|
||||
wa quad kwa ufanisi
|
||||
|
||||
### Kikomo cha Hifadhi ya Vector
|
||||
|
||||
Hifadhi za vector daima zinaunga mkono IRIs pekee:
|
||||
- Kamwe ukingo (triples zilizotiwa nukuu)
|
||||
- Kamwe maadili
|
||||
- Kamwe nodi tupu
|
||||
|
||||
Hii inahifadhi hifadhi ya vector kuwa rahisi - inashughulikia utangamano
|
||||
wa kimaana wa vitu vilivyoainishwa. Muundo wa grafu hushughulikia
|
||||
mahusiano, ufufunzaji, na metadata. Triples zilizotiwa nukuu na picha
|
||||
hazichanganyishi shughuli za vector.
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
Tumia mkakati wa sasa wa majaribio. Kwa kuwa hii ni mabadiliko ambayo
|
||||
yanaweza kusababisha utosoni, zingatia sana seti ya majaribio ya mwisho
|
||||
ku hakikisha kwamba miundo mipya inafanya kazi vizuri katika vipengele
|
||||
vyote.
|
||||
|
||||
## Mpango wa Uhamishaji
|
||||
|
||||
- 2.0 ni toleo ambalo linaweza kusababisha utosoni; hakuna utangamano
|
||||
wa kurudi nyuma unaohitajika
|
||||
- Data iliyopo inaweza kuhitaji uhamishaji kwenye schema mpya (itaamuliwa
|
||||
kulingana na muundo wa mwisho)
|
||||
- Tafadhali fikiria zana za uhamishaji kwa ajili ya kubadilisha triples
|
||||
zilizo zilizopo.
|
||||
|
||||
## Masuala yaliyowazi
|
||||
|
||||
- **Nodi tupu**: Usaidizi mdogo umeanzishwa. Inaweza kuhitaji uamuzi wa
|
||||
mkakati wa skolemization (kuunda IRIs wakati wa kupakua, au kudumisha
|
||||
vitambulisho vya nodi tupu).
|
||||
- **Mbinu ya maswali**: Mbinu halisi ya kutaja triples zilizotiwa nukuu
|
||||
katika maswali ni nini? Tafadhali define API ya swali.
|
||||
- ~~**Sanaa ya kieleuzo**~~: Limeelekezwa. Maneno yoyote ya RDF yanaruhusiwa,
|
||||
pamoja na maneno ya mtumiaji. Vipengele vichache tu vya kufungwa (k.m.,
|
||||
rdfs:label inatumiwa katika baadhi ya maeneo).
|
||||
Mkakati: epuka kufunga kitu chochote isipokuwa kinapohitajika.
|
||||
- ~~**Athari ya hifadhi ya vector**~~: Limeelekezwa. Hifadhi za vector
|
||||
daima zinarejelea IRIs pekee - kamwe ukingo, maadili, au nodi tupu.
|
||||
Triples zilizotiwa nukuu na ufufunzaji hazisababishi hifadhi ya vector.
|
||||
- ~~**Semantika ya picha**~~: Limeelekezwa. Maswali huanguka chaguo-msingi
|
||||
kwenye picha ya chaguo-msingi (inalingana na tabia ya SPARQL, inafaa
|
||||
kurudi nyuma). Paramu halisi ya picha inahitajika kuuliza picha
|
||||
zilizoainishwa au picha zote.
|
||||
|
||||
## Marejeo
|
||||
|
||||
- [Mawazo ya RDF 1.2](https://www.w3.org/TR/rdf12-concepts/)
|
||||
- [RDF-star na SPARQL-star](https://w3c.github.io/rdf-star/)
|
||||
- [RDF Dataset](https://www.w3.org/TR/rdf11-concepts/#section-dataset)
|
||||
585
docs/tech-specs/sw/graphql-query.sw.md
Normal file
585
docs/tech-specs/sw/graphql-query.sw.md
Normal file
|
|
@ -0,0 +1,585 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Kiufundi vya Umasilisho wa GraphQL"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
<<<<<<< HEAD
|
||||
# Vipimo vya Kiufundi vya Umasilisho wa GraphQL
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelekezo haya yanaelezea utekelezaji wa kiolesura cha uwasilisho wa GraphQL kwa kuhifadhi data iliyopangwa ya TrustGraph katika Apache Cassandra. Kujenga juu ya uwezo wa data iliyopangwa uliyoainishwa katika maelekezo ya structured-data.md, hati hii inaeleza jinsi maswali ya GraphQL yanavyotekelezwa dhidi ya meza za Cassandra zinazokuza vitu vilivyochukuliwa na vilivyomingwa.
|
||||
|
||||
Huduma ya uwasilisho wa GraphQL itatoa kiolesura kinachobadilika na kinacholingana na aina kwa kuuliza data iliyopangwa iliyohifadhiwa katika Cassandra. Itabadilika moja kwa moja kwa mabadiliko ya mpango, inasaidia maswali tata ikiwa ni pamoja na uhusiano kati ya vitu, na itounganisha kikamilifu na usanifu uliopo wa TrustGraph unaotegemea ujumbe.
|
||||
|
||||
## Lengo
|
||||
|
||||
**Usaidizi wa Mpango Unaobadilika**: Kujifunga kiotomatiki kwa mabadiliko ya mpango bila kuacha huduma
|
||||
**Uzingatiaji wa Viwango vya GraphQL**: Kutoa kiolesura cha kawaida cha GraphQL kinacholingana na zana na wateja wa GraphQL iliyopo
|
||||
**Maswali ya Ufanisi ya Cassandra**: Kubadilisha maswali ya GraphQL kuwa maswali ya ufanisi ya Cassandra CQL kwa kuheshimu funguo za sehemu na fahirisi
|
||||
**Suluhisho la Uhusiano**: Kusaidia suluhu za GraphQL kwa uhusiano kati ya aina tofauti za vitu
|
||||
**Usalama wa Aina**: Kuhakikisha utekelezaji wa aina-salama wa maswali na utengenezaji wa majibu kulingana na maelezo ya mpango
|
||||
**Utendaji Unaoweza Kukidhi Mahitaji**: Kushughulikia maswali mengi kwa ufanisi kwa kutumia udhibiti wa muunganisho na uboreshaji wa maswali
|
||||
**Ujumuishaji wa Ombi/Jibu**: Kuhifadhi utangamano na mtindo wa ombi/jibu wa TrustGraph unaotegemea Pulsar
|
||||
**Usimamizi wa Makosa**: Kutoa ripoti kamili ya makosa kwa kutofautiana kwa mpango, makosa ya maswali, na masuala ya uthibitisho wa data
|
||||
|
||||
## Asili
|
||||
|
||||
Utekelezaji wa uhifadhi wa data iliyopangwa (trustgraph-flow/trustgraph/storage/objects/cassandra/) huandika vitu kwenye meza za Cassandra kulingana na maelezo ya mpango yaliyohifadhiwa katika mfumo wa usanidi wa TrustGraph. Meza hizi hutumia muundo wa funguo ya sehemu iliyounganishwa na funguo za msingi zilizobainishwa na mpango, na kuwezesha maswali ya ufanisi ndani ya makusanyo.
|
||||
|
||||
Marekebisho ya sasa ambayo maelekezo haya yanaashiria:
|
||||
Hakuna kiolesura cha kuuliza kwa data iliyopangwa iliyohifadhiwa katika Cassandra
|
||||
Uwasilishaji usio wa uwezo wa uwezo wa maswali ya GraphQL kwa data iliyopangwa
|
||||
Usaidizi usio na uhusiano kati ya vitu vinavyohusiana
|
||||
Ukosefu wa lugha ya kawaida ya kuuliza kwa upataji wa data iliyopangwa
|
||||
|
||||
Huduma ya uwasilisho wa GraphQL itafunga pengo hizi kwa:
|
||||
Kutoa kiolesura cha kawaida cha GraphQL kwa kuuliza meza za Cassandra
|
||||
Kujenga schemas za GraphQL kwa moja kwa moja kutoka usanidi wa TrustGraph
|
||||
Kubadilisha maswali ya GraphQL kwa ufanisi kwa CQL ya Cassandra
|
||||
Kusaidia suluhisho la uhusiano kupitia suluhu za uwanja
|
||||
|
||||
## Ubunifu wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Huduma ya uwasilisho wa GraphQL itatekelezwa kama mchakato mpya wa TrustGraph kufuatia mbinu zilizopo:
|
||||
=======
|
||||
# Vipimo vya Kiufundi vya Ulinganisho wa GraphQL
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelezo haya yanaeleza utekelezaji wa kiolesura cha ulinganisho wa GraphQL kwa kuhifadhi data iliyopangwa ya TrustGraph katika Apache Cassandra. Kujenga juu ya uwezo wa data iliyopangwa uliotajwa katika maelezo ya structured-data.md, hati hii inaeleza jinsi ulinganisho wa GraphQL utakavyotekelezwa dhidi ya meza za Cassandra zinazozingatia vitu vilivyochukuliwa na kuingizwa.
|
||||
|
||||
Huduma ya ulinganisho wa GraphQL itatoa kiolesura kinachobadilika na kinacholingana na aina kwa ulinganisho wa data iliyopangwa iliyohifadhiwa katika Cassandra. Itabadilika moja kwa moja kulingana na mabadiliko ya muundo, itaidhinisha ulinganisho tata ikiwa ni pamoja na uhusiano kati ya vitu, na itounganisha kikamilifu na usanifu uliopo wa TrustGraph unaotegemea ujumbe.
|
||||
|
||||
## Lengo
|
||||
|
||||
**Usaidizi wa Muundo Unaobadilika**: Kujifunga kiotomatiki na mabadiliko ya muundo katika usanidi bila kuacha huduma
|
||||
**Uzingatiaji wa Viwango vya GraphQL**: Kutoa kiolesura cha kawaida cha GraphQL kinacholingana na zana na wateja wa GraphQL iliyopo
|
||||
**Ulinganisho wa Ufanisi wa Cassandra**: Kubadilisha ulinganisho wa GraphQL kuwa ulinganisho wa CQL wa Cassandra unaofaa kwa kuheshimu funguo za sehemu na fahirisi
|
||||
**Suluhisho la Uhusiano**: Kusaidia suluhu za GraphQL kwa uhusiano kati ya aina tofauti za vitu
|
||||
**Usalama wa Aina**: Kuhakikisha utekelezaji wa ulinganisho salama wa aina na uzalishaji wa majibu kulingana na ufafanuzi wa muundo
|
||||
**Utendaji Unaoweza Kukidhi Mahitaji**: Kushughulikia ulinganisho wa wakati mmoja kwa ufanisi kwa kutumia udhibiti wa muunganisho na uboreshaji wa ulinganisho
|
||||
**Ujumuishaji wa Ombi/Jibu**: Kuhifadhi utangamano na mtindo wa ombi/jibu wa TrustGraph unaotegemea Pulsar
|
||||
**Usimamizi wa Makosa**: Kutoa ripoti kamili ya makosa kwa kutofautiana kwa muundo, makosa ya ulinganisho, na masuala ya uthibitisho wa data
|
||||
|
||||
## Asili
|
||||
|
||||
Utekelezaji wa uhifadhi wa data iliyopangwa (trustgraph-flow/trustgraph/storage/objects/cassandra/) huandika vitu kwenye meza za Cassandra kulingana na ufafanuzi wa muundo uliohifadhiwa katika mfumo wa usanidi wa TrustGraph. Meza hizi hutumia muundo wa funguo ya sehemu iliyounganishwa na funguo za msingi zilizobainishwa na muundo, na kuwezesha ulinganisho wa ufanisi ndani ya mikusanyiko.
|
||||
|
||||
Marekebisho ya sasa ambayo maelezo haya yanaashiria:
|
||||
Hakuna kiolesura cha ulinganisho kwa data iliyopangwa iliyohifadhiwa katika Cassandra
|
||||
Uwezekano wa kutumia uwezo wa ulinganisho wa GraphQL kwa data iliyopangwa
|
||||
Usaidizi usio na uhusiano wa ueleuzi kati ya vitu vinavyohusiana
|
||||
Ukosefu wa lugha ya kawaida ya ulinganisho kwa ufikiaji wa data iliyopangwa
|
||||
|
||||
Huduma ya ulinganisho wa GraphQL itafunga pengo hizi kwa:
|
||||
Kutoa kiolesura cha kawaida cha GraphQL kwa ulinganisho wa meza za Cassandra
|
||||
Kujenga schemas za GraphQL moja kwa moja kutoka kwa usanidi wa TrustGraph
|
||||
Kubadilisha ulinganisho wa GraphQL kwa CQL ya Cassandra kwa ufanisi
|
||||
Kusaidia suluhisho la uhusiano kupitia suluhu za uwanja
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Huduma ya ulinganisho wa GraphQL itatekelezwa kama mchakato mpya wa TrustGraph kufuatia mbinu zilizopo:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**Eneo la Moduli**: `trustgraph-flow/trustgraph/query/objects/cassandra/`
|
||||
|
||||
**Vipengele Muhimu**:
|
||||
|
||||
<<<<<<< HEAD
|
||||
1. **Mchakato wa Huduma ya Uwasilisho wa GraphQL**
|
||||
Huendelea na darasa la msingi la FlowProcessor
|
||||
Inatekeleza mtindo wa ombi/jibu sawa na huduma zingine za kuuliza
|
||||
Inafuatilia usanidi kwa sasisho za mpango
|
||||
Inahifadhi mpango wa GraphQL inayosawazishwa na usanidi
|
||||
|
||||
2. **Mjenzi wa Mpango wa Njia Moja Moja**
|
||||
Inabadilisha maelezo ya TrustGraph RowSchema kuwa aina za GraphQL
|
||||
Inaunda aina za vitu vya GraphQL na maelezo ya uwanja sahihi
|
||||
Inazalisha aina ya mizizi ya Ombi na suluhu za msingi za makusanyo
|
||||
Inasasisha mpango wa GraphQL wakati usanidi unabadilika
|
||||
|
||||
3. **Mtekelezaji wa Maswali**
|
||||
Inachambua maswali ya GraphQL yanayoingia kwa kutumia maktaba ya Strawberry
|
||||
Inathibitisha maswali dhidi ya mpango wa sasa
|
||||
Inatekeleza maswali na inarudisha majibu yaliyopangwa
|
||||
Inashughulikia makosa kwa utulivu na ujumbe wa makosa wa kina
|
||||
|
||||
4. **Mhubiri wa Maswali ya Cassandra**
|
||||
Inabadilisha uteuzi wa GraphQL kuwa maswali ya CQL
|
||||
Inaboresha maswali kulingana na fahirisi na funguo za sehemu zinazopatikana
|
||||
Inashughulikia kuchujwa, upangishaji, na utaratibu
|
||||
Inadhibiti udhibiti wa muunganisho na maisha ya kikao
|
||||
|
||||
5. **Suluhu ya Uhusiano**
|
||||
Inatekeleza suluhu za uwanja kwa uhusiano wa vitu
|
||||
Inafanya upakiaji wa kundi ili kuepuka maswali ya N+1
|
||||
Inahifadhi suluhu za uhusiano ndani ya muktadha wa ombi
|
||||
Inasaidia utambuzi wa uhusiano wa mbele na nyuma
|
||||
|
||||
### Ufuatiliaji wa Mpango wa Usanidi
|
||||
|
||||
Huduma itasajili mshukiwa wa usanidi ili kupokea sasisho za mpango:
|
||||
=======
|
||||
1. **Mchakato wa Huduma ya Ulinganisho wa GraphQL**
|
||||
Huendelea na darasa la msingi la FlowProcessor
|
||||
Huendesha mtindo wa ombi/jibu sawa na huduma zingine za ulinganisho
|
||||
Huangalia usanidi kwa sasisho za muundo
|
||||
Huendeleza schema ya GraphQL ili kuendana na usanidi
|
||||
|
||||
2. **Mzalishaji wa Muundo wa Njia Moja Moja**
|
||||
Hubadilisha ufafanuzi wa TrustGraph RowSchema kuwa aina za GraphQL
|
||||
Huunda aina za vitu vya GraphQL na ufafanuzi sahihi wa uwanja
|
||||
Huunda aina ya Query ya mizizi na suluhu za msingi za mkusanyiko
|
||||
Huendeleza schema ya GraphQL wakati usanidi unabadilika
|
||||
|
||||
3. **Mtekelezaji wa Ulinganisho**
|
||||
Huainisha ulinganisho wa GraphQL unaoingia kwa kutumia maktaba ya Strawberry
|
||||
Huainisha ulinganisho dhidi ya schema ya sasa
|
||||
Huendesha ulinganisho na hurudisha majibu yaliyopangwa
|
||||
Hushughulikia makosa kwa njia nzuri na ujumbe wa makosa wa kina
|
||||
|
||||
4. **Mhubiri wa Ulinganisho wa Cassandra**
|
||||
Hubadilisha uteuzi wa GraphQL kuwa ulinganisho wa CQL
|
||||
Huongeza ulinganisho kulingana na fahirisi na funguo za sehemu zinazopatikana
|
||||
Hushughulikia uchujaji, upangishaji, na utaratibu
|
||||
Hushughulikia udhibiti wa muunganisho na mzunguko wa kikao
|
||||
|
||||
5. **Suluhu ya Uhusiano**
|
||||
Huendesha suluhu za uwanja kwa uhusiano wa vitu
|
||||
Hufanya upakiaji wa kundi ili kuepuka masuala ya N+1
|
||||
Huweka suluhu za uhusiano ndani ya muktadha wa ombi
|
||||
Inasaidia ueleuzi wa uhusiano wa mbele na nyuma
|
||||
|
||||
### Ufuatiliaji wa Muundo wa Usanidi
|
||||
|
||||
Huduma itajisajili kwa mshughuliki wa usanidi ili kupokea sasisho za muundo:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```python
|
||||
self.register_config_handler(self.on_schema_config)
|
||||
```
|
||||
|
||||
Wakati schemas hubadilika:
|
||||
1. Changanua maelezo mapya ya schema kutoka kwa usanidi
|
||||
<<<<<<< HEAD
|
||||
2. Tengeneza upya aina za GraphQL na suluhu
|
||||
=======
|
||||
2. Tengeneza upya aina na suluhu za GraphQL
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
3. Sasisha schema inayotumika
|
||||
4. Ondoa kumbukumbu zozote zinazotegemea schema
|
||||
|
||||
### Uzalishaji wa Schema ya GraphQL
|
||||
|
||||
Kwa kila RowSchema katika usanidi, tengeneza:
|
||||
|
||||
1. **Aina ya Kitu cha GraphQL**:
|
||||
Linganisha aina za sehemu (string → String, integer → Int, float → Float, boolean → Boolean)
|
||||
Weka sehemu ambazo zinahitajika kama zisizo na thamani null katika GraphQL
|
||||
Ongeza maelezo ya sehemu kutoka kwa schema
|
||||
|
||||
2. **Sehemu za Uchunguzi Mkuu**:
|
||||
Uchunguzi wa mkusanyiko (e.g., `customers`, `transactions`)
|
||||
<<<<<<< HEAD
|
||||
Vigezo vya kuchujwa kulingana na sehemu zilizo na fahirisi
|
||||
Usaidizi wa ukurasa (limit, offset)
|
||||
=======
|
||||
Majadiliano ya kuchujwa kulingana na sehemu zilizo na index
|
||||
Usaidizi wa upangaji (limit, offset)
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Chaguo za kupanga kwa sehemu ambazo zinaweza kupangwa
|
||||
|
||||
3. **Sehemu za Uhusiano**:
|
||||
Tambua uhusiano wa ufunguo wa kigeni kutoka kwa schema
|
||||
Unda suluhu za sehemu kwa vitu vinavyohusiana
|
||||
<<<<<<< HEAD
|
||||
Usaidizi wa uhusiano wa kitu kimoja na orodha
|
||||
=======
|
||||
Saidia uhusiano wa kitu kimoja na orodha
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
### Mtiririko wa Utendaji wa Uchunguzi
|
||||
|
||||
1. **Mapokezi ya Ombi**:
|
||||
Pokea `ObjectsQueryRequest` kutoka Pulsar.
|
||||
Toa mfuatano wa GraphQL na vigezo.
|
||||
Tambua muktadha wa mtumiaji na mkusanyiko.
|
||||
|
||||
2. **Uthibitisho wa Ombi**:
|
||||
Changanua mfuatano wa GraphQL kwa kutumia Strawberry.
|
||||
Thibitisha dhidi ya mpango (schema) unaoendelea.
|
||||
Angalia uteuzi wa sehemu na aina za hoja (argument).
|
||||
|
||||
3. **Uundaji wa Ombi la CQL**:
|
||||
<<<<<<< HEAD
|
||||
Jadili uteuzi wa GraphQL.
|
||||
Unda ombi la CQL na vipengele sahihi vya `WHERE`.
|
||||
=======
|
||||
Changanua uteuzi wa GraphQL.
|
||||
Unda ombi la CQL na vigezo sahihi vya `WHERE`.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Jumuisha mkusanyiko katika ufunguo wa sehemu (partition key).
|
||||
Tumia vichujio kulingana na hoja za GraphQL.
|
||||
|
||||
4. **Utendaji wa Ombi**:
|
||||
Tekeleza ombi la CQL dhidi ya Cassandra.
|
||||
Linganisha matokeo na muundo wa jibu la GraphQL.
|
||||
Tatua sehemu zozote za uhusiano.
|
||||
Tengeneza jibu kulingana na vipimo vya GraphQL.
|
||||
|
||||
5. **Utoaji wa Jibu**:
|
||||
Unda `ObjectsQueryResponse` na matokeo.
|
||||
Jumuisha makosa yoyote ya utekelezaji.
|
||||
Tuma jibu kupitia Pulsar na kitambulisho cha uhusiano (correlation ID).
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
<<<<<<< HEAD
|
||||
> **Kumbuka**: Mpango (schema) uliopo wa `StructuredQueryRequest/Response` unafanya kazi katika `trustgraph-base/trustgraph/schema/services/structured_query.py`. Hata hivyo, hauna vipengele muhimu (mtumiaji, mkusanyiko) na hutumia aina ambazo sio bora. Mifano iliyo hapa chini inaonyesha maendeleo yanayopendekezwa, ambayo inaweza kuchukua nafasi ya mifano iliyopo au kuundwa kama aina mpya za `ObjectsQueryRequest/Response`.
|
||||
=======
|
||||
> **Kumbuka**: Mpango (schema) uliopo wa `StructuredQueryRequest/Response` unafanya kazi katika `trustgraph-base/trustgraph/schema/services/structured_query.py`. Hata hivyo, hauna sehemu muhimu (mtumiaji, mkusanyiko) na hutumia aina ambazo sio bora. Mifano iliyo hapa chini inaonyesha maendeleo yanayopendekezwa, ambayo inaweza kuchukua nafasi ya mifano iliyopo au kuundwa kama aina mpya za `ObjectsQueryRequest/Response`.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Mpango wa Ombi (ObjectsQueryRequest)
|
||||
|
||||
```python
|
||||
from pulsar.schema import Record, String, Map, Array
|
||||
|
||||
class ObjectsQueryRequest(Record):
|
||||
user = String() # Cassandra keyspace (follows pattern from TriplesQueryRequest)
|
||||
collection = String() # Data collection identifier (required for partition key)
|
||||
query = String() # GraphQL query string
|
||||
variables = Map(String()) # GraphQL variables (consider enhancing to support all JSON types)
|
||||
operation_name = String() # Operation to execute for multi-operation documents
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Mazingatio ya mabadiliko kutoka kwa Ombi la Ulinganisho Lililopo:**
|
||||
Imeongezwa sehemu `user` na `collection` ili kuendana na mtindo wa huduma zingine za utafutaji.
|
||||
Sehemu hizi ni muhimu kwa kutambua eneo la kuhifadhi data (keyspace) na mkusanyiko (collection) katika Cassandra.
|
||||
=======
|
||||
**Mazingatio ya mabadiliko kutoka kwa Ombi la Ulipimaji Uliohifadhiwa (StructuredQueryRequest):**
|
||||
Imeongezwa sehemu `user` na `collection` ili kuendana na mtindo wa huduma zingine za ulipimaji.
|
||||
Sehemu hizi ni muhimu kwa kutambua eneo la kuhifadhi (keyspace) na mkusanyiko (collection) wa Cassandra.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Vigezo vinaendelea kuwa Map(String()) kwa sasa, lakini inapaswa kusaidia aina zote za JSON.
|
||||
|
||||
#### Muundo wa Majibu (ObjectsQueryResponse)
|
||||
|
||||
```python
|
||||
from pulsar.schema import Record, String, Array
|
||||
from ..core.primitives import Error
|
||||
|
||||
class GraphQLError(Record):
|
||||
message = String()
|
||||
path = Array(String()) # Path to the field that caused the error
|
||||
extensions = Map(String()) # Additional error metadata
|
||||
|
||||
class ObjectsQueryResponse(Record):
|
||||
error = Error() # System-level error (connection, timeout, etc.)
|
||||
data = String() # JSON-encoded GraphQL response data
|
||||
errors = Array(GraphQLError) # GraphQL field-level errors
|
||||
extensions = Map(String()) # Query metadata (execution time, etc.)
|
||||
```
|
||||
|
||||
**Mazingatio ya mabadiliko kutoka kwa Jibu la Uliopo la StructuredQueryResponse:**
|
||||
Hutofautisha kati ya makosa ya mfumo (`error`) na makosa ya GraphQL (`errors`)
|
||||
Hutumia vitu vilivyopangwa vya GraphQLError badala ya safu ya maandishi
|
||||
Huongeza sehemu `extensions` ili kufuata vipimo vya GraphQL
|
||||
<<<<<<< HEAD
|
||||
Huhifadhi data kama mnyororo wa JSON ili kuendana, ingawa aina asilia zingekuwa bora
|
||||
=======
|
||||
Huendeleza data kama mnyororo wa JSON kwa utangamano, ingawa aina asilia zingekuwa bora
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
### Uboreshaji wa Umasilisho wa Cassandra
|
||||
|
||||
Huduma itaboresha masilisho ya Cassandra kwa:
|
||||
|
||||
<<<<<<< HEAD
|
||||
1. **Kufuata Vipengele vya Partition:**
|
||||
Daima jumuisha mkusanyiko katika masilisho
|
||||
Tumia funguo kuu zilizotolewa na mpango kwa ufanisi
|
||||
Epuka uchanganuzi kamili wa jedwali
|
||||
|
||||
2. **Kutumia Faharasa:**
|
||||
Tumia faharasa za sekondari kwa kuchujwa
|
||||
Unganisha vichujio vingi wakati inafaa
|
||||
Toa onyo wakati masilisho yanaweza kuwa yasiyo na ufanisi
|
||||
|
||||
3. **Upakiaji wa Kundi:**
|
||||
=======
|
||||
1. **Kutii Mifungo ya Sehemu**:
|
||||
Daima jumuisha mkusanyiko katika masilisho
|
||||
Tumia funguo kuu zilizotolewa na muundo kwa ufanisi
|
||||
Epuka uchanganuzi kamili wa jedwali
|
||||
|
||||
2. **Kutumia Faharasa**:
|
||||
Tumia faharasa za sekondari kwa kuchujua
|
||||
Unganisha vichujio vingi wakati inafaa
|
||||
Toa onyo wakati masilisho yanaweza kuwa yasiyo na ufanisi
|
||||
|
||||
3. **Upakiaji wa Kundi**:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Kusanya masilisho ya uhusiano
|
||||
Tekeleza kwa makundi ili kupunguza safari za kurudi na kuja
|
||||
Hifadhi matokeo ndani ya muktadha wa ombi
|
||||
|
||||
<<<<<<< HEAD
|
||||
4. **Usimamizi wa Muunganisho:**
|
||||
=======
|
||||
4. **Usimamizi wa Muunganisho**:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Dumishe vipindi vya Cassandra vinavyoendelea
|
||||
Tumia mabwalo ya muunganisho
|
||||
Shughulikia muunganisho upya katika hali ya kushindwa
|
||||
|
||||
### Mifano ya Masilisho ya GraphQL
|
||||
|
||||
#### Masilisho ya Mkusaniko Rahisi
|
||||
```graphql
|
||||
{
|
||||
customers(status: "active") {
|
||||
customer_id
|
||||
name
|
||||
email
|
||||
registration_date
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Swali na Mahusiano
|
||||
```graphql
|
||||
{
|
||||
orders(order_date_gt: "2024-01-01") {
|
||||
order_id
|
||||
total_amount
|
||||
customer {
|
||||
name
|
||||
email
|
||||
}
|
||||
items {
|
||||
product_name
|
||||
quantity
|
||||
price
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Swali lililogawanywa katika kurasa.
|
||||
```graphql
|
||||
{
|
||||
products(limit: 20, offset: 40) {
|
||||
product_id
|
||||
name
|
||||
price
|
||||
category
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
### Utendaji (Implementation)
|
||||
|
||||
**Strawberry GraphQL**: Kwa uainishaji wa schema ya GraphQL na utekelezaji wa swali.
|
||||
**Cassandra Driver**: Kwa muunganisho wa hifadhidata (tayari inatumika katika moduli ya uhifadhi).
|
||||
**TrustGraph Base**: Kwa FlowProcessor na uainishaji wa schema.
|
||||
=======
|
||||
### Utendaji (Dependencies)
|
||||
|
||||
**Strawberry GraphQL**: Kwa ufafanuzi wa schema ya GraphQL na utekelezaji wa swali.
|
||||
**Cassandra Driver**: Kwa muunganisho wa hifadhidata (tayari inatumika katika moduli ya uhifadhi).
|
||||
**TrustGraph Base**: Kwa FlowProcessor na ufafanuzi wa schema.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
**Mfumo wa Usanidi**: Kwa ufuatiliaji na sasisho za schema.
|
||||
|
||||
### Kiolesura cha Amri (Command-Line Interface)
|
||||
|
||||
Huduma itatoa amri ya CLI: `kg-query-objects-graphql-cassandra`
|
||||
|
||||
Majadilisho:
|
||||
`--cassandra-host`: Alama ya kuwasiliana na kundi la Cassandra.
|
||||
`--cassandra-username`: Jina la mtumiaji la uthibitishaji.
|
||||
`--cassandra-password`: Nenosiri la uthibitishaji.
|
||||
<<<<<<< HEAD
|
||||
`--config-type`: Aina ya usanidi kwa schema (ya kawaida: "schema").
|
||||
=======
|
||||
`--config-type`: Aina ya usanidi kwa schema (kiwango: "schema").
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Majadilisho ya kawaida ya FlowProcessor (usanidi wa Pulsar, n.k.).
|
||||
|
||||
## Uunganisho wa API
|
||||
|
||||
### Mada za Pulsar
|
||||
|
||||
**Mada ya Ingizo**: `objects-graphql-query-request`
|
||||
Schema: ObjectsQueryRequest
|
||||
Inapokea maswali ya GraphQL kutoka kwa huduma za lango.
|
||||
|
||||
**Mada ya Toa**: `objects-graphql-query-response`
|
||||
Schema: ObjectsQueryResponse
|
||||
Inarudisha matokeo ya swali na makosa.
|
||||
|
||||
### Uunganisho wa Lango
|
||||
|
||||
Lango na lango la kinyume (reverse-gateway) itahitaji sehemu za:
|
||||
1. Kukubali maswali ya GraphQL kutoka kwa wateja.
|
||||
2. Kusambaza kwa huduma ya swali kupitia Pulsar.
|
||||
3. Kurudisha majibu kwa wateja.
|
||||
4. Kusaidia maswali ya utafiti wa GraphQL.
|
||||
|
||||
### Uunganisho wa Zana ya Wakala
|
||||
|
||||
<<<<<<< HEAD
|
||||
Darasa mpya la zana ya wakala itaruhusu:
|
||||
=======
|
||||
Darasa jipya la zana ya wakala itaruhusu:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Uundaji wa swali la GraphQL kutoka kwa lugha ya asili.
|
||||
Utendaji wa moja kwa moja wa swali la GraphQL.
|
||||
Tafsiri na umbizo wa matokeo.
|
||||
Uunganisho na mtiririko wa maamuzi wa wakala.
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
**Kipengele cha Kuzuia Kina cha Swali**: Kuzuia maswali yenye kina kikubwa ambacho kinaweza kusababisha matatizo ya utendaji.
|
||||
**Uchambuzi wa Ufumbuzi wa Swali**: Kupunguza ufumbuzi wa swali ili kuzuia matumizi yasiyofaa ya rasilimali.
|
||||
**Ruhusa za Kawaida**: Usaidizi wa baadaye kwa udhibiti wa ufikiaji wa kawaida kulingana na majukumu ya mtumiaji.
|
||||
**Usanifu wa Ingizo**: Kuhakikisha na kusafisha pembejeo zote za swali ili kuzuia mashambulizi ya kuingiza.
|
||||
**Kipengele cha Kupunguza Kasi**: Kuweka kikomo cha kasi ya swali kwa kila mtumiaji/mkusanyiko.
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Upangaji wa Swali**: Kuchambua maswali kabla ya utekelezaji ili kuongeza ufanisi wa uundaji wa CQL.
|
||||
**Kuhifadhi Matokeo**: Kuzingatia kuhifadhi data inayopatikana mara kwa mara katika kiwango cha kielekezi cha matokeo.
|
||||
**Usimamizi wa Muunganisho**: Kudumisha mizingi bora ya muunganisho kwa Cassandra.
|
||||
=======
|
||||
**Upangaji wa Swali**: Kuchambua maswali kabla ya utekelezaji ili kuongeza ufanisi wa utengenezaji wa CQL.
|
||||
**Kuhifadhi Matokeo**: Kuzingatia kuhifadhi data inayopatikana mara kwa mara katika kiwango cha suluhu.
|
||||
**Usimamizi wa Muunganisho**: Kudumisha mikoa bora ya muunganisho kwa Cassandra.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
**Operesheni za Kikundi**: Kuchanganya maswali mengi wakati inafaa ili kupunguza latensi.
|
||||
**Ufuatiliaji**: Kufuatilia vipimo vya utendaji wa swali kwa ajili ya uboreshaji.
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
<<<<<<< HEAD
|
||||
### Majaribio ya Vitengo
|
||||
Uzalishaji wa schema kutoka kwenye maelezo ya RowSchema
|
||||
Uchunguzi na uthibitisho wa swali la GraphQL
|
||||
Mantiki ya uzalishaji wa swali la CQL
|
||||
Utendaji wa suluhu za sehemu
|
||||
|
||||
### Majaribio ya Mkataba
|
||||
Uzingatiaji wa mkataba wa ujumbe wa Pulsar
|
||||
Uthibitisho wa uhalali wa schema ya GraphQL
|
||||
Uthibitisho wa muundo wa jibu
|
||||
Uthibitisho wa muundo wa hitilafu
|
||||
|
||||
### Majaribio ya Uunganishaji
|
||||
Utendaji wa swali kamili dhidi ya mfano wa Cassandra wa majaribio
|
||||
Usimamizi wa sasisho za schema
|
||||
Suluhisho la uhusiano
|
||||
Urekebishaji na utafutaji
|
||||
Hali za hitilafu
|
||||
|
||||
### Majaribio ya Utendaji
|
||||
=======
|
||||
### Majaribsu ya Vitengo
|
||||
Uzalishaji wa schema kutoka kwa maelezo ya RowSchema
|
||||
Uchunguzi na uthibitishaji wa swali la GraphQL
|
||||
Mantiki ya uzalishaji wa swali la CQL
|
||||
Utendaji wa suluhu za sehemu
|
||||
|
||||
### Majaribsu ya Mkataba
|
||||
Uzingatiaji wa mkataba wa ujumbe wa Pulsar
|
||||
Uthibitisho wa utiifu wa schema ya GraphQL
|
||||
Uthibitisho wa muundo wa jibu
|
||||
Uthibitisho wa muundo wa hitilafu
|
||||
|
||||
### Majaribsu ya Uunganishaji
|
||||
Utendaji wa swali kamili dhidi ya mfano wa Cassandra wa majaribio
|
||||
Usimamizi wa sasisho za schema
|
||||
Suluhisho la uhusiano
|
||||
Urekebishaji na utaratibu
|
||||
Hali za hitilafu
|
||||
|
||||
### Majaribsu ya Utendaji
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Ufanisi wa swali chini ya mzigo
|
||||
Muda wa jibu kwa utata wa swali mbalimbali
|
||||
Matumizi ya kumbukumbu na matokeo makubwa
|
||||
Ufanisi wa kikundi cha muunganisho
|
||||
|
||||
## Mpango wa Uhamishaji
|
||||
|
||||
<<<<<<< HEAD
|
||||
Uhamishaji hauhitajiki kwani hii ni uwezo mpya. Huduma itafanya:
|
||||
1. Kusoma schema zilizopo kutoka kwenye usanidi
|
||||
2. Kuunganisha na meza zilizopo za Cassandra zilizoundwa na moduli ya uhifadhi
|
||||
3. Kuanza kukubali maswali mara tu inaposanikishwa
|
||||
|
||||
## Muda
|
||||
=======
|
||||
Hakuna uhamishaji unaohitajika kwani hii ni uwezo mpya. Huduma itafanya:
|
||||
1. Kusoma schema zilizopo kutoka kwa usanidi
|
||||
2. Kuunganisha na meza zilizopo za Cassandra zilizoundwa na moduli ya uhifadhi
|
||||
3. Kuanza kukubali maswali mara tu baada ya kuwekwa
|
||||
|
||||
## Ratiba
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
Wiki 1-2: Utendaji wa msingi wa huduma na uzalishaji wa schema
|
||||
Wiki 3: Utendaji wa swali na tafsiri ya CQL
|
||||
Wiki 4: Suluhisho la uhusiano na uboreshaji
|
||||
Wiki 5: Majaribio na uboreshaji wa utendaji
|
||||
Wiki 6: Uunganisho wa lango na maandishi
|
||||
|
||||
## Maswali ya Wazi
|
||||
|
||||
<<<<<<< HEAD
|
||||
1. **Maendeleo ya Schema**: Huduma inapaswa kushughulikia maswali vipi wakati wa mabadiliko ya schema?
|
||||
=======
|
||||
1. **Maendeleo ya Schema**: Huduma inapaswa kushughulikia maswali wakati wa mabadiliko ya schema?
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Chaguo: Kuweka maswali kwenye folyo wakati wa sasisho za schema
|
||||
Chaguo: Kusaidia matoleo mengi ya schema kwa wakati mmoja
|
||||
|
||||
2. **Mkakati wa Uhifadhi**: Je, matokeo ya swali yanapaswa kuhifadhiwa?
|
||||
Zingatia: Muda wa kumalizika
|
||||
Zingatia: Ubatilishaji kulingana na tukio
|
||||
|
||||
3. **Usaidizi wa Shirikisho**: Je, huduma inapaswa kusaidia shirikisho la GraphQL ili kuunganisha na vyanzo vingine vya data?
|
||||
Itaruhusu maswali ya umoja katika data iliyopangwa na ya grafu
|
||||
|
||||
4. **Usaidizi wa Ujiandikishaji**: Je, huduma inapaswa kusaidia uandikishaji wa GraphQL kwa sasisho za wakati halisi?
|
||||
Itahitaji usaidizi wa WebSocket katika lango
|
||||
|
||||
5. **Scalar Maalum**: Je, aina za scalar maalum zinapaswa kuungwa mkono kwa aina za data maalum za kikoa?
|
||||
Mifano: DateTime, UUID, nafasi za JSON
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Maelezo ya Kimataifa ya Data Iliyopangwa: `docs/tech-specs/structured-data.md`
|
||||
<<<<<<< HEAD
|
||||
Nyaraka za Strawberry GraphQL: https://strawberry.rocks/
|
||||
Maelezo ya GraphQL: https://spec.graphql.org/
|
||||
Marejeleo ya Apache Cassandra CQL: https://cassandra.apache.org/doc/stable/cassandra/cql/
|
||||
Nyaraka za Msimamizi wa Mtiririko wa TrustGraph: Nyaraka za ndani
|
||||
=======
|
||||
Nyaraka za GraphQL Strawberry: https://strawberry.rocks/
|
||||
Maelezo ya GraphQL: https://spec.graphql.org/
|
||||
Marejeleo ya Apache Cassandra CQL: https://cassandra.apache.org/doc/stable/cassandra/cql/
|
||||
Nyaraka za Mfumo wa Mchakato wa TrustGraph: Nyaraka za ndani
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
899
docs/tech-specs/sw/graphrag-performance-optimization.sw.md
Normal file
899
docs/tech-specs/sw/graphrag-performance-optimization.sw.md
Normal file
|
|
@ -0,0 +1,899 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Ufanisi wa GraphRAG kwa Uboreshaji wa Kawaida"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
<<<<<<< HEAD
|
||||
# Vipimo vya Ufanisi wa GraphRAG kwa Uboreshaji wa Kawaida
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
=======
|
||||
# Vipimo vya Ufanisi wa GraphRAG kwa Uboreshaji wa Kiufundi
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Maelezo
|
||||
|
||||
<<<<<<< HEAD
|
||||
Hati hii inaeleza uboreshaji wa kina wa utendaji wa algorithm ya GraphRAG (Graph Retrieval-Augmented Generation) katika TrustGraph. Utaratibu wa sasa una matatizo makubwa ya utendaji ambayo yanapunguza uwezo wa kupanuka na wakati wa majibu. Hati hii inashughulikia maeneo manne makuu ya uboreshaji:
|
||||
|
||||
1. **Uboreshaji wa Ufuatiliaji wa Grafu**: Ondoa maswali ya hivi karibuni ya hivi karibuni ya hivi karibuni na tekeleza utafutaji wa grafu wa kikundi
|
||||
2. **Uboreshaji wa Utatuzi wa Lebo**: Badilisha upekuzi wa hivi karibuni wa lebo na shughuli za sambamba/za kikundi
|
||||
3. **Uboreshaji wa Mkakati wa Kumbukumbu**: Tekeleza kumbukumbu mahiri na kuondoa kwa LRU na utabiri
|
||||
4. **Uboreshaji wa Ulipaji**: Ongeza kumbukumbu ya matokeo na kumbukumbu ya uingizaji kwa kuboresha wakati wa majibu
|
||||
|
||||
## Lengo
|
||||
|
||||
**Punguza Kiasi cha Maswali ya Hivi Karibuni**: Pata kupunguzwa kwa 50-80% katika jumla ya maswali ya hivi karibuni kupitia kikundi na kumbukumbu
|
||||
**Boresha Wakati wa Majibu**: Lenga ujenzi wa subgrafu wa haraka 3-5x na utatuzi wa lebo wa haraka 2-3x
|
||||
**Boresha Uwezo wa Kupanuka**: Unga grafu kubwa za maarifa na usimamizi bora wa kumbukumbu
|
||||
**Dumishe Usahihi**: Dumishe utendaji na ubora wa matokeo ya GraphRAG iliyopo
|
||||
**Wezesha Ulinganifu**: Boresha uwezo wa usindikaji sambamba kwa maombi mengi ya sambamba
|
||||
**Punguza Uzito wa Kumbukumbu**: Tekeleza miundo ya data na usimamizi wa kumbukumbu bora
|
||||
**Ongeza Ufuatiliaji**: Jumuisha metriki za utendaji na uwezo wa ufuatiliaji
|
||||
**Hakikisha Utendaji**: Ongeza ushughulikiaji sahihi wa makosa na mitambo ya muda
|
||||
=======
|
||||
Maelekezo haya yanaelezea uboreshaji wa kina wa utendaji kwa algorithm ya GraphRAG (Graph Retrieval-Augmented Generation) katika TrustGraph. Utaratibu wa sasa una matatizo makubwa ya utendaji ambayo yanapunguza uwezo wa kupanuka na wakati wa majibu. Maelekezo haya yanaangazia maeneo manne makuu ya uboreshaji:
|
||||
|
||||
1. **Uboreshaji wa Ufuatiliaji wa Grafu**: Ondoa maswali ya hivi karibuni ya hivi karibuni ya hivi karibuni na tekeleza utafutaji wa grafu wa kikundi.
|
||||
2. **Uboreshaji wa Utatuzi wa Lebo**: Badilisha upekuzi wa lebo wa mfululizo na shughuli za sambamba/za kikundi.
|
||||
3. **Uboreshaji wa Mkakati wa Kumbukumbu**: Tekeleza kumbukumbu mahiri na kuondoa kwa LRU na utabiri.
|
||||
4. **Uboreshaji wa Ulipaji**: Ongeza kumbukumbu ya matokeo na kumbukumbu ya uingizaji kwa kuboresha wakati wa majibu.
|
||||
|
||||
## Lengo
|
||||
|
||||
**Punguza Kiasi cha Maswali ya Hivi Karibuni**: Pata kupunguzwa kwa 50-80% katika jumla ya maswali ya hivi karibuni kupitia kikundi na kumbukumbu.
|
||||
**Boresha Wakati wa Majibu**: Lenga ujenzi wa subgraph wa haraka 3-5x na utatuzi wa lebo wa haraka 2-3x.
|
||||
**Boresha Uwezo wa Kupanuka**: Unga grafu kubwa za maarifa na usimamizi bora wa kumbukumbu.
|
||||
**Dumishe Usahihi**: Dumishe utendaji na ubora wa matokeo ya GraphRAG iliyopo.
|
||||
**Wezesha Ulinganifu**: Boresha uwezo wa usindikaji sambamba kwa maombi mengi ya sambamba.
|
||||
**Punguza Uzito wa Kumbukumbu**: Tekeleza miundo ya data na usimamizi wa kumbukumbu bora.
|
||||
**Ongeza Ufuatiliaji**: Jumuisha metriki za utendaji na uwezo wa ufuatiliaji.
|
||||
**Hakikisha Utendaji**: Ongeza ushughulikiaji sahihi wa makosa na mitambo ya muda.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Asili
|
||||
|
||||
Utaratibu wa sasa wa GraphRAG katika `trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py` una masuala muhimu ya utendaji ambayo yanaathiri sana uwezo wa kupanuka wa mfumo:
|
||||
|
||||
### Matatizo ya Sasa ya Utendaji
|
||||
|
||||
**1. Ufuatiliaji Usio na Ufanisi wa Grafu (kitendaji cha `follow_edges`, mistari 79-127)**
|
||||
<<<<<<< HEAD
|
||||
Hufanya maswali 3 tofauti ya hivi karibuni kwa kila kitu kwa kila kiwango cha kina
|
||||
Mfumo wa swali: maswali ya msingi ya mada, maswali ya msingi ya tabia, na maswali ya msingi ya kitu kwa kila kitu
|
||||
Hakuna kikundi: Kila swali huchakata kitu kimoja wakati mmoja
|
||||
Hakuna utambuzi wa mzunguko: Inaweza kurudi kwenye nodi sawa mara nyingi
|
||||
Utaratibu wa hivi karibuni bila kumbukumbu husababisha utata wa kielelekevu
|
||||
Utata wa wakati: O(vitabu × urefu_max_ya_njia × triple_limit³)
|
||||
|
||||
**2. Utatuzi wa Hivi Karibuni wa Lebo (kitendaji cha `get_labelgraph`, mistari 144-171)**
|
||||
Huchakata kila sehemu ya tatu (mhusika, tabia, kitu) kwa hivi karibuni
|
||||
Kila wito wa `maybe_label` inaweza kusababisha swali la hivi karibuni la hivi karibuni
|
||||
Hakuna utekelezaji sambamba au kikundi cha maswali ya lebo
|
||||
Hupelekea hadi simu 3 × ya hivi karibuni ya hivi karibuni ya hivi karibuni.
|
||||
|
||||
**3. Mkakati wa Kumbukumbu wa Msingi (kitendaji cha `maybe_label`, mistari 62-77)**
|
||||
Kumbukumbu rahisi ya kamusi bila mipaka ya saizi au TTL
|
||||
Hakuna sera ya kuondoa kumbukumbu inayosababisha ukuaji usio na kikomo wa kumbukumbu
|
||||
Kupoteza kumbukumbu hutuma maswali ya hivi karibuni ya hivi karibuni ya hivi karibuni
|
||||
Hakuna utabiri au uongezaji mahiri wa kumbukumbu
|
||||
|
||||
**4. Mfumo Usio na Ufanisi wa Maswali**
|
||||
Maswali ya ufanano wa vekta ya kitu hayahifadhiwi kati ya maombi sawa
|
||||
Hakuna kumbukumbu ya matokeo kwa mifumo ya swali iliyorudiwa
|
||||
Uboreshaji wa swali unaokosekana kwa mifumo ya kawaida ya ufikiaji
|
||||
|
||||
**5. Masuala Muhimu ya Maisha ya Kitu (`rag.py:96-102`)**
|
||||
**Kitu cha GraphRag kinaundwa kwa kila ombi**: Toleo jipya huundwa kwa kila swali, na kupoteza faida zote za kumbukumbu
|
||||
**Kitu cha swali kina muda mfupi sana**: Huundwa na kuharibiwa ndani ya utekelezaji wa swali moja (mistari 201-207)
|
||||
**Kumbukumbu ya lebo inarejeshwa kwa kila ombi**: Uongezaji wa kumbukumbu na maarifa yaliyokusanywa yanapotea kati ya maombi
|
||||
**Upeo wa upya wa mteja**: Wateja wa hivi karibuni wanaweza kuanzishwa tena kwa kila ombi
|
||||
**Hakuna uboreshaji wa kati ya maombi**: Haiwezi kufaidika na mifumo ya swali au ushirikishwaji wa matokeo
|
||||
=======
|
||||
Hufanya maswali 3 tofauti ya hivi karibuni kwa kila kitu kwa kila ngazi ya kina.
|
||||
Mfumo wa swali: maswali ya msingi ya mada, maswali ya msingi ya tabia, na maswali ya msingi ya kitu kwa kila kitu.
|
||||
Hakuna kikundi: Kila swali huchakata kitu kimoja wakati mmoja.
|
||||
Hakuna utambuzi wa mzunguko: Inaweza kurudi kwenye nodi sawa mara nyingi.
|
||||
Utaratibu wa hivi karibuni bila kumbukumbu husababisha utata wa kielelekeo.
|
||||
Utata wa muda: O(vitabu × urefu_max_wa_njia × kikomo_cha_triple³)
|
||||
|
||||
**2. Utatuzi wa Msingi wa Lebo (kitendaji cha `get_labelgraph`, mistari 144-171)**
|
||||
Huchakata kila sehemu ya triple (mhusika, tabia, kitu) kwa utaratibu.
|
||||
Kila wito wa `maybe_label` inaweza kusababisha swali la hivi karibuni.
|
||||
Hakuna utekelezaji sambamba au kikundi cha maswali ya lebo.
|
||||
Hupelekea hadi simu 3 × ya hivi karibuni ya mtu binafsi ya hivi karibuni.
|
||||
|
||||
**3. Mkakati wa Kumbukumbu ya Msingi (kitendaji cha `maybe_label`, mistari 62-77)**
|
||||
Kumbukumbu rahisi ya kamusi bila mipaka ya ukubwa au TTL.
|
||||
Hakuna sera ya kuondoa kumbukumbu inayosababisha ukuaji usio na kikomo wa kumbukumbu.
|
||||
Kupoteza kumbukumbu hutuma maswali ya hivi karibuni ya mtu binafsi ya hivi karibuni.
|
||||
Hakuna utabiri au uongezaji mahiri wa kumbukumbu.
|
||||
|
||||
**4. Mfumo Usio na Ufanisi wa Maswali**
|
||||
Maswali ya ufanano wa vekta ya kitu hayahifadhiwi kati ya maombi sawa.
|
||||
Hakuna kumbukumbu ya matokeo kwa mifumo ya swali iliyorudiwa.
|
||||
Uboreshaji wa swali unaokosekana kwa mifumo ya kawaida ya ufikiaji.
|
||||
|
||||
**5. Masuala Muhimu ya Muda wa Kitu (`rag.py:96-102`)**
|
||||
**Kitu cha GraphRag kinaundwa kila maombi**: Mfano mpya huundwa kwa kila swali, ukipoteza faida zote za kumbukumbu.
|
||||
**Kitu cha swali kina muda mfupi sana**: Huundwa na kuharibiwa ndani ya utekelezaji wa swali moja (mistari 201-207).
|
||||
**Kumbukumbu ya lebo inarejeshwa kwa kila maombi**: Uongezaji wa kumbukumbu na maarifa yaliyokusanywa hupotea kati ya maombi.
|
||||
**Upekee wa upya wa mteja**: Wateja wa hivi karibuni wanaweza kuanzishwa tena kwa kila maombi.
|
||||
**Hakuna uboreshaji wa maombi**: Haiwezi kufaidika na mifumo ya swali au ushirikishwaji wa matokeo.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
### Uchambuzi wa Athari ya Utendaji
|
||||
|
||||
Hali mbaya zaidi ya sasa kwa swali la kawaida:
|
||||
<<<<<<< HEAD
|
||||
**Upekuzi wa Kitu**: swali 1 la ufanano wa vekta
|
||||
**Ufuatiliaji wa Grafu**: vitu × urefu_max_ya_njia × 3 × maswali ya hivi karibuni ya hivi karibuni ya hivi karibuni
|
||||
**Utatuzi wa Lebo**: maswali ya hivi karibuni ya hivi karibuni ya hivi karibuni ya subgrafu_size × 3
|
||||
|
||||
Kwa vigezo chache (vitu 50, urefu wa njia 2, kikomo cha triplet 30, saizi ya subgraph 150):
|
||||
**Maswali ya chini**: 1 + (50 × 2 × 3 × 30) + (150 × 3) = **maswali 9,451 ya hifadhidata**
|
||||
**Wakati wa majibu**: Sekunde 15-30 kwa vielelezo vya saizi ya wastani
|
||||
**Matumizi ya kumbukumbu**: Ukubwa wa kumbukumbu unaoongezeka bila kikomo baada ya muda
|
||||
**Ufanisi wa kumbukumbu**: 0% - kumbukumbu hurejeshwa kila ombi
|
||||
=======
|
||||
**Upekuzi wa Kitu**: swali 1 la ufanano wa vekta.
|
||||
**Ufuatiliaji wa Grafu**: vitabu × urefu_max_wa_njia × 3 × maswali ya hivi karibuni ya triple.
|
||||
**Utatuzi wa Lebo**: maswali ya mtu binafsi ya hivi karibuni ya lebo ya subgraph_size × 3.
|
||||
|
||||
Kwa vigezo chaguvi (vitu 50, urefu wa njia 2, kikomo cha triplet 30, saizi ya subgraph 150):
|
||||
**Maswali ya chini**: 1 + (50 × 2 × 3 × 30) + (150 × 3) = **maswali 9,451 ya hifadhidata**
|
||||
**Wakati wa majibu**: sekunde 15-30 kwa vielelezo vya saizi ya wastani
|
||||
**Matumizi ya kumbukumbu**: ukuaji usio na kikomo wa kumbukumbu kwa muda
|
||||
**Ufanisi wa kumbukumbu**: 0% - kumbukumbu hurekebishwa kila mara
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
**Utozo wa kuunda vitu**: Vitu vya GraphRag + Query vinaundwa/vinaharibiwa kwa kila ombi
|
||||
|
||||
Maelezo haya yanaangazia pengo hizi kwa kutumia maswali ya kikundi, uhifadhi mahiri, na usindikaji wa sambamba. Kwa kuboresha mifumo ya maswali na ufikiaji wa data, TrustGraph inaweza:
|
||||
Kusaidia vielelezo vya maarifa vya kiwango cha shirika na mamilioni ya vitu
|
||||
Kutoa wakati wa majibu ya chini ya sekunde kwa maswali ya kawaida
|
||||
Kushughulikia maombi mamia ya GraphRAG kwa wakati mmoja
|
||||
Kuongezeka kwa ufanisi na saizi na utata wa vielelezo
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Uboreshaji wa utendaji wa GraphRAG unahitaji vipengele hivi vya kiufundi:
|
||||
|
||||
#### 1. **Urekebishaji wa Usanifu wa Muda wa Vitu**
|
||||
<<<<<<< HEAD
|
||||
**Fanya GraphRag iwe na muda mrefu**: Hamisha mfano wa GraphRag hadi ngazi ya Processor ili kudumu katika maombi
|
||||
**Ondoa kumbukumbu**: Dumishe kumbukumbu ya lebo, kumbukumbu ya uingizaji, na kumbukumbu ya matokeo ya swali kati ya maombi
|
||||
**Boresha kitu cha Swali**: Rekebisha Swali ili iwe mfumo wa utekelezaji mwepesi, sio chombo cha data
|
||||
**Usaidizi wa muunganisho**: Dumishe miunganisho ya mteja wa hifadhidata katika maombi
|
||||
=======
|
||||
**Fanya GraphRag iwe ya muda mrefu**: Hamisha mfano wa GraphRag hadi ngazi ya Processor ili kudumu katika ombi
|
||||
**Ondoa kumbukumbu**: Dumishe kumbukumbu ya lebo, kumbukumbu ya uingizaji, na kumbukumbu ya matokeo ya swali kati ya ombi
|
||||
**Boresha kitu cha Swali**: Rekebisha Swali ili iwe muktadha wa utekelezaji mwepesi, sio chombo cha data
|
||||
**Usaidizi wa muunganisho**: Dumishe miunganisho ya mteja wa hifadhidata katika ombi
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/retrieval/graph_rag/rag.py` (iliyorekebishwa)
|
||||
|
||||
#### 2. **Injini Iliyoboreshwa ya Ufuatiliaji wa Vielelezo**
|
||||
Badilisha `follow_edges` ya kurudia na utafutaji wa upana wa mara kwa mara
|
||||
Tekeleza usindikaji wa kikundi wa vitu katika kila ngazi ya ufuatiliaji
|
||||
Ongeza ugunduzi wa mzunguko kwa kufuatilia nodi zilizotembelewa
|
||||
Jumuisha kumalizika mapema wakati mipaka inafikiwa
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/retrieval/graph_rag/optimized_traversal.py`
|
||||
|
||||
<<<<<<< HEAD
|
||||
#### 3. **Mfumo wa Ufafanuzi wa Lebo Sambamba**
|
||||
Kikundi maswali ya lebo kwa vitu vingi kwa wakati mmoja
|
||||
Tekeleza mifumo ya async/await kwa ufikiaji sambamba wa hifadhidata
|
||||
Ongeza upakiaji wa akili kwa mifumo ya kawaida ya lebo
|
||||
Jumuisha mikakati ya ukausha wa kumbukumbu ya lebo
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/retrieval/graph_rag/label_resolver.py`
|
||||
|
||||
#### 4. **Nafasi ya Kumbukumbu ya Lebo Iliyohifadhiwa**
|
||||
Kumbukumbu ya LRU na TTL fupi kwa lebo pekee (dakika 5) ili kusawazisha utendaji na uthabiti
|
||||
Fuatilia metriki na uwiano wa hit
|
||||
**Hakuna ukaushaji wa uingizaji**: Tayari umehifadhiwa kwa kila swali, hakuna faida ya kati ya maswali
|
||||
**Hakuna ukaushaji wa matokeo ya swali**: Kutokana na wasiwasi wa uthabiti wa mabadiliko ya vielelezo
|
||||
=======
|
||||
#### 3. **Mfumo wa Suluhisho la Lebo Sambamba**
|
||||
Kikundi maswali ya lebo kwa vitu vingi kwa wakati mmoja
|
||||
Tekeleza mifumo ya async/await kwa ufikiaji sambamba wa hifadhidata
|
||||
Ongeza utabiri wa kupata kwa mifumo ya kawaida ya lebo
|
||||
Jumuisha mikakati ya kupasha joto ya kumbukumbu ya lebo
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/retrieval/graph_rag/label_resolver.py`
|
||||
|
||||
#### 4. **Nafasi Hifadhi ya Lebo**
|
||||
Kumbukumbu ya LRU na TTL fupi kwa lebo pekee (dakika 5) ili kusawazisha utendaji dhidi ya uthabiti
|
||||
Fuatilia metriki na uwiano wa hit
|
||||
**Hakuna kumbukumbu ya uingizaji**: Tayari imehifadhiwa kwa kila swali, hakuna faida ya kati ya maswali
|
||||
**Hakuna kumbukumbu ya matokeo ya swali**: Kutokana na wasiwasi wa uthabiti wa mabadiliko ya vielelezo
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/retrieval/graph_rag/cache_manager.py`
|
||||
|
||||
#### 5. **Mfumo wa Uboreshaji wa Swali**
|
||||
Uchambuzi na mapendekezo ya uboreshaji wa mfumo wa swali
|
||||
Mratibu wa swali la kikundi kwa ufikiaji wa hifadhidata
|
||||
Uunganisho wa mabwawa na usimamaji wa muda wa swali
|
||||
Ufuatiliaji wa utendaji na ukusanyaji wa metriki
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/retrieval/graph_rag/query_optimizer.py`
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
#### Hali Iliyoboreshwa ya Ufuatiliaji wa Vielelezo
|
||||
|
||||
<<<<<<< HEAD
|
||||
Injini ya ufuatiliaji inahifadhi hali ili kuepuka shughuli za ziada:
|
||||
=======
|
||||
Injini ya ufuatiliaji inadumisha hali ili kuepuka shughuli za ziada:
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class TraversalState:
|
||||
visited_entities: Set[str]
|
||||
current_level_entities: Set[str]
|
||||
next_level_entities: Set[str]
|
||||
subgraph: Set[Tuple[str, str, str]]
|
||||
depth: int
|
||||
query_batch: List[TripleQuery]
|
||||
```
|
||||
|
||||
Mbinu hii inaruhusu:
|
||||
<<<<<<< HEAD
|
||||
Uchunguzi wa haraka wa mzunguko kupitia kufuatilia vitu vilivyotembelewa
|
||||
Maandalizi ya maswali kwa wingi katika kila ngazi ya utafutaji
|
||||
Usimamizi wa hali unaohifadhi kumbukumbu
|
||||
Kukomesha mapema wakati mipaka ya ukubwa inafikiwa
|
||||
=======
|
||||
Uchunguzi wa haraka wa mzunguko kupitia kufuatilia vitu vilivyotembelewa.
|
||||
Maandalizi ya maswali kwa wingi katika kila ngazi ya utafutaji.
|
||||
Usimamizi wa hali unaohifadhi kumbukumbu.
|
||||
Kukomesha mapema wakati mipaka ya ukubwa inafikiwa.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Muundo Ulioboreshwa wa Kumbukumbu (Cache)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class CacheEntry:
|
||||
value: Any
|
||||
timestamp: float
|
||||
access_count: int
|
||||
ttl: Optional[float]
|
||||
|
||||
class CacheManager:
|
||||
label_cache: LRUCache[str, CacheEntry]
|
||||
embedding_cache: LRUCache[str, CacheEntry]
|
||||
query_result_cache: LRUCache[str, CacheEntry]
|
||||
cache_stats: CacheStatistics
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
#### Muundo wa Maswali ya Kundi
|
||||
=======
|
||||
#### Muundo wa Maswali kwa Wingi
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class BatchTripleQuery:
|
||||
entities: List[str]
|
||||
query_type: QueryType # SUBJECT, PREDICATE, OBJECT
|
||||
limit_per_entity: int
|
||||
|
||||
@dataclass
|
||||
class BatchLabelQuery:
|
||||
entities: List[str]
|
||||
predicate: str = LABEL
|
||||
```
|
||||
|
||||
### API
|
||||
|
||||
#### API mpya:
|
||||
|
||||
**API ya GraphTraversal**
|
||||
```python
|
||||
async def optimized_follow_edges_batch(
|
||||
entities: List[str],
|
||||
max_depth: int,
|
||||
triple_limit: int,
|
||||
max_subgraph_size: int
|
||||
) -> Set[Tuple[str, str, str]]
|
||||
```
|
||||
|
||||
**API ya Utatuzi wa Lebo za Kundi**
|
||||
```python
|
||||
async def resolve_labels_batch(
|
||||
entities: List[str],
|
||||
cache_manager: CacheManager
|
||||
) -> Dict[str, str]
|
||||
```
|
||||
|
||||
**API ya Usimamizi wa Kumbukumbu (Cache)**
|
||||
```python
|
||||
class CacheManager:
|
||||
async def get_or_fetch_label(self, entity: str) -> str
|
||||
async def get_or_fetch_embeddings(self, query: str) -> List[float]
|
||||
async def cache_query_result(self, query_hash: str, result: Any, ttl: int)
|
||||
def get_cache_statistics(self) -> CacheStatistics
|
||||
```
|
||||
|
||||
#### API Zilizobadilishwa:
|
||||
|
||||
**GraphRag.query()** - Imeboreshwa kwa matumizi bora:
|
||||
Ongeza parameter ya `cache_manager` kwa udhibiti wa kumbukumbu.
|
||||
Jumuisha thamani ya kurudiwa ya `performance_metrics`.
|
||||
Ongeza parameter ya `query_timeout` kwa uaminifu.
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Darasa la `Query`** - Limepangwa upya kwa usindikaji wa jumla:
|
||||
Badilisha usindikaji wa kila kitu kwa shughuli za jumla.
|
||||
Ongeza menejeri wa muktadha wa async kwa usafi wa rasilimali.
|
||||
Jumuisha miongozo ya maendeleo kwa operesheni za muda mrefu.
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
#### Awamu ya 0: Urekebishaji Muhimu wa Muundo na Muda wa Maisha
|
||||
=======
|
||||
**Kifaa cha `Query`** - Kimepangwa upya kwa usindikaji wa jumla:
|
||||
Badilisha usindikaji wa kila kitu kwa shughuli za jumla.
|
||||
Ongeza menejimenti ya muktadha wa async kwa usafi wa rasilimali.
|
||||
Jumuisha miongozo ya maendeleo kwa shughuli za muda mrefu.
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
#### Awamu ya 0: Urekebishaji Muhimu wa Muundo na Muda
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**Utendaji Sasa Usiofaa:**
|
||||
```python
|
||||
# INEFFICIENT: GraphRag recreated every request
|
||||
class Processor(FlowProcessor):
|
||||
async def on_request(self, msg, consumer, flow):
|
||||
# PROBLEM: New GraphRag instance per request!
|
||||
self.rag = GraphRag(
|
||||
embeddings_client = flow("embeddings-request"),
|
||||
graph_embeddings_client = flow("graph-embeddings-request"),
|
||||
triples_client = flow("triples-request"),
|
||||
prompt_client = flow("prompt-request"),
|
||||
verbose=True,
|
||||
)
|
||||
# Cache starts empty every time - no benefit from previous requests
|
||||
response = await self.rag.query(...)
|
||||
|
||||
# VERY SHORT-LIVED: Query object created/destroyed per request
|
||||
class GraphRag:
|
||||
async def query(self, query, user="trustgraph", collection="default", ...):
|
||||
q = Query(rag=self, user=user, collection=collection, ...) # Created
|
||||
kg = await q.get_labelgraph(query) # Used briefly
|
||||
# q automatically destroyed when function exits
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Muundo Ulioboreshwa na Umeundwa Kudumu:**
|
||||
=======
|
||||
**Muundo Uliounganishwa Vizuri na Umeundwa kwa Muda Mrefu:**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```python
|
||||
class Processor(FlowProcessor):
|
||||
def __init__(self, **params):
|
||||
super().__init__(**params)
|
||||
self.rag_instance = None # Will be initialized once
|
||||
self.client_connections = {}
|
||||
|
||||
async def initialize_rag(self, flow):
|
||||
"""Initialize GraphRag once, reuse for all requests"""
|
||||
if self.rag_instance is None:
|
||||
self.rag_instance = LongLivedGraphRag(
|
||||
embeddings_client=flow("embeddings-request"),
|
||||
graph_embeddings_client=flow("graph-embeddings-request"),
|
||||
triples_client=flow("triples-request"),
|
||||
prompt_client=flow("prompt-request"),
|
||||
verbose=True,
|
||||
)
|
||||
return self.rag_instance
|
||||
|
||||
async def on_request(self, msg, consumer, flow):
|
||||
# REUSE the same GraphRag instance - caches persist!
|
||||
rag = await self.initialize_rag(flow)
|
||||
|
||||
# Query object becomes lightweight execution context
|
||||
response = await rag.query_with_context(
|
||||
query=v.query,
|
||||
execution_context=QueryContext(
|
||||
user=v.user,
|
||||
collection=v.collection,
|
||||
entity_limit=entity_limit,
|
||||
# ... other params
|
||||
)
|
||||
)
|
||||
|
||||
class LongLivedGraphRag:
|
||||
def __init__(self, ...):
|
||||
# CONSERVATIVE caches - balance performance vs consistency
|
||||
self.label_cache = LRUCacheWithTTL(max_size=5000, ttl=300) # 5min TTL for freshness
|
||||
# Note: No embedding cache - already cached per-query, no cross-query benefit
|
||||
# Note: No query result cache due to consistency concerns
|
||||
self.performance_metrics = PerformanceTracker()
|
||||
|
||||
async def query_with_context(self, query: str, context: QueryContext):
|
||||
# Use lightweight QueryExecutor instead of heavyweight Query object
|
||||
executor = QueryExecutor(self, context) # Minimal object
|
||||
return await executor.execute(query)
|
||||
|
||||
@dataclass
|
||||
class QueryContext:
|
||||
"""Lightweight execution context - no heavy operations"""
|
||||
user: str
|
||||
collection: str
|
||||
entity_limit: int
|
||||
triple_limit: int
|
||||
max_subgraph_size: int
|
||||
max_path_length: int
|
||||
|
||||
class QueryExecutor:
|
||||
"""Lightweight execution context - replaces old Query class"""
|
||||
def __init__(self, rag: LongLivedGraphRag, context: QueryContext):
|
||||
self.rag = rag
|
||||
self.context = context
|
||||
# No heavy initialization - just references
|
||||
|
||||
async def execute(self, query: str):
|
||||
# All heavy lifting uses persistent rag caches
|
||||
return await self.rag.execute_optimized_query(query, self.context)
|
||||
```
|
||||
|
||||
Mabadiliko haya ya usanifu yanatoa:
|
||||
**Punguuzo la 10-20% la maswali ya hifadhidata** kwa grafu zilizo na uhusiano wa kawaida (kulinganisha na 0% kwa sasa)
|
||||
<<<<<<< HEAD
|
||||
**Kuondolewa kwa gharama ya ziada ya uundaji wa vitu** kwa kila ombi
|
||||
**Uunganishaji wa kudumu na matumizi ya upya** kwa wateja
|
||||
**Uboreshaji wa ombi hadi ombi** ndani ya vipindi vya muda wa kuhifadhi (TTL)
|
||||
|
||||
**Kizuia Muhimu cha Utangamano wa Kumbukumbu:**
|
||||
Uhifadhi wa muda mrefu unaweza kusababisha data kuwa potofu wakati vitu/lebo zinafutwa au kubadilishwa katika grafu iliyoko. Kumbukumbu ya LRU yenye TTL hutoa usawa kati ya faida za utendaji na usafi wa data, lakini haiwezi kuchunguza mabadiliko ya grafu ya wakati halisi.
|
||||
=======
|
||||
**Kuondolewa kwa gharama ya utengenezaji wa kitu** kwa kila ombi
|
||||
**Uunganishaji wa kudumu na matumizi ya mteja tena**
|
||||
**Uboreshaji wa ombi hadi ombi** ndani ya vipindi vya muda wa kuhifadhi (TTL)
|
||||
|
||||
**Kizuia Muhimu cha Utangamano wa Kumbukumbu:**
|
||||
Uhifadhi wa muda mrefu unaweza kusababisha hatari ya data kuwa potofu wakati vitu/lebo zinafutwa au kubadilishwa katika grafu iliyoko. Kumbukumbu ya LRU yenye TTL hutoa usawa kati ya faida za utendaji na uongevu wa data, lakini haiwezi kuchunguza mabadiliko ya grafu ya wakati halisi.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### Awamu ya 1: Uboreshaji wa Ufuatiliaji wa Grafu
|
||||
|
||||
**Matatizo ya Utendaji wa Sasa:**
|
||||
```python
|
||||
# INEFFICIENT: 3 queries per entity per level
|
||||
async def follow_edges(self, ent, subgraph, path_length):
|
||||
# Query 1: s=ent, p=None, o=None
|
||||
res = await self.rag.triples_client.query(s=ent, p=None, o=None, limit=self.triple_limit)
|
||||
# Query 2: s=None, p=ent, o=None
|
||||
res = await self.rag.triples_client.query(s=None, p=ent, o=None, limit=self.triple_limit)
|
||||
# Query 3: s=None, p=None, o=ent
|
||||
res = await self.rag.triples_client.query(s=None, p=None, o=ent, limit=self.triple_limit)
|
||||
```
|
||||
|
||||
**Utekelezaji Ulioboreshwa:**
|
||||
```python
|
||||
async def optimized_traversal(self, entities: List[str], max_depth: int) -> Set[Triple]:
|
||||
visited = set()
|
||||
current_level = set(entities)
|
||||
subgraph = set()
|
||||
|
||||
for depth in range(max_depth):
|
||||
if not current_level or len(subgraph) >= self.max_subgraph_size:
|
||||
break
|
||||
|
||||
# Batch all queries for current level
|
||||
batch_queries = []
|
||||
for entity in current_level:
|
||||
if entity not in visited:
|
||||
batch_queries.extend([
|
||||
TripleQuery(s=entity, p=None, o=None),
|
||||
TripleQuery(s=None, p=entity, o=None),
|
||||
TripleQuery(s=None, p=None, o=entity)
|
||||
])
|
||||
|
||||
# Execute all queries concurrently
|
||||
results = await self.execute_batch_queries(batch_queries)
|
||||
|
||||
# Process results and prepare next level
|
||||
next_level = set()
|
||||
for result in results:
|
||||
subgraph.update(result.triples)
|
||||
next_level.update(result.new_entities)
|
||||
|
||||
visited.update(current_level)
|
||||
current_level = next_level - visited
|
||||
|
||||
return subgraph
|
||||
```
|
||||
|
||||
#### Awamu ya 2: Utatuzi wa Lebo Sambamba
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Utendaji wa Sasa wa Mfululizo:**
|
||||
=======
|
||||
**Utaratibu wa Sasa wa Utendaji:**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```python
|
||||
# INEFFICIENT: Sequential processing
|
||||
for edge in subgraph:
|
||||
s = await self.maybe_label(edge[0]) # Individual query
|
||||
p = await self.maybe_label(edge[1]) # Individual query
|
||||
o = await self.maybe_label(edge[2]) # Individual query
|
||||
```
|
||||
|
||||
**Utekelezaji Ufuatao Mfumo Sambamba Uliorekebishwa:**
|
||||
```python
|
||||
async def resolve_labels_parallel(self, subgraph: List[Triple]) -> List[Triple]:
|
||||
# Collect all unique entities needing labels
|
||||
entities_to_resolve = set()
|
||||
for s, p, o in subgraph:
|
||||
entities_to_resolve.update([s, p, o])
|
||||
|
||||
# Remove already cached entities
|
||||
uncached_entities = [e for e in entities_to_resolve if e not in self.label_cache]
|
||||
|
||||
# Batch query for all uncached labels
|
||||
if uncached_entities:
|
||||
label_results = await self.batch_label_query(uncached_entities)
|
||||
self.label_cache.update(label_results)
|
||||
|
||||
# Apply labels to subgraph
|
||||
return [
|
||||
(self.label_cache.get(s, s), self.label_cache.get(p, p), self.label_cache.get(o, o))
|
||||
for s, p, o in subgraph
|
||||
]
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
#### Awamu ya 3: Mkakati wa Kupanua Data (Caching) wa Juu
|
||||
|
||||
**Kupanua Data (Cache) la LRU pamoja na TTL:**
|
||||
=======
|
||||
#### Awamu ya 3: Mbinu Iliyoboreshwa ya Kuhifadhi Data
|
||||
|
||||
**Kifaa cha Kuhifadhi Data cha LRU (Least Recently Used) pamoja na TTL (Time To Live):**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
```python
|
||||
class LRUCacheWithTTL:
|
||||
def __init__(self, max_size: int, default_ttl: int = 3600):
|
||||
self.cache = OrderedDict()
|
||||
self.max_size = max_size
|
||||
self.default_ttl = default_ttl
|
||||
self.access_times = {}
|
||||
|
||||
async def get(self, key: str) -> Optional[Any]:
|
||||
if key in self.cache:
|
||||
# Check TTL expiration
|
||||
if time.time() - self.access_times[key] > self.default_ttl:
|
||||
del self.cache[key]
|
||||
del self.access_times[key]
|
||||
return None
|
||||
|
||||
# Move to end (most recently used)
|
||||
self.cache.move_to_end(key)
|
||||
return self.cache[key]
|
||||
return None
|
||||
|
||||
async def put(self, key: str, value: Any):
|
||||
if key in self.cache:
|
||||
self.cache.move_to_end(key)
|
||||
else:
|
||||
if len(self.cache) >= self.max_size:
|
||||
# Remove least recently used
|
||||
oldest_key = next(iter(self.cache))
|
||||
del self.cache[oldest_key]
|
||||
del self.access_times[oldest_key]
|
||||
|
||||
self.cache[key] = value
|
||||
self.access_times[key] = time.time()
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
#### Awamu ya 4: Ubora wa Ufuatiliaji na Ufuatiliaji
|
||||
=======
|
||||
#### Awamu ya 4: Ubora wa Ufuatiliaji na Usimamizi
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**Ukusanyaji wa Vipimo vya Utendaji:**
|
||||
```python
|
||||
@dataclass
|
||||
class PerformanceMetrics:
|
||||
total_queries: int
|
||||
cache_hits: int
|
||||
cache_misses: int
|
||||
avg_response_time: float
|
||||
subgraph_construction_time: float
|
||||
label_resolution_time: float
|
||||
total_entities_processed: int
|
||||
memory_usage_mb: float
|
||||
```
|
||||
|
||||
**Mipangilio ya Muda wa Muda na Mfumo wa Kuzuia:**
|
||||
```python
|
||||
async def execute_with_timeout(self, query_func, timeout: int = 30):
|
||||
try:
|
||||
return await asyncio.wait_for(query_func(), timeout=timeout)
|
||||
except asyncio.TimeoutError:
|
||||
logger.error(f"Query timeout after {timeout}s")
|
||||
raise GraphRagTimeoutError(f"Query exceeded timeout of {timeout}s")
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
## Mawasilisho ya Ulinganishaji wa Kumbukumbu (Cache)
|
||||
|
||||
**Ulinganishaji wa Uharibifu wa Data:**
|
||||
**Kumbukumbu ya lebo (TTL ya dakika 5)**: Hatari ya kuonyesha lebo za vitu ambazo zimefutwa/kubadilishwa
|
||||
**Hakuna uwekaji kumbukumbu wa embeddings**: Haihitajiki - embeddings tayari zimehifadhiwa kwa kila swali
|
||||
**Hakuna uwekaji kumbukumbu wa matokeo**: Inazuia matokeo ya subgrafu ya zamani kutoka kwa vitu/uhusiano ambao wamefutwa
|
||||
|
||||
**Mikakati ya Kupunguza Madhara:**
|
||||
**Manufaa ya TTL ya kihafidhia:** Kusawazisha faida za utendaji (10-20%) na usafi wa data
|
||||
**Viunganishi vya kutengua kumbukumbu:** Uunganishi wa hiari na matukio ya mabadiliko ya grafu
|
||||
**Dashibodi za ufuatiliaji:** Kufuatilia viwango vya hit ya kumbukumbu dhidi ya matukio ya usafi
|
||||
**Mawasilisho ya kumbukumbu yanayoweza kusanidi:** Kuruhusu urekebishaji kwa kila usakinishaji kulingana na masafa ya mabadiliko
|
||||
|
||||
**Mawasilisho Yanayopendekezwa ya Kumbukumbu Kulingana na Kasi ya Mabadiliko ya Grafu:**
|
||||
**Mabadiliko ya juu (>100 mabadiliko/saa)**: TTL=60s, saizi ndogo za kumbukumbu
|
||||
**Mabadiliko ya wastani (10-100 mabadiliko/saa)**: TTL=300s (ya kawaida)
|
||||
**Mabadiliko ya chini (<10 mabadiliko/saa)**: TTL=600s, saizi kubwa za kumbukumbu
|
||||
|
||||
## Mawasilisho ya Usalama
|
||||
|
||||
**Kuzuia Uingizwaji wa Swali:**
|
||||
=======
|
||||
## Mawasilisho ya Ulinganishaji wa Hifadhi (Cache)
|
||||
|
||||
**Ulinganishaji wa Uharibifu wa Data:**
|
||||
**Hifadhi ya lebo (TTL ya dakika 5)**: Hatari ya kuonyesha lebo za vitu ambazo zimefutwa/kubadilishwa
|
||||
**Hakuna uhifadhi wa embeddings**: Haihitajiki - embeddings tayari zimehifadhiwa kwa kila swali
|
||||
**Hakuna uhifadhi wa matokeo**: Inazuia matokeo ya subgrafu ya zamani kutoka kwa vitu/uhusiano ambao umeondolewa
|
||||
|
||||
**Mikakati ya Kupunguza Madhara:**
|
||||
**Manufaa ya TTL ya kiuchunguzi:** Kusawazisha faida za utendaji (10-20%) na usafi wa data
|
||||
**Viunganishi vya kutengua hifadhi:** Unganisho wa hiari na matukio ya mabadiliko ya grafu
|
||||
**Dashibodi za ufuatiliaji:** Kufuatilia viwango vya hit ya hifadhi dhidi ya matukio ya uharibifu
|
||||
**Mbinu za hifadhi zinazoweza kusanidi:** Kuruhusu urekebishaji wa kila usakinishaji kulingana na masafa ya mabadiliko
|
||||
|
||||
**Mazingatio Yanayopendekezwa ya Hifadhi Kulingana na Kasi ya Mabadiliko ya Grafu:**
|
||||
**Mabadiliko ya juu (>100 mabadiliko/saa)**: TTL=60s, saizi ndogo za hifadhi
|
||||
**Mabadiliko ya wastani (10-100 mabadiliko/saa)**: TTL=300s (ya kawaida)
|
||||
**Mabadiliko ya chini (<10 mabadiliko/saa)**: TTL=600s, saizi kubwa za hifadhi
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
**Kuzuia Uingizwaji wa Maswali:**
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Thibitisha kitambulisho vyote vya vitu na vigezo vya swali
|
||||
Tumia maswali yaliyoparametishwa kwa mwingiliano wote wa hifadhidata
|
||||
Tekeleza mipaka ya utata wa swali ili kuzuia mashambulizi ya aina ya kukataa huduma (DoS)
|
||||
|
||||
**Ulinzi wa Rasilimali:**
|
||||
Enforce mipaka ya juu ya saizi ya subgrafu
|
||||
<<<<<<< HEAD
|
||||
Tekeleza muda wa mwisho wa swali ili kuzuia kutokuwa na rasilimali
|
||||
Ongeza ufuatiliaji na mipaka ya matumizi ya kumbukumbu
|
||||
|
||||
**Kidhibiti cha Ufikiaji:**
|
||||
Endeleza kutengwa kwa watumiaji na ukusanyaji iliyopo
|
||||
=======
|
||||
Tekeleza muda wa mwisho wa swali ili kuzuia uchovu wa rasilimali
|
||||
Ongeza ufuatiliaji na mipaka ya matumizi ya kumbukumbu
|
||||
|
||||
**Kidhibiti cha Ufikiaji:**
|
||||
Endeleza kutengwa kwa watumiaji na mkusanyiko iliyopo
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Ongeza uandikaji wa ukaguzi kwa operesheni zinazoathiri utendaji
|
||||
Tekeleza kikomo cha kiwango kwa operesheni ghali
|
||||
|
||||
## Mawasilisho ya Utendaji
|
||||
|
||||
### Maboresho Yanayotarajiwa ya Utendaji
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Upunguzaji wa Swali:**
|
||||
Sasa: ~9,000+ maswali kwa ombi la kawaida
|
||||
Yaliyoboreshwa: ~50-100 maswali yaliyunganishwa (upunguzaji wa 98%)
|
||||
=======
|
||||
**Upunguzaji wa Maswali:**
|
||||
Sasa: ~9,000+ maswali kwa ombi la kawaida
|
||||
Yaliyoboreshwa: ~50-100 maswali yaliyogawanywa (upunguzaji wa 98%)
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
**Maboresho ya Muda wa Jibu:**
|
||||
Ufuatiliaji wa grafu: 15-20s → 3-5s (haraka 4-5x)
|
||||
Utatuzi wa lebo: 8-12s → 2-4s (haraka 3x)
|
||||
Swali kamili: 25-35s → 6-10s (maboresho ya 3-4x)
|
||||
|
||||
**Ufanisi wa Kumbukumbu:**
|
||||
<<<<<<< HEAD
|
||||
Saizi zilizokadiriwa za kumbukumbu inazuia uvujaji wa kumbukumbu
|
||||
=======
|
||||
Saizi zilizokadiriwa za hifadhi inazuia uvujaji wa kumbukumbu
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Miundo ya data inayofaa hupunguza athari ya kumbukumbu kwa ~40%
|
||||
Urekebishaji wa taka bora kupitia usafi sahihi wa rasilimali
|
||||
|
||||
**Mataifa ya Kweli ya Utendaji:**
|
||||
<<<<<<< HEAD
|
||||
**Kumbukumbu ya lebo**: Upunguzaji wa 10-20% wa swali kwa grafu zilizo na uhusiano wa kawaida
|
||||
**Uboreshaji wa uunganisho**: Upunguzaji wa 50-80% wa swali (uboresho mkuu)
|
||||
**Uboreshaji wa maisha ya kitu**: Ondoa gharama ya kila ombi
|
||||
**Maboresho ya jumla**: Maboresho ya 3-4x ya muda wa jibu hasa kutoka kwa uunganisho
|
||||
|
||||
**Maboresho ya Uwezo wa Kupanuka:**
|
||||
Usaidizi wa grafu za maarifa kubwa 3-5x (vikomo na mahitaji ya ulinganishaji wa utendaji)
|
||||
=======
|
||||
**Hifadhi ya lebo**: Upunguzaji wa 10-20% wa maswali kwa grafu zilizo na uhusiano wa kawaida
|
||||
**Uboreshaji wa uainishaji**: Upunguzaji wa 50-80% wa maswali (uboresho mkuu)
|
||||
**Uboreshaji wa maisha ya kitu**: Ondoa gharama ya kila ombi
|
||||
**Maboresho ya jumla**: Maboresho ya 3-4x ya muda wa jibu hasa kutoka kwa uainishaji
|
||||
|
||||
**Maboresho ya Uwezo wa Kupanuka:**
|
||||
Usaidizi wa grafu za maarifa kubwa 3-5x (mdogo na mahitaji ya ulinganishaji wa data)
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Uwezo wa juu 3-5x wa ombi la wakati mmoja
|
||||
Matumizi bora ya rasilimali kupitia matumizi ya upya ya muunganisho
|
||||
|
||||
### Ufuatiliaji wa Utendaji
|
||||
|
||||
<<<<<<< HEAD
|
||||
**Hesabu za Muda Halisi:**
|
||||
Muda wa utekelezaji wa swali kwa aina ya operesheni
|
||||
Viwango vya hit na ufanisi wa kumbukumbu
|
||||
Matumizi ya kikundi cha muunganisho wa hifadhidata
|
||||
=======
|
||||
**Mataifa ya Muda Halisi:**
|
||||
Muda wa utekelezaji wa swali kwa aina ya operesheni
|
||||
Viwango vya hit na ufanisi wa hifadhi
|
||||
Matumizi ya dimbidi ya muunganisho wa hifadhidata
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Matumizi ya kumbukumbu na athari ya urekebishaji wa taka
|
||||
|
||||
**Ufuatiliaji wa Utendaji:**
|
||||
Mtihirika wa kiotomatiki wa utendaji
|
||||
<<<<<<< HEAD
|
||||
Mtihirika wa mzigo ukitumia data halisi
|
||||
Viwango vya utendaji dhidi ya utekelezaji wa sasa
|
||||
=======
|
||||
Mtihirika wa mzigo kwa matumizi halisi ya data
|
||||
Viwango vya kulinganisho dhidi ya utekelezaji wa sasa
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Mkakati wa Mtihirika
|
||||
|
||||
### Mtihirika wa Vitengo
|
||||
Mtihirika wa vipengele vya mtu binafsi kwa ajili ya utekelezaji, kuhifadhi, na utatuzi wa lebo
|
||||
<<<<<<< HEAD
|
||||
Mwingiliano wa bandarini ya bandarini kwa ajili ya mtihirika wa utendaji
|
||||
Mtihirika wa kuondoa data kutoka kwa kumbukumbu na muda wa kumalizika
|
||||
Usimamizi wa makosa na hali za muda
|
||||
|
||||
### Mtihirika wa Uunganisho
|
||||
Mtihirika wa mwisho hadi mwisho wa swali la GraphRAG ukiwa na uboreshaji
|
||||
Mtihirika wa mwingiliano wa bandarini ya bandarini ukitumia data halisi
|
||||
=======
|
||||
Mwingiliano wa bandarini ya hila kwa ajili ya mtihirika wa utendaji
|
||||
Mtihirika wa kuondoa data kutoka kwa kumbukumbu na kumalizika kwa muda
|
||||
Usimamizi wa makosa na hali za muda
|
||||
|
||||
### Mtihirika wa Uunganisho
|
||||
Mtihirika wa mwisho hadi mwisho wa swali la GraphRAG na uboreshaji
|
||||
Mtihirika wa mwingiliano wa bandarini ya data halisi
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Usimamizi wa ombi la wakati mmoja na rasilimali
|
||||
Udagano wa uvujaji wa kumbukumbu na uthibitisho wa kusafisha rasilimali
|
||||
|
||||
### Mtihirika wa Utendaji
|
||||
Mtihirika dhidi ya utekelezaji wa sasa
|
||||
<<<<<<< HEAD
|
||||
Mtihirika wa mzigo ukitumia saizi na utata tofauti wa grafu
|
||||
Mtihirika wa shinikizo kwa mipaka ya kumbukumbu na uunganisho
|
||||
Mtihirika wa utendaji kwa uboreshaji
|
||||
|
||||
### Mtihirika wa Ulinganishi
|
||||
Thibitisha ulinganishi wa API ya GraphRAG iliyopo
|
||||
Mtihirika ukitumia bandarini ya bandarini tofauti za bandarini ya grafu
|
||||
=======
|
||||
Mtihirika wa mzigo kwa saizi na utata tofauti wa grafu
|
||||
Mtihirika wa shinikizo kwa mipaka ya kumbukumbu na uunganisho
|
||||
Mtihirika wa marejesho kwa maboresho ya utendaji
|
||||
|
||||
### Mtihirika wa Ulinganishi
|
||||
Thibitisha ulinganishi wa API ya GraphRAG iliyopo
|
||||
Mtihirika na bandarini tofauti za hifadhi ya grafu
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Thibitisha usahihi wa matokeo ikilinganishwa na utekelezaji wa sasa
|
||||
|
||||
## Mpango wa Utendaji
|
||||
|
||||
### Mbinu ya Utendaji Moja kwa Moja
|
||||
Kwa kuwa API zinaweza kubadilika, tekeleza uboreshaji moja kwa moja bila utata wa uhamishaji:
|
||||
|
||||
<<<<<<< HEAD
|
||||
1. **Badilisha `follow_edges` mbinu**: Andika upya ukitumia utekelezaji wa kikundi
|
||||
2. **Boresha `get_labelgraph`**: Tepeleza utatuzi wa lebo kwa wingi
|
||||
=======
|
||||
1. **Badilisha `follow_edges` mbinu**: Andika upya kwa utekelezaji wa kikundi
|
||||
2. **Boresha `get_labelgraph`**: Tepeleza utatuzi wa lebo kwa njia ya sambamba
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
3. **Ongeza GraphRag ya muda mrefu**: Badilisha Processor ili kudumisha mfano wa kudumu
|
||||
4. **Tepeleza uhifadhi wa lebo**: Ongeza kumbukumbu ya LRU na TTL kwa darasa la GraphRag
|
||||
|
||||
### Wigo wa Mabadiliko
|
||||
**Darasa la swali**: Badilisha mistari ~50 katika `follow_edges`, ongeza mistari ~30 ya utunzaji wa kikundi
|
||||
**Darasa la GraphRag**: Ongeza safu ya kuhifadhi (~mistari 40)
|
||||
**Darasa la Processor**: Badilisha ili kutumia mfano wa kudumu wa GraphRag (~mistari 20)
|
||||
**Jumla**: ~mistari 140 ya mabadiliko, hasa ndani ya madarasa yaliyopo
|
||||
|
||||
## Ratiba
|
||||
|
||||
**Wiki ya 1: Utendaji wa Msingi**
|
||||
<<<<<<< HEAD
|
||||
Badilisha `follow_edges` ukitumia utekelezaji wa kikundi
|
||||
Tepeleza utatuzi wa lebo kwa wingi katika `get_labelgraph`
|
||||
=======
|
||||
Badilisha `follow_edges` kwa utekelezaji wa kikundi
|
||||
Tepeleza utatuzi wa lebo kwa njia ya sambamba katika `get_labelgraph`
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Ongeza mfano wa GraphRag wa muda mrefu kwa Processor
|
||||
Tepeleza safu ya uhifadhi
|
||||
|
||||
**Wiki ya 2: Mtihirika na Uunganisho**
|
||||
Mtihirika wa vitengo kwa ajili ya utekelezaji mpya wa utekelezaji na uhifadhi
|
||||
<<<<<<< HEAD
|
||||
Ufuatiliaji wa utendaji dhidi ya utekelezaji wa sasa
|
||||
Mtihirika wa uunganisho ukitumia data halisi ya grafu
|
||||
Mtihirika wa msimamizi na uboreshaji
|
||||
|
||||
**Wiki ya 3: Utekelezaji**
|
||||
Tepeleza utekelezaji ulioboreshwa
|
||||
Fuatilia uboreshaji wa utendaji
|
||||
Punguza muda wa TTL wa kumbukumbu na saizi za kikundi kulingana na matumizi halisi
|
||||
|
||||
## Maswali ya Funguo
|
||||
|
||||
**Uunganisho wa Bandarini**: Je, tunapaswa kutekeleza uunganisho wa bandarini maalum au kutegemea uunganisho wa bandarini wa bandarini ya bandarini iliyopo?
|
||||
**Ukurasa wa Kumbukumbu**: Je, kumbukumbu za lebo na uwekaji wa kumbukumbu zinapaswa kudumu katika kuanzishwa upya za huduma?
|
||||
**Ukurasa Uliogawanyika**: Kwa matoleo mengi, je, tunapaswa kutekeleza ukurasa uliogawanyika ukitumia Redis/Memcached?
|
||||
**Muundo wa Matokeo ya Swali**: Je, tunapaswa kuboresha uwakilishi wa ndani wa triple ili kuboresha ufanisi wa kumbukumbu?
|
||||
**Uunganisho wa Ufuatiliaji**: Vipimo vipi vinapaswa kuonyeshwa kwa mifumo ya ufuatiliaji iliyopo (Prometheus, n.k.)?
|
||||
=======
|
||||
Mtihirika wa utendaji dhidi ya utekelezaji wa sasa
|
||||
Mtihirika wa uunganisho na data halisi ya grafu
|
||||
Mtihirika wa msimamizi na uboreshaji
|
||||
|
||||
**Wiki ya 3: Uwekaji**
|
||||
Weka utekelezaji ulioboreshwa
|
||||
Fuatilia maboresho ya utendaji
|
||||
Punguza muda wa uhifadhi na saizi za kikundi kulingana na matumizi halisi
|
||||
|
||||
## Maswali ya Funguo
|
||||
|
||||
**Uunganisho wa Bandarini**: Je, tunapaswa kutekeleza bandarini ya uunganisho maalum au kutegemea bandarini ya mteja wa hifadhi iliyopo?
|
||||
**Ukurasa wa Uhifadhi**: Je, uhifadhi wa lebo na uwekaji unapaswa kudumu katika kuanzishwa upya huduma?
|
||||
**Ukurasa Uliogawanyika**: Kwa matoleo mengi, je, tunapaswa kutekeleza ukurasa uliogawanyika na Redis/Memcached?
|
||||
**Muundo wa Matokeo ya Swali**: Je, tunapaswa kuboresha uwakilishi wa ndani wa utatu kwa ufanisi bora wa kumbukumbu?
|
||||
**Uunganisho wa Ufuatiliaji**: Ni metri gani ambazo zinapaswa kuonyeshwa kwa mifumo ya ufuatiliaji iliyopo (Prometheus, n.k.)?
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Marejeleo
|
||||
|
||||
[Utekelezaji Asili wa GraphRAG](trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py)
|
||||
<<<<<<< HEAD
|
||||
[Kanuni za Usanifu wa TrustGraph](architecture-principles.md)
|
||||
=======
|
||||
[Kanuni za Usawa wa TrustGraph](architecture-principles.md)
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
[Maelekezo ya Usimamizi wa Mkusanyiko](collection-management.md)
|
||||
738
docs/tech-specs/sw/import-export-graceful-shutdown.sw.md
Normal file
738
docs/tech-specs/sw/import-export-graceful-shutdown.sw.md
Normal file
|
|
@ -0,0 +1,738 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Kiufundi vya Uanzishaji na Kukomesha Kazi kwa Ufasaha (Import/Export)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vipimo vya Kiufundi vya Uanzishaji na Kukomesha Kazi kwa Ufasaha (Import/Export)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Tatizo
|
||||
|
||||
Hivi sasa, mlango wa TrustGraph unapoteza ujumbe wakati wa kufunga muunganisho wa websocket katika operesheni za uanzishaji na kukomesha kazi. Hii hutokea kwa sababu ya migogoro ambapo ujumbe unaendelea hutupwa kabla ya kufika kwa lengo lake (mifereji ya Pulsar kwa uanzishaji, wateja wa websocket kwa kukomesha kazi).
|
||||
|
||||
### Masuala ya Upande wa Uanzishaji
|
||||
1. Picha ya folyo ya `asyncio.Queue` ya mchapishaji haijafunguliwa wakati wa kukomesha kazi.
|
||||
2. Websocket hufungwa kabla ya kuhakikisha kwamba ujumbe uliopangwa kufika kwenye Pulsar.
|
||||
<<<<<<< HEAD
|
||||
3. Hakuna mfumo wa uthibitisho wa uwasilishaji wa ujumbe kwa mafanikio.
|
||||
|
||||
### Masuala ya Upande wa Kukomesha Kazi
|
||||
1. Ujumbe unaidhinishwa katika Pulsar kabla ya uwasilishaji wa mafanikio kwa wateja.
|
||||
2. Muda uliopangwa (timeouts) husababisha kupotea kwa ujumbe wakati mifereji imejaa.
|
||||
3. Hakuna mfumo wa kudhibiti kasi (backpressure) wa kushughulikia watumiaji (consumers) ambao ni polepole.
|
||||
=======
|
||||
3. Hakuna mfumo wa utambuzi kwa uwasilishaji wa ujumbe unaofanikiwa.
|
||||
|
||||
### Masuala ya Upande wa Kukomesha Kazi
|
||||
1. Ujumbe hutambuliwa katika Pulsar kabla ya uwasilishaji wa mafanikio kwa wateja.
|
||||
2. Muda uliopangwa (timeouts) husababisha kupotea kwa ujumbe wakati mifereji imejaa.
|
||||
3. Hakuna mfumo wa kudhibiti kasi (backpressure) kwa kushughulikia watumiaji (consumers) ambao ni polepole.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
4. Vituo vingi vya kuhifadhi ambapo data inaweza kupotea.
|
||||
|
||||
## Muhtasari wa Muundo
|
||||
|
||||
```
|
||||
Import Flow:
|
||||
Client -> Websocket -> TriplesImport -> Publisher -> Pulsar Queue
|
||||
|
||||
Export Flow:
|
||||
Pulsar Queue -> Subscriber -> TriplesExport -> Websocket -> Client
|
||||
```
|
||||
|
||||
## Marekebisho Yanayopendekezwa
|
||||
|
||||
### 1. Maboresho ya Mchapishaji (Upande wa Uingizaji)
|
||||
|
||||
#### A. Kusafisha Kina Kesi ya Mfululizo
|
||||
|
||||
**Faili**: `trustgraph-base/trustgraph/base/publisher.py`
|
||||
|
||||
```python
|
||||
class Publisher:
|
||||
def __init__(self, client, topic, schema=None, max_size=10,
|
||||
chunking_enabled=True, drain_timeout=5.0):
|
||||
self.client = client
|
||||
self.topic = topic
|
||||
self.schema = schema
|
||||
self.q = asyncio.Queue(maxsize=max_size)
|
||||
self.chunking_enabled = chunking_enabled
|
||||
self.running = True
|
||||
self.draining = False # New state for graceful shutdown
|
||||
self.task = None
|
||||
self.drain_timeout = drain_timeout
|
||||
|
||||
async def stop(self):
|
||||
"""Initiate graceful shutdown with draining"""
|
||||
self.running = False
|
||||
self.draining = True
|
||||
|
||||
if self.task:
|
||||
# Wait for run() to complete draining
|
||||
await self.task
|
||||
|
||||
async def run(self):
|
||||
"""Enhanced run method with integrated draining logic"""
|
||||
while self.running or self.draining:
|
||||
try:
|
||||
producer = self.client.create_producer(
|
||||
topic=self.topic,
|
||||
schema=JsonSchema(self.schema),
|
||||
chunking_enabled=self.chunking_enabled,
|
||||
)
|
||||
|
||||
drain_end_time = None
|
||||
|
||||
while self.running or self.draining:
|
||||
try:
|
||||
# Start drain timeout when entering drain mode
|
||||
if self.draining and drain_end_time is None:
|
||||
drain_end_time = time.time() + self.drain_timeout
|
||||
logger.info(f"Publisher entering drain mode, timeout={self.drain_timeout}s")
|
||||
|
||||
# Check drain timeout
|
||||
if self.draining and time.time() > drain_end_time:
|
||||
if not self.q.empty():
|
||||
logger.warning(f"Drain timeout reached with {self.q.qsize()} messages remaining")
|
||||
self.draining = False
|
||||
break
|
||||
|
||||
# Calculate wait timeout based on mode
|
||||
if self.draining:
|
||||
# Shorter timeout during draining to exit quickly when empty
|
||||
timeout = min(0.1, drain_end_time - time.time())
|
||||
else:
|
||||
# Normal operation timeout
|
||||
timeout = 0.25
|
||||
|
||||
# Get message from queue
|
||||
id, item = await asyncio.wait_for(
|
||||
self.q.get(),
|
||||
timeout=timeout
|
||||
)
|
||||
|
||||
# Send the message (single place for sending)
|
||||
if id:
|
||||
producer.send(item, { "id": id })
|
||||
else:
|
||||
producer.send(item)
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
# If draining and queue is empty, we're done
|
||||
if self.draining and self.q.empty():
|
||||
logger.info("Publisher queue drained successfully")
|
||||
self.draining = False
|
||||
break
|
||||
continue
|
||||
|
||||
except asyncio.QueueEmpty:
|
||||
# If draining and queue is empty, we're done
|
||||
if self.draining and self.q.empty():
|
||||
logger.info("Publisher queue drained successfully")
|
||||
self.draining = False
|
||||
break
|
||||
continue
|
||||
|
||||
# Flush producer before closing
|
||||
if producer:
|
||||
producer.flush()
|
||||
producer.close()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Exception in publisher: {e}", exc_info=True)
|
||||
|
||||
if not self.running and not self.draining:
|
||||
return
|
||||
|
||||
# If handler drops out, sleep a retry
|
||||
await asyncio.sleep(1)
|
||||
|
||||
async def send(self, id, item):
|
||||
"""Send still works normally - just adds to queue"""
|
||||
if self.draining:
|
||||
# Optionally reject new messages during drain
|
||||
raise RuntimeError("Publisher is shutting down, not accepting new messages")
|
||||
await self.q.put((id, item))
|
||||
```
|
||||
|
||||
**Manufaa Muhimu ya Ubunifu:**
|
||||
**Eneo Moja la Kutuma:** Wito wote wa `producer.send()` hutokea katika sehemu moja ndani ya mbinu ya `run()`.
|
||||
<<<<<<< HEAD
|
||||
**Mashine ya Hali Safi:** Hali tatu wazi - inafanya kazi, inatakatishwa, imesimama.
|
||||
**Ulinzi wa Muda:** Haingii katika hali ya kukwama kwa muda usio na kikomo wakati wa utaratibu wa utakatishaji.
|
||||
**Ufuatiliaji Bora:** Uandikaji wazi wa maendeleo ya utaratibu wa utakatishaji na mabadiliko ya hali.
|
||||
**Kukataa kwa Ujumbe (Hiari):** Inaweza kukataa ujumbe mpya wakati wa awamu ya kuzima.
|
||||
=======
|
||||
**Mashine ya Hali Safi:** Hali tatu wazi - inafanya kazi, inachosha, imesimama.
|
||||
**Ulinzi wa Muda:** Haitemei kwa muda usio na kikomo wakati wa kuchosha.
|
||||
**Ufuatiliaji Bora:** Uandikaji wazi wa maendeleo ya kuchosha na mabadiliko ya hali.
|
||||
**Kukataa Ujumbe wa Hiari:** Inaweza kukataa ujumbe mpya wakati wa awamu ya kuzima.
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
#### B. Mpangilio Ulioboreshwa wa Kuzima
|
||||
|
||||
**Faili:** `trustgraph-flow/trustgraph/gateway/dispatch/triples_import.py`
|
||||
|
||||
```python
|
||||
class TriplesImport:
|
||||
async def destroy(self):
|
||||
"""Enhanced destroy with proper shutdown order"""
|
||||
# Step 1: Stop accepting new messages
|
||||
self.running.stop()
|
||||
|
||||
# Step 2: Wait for publisher to drain its queue
|
||||
logger.info("Draining publisher queue...")
|
||||
await self.publisher.stop()
|
||||
|
||||
# Step 3: Close websocket only after queue is drained
|
||||
if self.ws:
|
||||
await self.ws.close()
|
||||
```
|
||||
|
||||
### 2. Maboresho kwa Wateja (Upande wa Uhamisho)
|
||||
|
||||
#### A. Mfumo Uliounganishwa wa Utoaji
|
||||
|
||||
**Faili**: `trustgraph-base/trustgraph/base/subscriber.py`
|
||||
|
||||
```python
|
||||
class Subscriber:
|
||||
def __init__(self, client, topic, subscription, consumer_name,
|
||||
schema=None, max_size=100, metrics=None,
|
||||
backpressure_strategy="block", drain_timeout=5.0):
|
||||
# ... existing init ...
|
||||
self.backpressure_strategy = backpressure_strategy
|
||||
self.running = True
|
||||
self.draining = False # New state for graceful shutdown
|
||||
self.drain_timeout = drain_timeout
|
||||
self.pending_acks = {} # Track messages awaiting delivery
|
||||
|
||||
async def stop(self):
|
||||
"""Initiate graceful shutdown with draining"""
|
||||
self.running = False
|
||||
self.draining = True
|
||||
|
||||
if self.task:
|
||||
# Wait for run() to complete draining
|
||||
await self.task
|
||||
|
||||
async def run(self):
|
||||
"""Enhanced run method with integrated draining logic"""
|
||||
while self.running or self.draining:
|
||||
if self.metrics:
|
||||
self.metrics.state("stopped")
|
||||
|
||||
try:
|
||||
self.consumer = self.client.subscribe(
|
||||
topic = self.topic,
|
||||
subscription_name = self.subscription,
|
||||
consumer_name = self.consumer_name,
|
||||
schema = JsonSchema(self.schema),
|
||||
)
|
||||
|
||||
if self.metrics:
|
||||
self.metrics.state("running")
|
||||
|
||||
logger.info("Subscriber running...")
|
||||
drain_end_time = None
|
||||
|
||||
while self.running or self.draining:
|
||||
# Start drain timeout when entering drain mode
|
||||
if self.draining and drain_end_time is None:
|
||||
drain_end_time = time.time() + self.drain_timeout
|
||||
logger.info(f"Subscriber entering drain mode, timeout={self.drain_timeout}s")
|
||||
|
||||
# Stop accepting new messages from Pulsar during drain
|
||||
self.consumer.pause_message_listener()
|
||||
|
||||
# Check drain timeout
|
||||
if self.draining and time.time() > drain_end_time:
|
||||
async with self.lock:
|
||||
total_pending = sum(
|
||||
q.qsize() for q in
|
||||
list(self.q.values()) + list(self.full.values())
|
||||
)
|
||||
if total_pending > 0:
|
||||
logger.warning(f"Drain timeout reached with {total_pending} messages in queues")
|
||||
self.draining = False
|
||||
break
|
||||
|
||||
# Check if we can exit drain mode
|
||||
if self.draining:
|
||||
async with self.lock:
|
||||
all_empty = all(
|
||||
q.empty() for q in
|
||||
list(self.q.values()) + list(self.full.values())
|
||||
)
|
||||
if all_empty and len(self.pending_acks) == 0:
|
||||
logger.info("Subscriber queues drained successfully")
|
||||
self.draining = False
|
||||
break
|
||||
|
||||
# Process messages only if not draining
|
||||
if not self.draining:
|
||||
try:
|
||||
msg = await asyncio.to_thread(
|
||||
self.consumer.receive,
|
||||
timeout_millis=250
|
||||
)
|
||||
except _pulsar.Timeout:
|
||||
continue
|
||||
except Exception as e:
|
||||
logger.error(f"Exception in subscriber receive: {e}", exc_info=True)
|
||||
raise e
|
||||
|
||||
if self.metrics:
|
||||
self.metrics.received()
|
||||
|
||||
# Process the message
|
||||
await self._process_message(msg)
|
||||
else:
|
||||
# During draining, just wait for queues to empty
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Subscriber exception: {e}", exc_info=True)
|
||||
|
||||
finally:
|
||||
# Negative acknowledge any pending messages
|
||||
for msg in self.pending_acks.values():
|
||||
self.consumer.negative_acknowledge(msg)
|
||||
self.pending_acks.clear()
|
||||
|
||||
if self.consumer:
|
||||
self.consumer.unsubscribe()
|
||||
self.consumer.close()
|
||||
self.consumer = None
|
||||
|
||||
if self.metrics:
|
||||
self.metrics.state("stopped")
|
||||
|
||||
if not self.running and not self.draining:
|
||||
return
|
||||
|
||||
# If handler drops out, sleep a retry
|
||||
await asyncio.sleep(1)
|
||||
|
||||
async def _process_message(self, msg):
|
||||
"""Process a single message with deferred acknowledgment"""
|
||||
# Store message for later acknowledgment
|
||||
msg_id = str(uuid.uuid4())
|
||||
self.pending_acks[msg_id] = msg
|
||||
|
||||
try:
|
||||
id = msg.properties()["id"]
|
||||
except:
|
||||
id = None
|
||||
|
||||
value = msg.value()
|
||||
delivery_success = False
|
||||
|
||||
async with self.lock:
|
||||
# Deliver to specific subscribers
|
||||
if id in self.q:
|
||||
delivery_success = await self._deliver_to_queue(
|
||||
self.q[id], value
|
||||
)
|
||||
|
||||
# Deliver to all subscribers
|
||||
for q in self.full.values():
|
||||
if await self._deliver_to_queue(q, value):
|
||||
delivery_success = True
|
||||
|
||||
# Acknowledge only on successful delivery
|
||||
if delivery_success:
|
||||
self.consumer.acknowledge(msg)
|
||||
del self.pending_acks[msg_id]
|
||||
else:
|
||||
# Negative acknowledge for retry
|
||||
self.consumer.negative_acknowledge(msg)
|
||||
del self.pending_acks[msg_id]
|
||||
|
||||
async def _deliver_to_queue(self, queue, value):
|
||||
"""Deliver message to queue with backpressure handling"""
|
||||
try:
|
||||
if self.backpressure_strategy == "block":
|
||||
# Block until space available (no timeout)
|
||||
await queue.put(value)
|
||||
return True
|
||||
|
||||
elif self.backpressure_strategy == "drop_oldest":
|
||||
# Drop oldest message if queue full
|
||||
if queue.full():
|
||||
try:
|
||||
queue.get_nowait()
|
||||
if self.metrics:
|
||||
self.metrics.dropped()
|
||||
except asyncio.QueueEmpty:
|
||||
pass
|
||||
await queue.put(value)
|
||||
return True
|
||||
|
||||
elif self.backpressure_strategy == "drop_new":
|
||||
# Drop new message if queue full
|
||||
if queue.full():
|
||||
if self.metrics:
|
||||
self.metrics.dropped()
|
||||
return False
|
||||
await queue.put(value)
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to deliver message: {e}")
|
||||
return False
|
||||
```
|
||||
|
||||
**Manufaa Muhimu ya Ubunifu (kulingana na mtindo wa Mchapishaji):**
|
||||
**Eneo Moja la Ufuatiliaji**: Ufuatiliaji wote wa ujumbe hutokea katika njia ya `run()`
|
||||
<<<<<<< HEAD
|
||||
**Mashine Safi ya Hali**: Hali tatu wazi - inafanya kazi, inachosha, imesimama
|
||||
**Kusitisha Wakati wa Kuchosha**: Inasimamisha kupokea ujumbe mpya kutoka Pulsar wakati inachosha folyo zilizopo
|
||||
**Ulinzi wa Muda**: Haingii katika hali ya kukwama kwa muda usio na kikomo wakati wa kuchosha
|
||||
=======
|
||||
**Mashine ya Hali Safi**: Hali tatu wazi - inafanya kazi, inatakatishwa, imesimama
|
||||
**Kusitisha Wakati wa Kufyatua**: Inasimamisha kupokea ujumbe mpya kutoka Pulsar wakati wa kufyatua folyo zilizopo
|
||||
**Ulinzi wa Muda**: Haingii katika hali ya kukamatwa kwa muda usio na kikomo wakati wa kufyatua
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
**Usafi Sawa**: Inatambua vibaya ujumbe wowote usiofikishwa wakati wa kuzima
|
||||
|
||||
#### B. Maboresho ya Kidhibiti cha Uhamisho
|
||||
|
||||
**Faili**: `trustgraph-flow/trustgraph/gateway/dispatch/triples_export.py`
|
||||
|
||||
```python
|
||||
class TriplesExport:
|
||||
async def destroy(self):
|
||||
"""Enhanced destroy with graceful shutdown"""
|
||||
# Step 1: Signal stop to prevent new messages
|
||||
self.running.stop()
|
||||
|
||||
# Step 2: Wait briefly for in-flight messages
|
||||
await asyncio.sleep(0.5)
|
||||
|
||||
# Step 3: Unsubscribe and stop subscriber (triggers queue drain)
|
||||
if hasattr(self, 'subs'):
|
||||
await self.subs.unsubscribe_all(self.id)
|
||||
await self.subs.stop()
|
||||
|
||||
# Step 4: Close websocket last
|
||||
if self.ws and not self.ws.closed:
|
||||
await self.ws.close()
|
||||
|
||||
async def run(self):
|
||||
"""Enhanced run with better error handling"""
|
||||
self.subs = Subscriber(
|
||||
client = self.pulsar_client,
|
||||
topic = self.queue,
|
||||
consumer_name = self.consumer,
|
||||
subscription = self.subscriber,
|
||||
schema = Triples,
|
||||
backpressure_strategy = "block" # Configurable
|
||||
)
|
||||
|
||||
await self.subs.start()
|
||||
|
||||
self.id = str(uuid.uuid4())
|
||||
q = await self.subs.subscribe_all(self.id)
|
||||
|
||||
consecutive_errors = 0
|
||||
max_consecutive_errors = 5
|
||||
|
||||
while self.running.get():
|
||||
try:
|
||||
resp = await asyncio.wait_for(q.get(), timeout=0.5)
|
||||
await self.ws.send_json(serialize_triples(resp))
|
||||
consecutive_errors = 0 # Reset on success
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
continue
|
||||
|
||||
except queue.Empty:
|
||||
continue
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Exception sending to websocket: {str(e)}")
|
||||
consecutive_errors += 1
|
||||
|
||||
if consecutive_errors >= max_consecutive_errors:
|
||||
logger.error("Too many consecutive errors, shutting down")
|
||||
break
|
||||
|
||||
# Brief pause before retry
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
# Graceful cleanup handled in destroy()
|
||||
```
|
||||
|
||||
### 3. Maboresho ya Kawaida ya Soketi
|
||||
|
||||
**Faili**: `trustgraph-flow/trustgraph/gateway/endpoint/socket.py`
|
||||
|
||||
```python
|
||||
class SocketEndpoint:
|
||||
async def listener(self, ws, dispatcher, running):
|
||||
"""Enhanced listener with graceful shutdown"""
|
||||
async for msg in ws:
|
||||
if msg.type == WSMsgType.TEXT:
|
||||
await dispatcher.receive(msg)
|
||||
continue
|
||||
elif msg.type == WSMsgType.BINARY:
|
||||
await dispatcher.receive(msg)
|
||||
continue
|
||||
else:
|
||||
# Graceful shutdown on close
|
||||
logger.info("Websocket closing, initiating graceful shutdown")
|
||||
running.stop()
|
||||
|
||||
# Allow time for dispatcher cleanup
|
||||
await asyncio.sleep(1.0)
|
||||
break
|
||||
|
||||
async def handle(self, request):
|
||||
"""Enhanced handler with better cleanup"""
|
||||
# ... existing setup code ...
|
||||
|
||||
try:
|
||||
async with asyncio.TaskGroup() as tg:
|
||||
running = Running()
|
||||
|
||||
dispatcher = await self.dispatcher(
|
||||
ws, running, request.match_info
|
||||
)
|
||||
|
||||
worker_task = tg.create_task(
|
||||
self.worker(ws, dispatcher, running)
|
||||
)
|
||||
|
||||
lsnr_task = tg.create_task(
|
||||
self.listener(ws, dispatcher, running)
|
||||
)
|
||||
|
||||
except ExceptionGroup as e:
|
||||
logger.error("Exception group occurred:", exc_info=True)
|
||||
|
||||
# Attempt graceful dispatcher shutdown
|
||||
try:
|
||||
await asyncio.wait_for(
|
||||
dispatcher.destroy(),
|
||||
timeout=5.0
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
logger.warning("Dispatcher shutdown timed out")
|
||||
except Exception as de:
|
||||
logger.error(f"Error during dispatcher cleanup: {de}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Socket exception: {e}", exc_info=True)
|
||||
|
||||
finally:
|
||||
# Ensure dispatcher cleanup
|
||||
if dispatcher and hasattr(dispatcher, 'destroy'):
|
||||
try:
|
||||
await dispatcher.destroy()
|
||||
except:
|
||||
pass
|
||||
|
||||
# Ensure websocket is closed
|
||||
if ws and not ws.closed:
|
||||
await ws.close()
|
||||
|
||||
return ws
|
||||
```
|
||||
|
||||
## Chaguo za Usanidi
|
||||
|
||||
Ongeza usaidizi wa usanidi ili kurekebisha tabia:
|
||||
|
||||
```python
|
||||
# config.py
|
||||
class GracefulShutdownConfig:
|
||||
# Publisher settings
|
||||
PUBLISHER_DRAIN_TIMEOUT = 5.0 # Seconds to wait for queue drain
|
||||
PUBLISHER_FLUSH_TIMEOUT = 2.0 # Producer flush timeout
|
||||
|
||||
# Subscriber settings
|
||||
SUBSCRIBER_DRAIN_TIMEOUT = 5.0 # Seconds to wait for queue drain
|
||||
BACKPRESSURE_STRATEGY = "block" # Options: "block", "drop_oldest", "drop_new"
|
||||
SUBSCRIBER_MAX_QUEUE_SIZE = 100 # Maximum queue size before backpressure
|
||||
|
||||
# Socket settings
|
||||
SHUTDOWN_GRACE_PERIOD = 1.0 # Seconds to wait for graceful shutdown
|
||||
MAX_CONSECUTIVE_ERRORS = 5 # Maximum errors before forced shutdown
|
||||
|
||||
# Monitoring
|
||||
LOG_QUEUE_STATS = True # Log queue statistics on shutdown
|
||||
METRICS_ENABLED = True # Enable metrics collection
|
||||
```
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
### Majaribio ya Kitengo
|
||||
|
||||
```python
|
||||
async def test_publisher_queue_drain():
|
||||
"""Verify Publisher drains queue on shutdown"""
|
||||
publisher = Publisher(...)
|
||||
|
||||
# Fill queue with messages
|
||||
for i in range(10):
|
||||
await publisher.send(f"id-{i}", {"data": i})
|
||||
|
||||
# Stop publisher
|
||||
await publisher.stop()
|
||||
|
||||
# Verify all messages were sent
|
||||
assert publisher.q.empty()
|
||||
assert mock_producer.send.call_count == 10
|
||||
|
||||
async def test_subscriber_deferred_ack():
|
||||
"""Verify Subscriber only acks on successful delivery"""
|
||||
subscriber = Subscriber(..., backpressure_strategy="drop_new")
|
||||
|
||||
# Fill queue to capacity
|
||||
queue = await subscriber.subscribe("test")
|
||||
for i in range(100):
|
||||
await queue.put({"data": i})
|
||||
|
||||
# Try to add message when full
|
||||
msg = create_mock_message()
|
||||
await subscriber._process_message(msg)
|
||||
|
||||
# Verify negative acknowledgment
|
||||
assert msg.negative_acknowledge.called
|
||||
assert not msg.acknowledge.called
|
||||
```
|
||||
|
||||
### Majaribio ya Uunganishaji
|
||||
|
||||
```python
|
||||
async def test_import_graceful_shutdown():
|
||||
"""Test import path handles shutdown gracefully"""
|
||||
# Setup
|
||||
import_handler = TriplesImport(...)
|
||||
await import_handler.start()
|
||||
|
||||
# Send messages
|
||||
messages = []
|
||||
for i in range(100):
|
||||
msg = {"metadata": {...}, "triples": [...]}
|
||||
await import_handler.receive(msg)
|
||||
messages.append(msg)
|
||||
|
||||
# Shutdown while messages in flight
|
||||
await import_handler.destroy()
|
||||
|
||||
# Verify all messages reached Pulsar
|
||||
received = await pulsar_consumer.receive_all()
|
||||
assert len(received) == 100
|
||||
|
||||
async def test_export_no_message_loss():
|
||||
"""Test export path doesn't lose acknowledged messages"""
|
||||
# Setup Pulsar with test messages
|
||||
for i in range(100):
|
||||
await pulsar_producer.send({"data": i})
|
||||
|
||||
# Start export handler
|
||||
export_handler = TriplesExport(...)
|
||||
export_task = asyncio.create_task(export_handler.run())
|
||||
|
||||
# Receive some messages
|
||||
received = []
|
||||
for _ in range(50):
|
||||
msg = await websocket.receive()
|
||||
received.append(msg)
|
||||
|
||||
# Force shutdown
|
||||
await export_handler.destroy()
|
||||
|
||||
# Continue receiving until websocket closes
|
||||
while not websocket.closed:
|
||||
try:
|
||||
msg = await websocket.receive()
|
||||
received.append(msg)
|
||||
except:
|
||||
break
|
||||
|
||||
# Verify no acknowledged messages were lost
|
||||
assert len(received) >= 50
|
||||
```
|
||||
|
||||
<<<<<<< HEAD
|
||||
## Mpango wa Utendaji
|
||||
=======
|
||||
## Mpango wa Utumizaji
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
### Awamu ya 1: Marekebisho Muhimu (Wiki ya 1)
|
||||
Marekebisho ya muda wa utambuzi wa mshabiki (kuzuia upotevu wa ujumbe)
|
||||
Ongeza utaratibu wa kusafisha folyo ya mchapishaji
|
||||
<<<<<<< HEAD
|
||||
Tuma kwenye mazingira ya majaribio
|
||||
|
||||
### Awamu ya 2: Kusitisha kwa Ufasaha (Wiki ya 2)
|
||||
Lenga kusitisha kwa ufasaha
|
||||
Ongeza mikakati ya shinikizo nyuma
|
||||
=======
|
||||
Toa toleo kwenye mazingira ya majaribio
|
||||
|
||||
### Awamu ya 2: Kusitisha kwa Ufasaha (Wiki ya 2)
|
||||
Lenga utaratibu wa usitishaji
|
||||
Ongeza mikakati ya kupunguza mzigo
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
Vipimo vya utendaji
|
||||
|
||||
### Awamu ya 3: Ufuatiliaji na Urekebishaji (Wiki ya 3)
|
||||
Ongeza vipimo vya kina ya folyo
|
||||
Ongeza arifa za upotevu wa ujumbe
|
||||
Rekebisha maadili ya muda wa kusubiri kulingana na data ya uzalishaji
|
||||
|
||||
## Ufuatiliaji na Arifa
|
||||
|
||||
### Vipimo vya Kufuata
|
||||
`publisher.queue.depth` - Ukubwa wa sasa wa folyo ya mchapishaji
|
||||
`publisher.messages.dropped` - Ujumbe uliopotea wakati wa kusitisha
|
||||
`subscriber.messages.negatively_acknowledged` - Utoaji ambao haujafanikiwa
|
||||
<<<<<<< HEAD
|
||||
`websocket.graceful_shutdowns` - Kusitisha kwa ufasaha ambavyo vimefanikiwa
|
||||
`websocket.forced_shutdowns` - Kusitisha kwa nguvu/kwa muda mrefu
|
||||
=======
|
||||
`websocket.graceful_shutdowns` - Usitishaji wa ufasaha ambao umefanikiwa
|
||||
`websocket.forced_shutdowns` - Usitishaji wa lazima/wa muda
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
### Arifa
|
||||
Kina cha folyo ya mchapishaji > 80% ya uwezo
|
||||
Upotevu wowote wa ujumbe wakati wa kusitisha
|
||||
Kiwango cha kukataa cha mshabiki > 1%
|
||||
<<<<<<< HEAD
|
||||
Muda wa kusitisha umepita
|
||||
=======
|
||||
Muda wa kusitisha ambao umevuka
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Ulinganifu na Mifumo ya Zamani
|
||||
|
||||
Marekebisho yote yanahifadhi ulinganifu na mifumo ya zamani:
|
||||
Tabia ya kawaida haibadiliki bila usanidi
|
||||
Uwekaji wa sasa unaendelea kufanya kazi
|
||||
Kupungua kwa utendaji ikiwa vipengele vipya havipatikani
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
Vipengele vipya vya mashambulizi haviongezwi
|
||||
<<<<<<< HEAD
|
||||
Shinikizo nyuma huzuia mashambulizi ya kutumia kumbukumbu nyingi
|
||||
Mipaka inayoweza kusanidiwa huzuia matumizi mabaya ya rasilimali
|
||||
=======
|
||||
Kupunguza mzigo huzuia mashambulizi ya kutokwa na kumbukumbu
|
||||
Mipaka inayoweza kusanidiwa inazuia unyonyaji wa rasilimali
|
||||
>>>>>>> 82edf2d (New md files from RunPod)
|
||||
|
||||
## Athari za Utendaji
|
||||
|
||||
Uwezekano mdogo wakati wa operesheni ya kawaida
|
||||
Kusitisha kunaweza kuchukua sekunde 5 zaidi (inaweza kusanidiwa)
|
||||
Matumizi ya kumbukumbu yanapingika na mipaka ya ukubwa wa folyo
|
||||
Athari ya CPU ni ndogo (<1% ya ongezeko)
|
||||
626
docs/tech-specs/sw/jsonl-prompt-output.sw.md
Normal file
626
docs/tech-specs/sw/jsonl-prompt-output.sw.md
Normal file
|
|
@ -0,0 +1,626 @@
|
|||
---
|
||||
layout: default
|
||||
title: "MASHIRIKA YA KIUFUNZI YA TOKEZI YA JSONL"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# MASHIRIKA YA KIUFUNZI YA TOKEZI YA JSONL
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Utangulizi
|
||||
|
||||
Mashirika haya yanamaanisha utekelezaji wa umbizo la tokeo la JSONL (JSON Lines) kwa majibu ya ombi katika TrustGraph. JSONL inaruhusu utoaji wa data iliyopangwa kwa ufanisi hata pale ombi linapotokea, ikishughulikia matatizo muhimu ambayo hutokea pale anwani za JSON zinapotokea wakati anwani za LLM zinafikia mipaka ya tokeni.
|
||||
|
||||
Mashirika haya yanaunga mkono matumizi yafuatayo:
|
||||
|
||||
1. **Utoaji wa Matokeo Bila Kukatizwa**: Kuondoa matokeo halali hata pale ombi linapotokea katikati ya jibu.
|
||||
2. **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa kutokana na mipaka ya tokeni.
|
||||
3. **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
4. **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mabadiliko Muhimu:**
|
||||
|
||||
* **`response-type`**: Imewekwa kama `"jsonl"`.
|
||||
* **`schema`**: Imerekebishwa ili kuendana na umbizo la JSONL.
|
||||
* **`parse_jsonl()`**: Funzione mpya kwa utoaji wa JSONL.
|
||||
|
||||
**Hati muhimu:**
|
||||
|
||||
* **`docs/tech-specs/streaming-llm-responses.md`**: (Uhusiano na utoaji wa anwani)
|
||||
* **`jsonlines.org`**: (Taarifa kuhusu JSON Lines)
|
||||
* **`json-schema.org/understanding-json-schema/reference/combining.html#oneof`**: (Maelezo kuhusu "oneOf" katika schema ya JSON)
|
||||
|
||||
**Kumbuka:**
|
||||
|
||||
* Mashirika ya awali yaliyotumia `"json"` yatahitaji marekebisho ili kuendana na umbizo la JSONL.
|
||||
|
||||
**Msisitizo:**
|
||||
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa kutokana na mipaka ya tokeni.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mambo Muhimu:**
|
||||
|
||||
* **Urahisi wa Matumizi**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Upeo**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu).
|
||||
* **Urahisi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Maelezo Mengine:**
|
||||
|
||||
* **`response-type`**: Imewekwa kama `"jsonl"`.
|
||||
* **`schema`**: Imerekebishwa ili kuendana na umbizo la JSONL.
|
||||
* **`parse_jsonl()`**: Funzione mpya kwa utoaji wa JSONL.
|
||||
* **`docs/tech-specs/streaming-llm-responses.md`**: (Uhusiano na utoaji wa anwani)
|
||||
* **`jsonlines.org`**: (Taarifa kuhusu JSON Lines)
|
||||
* **`json-schema.org/understanding-json-schema/reference/combining.html#oneof`**: (Maelezo kuhusu "oneOf" katika schema ya JSON)
|
||||
* **Kumbuka**: Mashirika ya awali yaliyotumia `"json"` yatahitaji marekebisho ili kuendana na umbizo la JSONL.
|
||||
|
||||
**Msisitizo**: Urahisi wa matumizi, upeo, aina mbalimbali, na urahisi.
|
||||
|
||||
**Mambo Muhimu**: Urahisi wa matumizi, upeo, aina mbalimbali, na urahisi.
|
||||
|
||||
**Mchakato wa Utekelezaji**:
|
||||
|
||||
1. **Usanifu**: Utekelezaji wa mashirika na umbizo.
|
||||
2. **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
3. **Usanifu**: Marekebisho na uboreshaji.
|
||||
4. **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
|
||||
**Mjadala wa Hatari**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama wa data katika utoaji.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Hakikisha usanifu na urahisi wa matumizi.
|
||||
* **Uchunguzi**: Hakikisha utendakazi na ufanisi wa mashirika.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji wa mashirika.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji salama na ufanisi wa data.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Ufafanuzi wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Hakikisha urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Hakikisha utendakazi na ufanisi wa mashirika.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji wa mashirika.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Hakikisha urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Hakikisha utendakazi na ufanisi wa mashirika.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji wa mashirika.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Ufafanuzi wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Hakikisha urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Hakikisha utendakazi na ufanisi wa mashirika.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji wa mashirika.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Hakikisha urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Hakikisha utendakazi na ufanisi wa mashirika.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji wa mashirika.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Hakikisha urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Hakikisha urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakazi.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakazi na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakazi na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakaji na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Hakikisha utendakaji na ufanisi wa utoaji wa anwani.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mada Zinazohusiana**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data katika mfumo salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Mjadala wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Hakikisha usalama na ufanisi wa utoaji wa data.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
|
||||
**Umuhimu wa Mada**:
|
||||
|
||||
* **Utoaji wa Data**: Utoaji wa data salama na ufanisi.
|
||||
* **Utoaji wa Anwani**: Utoaji wa anwani na utendakaji.
|
||||
* **Usanifu**: Urahisi wa matumizi na usanifu.
|
||||
* **Uchunguzi**: Uchunguzi wa utendakaji na ufanisi.
|
||||
* **Marekebisho**: Marekebisho na uboreshaji.
|
||||
* **Utoaji**: Utoaji wa mashirika yaliyorekebishwa.
|
||||
* **Utoaji wa Matokeo Bila Kukatizwa**: Utoaji wa matokeo halali hata pale ombi linapotokea.
|
||||
* **Utoaji wa Upeo Mkuu**: Kushughulikia uondoaji wa vitu vingi bila hatari ya kushindwa kabisa.
|
||||
* **Utoaji wa Aina Mbalimbali**: Kusaidia uondoaji wa aina tofauti za vitu (ufafanuzi, uhusiano, vitu, safu) katika ombi moja.
|
||||
* **Urahisi wa Usanidi**: Urahisi wa usanifu na matengenezo ya mashirika.
|
||||
1311
docs/tech-specs/sw/large-document-loading.sw.md
Normal file
1311
docs/tech-specs/sw/large-document-loading.sw.md
Normal file
File diff suppressed because it is too large
Load diff
358
docs/tech-specs/sw/logging-strategy.sw.md
Normal file
358
docs/tech-specs/sw/logging-strategy.sw.md
Normal file
|
|
@ -0,0 +1,358 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Mbinu ya Uandikaji (Logging) ya TrustGraph"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Mbinu ya Uandikaji (Logging) ya TrustGraph
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
TrustGraph hutumia moduli iliyojumuishwa ya Python `logging` kwa operesheni zote za uandikaji, pamoja na usanidi uliokatikati na ujumuishaji wa Loki wa hiari kwa ukusanyaji wa matangazo. Hii hutoa mbinu iliyoendeshwa na kikao, inayobadilika kwa uandikaji katika vipengele vyote vya mfumo.
|
||||
|
||||
## Mpangilio Chaguwa
|
||||
|
||||
### Kigezo cha Uandikaji
|
||||
**Kigezo Chaguwa**: `INFO`
|
||||
**Inayoweza kusanidiwa kupitia**: hoja ya mstari wa amri `--log-level`
|
||||
**Chaguo**: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
|
||||
|
||||
### Vituo vya Matokeo
|
||||
1. **Kikonsoli (stdout)**: Daima inaanzishwa - huhakikisha utangamano na mazingira yaliyofungwa.
|
||||
2. **Loki**: Ukusanyaji wa matangazo uliokatikati wa hiari (inaanzishwa kwa chaguwa, inaweza kuzimwa).
|
||||
|
||||
## Moduli ya Uandikaji Iliyokatikati
|
||||
|
||||
Mpangilio wote wa uandikaji unadhibitiwa na moduli `trustgraph.base.logging`, ambayo hutoa:
|
||||
`add_logging_args(parser)` - Inaongeza hoja za kawaida za CLI za uandikaji.
|
||||
`setup_logging(args)` - Inasanidi uandikaji kutoka kwa hoja zilizochanganuliwa.
|
||||
|
||||
Moduli hii inatumika na vipengele vyote vya upande wa seva:
|
||||
Huduma zinazotegemea AsyncProcessor
|
||||
API Gateway
|
||||
MCP Server
|
||||
|
||||
## Miongozo ya Utendaji
|
||||
|
||||
### 1. Uanzishaji wa Kichunguzi
|
||||
|
||||
Kila moduli inapaswa kuunda kichunguzi chake mwenyewe kwa kutumia moduli ya `__name__`:
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
```
|
||||
|
||||
Jina la kichujio hutumika kiotomatiki kama lebo katika Loki kwa ajili ya kuchujua na kutafuta.
|
||||
|
||||
### 2. Uanzishaji wa Huduma
|
||||
|
||||
Huduma zote za upande wa seva hupokea kiotomatiki usanidi wa uandishi wa matukio kupitia moduli ya kituo:
|
||||
|
||||
```python
|
||||
from trustgraph.base import add_logging_args, setup_logging
|
||||
import argparse
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
# Add standard logging arguments (includes Loki configuration)
|
||||
add_logging_args(parser)
|
||||
|
||||
# Add your service-specific arguments
|
||||
parser.add_argument('--port', type=int, default=8080)
|
||||
|
||||
args = parser.parse_args()
|
||||
args = vars(args)
|
||||
|
||||
# Setup logging early in startup
|
||||
setup_logging(args)
|
||||
|
||||
# Rest of your service initialization
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.info("Service starting...")
|
||||
```
|
||||
|
||||
### 3. Vigezo vya Kamba ya Amri
|
||||
|
||||
Huduma zote zinaunga mkono vigezo hivi vya uandishi wa matukio:
|
||||
|
||||
**Kiwango cha Uandishi:**
|
||||
```bash
|
||||
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
|
||||
```
|
||||
|
||||
**Mipangilio ya Loki:**
|
||||
```bash
|
||||
--loki-enabled # Enable Loki (default)
|
||||
--no-loki-enabled # Disable Loki
|
||||
--loki-url URL # Loki push URL (default: http://loki:3100/loki/api/v1/push)
|
||||
--loki-username USERNAME # Optional authentication
|
||||
--loki-password PASSWORD # Optional authentication
|
||||
```
|
||||
|
||||
**Mfano:**
|
||||
```bash
|
||||
# Default - INFO level, Loki enabled
|
||||
./my-service
|
||||
|
||||
# Debug mode, console only
|
||||
./my-service --log-level DEBUG --no-loki-enabled
|
||||
|
||||
# Custom Loki server with auth
|
||||
./my-service --loki-url http://loki.prod:3100/loki/api/v1/push \
|
||||
--loki-username admin --loki-password secret
|
||||
```
|
||||
|
||||
### 4. Vigezo vya Mazingira
|
||||
|
||||
Usanidi wa Loki unaounga mkono utumizi wa vigezo vya mazingira kama chaguo-mbadala:
|
||||
|
||||
```bash
|
||||
export LOKI_URL=http://loki.prod:3100/loki/api/v1/push
|
||||
export LOKI_USERNAME=admin
|
||||
export LOKI_PASSWORD=secret
|
||||
```
|
||||
|
||||
Vigezo vya mstari wa amana hupendelewa kuliko vigezo vya mazingira.
|
||||
|
||||
### 5. Mbinu Bora za Uandishi wa Matukio
|
||||
|
||||
#### Matumizi ya Viwango vya Matukio
|
||||
**DEBUG**: Habari za kina kwa ajili ya utambuzi wa matatizo (maelezo ya vigezo, kuingia/kuacha kazi)
|
||||
**INFO**: Jumbe za habari za jumla (huduma ilianza, usanidi ulipakuliwa, hatua muhimu za usindikaji)
|
||||
**WARNING**: Jumbe za onyo kwa hali ambazo zinaweza kuwa hatari (vipengele vilivyomalizwa, makosa yanayoweza kutatuliwa)
|
||||
**ERROR**: Jumbe za makosa kwa matatizo makubwa (operesheni zilizoendea, ukiukaji)
|
||||
**CRITICAL**: Jumbe muhimu kwa ajili ya hitilafu za mfumo zinazohitaji uangalifu wa haraka
|
||||
|
||||
#### Muundo wa Jumbe
|
||||
```python
|
||||
# Good - includes context
|
||||
logger.info(f"Processing document: {doc_id}, size: {doc_size} bytes")
|
||||
logger.error(f"Failed to connect to database: {error}", exc_info=True)
|
||||
|
||||
# Avoid - lacks context
|
||||
logger.info("Processing document")
|
||||
logger.error("Connection failed")
|
||||
```
|
||||
|
||||
#### Mawazo ya Utendaji
|
||||
```python
|
||||
# Use lazy formatting for expensive operations
|
||||
logger.debug("Expensive operation result: %s", expensive_function())
|
||||
|
||||
# Check log level for very expensive debug operations
|
||||
if logger.isEnabledFor(logging.DEBUG):
|
||||
debug_data = compute_expensive_debug_info()
|
||||
logger.debug(f"Debug data: {debug_data}")
|
||||
```
|
||||
|
||||
### 6. Uandikaji Maelezo Ulio Pamoja na Muundo Ukitumia Loki
|
||||
|
||||
Kwa data ngumu, tumia uandikaji maelezo uliopangwa na lebo za ziada kwa ajili ya Loki:
|
||||
|
||||
```python
|
||||
logger.info("Request processed", extra={
|
||||
'tags': {
|
||||
'request_id': request_id,
|
||||
'user_id': user_id,
|
||||
'status': 'success'
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
Alama hizi zinakuwa lebo zinazoweza kutafutwa katika Loki, pamoja na lebo za kiotomatiki:
|
||||
`severity` - Kiwango cha matukio (DEBUG, INFO, WARNING, ERROR, CRITICAL)
|
||||
`logger` - Jina la moduli (kutoka `__name__`)
|
||||
|
||||
### 7. Uandikaji wa Matukio ya Aina ya Makosa
|
||||
|
||||
Daima jumuisha maandishi ya mfuatano wa mazingira kwa matukio ya aina ya makosa:
|
||||
|
||||
```python
|
||||
try:
|
||||
process_data()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to process data: {e}", exc_info=True)
|
||||
raise
|
||||
```
|
||||
|
||||
### 8. Mambo ya Kuzingatia Kuhusu Uandikaji wa Matukio (Logging) ya Aina ya Async
|
||||
|
||||
Mfumo wa uandikaji wa matukio hutumia vichuja (handlers) visivyozuia (non-blocking) vilivyopangwa kwa Loki:
|
||||
Matokeo ya konseli ni ya aina moja kwa moja (haraka)
|
||||
Matokeo ya Loki yanapangwa na buffer ya ujumbe 500
|
||||
Mfumo wa nyuma hushughulikia usambazaji wa Loki
|
||||
Hakuna kuzuiliwa kwa msimbo mkuu wa programu
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
import logging
|
||||
|
||||
async def async_operation():
|
||||
logger = logging.getLogger(__name__)
|
||||
# Logging is thread-safe and won't block async operations
|
||||
logger.info(f"Starting async operation in task: {asyncio.current_task().get_name()}")
|
||||
```
|
||||
|
||||
## Uunganisho wa Loki
|
||||
|
||||
### Muundo
|
||||
|
||||
Mfumo wa uandikaji matumizi hutumia `QueueHandler` na `QueueListener` zilizojumuishwa katika Python kwa uunganisho wa Loki usiozuia:
|
||||
|
||||
1. **QueueHandler**: Matukio yanawekwa kwenye folyo ya ujumbe 500 (hayazuilii)
|
||||
2. **Mfululizo wa Nyuma**: QueueListener hutuma matukio kwa Loki kwa njia isiyo ya moja kwa moja
|
||||
3. **Upunguzaji wa Kawaida**: Ikiwa Loki haipatikani, uandikaji matukio kwenye konsi unaendelea
|
||||
|
||||
### Laha za Otomatiki
|
||||
|
||||
Kila tukio linalotumwa kwa Loki linajumuisha:
|
||||
`processor`: Kitambulisho cha kichakata (k.m., `config-svc`, `text-completion`, `embeddings`)
|
||||
`severity`: Kiwango cha matukio (DEBUG, INFO, n.k.)
|
||||
`logger`: Jina la moduli (k.m., `trustgraph.gateway.service`, `trustgraph.agent.react.service`)
|
||||
|
||||
### Laha za Msingi
|
||||
|
||||
Ongeza laha za msingi kupitia parameter ya `extra`:
|
||||
|
||||
```python
|
||||
logger.info("User action", extra={
|
||||
'tags': {
|
||||
'user_id': user_id,
|
||||
'action': 'document_upload',
|
||||
'collection': collection_name
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### Kuuliza Kumbukumbu katika Loki
|
||||
|
||||
```logql
|
||||
# All logs from a specific processor (recommended - matches Prometheus metrics)
|
||||
{processor="config-svc"}
|
||||
{processor="text-completion"}
|
||||
{processor="embeddings"}
|
||||
|
||||
# Error logs from a specific processor
|
||||
{processor="config-svc", severity="ERROR"}
|
||||
|
||||
# Error logs from all processors
|
||||
{severity="ERROR"}
|
||||
|
||||
# Logs from a specific processor with text filter
|
||||
{processor="text-completion"} |= "Processing"
|
||||
|
||||
# All logs from API gateway
|
||||
{processor="api-gateway"}
|
||||
|
||||
# Logs from processors matching pattern
|
||||
{processor=~".*-completion"}
|
||||
|
||||
# Logs with custom tags
|
||||
{processor="api-gateway"} | json | user_id="12345"
|
||||
```
|
||||
|
||||
### Upunguzaji wa Athari (Graceful Degradation)
|
||||
|
||||
Ikiwa Loki haipatikani au `python-logging-loki` haijafunguliwa:
|
||||
Ujumbe wa onyo huonyeshwa kwenye konsoli
|
||||
Uandikaji kwenye konsoli unaendelea kama kawaida
|
||||
Programu inaendelea kuendeshwa
|
||||
Hakuna mfumo wa kujaribu tena muunganisho wa Loki (fainda haraka, punguza athari)
|
||||
|
||||
## Majaribio
|
||||
|
||||
Wakati wa majaribio, fikiria kutumia usanidi tofauti wa uandikaji:
|
||||
|
||||
```python
|
||||
# In test setup
|
||||
import logging
|
||||
|
||||
# Reduce noise during tests
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
|
||||
# Or disable Loki for tests
|
||||
setup_logging({'log_level': 'WARNING', 'loki_enabled': False})
|
||||
```
|
||||
|
||||
## Ufuatiliaji wa Uunganishaji
|
||||
|
||||
### Muundo wa Kawaida
|
||||
Matumizi yote ya rekodi hutumia muundo unaofuata sheria:
|
||||
```
|
||||
2025-01-09 10:30:45,123 - trustgraph.gateway.service - INFO - Request processed
|
||||
```
|
||||
|
||||
Vipengele vya muundo:
|
||||
Wakati (muundo wa ISO na milisekundi)
|
||||
Jina la kisajili (njia ya moduli)
|
||||
Kiwango cha kisajili
|
||||
Ujumbe
|
||||
|
||||
### Maswali ya Loki kwa Ufuatiliaji
|
||||
|
||||
Maswali ya kawaida ya ufuatiliaji:
|
||||
|
||||
```logql
|
||||
# Error rate by processor
|
||||
rate({severity="ERROR"}[5m]) by (processor)
|
||||
|
||||
# Top error-producing processors
|
||||
topk(5, count_over_time({severity="ERROR"}[1h]) by (processor))
|
||||
|
||||
# Recent errors with processor name
|
||||
{severity="ERROR"} | line_format "{{.processor}}: {{.message}}"
|
||||
|
||||
# All agent processors
|
||||
{processor=~".*agent.*"} |= "exception"
|
||||
|
||||
# Specific processor error count
|
||||
count_over_time({processor="config-svc", severity="ERROR"}[1h])
|
||||
```
|
||||
|
||||
## Mambo ya Kuzingatia Kuhusu Usalama
|
||||
|
||||
**Usisahirishe kamwe taarifa nyeti** (manenosi, funguo za API, data ya kibinafsi, alama)
|
||||
**Safisha pembejeo za mtumiaji** kabla ya kuzisajili
|
||||
**Tumia nafasi za kubadilika** kwa sehemu nyeti: `user_id=****1234`
|
||||
**Uthibitishaji wa Loki**: Tumia `--loki-username` na `--loki-password` kwa matumizi salama
|
||||
**Usafiri salama**: Tumia HTTPS kwa URL ya Loki katika mazingira ya uzalishaji: `https://loki.prod:3100/loki/api/v1/push`
|
||||
|
||||
## Utegemezi
|
||||
|
||||
Moduli ya uandikaji matukio ya kituo inahitaji:
|
||||
`python-logging-loki` - Kwa ujumuishaji wa Loki (hiari, utendaji wa chini ikiwa haipo)
|
||||
|
||||
Tayari imejumuishwa katika `trustgraph-base/pyproject.toml` na `requirements.txt`.
|
||||
|
||||
## Njia ya Uhamishaji
|
||||
|
||||
Kwa programu zilizopo:
|
||||
|
||||
1. **Huduma ambazo tayari zinatumia AsyncProcessor**: Hakuna mabadiliko yanayohitajika, usaidizi wa Loki ni moja kwa moja
|
||||
2. **Huduma ambazo hazitumii AsyncProcessor** (api-gateway, mcp-server): Tayari zimefanyiwa mabadiliko
|
||||
3. **Zana za CLI**: Hayajajumuishwa - endelea kutumia print() au uandikaji matukio rahisi
|
||||
|
||||
### Kutoka print() hadi uandikaji matukio:
|
||||
```python
|
||||
# Before
|
||||
print(f"Processing document {doc_id}")
|
||||
|
||||
# After
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.info(f"Processing document {doc_id}")
|
||||
```
|
||||
|
||||
## Muhtasari wa Usanidi
|
||||
|
||||
| Jina la hoja | Chaguizi | Kigezo cha mazingira | Maelezo |
|
||||
|----------|---------|---------------------|-------------|
|
||||
| `--log-level` | `INFO` | - | Kigezo cha uingishaji wa Loki na kituo cha uendeshaji |
|
||||
| `--loki-enabled` | `True` | - | Wezesha uingishaji wa Loki |
|
||||
| `--loki-url` | `http://loki:3100/loki/api/v1/push` | `LOKI_URL` | Kifaa cha utumaji cha Loki |
|
||||
| `--loki-username` | `None` | `LOKI_USERNAME` | Jina la mtumiaji la uthibitishaji wa Loki |
|
||||
| `--loki-password` | `None` | `LOKI_PASSWORD` | Nenosiri la uthibitishaji wa Loki |
|
||||
264
docs/tech-specs/sw/mcp-tool-arguments.sw.md
Normal file
264
docs/tech-specs/sw/mcp-tool-arguments.sw.md
Normal file
|
|
@ -0,0 +1,264 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Majadiliano ya Zana ya MCP"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vipimo vya Majadiliano ya Zana ya MCP
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
**Jina la Kipengele:** Usaidizi wa Majadiliano ya Zana ya MCP
|
||||
**Mwandishi:** Claude Code Assistant
|
||||
**Tarehe:** 2025-08-21
|
||||
**Hali:** Imekamilika
|
||||
|
||||
### Muhtasari
|
||||
|
||||
Kuruhusu wakala wa ReACT kuita zana za MCP (Model Context Protocol) kwa
|
||||
majadiliano yaliyobainishwa vizuri kwa kuongeza usaidizi wa majadiliano katika
|
||||
usanidi wa zana za MCP, kama vile zana za kiolezo za matangazo
|
||||
zinavyofanya sasa.
|
||||
|
||||
### Tatizo
|
||||
|
||||
Kwa sasa, zana za MCP katika mfumo wa wakala wa ReACT haziwezi kuainisha
|
||||
majadiliano yake yanayotarajiwa. Njia ya `McpToolImpl.get_arguments()` hurudisha
|
||||
orodha tupu, na kuwafanya LLMs (Large Language Models) nadhani muundo sahihi
|
||||
wa vigezo kulingana na majina na maelezo ya zana pekee. Hii husababisha:
|
||||
Utendaji usio wa kuaminika wa zana kutokana na nadharia ya vigezo
|
||||
Uzoefu mbaya wa mtumiaji wakati zana zinashindwa kutokana na majadiliano yasiyo sahihi
|
||||
Hakuna uthibitishaji wa vigezo vya zana kabla ya utekelezaji
|
||||
Ukosefu wa maandishi ya vigezo katika matangazo ya wakala
|
||||
|
||||
### Lengo
|
||||
|
||||
[ ] Kuruhusu usanidi wa zana za MCP kuainisha majadiliano yanayotarajiwa (jina, aina, maelezo)
|
||||
[ ] Kusasisha meneja wa wakala ili kuonyesha majadiliano ya zana za MCP kwa LLMs kupitia matangazo
|
||||
[ ] Kuhifadhi utangamano na usanidi wa zana za MCP zilizopo
|
||||
[ ] Kusaidia uthibitishaji wa majadiliano kama vile zana za kiolezo za matangazo
|
||||
|
||||
### Mambo ambayo Hayatarajiwi
|
||||
Kugundua majadiliano kwa njia ya moja kwa moja kutoka kwa seva za MCP (ongezeko la baadaye)
|
||||
Uthibitishaji wa aina ya majadiliano zaidi ya muundo wa msingi
|
||||
Mifumo ngumu ya majadiliano (vitu vilivyojumuishwa, safu)
|
||||
|
||||
## Asili na Mfumo
|
||||
|
||||
### Hali ya Sasa
|
||||
Zana za MCP zimepangwa katika mfumo wa wakala wa ReACT na metadata ndogo:
|
||||
```json
|
||||
{
|
||||
"type": "mcp-tool",
|
||||
"name": "get_bank_balance",
|
||||
"description": "Get bank account balance",
|
||||
"mcp-tool": "get_bank_balance"
|
||||
}
|
||||
```
|
||||
|
||||
Njia `McpToolImpl.get_arguments()` hurudia `[]`, kwa hivyo, mifumo ya lugha kubwa (LLMs) hayapokei mwongozo wowote kuhusu hoja katika maagizo yao.
|
||||
|
||||
### Mapungufu
|
||||
|
||||
1. **Hakuna uainishaji wa hoja**: Vifaa vya MCP haviwezi kufafanua
|
||||
vigezo.
|
||||
|
||||
2. **Utabiri wa vigezo vya LLM**: Wawakilishi lazima watabiri vigezo kutoka kwa majina/maelezo ya zana.
|
||||
|
||||
3. **Habari ya maagizo inayokosekana**: Maagizo ya wakala yanaonyesha maelezo yoyote kuhusu hoja kwa vifaa vya MCP.
|
||||
|
||||
4. **Hakuna uthibitisho**: Vigezo visivyofaa hugunduliwa wakati wa utekelezaji wa zana ya MCP.
|
||||
|
||||
### Vipengele Vinavyohusiana
|
||||
**trustgraph-flow/agent/react/service.py**: Kupakia usanidi wa zana na uundaji wa AgentManager.
|
||||
**trustgraph-flow/agent/react/tools.py**: Utendaji wa McpToolImpl.
|
||||
**trustgraph-flow/agent/react/agent_manager.py**: Uundaji wa maagizo pamoja na hoja za zana.
|
||||
**trustgraph-cli**: Vifaa vya CLI kwa usimamizi wa zana za MCP.
|
||||
**Workbench**: Kiolesura cha nje cha usanidi wa zana za wakala.
|
||||
|
||||
## Mahitaji
|
||||
|
||||
### Mahitaji ya Kifamilia
|
||||
## Mahitaji
|
||||
|
||||
### Mahitaji ya Kazi
|
||||
|
||||
1. **Vigezo vya Usanidi wa Zana ya MCP**: Usanidi wa zana za MCP LAZIMA uunga mkono safu ya hiari ya `arguments` yenye nyanja za jina, aina, na maelezo.
|
||||
2. **Uonyeshaji wa Vigezo**: `McpToolImpl.get_arguments()` INAHITAJIKA kurudisha vigezo vilivyosanidiwa badala ya orodha tupu.
|
||||
3. **Uunganisho wa Maagizo**: Maagizo ya wakala LAZIMA yajumuise maelezo ya vigezo vya zana ya MCP wakati vigezo vinapotajwa.
|
||||
4. **Ulinganifu na Mifumo ya Zamani**: Usanidi wa zana za MCP uliopo bila vigezo LAZIMA uendelee kufanya kazi.
|
||||
5. **Usaidizi wa CLI**: CLI ya `tg-invoke-mcp-tool` iliyopo inasaidia vigezo (tayari imetekelezwa).
|
||||
|
||||
### Mahitaji Yasiyo ya Kazi
|
||||
1. **Ulinganifu na Mifumo ya Zamani**: Hakuna mabadiliko yoyote yanayoweza kusababisha migogoro kwa usanidi wa zana za MCP uliopo.
|
||||
2. **Utendaji**: Hakuna athari kubwa ya utendaji kwenye uzalishaji wa maagizo ya wakala.
|
||||
3. **Ulinganifu**: Usimamizi wa vigezo LAZIMA uangane na mifumo ya zana za kiolezo katika kiolezo cha maagizo.
|
||||
|
||||
### Hadithi za Mtumiaji
|
||||
|
||||
1. Kama **msanidi programu wa wakala**, ninataka kuainisha vigezo vya zana ya MCP katika usanidi ili kwamba mifumo ya LLM iweze kutumia zana na vigezo sahihi.
|
||||
2. Kama **mtumiaji wa benchi ya kazi**, ninataka kusanidi vigezo vya zana ya MCP katika UI ili kwamba wakala watumie zana vizuri.
|
||||
3. Kama **mfumo wa LLM katika wakala wa ReACT**, ninataka kuona maelezo ya vigezo vya zana katika maagizo ili kwamba niweze kutoa vigezo sahihi.
|
||||
|
||||
## Muundo
|
||||
|
||||
### Muundo wa Juu
|
||||
Panua usanidi wa zana ya MCP ili uangane na muundo wa kiolezo cha maagizo kwa:
|
||||
1. Kuongeza safu ya hiari ya `arguments` kwa usanidi wa zana za MCP.
|
||||
2. Kubadilisha `McpToolImpl` ili kukubali na kurudisha vigezo vilivyosanidiwa.
|
||||
3. Kusasisha upakaji wa usanidi ili kushughulikia vigezo vya zana ya MCP.
|
||||
4. Kuhakikisha kwamba maagizo ya wakala yajumuise taarifa ya vigezo vya zana ya MCP.
|
||||
|
||||
### Mfumo wa Usanidi
|
||||
```json
|
||||
{
|
||||
"type": "mcp-tool",
|
||||
"name": "get_bank_balance",
|
||||
"description": "Get bank account balance",
|
||||
"mcp-tool": "get_bank_balance",
|
||||
"arguments": [
|
||||
{
|
||||
"name": "account_id",
|
||||
"type": "string",
|
||||
"description": "Bank account identifier"
|
||||
},
|
||||
{
|
||||
"name": "date",
|
||||
"type": "string",
|
||||
"description": "Date for balance query (optional, format: YYYY-MM-DD)"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Mtiririko wa Data
|
||||
1. **Uipakaji wa Usanidi**: Usanidi wa zana ya MCP pamoja na hoja huipakwa na `on_tools_config()`
|
||||
2. **Uundaji wa Zana**: Hoja huzingatiwa na kupitishwa kwa `McpToolImpl` kupitia kwa konstrukta
|
||||
3. **Uundaji wa Maagizo**: `agent_manager.py` huita `tool.arguments` ili kujumuishwa katika maagizo ya LLM
|
||||
4. **Utendaji wa Zana**: LLM hutoa vigezo ambavyo hupitishwa kwa huduma ya MCP bila kubadilishwa
|
||||
|
||||
### Mabadiliko ya API
|
||||
Hakuna mabadiliko ya API ya nje - hii ni usanidi na usimamizi wa hoja wa ndani tu.
|
||||
|
||||
### Maelezo ya Vipengele
|
||||
|
||||
#### Kipengele 1: service.py (Uipakaji wa Usanidi wa Zana)
|
||||
**Madhumuni**: Kuchanganua usanidi wa zana za MCP na kuunda mifano ya zana
|
||||
**Mabadiliko Yanayohitajika**: Ongeza uchanganuzi wa hoja kwa zana za MCP (kama vile zana za maagizo)
|
||||
**Utendaji Mpya**: Toa safu ya `arguments` kutoka usanidi wa zana ya MCP na uunde vitu vya `Argument`
|
||||
|
||||
#### Kipengele 2: tools.py (McpToolImpl)
|
||||
**Madhumuni**: Kifungashio cha utekelezaji wa zana ya MCP
|
||||
**Mabadiliko Yanayohitajika**: Kukubali hoja katika konstrukta na kurejesha hoja hizo kutoka `get_arguments()`
|
||||
**Utendaji Mpya**: Kuhifadhi na kuonyesha hoja zilizosanidiwa badala ya kurejesha orodha tupu
|
||||
|
||||
#### Kipengele 3: Workbench (Hifadhi Nje)
|
||||
**Madhumuni**: Kiolesura cha usanidi wa zana za wakala
|
||||
**Mabadiliko Yanayohitajika**: Ongeza kiolesura cha maelezo ya hoja kwa zana za MCP
|
||||
**Utendaji Mpya**: Kuruhusu watumiaji kuongeza/kuhariri/kuondoa hoja kwa zana za MCP
|
||||
|
||||
#### Kipengele 4: Zana za CLI
|
||||
**Madhumuni**: Usimamizi wa zana za mstari wa amri
|
||||
**Mabadiliko Yanayohitajika**: Kusaidia maelezo ya hoja katika amri za uundaji/kusasisha zana za MCP
|
||||
**Utendaji Mpya**: Kukubali parameter ya hoja katika amri za usanidi wa zana
|
||||
|
||||
## Mpango wa Utendaji
|
||||
|
||||
### Awamu ya 1: Marekebisho ya Msingi ya Mfumo wa Wakala
|
||||
[ ] Sasisha konstrukta ya `McpToolImpl` ili kukubali parameter ya `arguments`
|
||||
[ ] Badilisha `McpToolImpl.get_arguments()` ili irudishe hoja zilizohifadhiwa
|
||||
[ ] Badilisha usanifuaji wa `service.py` wa zana ya MCP ili kushughulikia hoja
|
||||
[ ] Ongeza vipimo vya kitengo kwa usimamizi wa hoja za zana ya MCP
|
||||
[ ] Hakikisha maagizo ya wakala yanajumuisha hoja za zana ya MCP
|
||||
|
||||
### Awamu ya 2: Usaidizi wa Zana za Nje
|
||||
[ ] Sasisha zana za CLI ili kusaidia vipimo vya hoja za zana ya MCP
|
||||
[ ] Andika maelezo ya muundo wa usanifuaji wa hoja kwa watumiaji
|
||||
[ ] Sasisha kiolesura cha Kazi (Workbench) ili kusaidia usanifuaji wa hoja za zana ya MCP
|
||||
[ ] Ongeza mifano na maandishi
|
||||
|
||||
### Muhtasari wa Marekebisho ya Msimbo
|
||||
| Faili | Aina ya Marekebisho | Maelezo |
|
||||
|------|------------|-------------|
|
||||
| `tools.py` | Imebadilishwa | Sasisha McpToolImpl ili kukubali na kuhifadhi hoja |
|
||||
| `service.py` | Imebadilishwa | Pata hoja kutoka usanifuaji wa zana ya MCP (mstari wa 108-113) |
|
||||
| `test_react_processor.py` | Imebadilishwa | Ongeza vipimo kwa hoja za zana ya MCP |
|
||||
| Zana za CLI | Imebadilishwa | Saidia vipimo vya hoja katika amri |
|
||||
| Workbench | Imebadilishwa | Ongeza kiolesura kwa usanifuaji wa hoja za zana ya MCP |
|
||||
|
||||
## Mkakati wa Upimaji
|
||||
|
||||
### Vipimo vya Kitengo
|
||||
**Uchanganuzi wa Hoja za Zana ya MCP**: Hakikisha `service.py` inachanganua hoja vizuri kutoka usanifuaji wa zana ya MCP
|
||||
**Hoja za McpToolImpl**: Hakikisha `get_arguments()` inarudisha hoja zilizosanifishwa badala ya orodha tupu
|
||||
**Ulinganishi wa Awali**: Hakikisha zana za MCP bila hoja zinaendelea kufanya kazi (kurudisha orodha tupu)
|
||||
**Uundaji wa Maagizo ya Wakala**: Hakikisha maagizo ya wakala yanajumuisha maelezo ya hoja za zana ya MCP
|
||||
|
||||
### Vipimo vya Uunganisho
|
||||
**Uteuzi wa Zana Kamili**: Mfumo wa majaribio unaweza kuendesha zana kwa kutumia hoja za zana za MCP.
|
||||
**Uipakaji wa Mipangilio**: Jaribu mchakato kamili wa kupakua mipangilio kwa kutumia hoja za zana za MCP.
|
||||
**Kati ya Vipengele**: Hakikisha hoja zinapitishwa vizuri kutoka kwenye mipangilio hadi katika uundaji wa zana na uundaji wa maagizo.
|
||||
|
||||
### Majaribio ya Kawaida
|
||||
**Tabia ya Mfumo**: Angalia kwa uangalifu kama mfumo unapokea na kutumia taarifa za hoja katika mzunguko wa ReACT.
|
||||
**Uunganisho wa CLI**: Jaribu kama `tg-invoke-mcp-tool` inafanya kazi na zana za MCP ambazo zimepangwa na hoja.
|
||||
**Uunganisho wa Workbench**: Jaribu kama UI inasaidia upangaji wa hoja za zana za MCP.
|
||||
|
||||
## Uhamisho na Uanzishaji
|
||||
|
||||
### Mkakati wa Uhamisho
|
||||
Hakuna uhamishaji unaohitajika - hii ni kipengele cha ziada:
|
||||
Mipangilio ya zana za MCP iliyopo ambayo haina `arguments` inaendelea kufanya kazi bila mabadiliko.
|
||||
`McpToolImpl.get_arguments()` inarudisha orodha tupu kwa zana za zamani.
|
||||
Mipangilio mipya inaweza kujumuisha `arguments`.
|
||||
|
||||
### Mpango wa Uanzishaji
|
||||
1. **Awamu ya 1**: Anzisha mabadiliko ya msingi ya mfumo kwenye eneo la maendeleo/maandalizi.
|
||||
2. **Awamu ya 2**: Anzisha sasisho za zana za CLI na nyaraka.
|
||||
3. **Awamu ya 3**: Anzisha sasisho za UI za Workbench kwa upangaji wa hoja.
|
||||
4. **Awamu ya 4**: Uanzishaji wa uzalishaji na ufuatiliaji.
|
||||
|
||||
### Mpango wa Kurudisha Nyuma
|
||||
Mabadiliko ya msingi yanaambatana na matoleo ya awali - hakuna haja ya kurudisha nyuma kwa utendaji.
|
||||
Ikiwa matatizo yanajitokeza, zima uchanganuzi wa hoja kwa kurejesha mantiki ya kupakua mipangilio ya zana za MCP.
|
||||
Mabadiliko ya Workbench na CLI yanaweza kurejeshwa kando.
|
||||
|
||||
## Masuala ya Usalama
|
||||
**Hakuna eneo jipya la shambulio**: Hoja zinachanganzwa kutoka kwa vyanzo vya mipangilio iliyopo bila pembejeo mpya.
|
||||
**Uthibitisho wa vigezo**: Hoja huhamishwa kwa zana za MCP bila mabadiliko - uthibitisho unaendelea katika kiwango cha zana za MCP.
|
||||
**Uadilifu wa mipangilio**: Maelezo ya hoja ni sehemu ya upangaji wa zana - mfumo sawa wa usalama unafanya kazi.
|
||||
|
||||
## Athari za Utendaji
|
||||
**Uongezeko mdogo**: Uchanganuzi wa hoja hufanyika tu wakati wa kupakua mipangilio, sio kwa kila ombi.
|
||||
**Kukua kwa saizi ya maagizo**: Maagizo ya mfumo yatajumuisha maelezo ya hoja za zana za MCP, na hivyo kuongeza matumizi ya tokeni.
|
||||
**Matumizi ya kumbukumbu**: Kuongezeka kwa kiasi kidogo kwa kuhifadhi maelezo ya hoja katika vitu vya zana.
|
||||
|
||||
## Nyaraka
|
||||
|
||||
### Nyaraka za Mtumiaji
|
||||
[ ] Sasisha mwongozo wa upangaji wa zana za MCP na mifano ya hoja.
|
||||
[ ] Ongeza maelezo ya hoja kwenye maandishi ya usaidizi wa zana za CLI.
|
||||
[ ] Unda mifano ya muundo wa kawaida wa hoja za zana za MCP.
|
||||
|
||||
### Nyaraka za Mpelelezi
|
||||
[ ] Sasisha nyaraka za darasa la `McpToolImpl`.
|
||||
[ ] Ongeza maelezo ya ndani kwa mantiki ya uchanganuzi wa hoja.
|
||||
[ ] Andika maelezo ya mtiririko wa hoja katika muundo wa mfumo.
|
||||
|
||||
## Maswali Yaliyofunguliwa
|
||||
1. **Uthibitisho wa hoja**: Je, tunapaswa kuthibitisha aina/aina za hoja zaidi ya ukaguzi wa muundo wa msingi?
|
||||
2. **Utafiti wa kiotomatiki**: Uboreshaji wa baadaye wa kuuliza seva za MCP kwa schema za zana kiotomatiki?
|
||||
|
||||
## Mbadala Zilizozingatiwa
|
||||
1. **Utafiti wa kiotomatiki wa schema ya MCP**: Kuuliza seva za MCP kwa schema za hoja za zana wakati wa utendaji - ilikataliwa kwa sababu ya utata na wasiwasi wa kuegemea.
|
||||
2. **Usajili wa kando wa hoja**: Kuhifadhi hoja za zana za MCP katika sehemu tofauti ya upangaji - ilikataliwa kwa utangamano na mbinu ya kiolezo ya maagizo.
|
||||
3. **Uthibitisho wa aina**: Uthibitisho kamili wa schema ya JSON kwa hoja - imeahirishwa kama uboreshaji wa baadaye ili kuendeleza utekelezaji wa awali.
|
||||
|
||||
## Marejeleo
|
||||
[Maelezo ya Itifaki ya MCP](https://github.com/modelcontextprotocol/spec)
|
||||
[Utekelezaji wa Zana ya Kiolezo ya Maagizo](./trustgraph-flow/trustgraph/agent/react/service.py#L114-129)
|
||||
[Utekelezaji wa Sasa wa Zana ya MCP](./trustgraph-flow/trustgraph/agent/react/tools.py#L58-86)
|
||||
|
||||
## Toa Maelezo
|
||||
[Maelezo yoyote ya ziada, michoro, au mifano]
|
||||
562
docs/tech-specs/sw/mcp-tool-bearer-token.sw.md
Normal file
562
docs/tech-specs/sw/mcp-tool-bearer-token.sw.md
Normal file
|
|
@ -0,0 +1,562 @@
|
|||
---
|
||||
layout: default
|
||||
title: "MCP Tool Bearer Token Authentication Specification"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# MCP Tool Bearer Token Authentication Specification
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
> **⚠️ MUHIMU: INATUMIWA TU KWA MASHARTI MOJA**
|
||||
>
|
||||
> Maelezo haya yanaelezea **mfumo msingi wa uthibitishaji wa kiwango cha huduma** kwa zana za MCP. Haikuwa **suluhisho kamili** la uthibitishaji na **haifai** kwa:
|
||||
> - Mazingira ya watumiaji wengi
|
||||
> - Matumizi mengi ya wateja
|
||||
> - Uthibitishaji uliounganishwa
|
||||
> - Usambazaji wa muktadha wa mtumiaji
|
||||
> - Ruhusa kwa kila mtumiaji
|
||||
>
|
||||
> Kipengele hiki hutoa **simu moja ya tuli kwa kila zana ya MCP**, ambayo inashirikiwa na watumiaji wote na vipindi vyote. Ikiwa unahitaji uthibitishaji kwa kila mtumiaji au kwa kila mteja, hii si suluhisho sahihi.
|
||||
|
||||
## Maelezo
|
||||
**Jina la Kipengele**: Usaidizi wa Uthibitishaji wa Simu ya Bearer ya Zana ya MCP
|
||||
**Mwandishi**: Claude Code Assistant
|
||||
**Tarehe**: 2025-11-11
|
||||
**Hali**: Katika Maendeleo
|
||||
|
||||
### Muhtasari
|
||||
|
||||
Ruhusu usanidi wa zana za MCP kubainisha simu za hiari za bearer kwa uthibitishaji na seva za MCP zilizolindwa. Hii inaruhusu TrustGraph kuita zana za MCP zilizohifadhiwa kwenye seva ambazo zinahitaji uthibitishaji, bila kubadilisha wakala au interfaces za kutumia zana.
|
||||
|
||||
**MUHIMU**: Hii ni mfumo msingi wa uthibitishaji ulioundwa kwa hali za uthibitishaji wa huduma hadi huduma kwa mteja mmoja. Haifai kwa:
|
||||
Mazingira ya watumiaji wengi ambapo watumiaji tofauti wanahitaji anwani tofauti
|
||||
Matumizi mengi ya wateja yanayohitaji kutengwa kwa kila mteja
|
||||
Hali za uthibitishaji zilizounganishwa
|
||||
Uthibitishaji au ruhusa za kiwango cha mtumiaji
|
||||
Usimamizi wa anwani ya kipekee au urekebishaji wa simu
|
||||
|
||||
Kipengele hiki hutoa simu ya tuli, ya kimfumo kwa usanidi wa kila zana ya MCP, ambayo inashirikiwa na watumiaji wote na matumizi ya zana hiyo.
|
||||
|
||||
### Tatizo
|
||||
|
||||
Kwa sasa, zana za MCP zinaweza kuunganisha tu kwa seva za MCP zinazopatikana kwa umma. Matumizi mengi ya uzalishaji ya MCP yanahitaji uthibitishaji kupitia simu za bearer kwa usalama. Bila usaidizi wa uthibitishaji:
|
||||
Zana za MCP haziwezi kuunganisha kwa seva za MCP zilizolindwa
|
||||
Watumiaji lazima iweze kufungua seva za MCP kwa umma au kutumia viboreshaji vya kurudi nyuma
|
||||
Hakuna njia iliyoanzishwa ya kupitisha anwani kwa miunganisho ya MCP
|
||||
Mazoea bora ya usalama hayawezi kutekelezwa kwenye mwisho wa MCP
|
||||
|
||||
### Lengo
|
||||
|
||||
[ ] Ruhusu usanidi wa zana za MCP kubainisha parameter ya `auth-token` ya hiari
|
||||
[ ] Sasisha huduma ya zana ya MCP ili itumie simu za bearer wakati inapo na seva za MCP
|
||||
[ ] Sasisha zana za CLI ili kusaidia kuweka/kuonyesha anwani
|
||||
[ ] Dumishe utangamano wa nyuma na usanidi usio na uthibitishaji wa MCP
|
||||
[ ] Andika masuala ya usalama ya uhifadhi wa simu
|
||||
|
||||
### Lengo Lisilofikiwa
|
||||
Urekebishaji wa simu ya kipekee au mtiririko wa OAuth (simu za tuli tu)
|
||||
Usifungishaji wa simu zilizohifadhiwa (usalama wa mfumo wa usanidi uko nje ya wigo)
|
||||
Njia zingine za uthibitishaji (uthibitishaji wa Msingi, ufunguo wa API, n.k.)
|
||||
Uthibitishaji au ukaguzi wa kumalizika wa simu
|
||||
**Uthibitishaji wa kila mtumiaji**: Kipengele hiki hakisaidii anwani maalum za mtumiaji
|
||||
**Kutengwa kwa mteja mwingi**: Kipengele hiki hakutoa usimamizi wa simu kwa kila mteja
|
||||
**Uthibitishaji uliounganishwa**: Kipengele hiki hakujumuisha na watoa utambulisho (SSO, OAuth, SAML, n.k.)
|
||||
**Uthibitishaji unaohusiana na muktadha**: Simu hazipitishwe kulingana na muktadha wa mtumiaji au kikao
|
||||
|
||||
## Asili na Mfumo
|
||||
|
||||
### Hali ya Sasa
|
||||
Usanidi wa zana za MCP huhifadhiwa katika kikundi cha usanidi cha `mcp` na muundo huu:
|
||||
```json
|
||||
{
|
||||
"remote-name": "tool_name",
|
||||
"url": "http://mcp-server:3000/api"
|
||||
}
|
||||
```
|
||||
|
||||
Huduma ya zana ya MCP inaunganisha na seva kwa kutumia `streamablehttp_client(url)` bila vichwa vya uthibitishaji.
|
||||
|
||||
### Marekebisho
|
||||
|
||||
**Marekebisho ya Sasa ya Mfumo:**
|
||||
1. **Hakuna usaidizi wa uthibitishaji**: Haiwezi kuunganisha na seva za MCP zilizolindwa.
|
||||
2. **Ufafanuzi wa usalama**: Seva za MCP lazima ziwe zinapatikana kwa umma au zitumie usalama wa kiwango cha mtandao pekee.
|
||||
3. **Matatizo ya matumizi katika mazingira ya uzalishaji**: Haiwezi kufuata mbinu bora za usalama kwa vidokezo vya API.
|
||||
|
||||
**Marekebisho ya Suluhisho Hili:**
|
||||
1. **Kwa watumiaji mmoja tu**: Ishara moja ya tuli kwa kila zana ya MCP, inayoshirikiwa na watumiaji wote.
|
||||
2. **Hakuna anwani za mtumiaji binafsi**: Haiwezi kuthibitisha kama watumiaji tofauti au kupitisha muktadha wa mtumiaji.
|
||||
3. **Hakuna usaidizi wa watumiaji wengi**: Haiwezi kutenganisha anwani kwa kila mhakiki au shirika.
|
||||
4. **Ishara za tuli tu**: Hakuna usaidizi kwa sasisho, mzunguko, au utunzaji wa kumalizika kwa ishara.
|
||||
5. **Uthibitishaji wa huduma**: Inathibitisha huduma ya TrustGraph, sio watumiaji binafsi.
|
||||
6. **Muktadha wa usalama unaoshirikiwa**: Matumizi yote ya zana ya MCP hutumia anwani sawa.
|
||||
|
||||
### Ufaa wa Matumizi
|
||||
|
||||
**✅ Matumizi Yanayofaa:**
|
||||
Uwekaji wa TrustGraph kwa watumiaji mmoja.
|
||||
Uthibitishaji kutoka kwa huduma hadi huduma (TrustGraph → Seva ya MCP).
|
||||
Mazingira ya maendeleo na majaribio.
|
||||
Zana za ndani za MCP zinazopatikana na mfumo wa TrustGraph.
|
||||
Matukio ambamo watumiaji wote wana kiwango sawa cha ufikiaji wa zana ya MCP.
|
||||
Anwani za huduma za tuli, za muda mrefu.
|
||||
|
||||
**❌ Matumizi Yasiyofaa:**
|
||||
Mifumo ya watumiaji wengi inayohitaji uthibitishaji wa kila mtumiaji.
|
||||
Uwekaji wa SaaS wa watumiaji wengi wenye mahitaji ya kutenganisha kila mhakiki.
|
||||
Matukio ya uthibitishaji uliounganishwa (SSO, OAuth, SAML).
|
||||
Mifumo inayohitaji kupitisha muktadha wa mtumiaji kwa seva za MCP.
|
||||
Mazingira yanayohitaji sasisho za ishara za nguvu au ishara za muda mfupi.
|
||||
Programu ambamo watumiaji tofauti wanahitaji viwango tofauti vya ruhusa.
|
||||
Mahitaji ya utiifu kwa njia za ukaguzi za kiwango cha mtumiaji.
|
||||
|
||||
**Mfano wa Matumizi Yanayofaa:**
|
||||
Uwekaji wa TrustGraph wa shirika moja ambamo wafanyakazi wote hutumia zana sawa ya ndani ya MCP (k.m., utafutaji wa hifadhi ya kampuni). Seva ya MCP inahitaji uthibitishaji ili kuzuia ufikiaji wa nje, lakini watumiaji wote wa ndani wana kiwango sawa cha ufikiaji.
|
||||
|
||||
**Mfano wa Matumizi Yasiyofaa:**
|
||||
Jukwaa la SaaS la TrustGraph la watumiaji wengi ambamo Mhakiki A na Mhakiki B kila mmoja anahitaji kufikia seva zao zilizotenganishwa za MCP na anwani tofauti. Kipengele hiki hakitumii usimamizi wa anwani wa kila mhakiki.
|
||||
|
||||
### Vipengele Vinavyohusiana
|
||||
**trustgraph-flow/trustgraph/agent/mcp_tool/service.py**: Huduma ya utekelezaji wa zana ya MCP.
|
||||
**trustgraph-cli/trustgraph/cli/set_mcp_tool.py**: Zana ya CLI ya kuunda/kusasisha mipangilio ya MCP.
|
||||
**trustgraph-cli/trustgraph/cli/show_mcp_tools.py**: Zana ya CLI ya kuonyesha mipangilio ya MCP.
|
||||
**SDK ya Python ya MCP**: `streamablehttp_client` kutoka `mcp.client.streamable_http`
|
||||
|
||||
## Mahitaji
|
||||
|
||||
### Mahitaji ya Kifaa
|
||||
|
||||
1. **Ishara ya Uthibitishaji ya Mipangilio ya MCP**: Mipangilio ya zana ya MCP INAWEZA kuwa na `auth-token`.
|
||||
2. **Matumizi ya Ishara ya Bearer**: Huduma ya zana ya MCP INAWEZA kutuma `Authorization: Bearer {token}` wakati ishara ya uthibitishaji imewekwa.
|
||||
3. **Usaidizi wa CLI**: `tg-set-mcp-tool` INAWEZA kukubali parameter ya `--auth-token`.
|
||||
4. **Uonyesho wa Ishara**: `tg-show-mcp-tools` INAWEZA kuonyesha wakati ishara ya uthibitishaji imewekwa (imeficha kwa usalama).
|
||||
5. **Ulinganishaji na Mifumo ya Zamani**: Mipangilio ya zana ya MCP iliyopo bila uthibitishaji INAWEZA kuendelea kufanya kazi.
|
||||
|
||||
### Mahitaji Yasiyo ya Kifaa
|
||||
1. **Ulinganishaji na Mifumo ya Zamani**: Hakuna mabadiliko yoyote yanayoweza kusababisha migogoro kwa mipangilio ya zana ya MCP iliyopo.
|
||||
2. **Utendaji**: Hakuna athari kubwa ya utendaji kwenye utekelezaji wa zana ya MCP.
|
||||
3. **Usalama**: Anwani zinaohifadhiwa katika mipangilio (angalia masuala ya usalama).
|
||||
|
||||
### Hadithi za Mtumiaji
|
||||
|
||||
1. Kama **mhandisi wa DevOps**, ningependa kusanidi anwani za bearer kwa zana za MCP ili niweze kulinda vidokezo vya seva za MCP.
|
||||
2. Kama **mtumiaji wa CLI**, ningependa kuweka anwani za uthibitishaji wakati ninaunda zana za MCP ili niweze kuunganisha na seva zilizolindwa.
|
||||
3. Kama **mhasibu wa mfumo**, ningependa kuona zana gani za MCP zilizosanidiwa na uthibitishaji ili niweze kukagua mipangilio ya usalama.
|
||||
|
||||
## Muundo
|
||||
|
||||
### Muundo wa Juu
|
||||
Panua mipangilio ya zana ya MCP na huduma ili kusaidia uthibitishaji wa ishara ya bearer:
|
||||
1. Ongeza `auth-token` kwenye schema ya mipangilio ya zana ya MCP.
|
||||
2. Badilisha huduma ya zana ya MCP ili kusoma ishara ya uthibitishaji na kuipitisha kwa mteja wa HTTP.
|
||||
3. Sasisha zana za CLI ili kusaidia kuweka na kuonyesha anwani za uthibitishaji.
|
||||
4. Andika masuala ya usalama na mbinu bora.
|
||||
|
||||
### Schema ya Mipangilio
|
||||
|
||||
**Schema ya Sasa**:
|
||||
```json
|
||||
{
|
||||
"remote-name": "tool_name",
|
||||
"url": "http://mcp-server:3000/api"
|
||||
}
|
||||
```
|
||||
|
||||
**Mfumo Mpya** (na ishara ya uthibitisho ya hiari):
|
||||
```json
|
||||
{
|
||||
"remote-name": "tool_name",
|
||||
"url": "http://mcp-server:3000/api",
|
||||
"auth-token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
|
||||
}
|
||||
```
|
||||
|
||||
**Maelezo ya Kila Kila Nyanja:**
|
||||
`remote-name` (hiari): Jina linalotumika na seva ya MCP (linalotumika kiotomatiki kama funguo ya usanidi)
|
||||
`url` (lazima): URL ya mwisho wa seva ya MCP
|
||||
`auth-token` (hiari): Alama ya "Bearer" kwa uthibitishaji
|
||||
|
||||
### Mtiririko wa Data
|
||||
|
||||
1. **Hifadhi ya Usanidi:** Mtumiaji huanzisha `tg-set-mcp-tool --id my-tool --tool-url http://server/api --auth-token xyz123`
|
||||
2. **Upakiaji wa Usanidi:** Huduma ya zana ya MCP hupokea sasisho la usanidi kupitia mjumuko wa `on_mcp_config()`
|
||||
3. **Uanzishaji wa Zana:** Wakati zana inaanzishwa:
|
||||
Huduma husoma `auth-token` kutoka usanidi (ikiwa ipo)
|
||||
Huunda kamusi ya vichwa: `{"Authorization": "Bearer {token}"}`
|
||||
Hutuma vichwa kwa `streamablehttp_client(url, headers=headers)`
|
||||
Seva ya MCP huangalia alama na kutoa ombi
|
||||
|
||||
### Mabadiliko ya API
|
||||
Hakuna mabadiliko ya API ya nje - mabadiliko ya muundo wa usanidi tu.
|
||||
|
||||
### Maelezo ya Vipengele
|
||||
|
||||
#### Kipengele 1: service.py (Huduma ya Zana ya MCP)
|
||||
**Faili:** `trustgraph-flow/trustgraph/agent/mcp_tool/service.py`
|
||||
|
||||
**Lengo:** Kuendesha zana za MCP kwenye seva za mbali
|
||||
|
||||
**Mabadiliko Yanayohitajika** (katika njia ya `invoke_tool()`):
|
||||
1. Angalia `auth-token` katika usanidi wa `self.mcp_services[name]`
|
||||
2. Jenga kamusi ya vichwa na kichwa cha "Authorization" ikiwa alama ipo
|
||||
3. Tuma vichwa kwa `streamablehttp_client(url, headers=headers)`
|
||||
|
||||
**Msimbo Sasa** (mistari 42-89):
|
||||
```python
|
||||
async def invoke_tool(self, name, parameters):
|
||||
try:
|
||||
if name not in self.mcp_services:
|
||||
raise RuntimeError(f"MCP service {name} not known")
|
||||
if "url" not in self.mcp_services[name]:
|
||||
raise RuntimeError(f"MCP service {name} URL not defined")
|
||||
|
||||
url = self.mcp_services[name]["url"]
|
||||
|
||||
if "remote-name" in self.mcp_services[name]:
|
||||
remote_name = self.mcp_services[name]["remote-name"]
|
||||
else:
|
||||
remote_name = name
|
||||
|
||||
logger.info(f"Invoking {remote_name} at {url}")
|
||||
|
||||
# Connect to a streamable HTTP server
|
||||
async with streamablehttp_client(url) as (
|
||||
read_stream,
|
||||
write_stream,
|
||||
_,
|
||||
):
|
||||
# ... rest of method
|
||||
```
|
||||
|
||||
**Msimbo Uliorekebishwa**:
|
||||
```python
|
||||
async def invoke_tool(self, name, parameters):
|
||||
try:
|
||||
if name not in self.mcp_services:
|
||||
raise RuntimeError(f"MCP service {name} not known")
|
||||
if "url" not in self.mcp_services[name]:
|
||||
raise RuntimeError(f"MCP service {name} URL not defined")
|
||||
|
||||
url = self.mcp_services[name]["url"]
|
||||
|
||||
if "remote-name" in self.mcp_services[name]:
|
||||
remote_name = self.mcp_services[name]["remote-name"]
|
||||
else:
|
||||
remote_name = name
|
||||
|
||||
# Build headers with optional bearer token
|
||||
headers = {}
|
||||
if "auth-token" in self.mcp_services[name]:
|
||||
token = self.mcp_services[name]["auth-token"]
|
||||
headers["Authorization"] = f"Bearer {token}"
|
||||
|
||||
logger.info(f"Invoking {remote_name} at {url}")
|
||||
|
||||
# Connect to a streamable HTTP server with headers
|
||||
async with streamablehttp_client(url, headers=headers) as (
|
||||
read_stream,
|
||||
write_stream,
|
||||
_,
|
||||
):
|
||||
# ... rest of method (unchanged)
|
||||
```
|
||||
|
||||
#### Sehemu ya 2: set_mcp_tool.py (Zana ya Usanidi wa CLI)
|
||||
**Faili**: `trustgraph-cli/trustgraph/cli/set_mcp_tool.py`
|
||||
|
||||
**Madhumuni**: Kuunda/kusasisha usanidi wa zana ya MCP
|
||||
|
||||
**Mabadiliko Yanayohitajika**:
|
||||
1. Ongeza sajili ya `--auth-token` ya hiari kwa argparse
|
||||
2. Jumuisha `auth-token` katika JSON ya usanidi wakati inatolewa
|
||||
|
||||
**Sajili za Sasa**:
|
||||
`--id` (lazima): Kitambulisho cha zana ya MCP
|
||||
`--remote-name` (ya hiari): Jina la zana ya MCP ya mbali
|
||||
`--tool-url` (lazima): Ncha ya URL ya zana ya MCP
|
||||
`-u, --api-url` (ya hiari): URL ya API ya TrustGraph
|
||||
|
||||
**Sajili Mpya**:
|
||||
`--auth-token` (ya hiari): Alama ya "Bearer" kwa uthibitishaji
|
||||
|
||||
**Ujenzi wa Usanidi Uliobadilishwa**:
|
||||
```python
|
||||
# Build configuration object
|
||||
config = {
|
||||
"url": args.tool_url,
|
||||
}
|
||||
|
||||
if args.remote_name:
|
||||
config["remote-name"] = args.remote_name
|
||||
|
||||
if args.auth_token:
|
||||
config["auth-token"] = args.auth_token
|
||||
|
||||
# Store configuration
|
||||
api.config().put([
|
||||
ConfigValue(type="mcp", key=args.id, value=json.dumps(config))
|
||||
])
|
||||
```
|
||||
|
||||
#### Sehemu ya 3: show_mcp_tools.py (Chombo cha Kuonyesha Kwenye Kamba)
|
||||
**Faili**: `trustgraph-cli/trustgraph/cli/show_mcp_tools.py`
|
||||
|
||||
**Madhumuni**: Kuonyesha usanidi wa chombo cha MCP
|
||||
|
||||
**Mabadiliko Yanayohitajika**:
|
||||
1. Ongeza safu ya "Auth" kwenye meza ya pato
|
||||
2. Onyesha "Ndiyo" au "Hapana" kulingana na uwepo wa ishara ya uthibitishaji (auth-token)
|
||||
3. Usionyeshe thamani halisi ya ishara (usalama)
|
||||
|
||||
**Pato Lililopo Sasa**:
|
||||
```
|
||||
ID Remote Name URL
|
||||
---------- ------------- ------------------------
|
||||
my-tool my-tool http://server:3000/api
|
||||
```
|
||||
|
||||
**Pato Jipya**:
|
||||
```
|
||||
ID Remote Name URL Auth
|
||||
---------- ------------- ------------------------ ------
|
||||
my-tool my-tool http://server:3000/api Yes
|
||||
other-tool other-tool http://other:3000/api No
|
||||
```
|
||||
|
||||
#### Sehemu ya 4: Nyaraka
|
||||
**Faili**: `docs/cli/tg-set-mcp-tool.md`
|
||||
|
||||
**Mabadiliko Yanayohitajika**:
|
||||
1. Andika nyaraka kwa parameter mpya ya `--auth-token`
|
||||
2. Toa mfano wa matumizi na uthibitishaji
|
||||
3. Andika masuala ya usalama
|
||||
|
||||
## Mpango wa Utendaji
|
||||
|
||||
### Awamu ya 1: Unda Vipimo vya Kisaikolojia
|
||||
[x] Andika vipimo vya kisaikolojia vya kina ambavyo vinadokeza mabadiliko yote
|
||||
|
||||
### Awamu ya 2: Sasisha Huduma ya Zana ya MCP
|
||||
[ ] Badilisha `invoke_tool()` katika `service.py` ili kusoma `auth-token` kutoka kwa usanidi
|
||||
[ ] Jenga kamusi ya vichwa na uipitisha kwa `streamablehttp_client`
|
||||
[ ] Jaribu na seva ya MCP iliyo na uthibitishaji
|
||||
|
||||
### Awamu ya 3: Sasisha Zana za CLI
|
||||
[ ] Ongeza hoja ya `--auth-token` kwa `set_mcp_tool.py`
|
||||
[ ] Jumuisha `auth-token` katika usanidi wa JSON
|
||||
[ ] Ongeza safu ya "Auth" kwenye pato la `show_mcp_tools.py`
|
||||
[ ] Jaribu mabadiliko ya zana ya CLI
|
||||
|
||||
### Awamu ya 4: Sasisha Nyaraka
|
||||
[ ] Andika `--auth-token` katika `tg-set-mcp-tool.md`
|
||||
[ ] Ongeza sehemu ya masuala ya usalama
|
||||
[ ] Toa mfano wa matumizi
|
||||
|
||||
### Awamu ya 5: Majaribio
|
||||
[ ] Jaribu zana ya MCP na `auth-token` inaunganisha kwa ufanisi
|
||||
[ ] Jaribu utangamano wa nyuma (zana bila `auth-token` zinaendelea kufanya kazi)
|
||||
[ ] Jaribu zana za CLI hupokea na kuhifadhi `auth-token` kwa usahihi
|
||||
[ ] Jaribu amri ya "Onyesha" inaonyesha hali ya uthibitishaji kwa usahihi
|
||||
|
||||
### Muhtasari wa Mabadiliko ya Msimbo
|
||||
| Faili | Aina ya Mabadiliko | Mistari | Maelezo |
|
||||
|------|------------|-------|-------------|
|
||||
| `service.py` | Imebadilishwa | ~52-66 | Ongeza usomaji wa `auth-token` na ujenzi wa vichwa |
|
||||
| `set_mcp_tool.py` | Imebadilishwa | ~30-60 | Ongeza hoja ya `--auth-token` na uhifadhi wa usanidi |
|
||||
| `show_mcp_tools.py` | Imebadilishwa | ~40-70 | Ongeza safu ya Uthibitishaji kwenye onyesho |
|
||||
| `tg-set-mcp-tool.md` | Imebadilishwa | Mbalimbali | Andika parameter mpya |
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
### Majaribio ya Kitengo
|
||||
**Usomaji wa Tokeni ya Uthibitishaji**: Jaribu `invoke_tool()` husoma `auth-token` kwa usahihi kutoka kwa usanidi
|
||||
**Ujenzi wa Vichwa**: Jaribu vichwa vya Ruhusa vinajengwa kwa usahihi na mbele ya `Bearer`
|
||||
**Utangamano wa Nyuma**: Jaribu zana bila `auth-token` zinafanya kazi bila mabadiliko
|
||||
**Uchanganuzi wa Hoja ya CLI**: Jaribu hoja ya `--auth-token` inachanganzwa kwa usahihi
|
||||
|
||||
### Majaribio ya Uunganisho
|
||||
**Uunganisho Ulio na Uthibitishaji**: Jaribu huduma ya zana ya MCP inaunganisha na seva iliyo na uthibitishaji
|
||||
**Kila kitu**: Jaribu CLI → uhifadhi wa usanidi → utekelezaji wa huduma na `auth token`
|
||||
**Tokeni Haihitajiki**: Jaribu uunganisho na seva isiyo na uthibitishaji unaendelea kufanya kazi
|
||||
|
||||
### Majaribio ya Kawaida
|
||||
**Seva Halisi ya MCP**: Jaribu na seva halisi ya MCP inayohitaji uthibitishaji wa `bearer token`
|
||||
**Mwendo wa CLI**: Jaribu mwendo kamili: weka zana na uthibitishaji → fanya kazi ya zana → thibitisha mafanikio
|
||||
**Kuficha Kuonyesha**: Thibitisha hali ya uthibitishaji inaonyeshwa lakini thamani ya tokeni haijaonyeshwa
|
||||
|
||||
## Uhamishaji na Utoaji
|
||||
|
||||
### Mkakati wa Uhamishaji
|
||||
Hakuna uhamishaji unaohitajika - hii ni utendakazi wa ziada:
|
||||
Usanidi wa zana ya MCP iliyopo bila `auth-token` inaendelea kufanya kazi bila mabadiliko
|
||||
Usanidi mpya unaweza kujumuisha sehemu ya `auth-token`
|
||||
Zana za CLI hupokea lakini hazihitaji parameter ya `--auth-token`
|
||||
|
||||
### Mpango wa Utoaji
|
||||
1. **Awamu ya 1**: Toa mabadiliko ya msingi ya huduma kwa maendeleo/maandalizi
|
||||
2. **Awamu ya 2**: Toa sasisho za zana za CLI
|
||||
3. **Awamu ya 3**: Sasisha nyaraka
|
||||
4. **Awamu ya 4**: Utoaji wa uzalishaji na ufuatiliaji
|
||||
|
||||
### Mpango wa Kurudisha Nyuma
|
||||
Mabadiliko ya msingi yana utangamano wa nyuma - zana zilizopo hazipatiwa madhara
|
||||
Ikiwa matatizo yanajitokeza, utunzaji wa `auth-token` unaweza kuzimwa kwa kuondoa mantiki ya ujenzi wa vichwa
|
||||
Mabadiliko ya zana za CLI ni huru na yanaweza kurejeshwa kando
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
### ⚠️ Kikomo Muhimu: Uthibitishaji wa Mfumo Mmoja Tu
|
||||
|
||||
**Mfumo huu wa uthibitishaji haufai kwa mazingira ya watumiaji wengi au ya wateja wengi.**
|
||||
|
||||
**Anwani zilizoshirikiwa**: Watumiaji wote na matumizi yote huongea tokeni moja kwa kila zana ya MCP
|
||||
**Hakuna muktadha wa mtumiaji**: Seva ya MCP haiwezi kutofautisha kati ya watumiaji tofauti wa TrustGraph
|
||||
**Hakuna kutengwa kwa mteja**: Wateja wote huongea anwani sawa kwa kila zana ya MCP
|
||||
**Kizuia cha ukaguzi**: Seva ya MCP inaonyesha maombi yote kutoka kwa anwani sawa
|
||||
**Nguvu za idhini**: Haiwezi kutekeleza viwango tofauti vya idhini kwa watumiaji tofauti
|
||||
|
||||
**Usitumie kipengele hiki ikiwa:**
|
||||
Umechanganya mashirika mengi
|
||||
Unahitaji uthibitishaji wa mtu binafsi
|
||||
Unahitaji uthibitishaji wa muda
|
||||
Unahitaji uthibitishaji wa mteja mmoja
|
||||
|
||||
|
||||
**Suluhisho mbadala kwa matukio ya watumiaji wengi/watu wengi:**
|
||||
Tengeneza usambazaji wa muktadha wa mtumiaji kupitia vichwa maalum
|
||||
Weka mifumo tofauti ya TrustGraph kwa kila mtoa huduma
|
||||
Tumia utengano wa kiwango cha mtandao (VPCs, huduma za mtandao)
|
||||
Tengeneza safu ya wakala inayoshughulikia uthibitishaji wa kila mtumiaji
|
||||
|
||||
### Uhifadhi wa Tokeni
|
||||
**Hatari**: Tokeni za uthibitishaji zimehifadhiwa kwa maandishi wazi katika mfumo wa usanidi
|
||||
|
||||
**Hatua za kuzuia**:
|
||||
Andika kwamba tokeni zimehifadhiwa bila usimbaji
|
||||
Pendekeza kutumia tokeni za muda mfupi inapowezekana
|
||||
Pendekeza udhibiti sahihi wa ufikiaji kwenye hifadhi ya usanidi
|
||||
Fikiria uboreshaji wa baadaye kwa uhifadhi uliosimbwa wa tokeni
|
||||
|
||||
### Uonyeshaji wa Tokeni
|
||||
**Hatari**: Tokeni zinaweza kuonyeshwa katika arifa au pato la CLI
|
||||
|
||||
**Hatua za kuzuia**:
|
||||
Usiandike maadili ya tokeni (andika tu "uthibitishaji umeanzishwa: ndiyo/hapana")
|
||||
Amri ya CLI ya kuonyesha inaonyesha hali iliyofichwa tu, sio tokeni halisi
|
||||
Usijumuishe tokeni katika ujumbe wa hitilafu
|
||||
|
||||
### Usalama wa Mtandao
|
||||
**Hatari**: Tokeni zinafutwa kupitia miunganisho isiyo salama
|
||||
|
||||
**Hatua za kuzuia**:
|
||||
Andika pendekezo la kutumia URL za HTTPS kwa seva za MCP
|
||||
Onya watumiaji kuhusu hatari ya usambazaji wa maandishi wazi na HTTP
|
||||
|
||||
### Ufikiaji wa Usanidi
|
||||
**Hatari**: Ufikiaji usioidhinishwa kwa mfumo wa usanidi unaoonyesha tokeni
|
||||
|
||||
**Hatua za kuzuia**:
|
||||
Andika umuhimu wa kuhakikisha ufikiaji wa mfumo wa usanidi
|
||||
Pendekeza kanuni ya madaraka madogo kwa ufikiaji wa usanidi
|
||||
Fikiria uandikaji wa matukio kwa mabadiliko ya usanidi (uboresho wa baadaye)
|
||||
|
||||
### Mazingira ya Watumiaji Wengi
|
||||
**Hatari**: Katika matukio ya watumiaji wengi, watumiaji wote wanashiriki anwani sawa za MCP
|
||||
|
||||
**Kuelewa Hatari**:
|
||||
Mtumiaji A na Mtumiaji B hutumia tokeni sawa wakati wa kufikia zana ya MCP
|
||||
Seva ya MCP haiwezi kutofautisha kati ya watumiaji tofauti wa TrustGraph
|
||||
Hakuna njia ya kutekeleza ruhusa au mipaka ya kiwango cha mtumiaji
|
||||
Arifa kwenye seva ya MCP zinaonyesha maombi yote kutoka kwa anwani sawa
|
||||
Ikiwa kikao cha mtumiaji mmoja kimebanwa, mshambuliaji ana ufikiaji sawa wa MCP kama watumiaji wote
|
||||
|
||||
**HII SI hitilafu - ni kikomo cha msingi cha muundo huu.**
|
||||
|
||||
## Athari ya Utendaji
|
||||
**Mzigo mdogo**: Ujenzi wa kichwa unaongeza muda mdogo wa usindikaji
|
||||
**Athari ya mtandao**: Kichwa cha ziada cha HTTP huongeza ~50-200 baiti kwa ombi
|
||||
**Matumizi ya kumbukumbu**: Kuongezeka kwa kiasi kidogo kwa kuhifadhi mnyororo wa tokeni katika usanidi
|
||||
|
||||
## Nyaraka
|
||||
|
||||
### Nyaraka za Mtumiaji
|
||||
[ ] Sasisha `tg-set-mcp-tool.md` na parameter ya `--auth-token`
|
||||
[ ] Ongeza sehemu ya mambo ya usalama
|
||||
[ ] Toa mfano wa matumizi na tokeni ya mfuata
|
||||
[ ] Andika madhumuni ya uhifadhi wa tokeni
|
||||
|
||||
### Nyaraka za Msanidi Programu
|
||||
[ ] Ongeza maelezo ya ndani kwa usimamizi wa tokeni ya uthibitishaji katika `service.py`
|
||||
[ ] Andika mantiki ya ujenzi wa kichwa
|
||||
[ ] Sasisha nyaraka za schema ya usanidi ya zana ya MCP
|
||||
|
||||
## Maswali ya Funguo
|
||||
1. **Usimbaji wa tokeni**: Je, tunapaswa kutekeleza uhifadhi uliosimbwa wa tokeni katika mfumo wa usanidi?
|
||||
2. **Urekebishaji wa tokeni**: Usaidizi wa siku zijazo kwa mtiririko wa OAuth wa urekebishaji au mzunguko wa tokeni?
|
||||
3. **Njia mbadala za uthibitishaji**: Je, tunapaswa kusaidia uthibitishaji wa Msingi, ufunguo wa API, au mbinu zingine?
|
||||
|
||||
## Mbadala Zilizozingatiwa
|
||||
|
||||
1. **Vigezo vya mazingira kwa tokeni**: Hifadhi tokeni katika vigezo vya mazingira badala ya usanidi
|
||||
**Ilikataliwa**: Inachanganya usakinishaji na usimamizi wa usanidi
|
||||
|
||||
2. **Hifadhi ya siri tofauti**: Tumia mfumo maalum wa usimamizi wa siri
|
||||
**Imeahirishwa**: Nje ya upeo wa utekelezaji wa awali, fikiria uboreshaji wa siku zijazo
|
||||
|
||||
3. **Njia nyingi za uthibitishaji**: Kusaidia Msingi, ufunguo wa API, OAuth, n.k.
|
||||
**Ilikataliwa**: Tokeni za mfuata hufunika matumizi mengi, endeleza utekelezaji wa awali rahisi
|
||||
|
||||
4. **Uhifadhi uliosimbwa wa tokeni**: Simba tokeni katika mfumo wa usanidi
|
||||
**Imeahirishwa**: Usalama wa mfumo wa usanidi ni suala pana, chelewesha hadi kazi ya baadaye
|
||||
|
||||
5. **Tokeni za kila utendaji**: Ruhusu tokeni kupitishwa wakati wa utendaji
|
||||
**Ilikataliwa**: Inakiuka utengano wa masuala, wakala haupaswi kushughulikia anwani
|
||||
|
||||
## Marejeleo
|
||||
[Maelezo ya Protokali ya MCP](https://github.com/modelcontextprotocol/spec)
|
||||
[Uthibitishaji wa Mfuata wa HTTP (RFC 6750)](https://tools.ietf.org/html/rfc6750)
|
||||
[Huduma ya Zana ya MCP ya Sasa](../trustgraph-flow/trustgraph/agent/mcp_tool/service.py)
|
||||
[Maelezo ya Majadilisho ya Zana ya MCP](./mcp-tool-arguments.md)
|
||||
|
||||
## Toa Maelezo
|
||||
|
||||
### Matumizi ya Kifaa
|
||||
|
||||
**Kuanzisha zana ya MCP pamoja na uthibitishaji:**
|
||||
```bash
|
||||
tg-set-mcp-tool \
|
||||
--id secure-tool \
|
||||
--tool-url https://secure-server.example.com/mcp \
|
||||
--auth-token eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
|
||||
```
|
||||
|
||||
**Kuonyesha zana za MCP**:
|
||||
```bash
|
||||
tg-show-mcp-tools
|
||||
|
||||
ID Remote Name URL Auth
|
||||
----------- ----------- ------------------------------------ ------
|
||||
secure-tool secure-tool https://secure-server.example.com/mcp Yes
|
||||
public-tool public-tool http://localhost:3000/mcp No
|
||||
```
|
||||
|
||||
### Mfano wa Usanidi
|
||||
|
||||
**Imehifadhiwa katika mfumo wa usanidi:**
|
||||
```json
|
||||
{
|
||||
"type": "mcp",
|
||||
"key": "secure-tool",
|
||||
"value": "{\"url\": \"https://secure-server.example.com/mcp\", \"auth-token\": \"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...\"}"
|
||||
}
|
||||
```
|
||||
|
||||
### Mbinu Bora za Usalama
|
||||
|
||||
1. **Tumia HTTPS**: Daima tumia anwani za mtandao (URLs) za HTTPS kwa seva za MCP zenye uthibitishaji.
|
||||
2. **Alama za muda mfupi**: Tumia alama (tokens) zenye muda wa kumalizika unapowezekana.
|
||||
3. **Haki ndogo zaidi**: Toa alama ruhusa ndogo zaidi zinazohitajika.
|
||||
4. **Kidhibiti cha ufikiaji**: Punguza ufikiaji kwenye mfumo wa usanidi.
|
||||
5. **Kubadilisha alama**: Badilisha alama mara kwa mara.
|
||||
6. **Uandikaji wa matukio**: Fuatilia mabadiliko ya usanidi ili kutambua matukio ya usalama.
|
||||
266
docs/tech-specs/sw/minio-to-s3-migration.sw.md
Normal file
266
docs/tech-specs/sw/minio-to-s3-migration.sw.md
Normal file
|
|
@ -0,0 +1,266 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Kisaikolojia: Usaidizi wa Hifadhi Data inayolingana na S3"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vipimo vya Kisaikolojia: Usaidizi wa Hifadhi Data inayolingana na S3
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Huduma ya Librarian hutumia hifadhi data ya vitu inayolingana na S3 kwa kuhifadhi faili za hati. Haya yanatoa maelezo ya utekelezaji unaoleta uwezo wa kusaidia mfumo wowote wa hifadhi inayolingana na S3, ikiwa ni pamoja na MinIO, Ceph RADOS Gateway (RGW), AWS S3, Cloudflare R2, DigitalOcean Spaces, na wengine.
|
||||
|
||||
## Muundo
|
||||
|
||||
### Vipengele vya Uhifadhi
|
||||
**Hifadhi ya Vitu:** Hifadhi data ya vitu inayolingana na S3 kupitia `minio` maktaba ya mteja ya Python
|
||||
**Hifadhi ya MetaData:** Cassandra (hufanya kazi ya kuhifadhi uhusiano wa object_id na metadata ya hati)
|
||||
**Kipengele Kilichohusika:** Huduma ya Librarian pekee
|
||||
**Mfumo wa Uhifadhi:** Uhifadhi mchanganyiko na metadata katika Cassandra, na yaliyomo katika hifadhi inayolingana na S3
|
||||
|
||||
### Utendaji
|
||||
**Maktaba:** `minio` mteja wa Python (inaunga mkono API yoyote inayolingana na S3)
|
||||
**Mahali:** `trustgraph-flow/trustgraph/librarian/blob_store.py`
|
||||
**Tendo:**
|
||||
`add()` - Hifadhi faili kwa kitambulisho cha kipekee (object_id)
|
||||
`get()` - Rudisha faili kwa kitambulisho cha kipekee (object_id)
|
||||
`remove()` - Futa faili kwa kitambulisho cha kipekee (object_id)
|
||||
`ensure_bucket()` - Unda kiasi (bucket) ikiwa haipo
|
||||
**Kiasi (Bucket):** `library`
|
||||
**Njia ya Faili:** `doc/{object_id}`
|
||||
**Aina Zinazoidhinishwa (MIME Types):** `text/plain`, `application/pdf`
|
||||
|
||||
### Faili Muhimu
|
||||
1. `trustgraph-flow/trustgraph/librarian/blob_store.py` - Utendaji wa BlobStore
|
||||
2. `trustgraph-flow/trustgraph/librarian/librarian.py` - Uanzishaji wa BlobStore
|
||||
3. `trustgraph-flow/trustgraph/librarian/service.py` - Usanidi wa huduma
|
||||
4. `trustgraph-flow/pyproject.toml` - Utendakazi (pakiti ya `minio`)
|
||||
5. `docs/apis/api-librarian.md` - Nyaraka za API
|
||||
|
||||
## Mifumo ya Uhifadhi Inayoungwa Mkono
|
||||
|
||||
Utendaji huu unafanya kazi na mfumo wowote wa hifadhi data ya vitu inayolingana na S3:
|
||||
|
||||
### Imethibitishwa/Inaungwa Mkono
|
||||
**Ceph RADOS Gateway (RGW)** - Mfumo wa hifadhi usambazwa na API ya S3 (usanidi chaguu)
|
||||
**MinIO** - Hifadhi data ya vitu nyepesi inayoweza kuendeshwa na wewe mwenyewe
|
||||
**Garage** - Hifadhi data ya vitu nyepesi inayopaswa kusambazwa kijiografia inayolingana na S3
|
||||
|
||||
### Inapaswa Kufanya kazi (Inayolingana na S3)
|
||||
**AWS S3** - Hifadhi data ya vitu ya Amazon kwenye wingu
|
||||
**Cloudflare R2** - Hifadhi data inayolingana na S3 ya Cloudflare
|
||||
**DigitalOcean Spaces** - Hifadhi data ya vitu ya DigitalOcean
|
||||
**Wasabi** - Hifadhi data ya vitu kwenye wingu inayolingana na S3
|
||||
**Backblaze B2** - Hifadhi data ya vitu inayolingana na S3 kwa ajili ya chelezo
|
||||
Huduma yoyote nyingine inayotekeleza API ya S3 REST
|
||||
|
||||
## Usanidi
|
||||
|
||||
### Majadiliano ya CLI
|
||||
|
||||
```bash
|
||||
librarian \
|
||||
--object-store-endpoint <hostname:port> \
|
||||
--object-store-access-key <access_key> \
|
||||
--object-store-secret-key <secret_key> \
|
||||
[--object-store-use-ssl] \
|
||||
[--object-store-region <region>]
|
||||
```
|
||||
|
||||
**Kumbuka:** Usijumuishie `http://` au `https://` katika mwisho. Tumia `--object-store-use-ssl` ili kuwezesha HTTPS.
|
||||
|
||||
### Vigezo vya Mazingira (Mbadala)
|
||||
|
||||
```bash
|
||||
OBJECT_STORE_ENDPOINT=<hostname:port>
|
||||
OBJECT_STORE_ACCESS_KEY=<access_key>
|
||||
OBJECT_STORE_SECRET_KEY=<secret_key>
|
||||
OBJECT_STORE_USE_SSL=true|false # Optional, default: false
|
||||
OBJECT_STORE_REGION=<region> # Optional
|
||||
```
|
||||
|
||||
### Mifano
|
||||
|
||||
**Lango la RADOS la Ceph (linalolingana na chaguo-msingi):**
|
||||
```bash
|
||||
--object-store-endpoint ceph-rgw:7480 \
|
||||
--object-store-access-key object-user \
|
||||
--object-store-secret-key object-password
|
||||
```
|
||||
|
||||
**MinIO:**
|
||||
```bash
|
||||
--object-store-endpoint minio:9000 \
|
||||
--object-store-access-key minioadmin \
|
||||
--object-store-secret-key minioadmin
|
||||
```
|
||||
|
||||
**Gara (Inayoambatana na S3):**
|
||||
```bash
|
||||
--object-store-endpoint garage:3900 \
|
||||
--object-store-access-key GK000000000000000000000001 \
|
||||
--object-store-secret-key b171f00be9be4c32c734f4c05fe64c527a8ab5eb823b376cfa8c2531f70fc427
|
||||
```
|
||||
|
||||
**AWS S3 na SSL:**
|
||||
```bash
|
||||
--object-store-endpoint s3.amazonaws.com \
|
||||
--object-store-access-key AKIAIOSFODNN7EXAMPLE \
|
||||
--object-store-secret-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
|
||||
--object-store-use-ssl \
|
||||
--object-store-region us-east-1
|
||||
```
|
||||
|
||||
## Uthibitisho
|
||||
|
||||
Vifaa vyote vinavyolingana na S3 vinahitaji uthibitisho wa AWS Signature Version 4 (au v2):
|
||||
|
||||
**Ufunguo wa Ufikiaji** - Kitambulisho cha umma (kama jina la mtumiaji)
|
||||
**Ufunguo Siri** - Ufunguo wa siri wa usaini (kama nenosiri)
|
||||
|
||||
Mteja wa Python wa MinIO hushughulikia hesabu yote ya usaini kiotomatiki.
|
||||
|
||||
### Kuunda Anwani
|
||||
|
||||
**Kwa MinIO:**
|
||||
```bash
|
||||
# Use default credentials or create user via MinIO Console
|
||||
minioadmin / minioadmin
|
||||
```
|
||||
|
||||
**Kwa Ceph RGW:**
|
||||
```bash
|
||||
radosgw-admin user create --uid="trustgraph" --display-name="TrustGraph Service"
|
||||
# Returns access_key and secret_key
|
||||
```
|
||||
|
||||
**Kwa AWS S3:**
|
||||
Unda mtumiaji wa IAM na ruhusa za S3
|
||||
Toa ufunguo wa ufikiaji katika Konsoli ya AWS
|
||||
|
||||
## Chaguo la Klibu: Mteja wa MinIO Python
|
||||
|
||||
**Sababu:**
|
||||
Nyepesi (~500KB dhidi ya ~50MB ya boto3)
|
||||
Inafanana na S3 - inafanya kazi na mwisho wowote wa API ya S3
|
||||
API rahisi kuliko boto3 kwa operesheni za msingi
|
||||
Tayari inatumika, hakuna uhamishaji unaohitajika
|
||||
Imethibitishwa kwa MinIO na mifumo mingine ya S3
|
||||
|
||||
## Utendaji wa BlobStore
|
||||
|
||||
**Mahali:** `trustgraph-flow/trustgraph/librarian/blob_store.py`
|
||||
|
||||
```python
|
||||
from minio import Minio
|
||||
import io
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class BlobStore:
|
||||
"""
|
||||
S3-compatible blob storage for document content.
|
||||
Supports MinIO, Ceph RGW, AWS S3, and other S3-compatible backends.
|
||||
"""
|
||||
|
||||
def __init__(self, endpoint, access_key, secret_key, bucket_name,
|
||||
use_ssl=False, region=None):
|
||||
"""
|
||||
Initialize S3-compatible blob storage.
|
||||
|
||||
Args:
|
||||
endpoint: S3 endpoint (e.g., "minio:9000", "ceph-rgw:7480")
|
||||
access_key: S3 access key
|
||||
secret_key: S3 secret key
|
||||
bucket_name: Bucket name for storage
|
||||
use_ssl: Use HTTPS instead of HTTP (default: False)
|
||||
region: S3 region (optional, e.g., "us-east-1")
|
||||
"""
|
||||
self.client = Minio(
|
||||
endpoint=endpoint,
|
||||
access_key=access_key,
|
||||
secret_key=secret_key,
|
||||
secure=use_ssl,
|
||||
region=region,
|
||||
)
|
||||
|
||||
self.bucket_name = bucket_name
|
||||
|
||||
protocol = "https" if use_ssl else "http"
|
||||
logger.info(f"Connected to S3-compatible storage at {protocol}://{endpoint}")
|
||||
|
||||
self.ensure_bucket()
|
||||
|
||||
def ensure_bucket(self):
|
||||
"""Create bucket if it doesn't exist"""
|
||||
found = self.client.bucket_exists(bucket_name=self.bucket_name)
|
||||
if not found:
|
||||
self.client.make_bucket(bucket_name=self.bucket_name)
|
||||
logger.info(f"Created bucket {self.bucket_name}")
|
||||
else:
|
||||
logger.debug(f"Bucket {self.bucket_name} already exists")
|
||||
|
||||
async def add(self, object_id, blob, kind):
|
||||
"""Store blob in S3-compatible storage"""
|
||||
self.client.put_object(
|
||||
bucket_name=self.bucket_name,
|
||||
object_name=f"doc/{object_id}",
|
||||
length=len(blob),
|
||||
data=io.BytesIO(blob),
|
||||
content_type=kind,
|
||||
)
|
||||
logger.debug("Add blob complete")
|
||||
|
||||
async def remove(self, object_id):
|
||||
"""Delete blob from S3-compatible storage"""
|
||||
self.client.remove_object(
|
||||
bucket_name=self.bucket_name,
|
||||
object_name=f"doc/{object_id}",
|
||||
)
|
||||
logger.debug("Remove blob complete")
|
||||
|
||||
async def get(self, object_id):
|
||||
"""Retrieve blob from S3-compatible storage"""
|
||||
resp = self.client.get_object(
|
||||
bucket_name=self.bucket_name,
|
||||
object_name=f"doc/{object_id}",
|
||||
)
|
||||
return resp.read()
|
||||
```
|
||||
|
||||
## Faida Muhimu
|
||||
|
||||
1. **Hakuna Utegemezi wa Mtoa Huduma** - Inafanya kazi na hifadhi yoyote inayolingana na S3.
|
||||
2. **Nyepesi** - Mteja wa MinIO ni takriban 500KB.
|
||||
3. **Uwekaji Rahisi** - Tu mwisho na anwani za kuingia.
|
||||
4. **Hakuna Uhamishaji wa Data** - Badala ya moja kwa moja kati ya mifumo ya nyuma.
|
||||
5. **Imethibitishwa katika Vita** - Mteja wa MinIO unafanya kazi na matoleo yote makubwa ya S3.
|
||||
|
||||
## Hali ya Utendaji
|
||||
|
||||
Msimbo wote umeongezwa ili kutumia majina ya vigezo vya S3.
|
||||
|
||||
✅ `blob_store.py` - Imeongezwa ili kukubali `endpoint`, `access_key`, `secret_key`
|
||||
✅ `librarian.py` - Majina ya vigezo yameongezwa.
|
||||
✅ `service.py` - Majadiliano ya CLI na usanidi yameongezwa.
|
||||
✅ Nyaraka zimeongezwa.
|
||||
|
||||
## Maboresho ya Baadaye
|
||||
|
||||
1. **Usaidizi wa SSL/TLS** - Ongeza bendera `--s3-use-ssl` kwa HTTPS.
|
||||
2. **Mantiki ya Kujaribu Upya** - Tekeleza kuchelewesha kwa eksponensia kwa kushindwa kwa muda mfupi.
|
||||
3. **Anwani za Muda** - Zunda anwani za muda za kupakia/kupakua.
|
||||
4. **Usaidizi wa Mikoa Mbalimbali** - Nakili data katika mikoa mbalimbali.
|
||||
5. **Uunganisho wa CDN** - Toa data kupitia CDN.
|
||||
6. **Daraja za Hifadhi** - Tumia daraja za hifadhi za S3 kwa uboreshaji wa gharama.
|
||||
7. **Sera za Maisha** - Hifadhi/ufute data kiotomatiki.
|
||||
8. **Toleo** - Hifadhi matoleo mengi ya data.
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Mteja wa MinIO wa Python: https://min.io/docs/minio/linux/developers/python/API.html
|
||||
API ya S3 ya Ceph RGW: https://docs.ceph.com/en/latest/radosgw/s3/
|
||||
Marejeleo ya API ya S3: https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html
|
||||
205
docs/tech-specs/sw/more-config-cli.sw.md
Normal file
205
docs/tech-specs/sw/more-config-cli.sw.md
Normal file
|
|
@ -0,0 +1,205 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Zusisi vya Amri ya Utekelezaji (CLI)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vipimo vya Zusisi vya Amri ya Utekelezaji (CLI)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Haya yanaeleza uwezo wa ziada wa usanidi wa amri ya utekelezaji (CLI) kwa TrustGraph, ambayo inaruhusu watumiaji kusimamia vipengele vya usanidi kila kimoja kupitia amri mahususi za CLI. Uunganisho huu unaunga mkono matumizi manne makuu:
|
||||
|
||||
1. **Orodha ya Vipengele vya Usanidi**: Kuonyesha funguo za usanidi za aina fulani.
|
||||
2. **Pata Kipengele cha Usanidi**: Kuchukua maadili maalum ya usanidi.
|
||||
3. **Weka Kipengele cha Usanidi**: Kuweka au kusasisha vipengele vya usanidi.
|
||||
4. **Futa Kipengele cha Usanidi**: Kuondoa vipengele vya usanidi.
|
||||
|
||||
## Malengo
|
||||
|
||||
- **Usimamizi Mahususi**: Kuwezesha usimamizi wa vipengele vya usanidi kila kimoja badala ya shughuli za jumla.
|
||||
- **Orodha Kulingana na Aina**: Kuruhusu watumiaji kuchunguza vipengele vya usanidi kwa aina.
|
||||
- **Shughuli za Kipengele Kimoja**: Kutoa amri za kupata/kuweka/kufuta vipengele vya usanidi kila kimoja.
|
||||
- **Uunganisho wa API**: Kutumia API ya Usanidi iliyopo kwa shughuli zote.
|
||||
- **Mifumo ya Utekelezaji ya Umoja**: Kufuata misingi na mifumo iliyopo ya CLI ya TrustGraph.
|
||||
- **Usimamizi wa Makosa**: Kutoa ujumbe wazi wa makosa kwa shughuli zisizo halali.
|
||||
- **Pato la JSON**: Kusaidia pato lililopangwa kwa matumizi ya programu.
|
||||
- **Wasifu**: Kuweka msaada kamili na mifano ya matumizi.
|
||||
|
||||
## Licha
|
||||
|
||||
Hivi sasa, TrustGraph hutoa usimamizi wa usanidi kupitia API ya Usanidi na amri moja ya CLI, `tg-show-config`, ambayo inaonyesha usanidi wote. Ingawa hii inafaa kwa kuona usanidi, haitoi uwezo wa usimamizi wa kina.
|
||||
|
||||
Hali ya sasa ya kikwazo ni pamoja na:
|
||||
- Hakuna njia ya kuorodisha vipengele vya usanidi kwa aina kutoka kwa CLI.
|
||||
- Hakuna amri ya CLI ya kuchukua maadili maalum ya usanidi.
|
||||
- Hakuna amri ya CLI ya kuweka vipengele vya usanidi kila kimoja.
|
||||
- Hakuna amri ya CLI ya kufuta vipengele maalum vya usanidi.
|
||||
|
||||
Haya yanaashiria pengo hili kwa kuongeza amri nne mpya za CLI ambazo hutoa usimamizi wa kina wa usanidi. Kwa kufichua shughuli za API ya Usanidi kupitia amri za CLI, TrustGraph inaweza:
|
||||
- Kuwezesha usimamizi wa usanidi kwa njia ya programu.
|
||||
- Kuruhusu kuchunguza muundo wa usanidi kwa aina.
|
||||
- Kusaidia sasisho maalumu ya usanidi.
|
||||
- Kutoa udhibiti wa kina wa usanidi.
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Utekelezaji wa ziada wa usanidi wa CLI unahitaji vipengele hivi vya kiufundi:
|
||||
|
||||
1. **tg-list-config-items**
|
||||
- Huorodhesha funguo za usanidi kwa aina iliyoelezwa.
|
||||
- Huita njia ya API `Config.list(type)`.
|
||||
- Huonyesha orodha ya funguo za usanidi.
|
||||
|
||||
Moduli: `trustgraph.cli.list_config_items`
|
||||
|
||||
2. **tg-get-config-item**
|
||||
- Inachukua kipengele(s) maalum(s) cha usanidi.
|
||||
- Huita njia ya API `Config.get(keys)`.
|
||||
- Inaonyesha maadili ya usanidi katika umbizo la JSON.
|
||||
|
||||
Moduli: `trustgraph.cli.get_config_item`
|
||||
|
||||
3. **tg-put-config-item**
|
||||
- Inaweka au kusasisha kipengele cha usanidi.
|
||||
- Huita njia ya API `Config.put(values)`.
|
||||
- Inakubali vigezo vya aina, funguo, na thamani.
|
||||
|
||||
Moduli: `trustgraph.cli.put_config_item`
|
||||
|
||||
4. **tg-delete-config-item**
|
||||
- Inaondoa kipengele cha usanidi.
|
||||
- Huita njia ya API `Config.delete(keys)`.
|
||||
- Inakubali vigezo vya aina na funguo.
|
||||
|
||||
Moduli: `trustgraph.cli.delete_config_item`
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
#### ConfigKey na ConfigValue
|
||||
|
||||
Amri hizi hutumia miundo ya data iliyopo kutoka `trustgraph.api.types`:
|
||||
|
||||
```python
|
||||
@dataclasses.dataclass
|
||||
class ConfigKey:
|
||||
type : string
|
||||
key : string
|
||||
|
||||
@dataclasses.dataclass
|
||||
class ConfigValue:
|
||||
type : string
|
||||
key : string
|
||||
value : string
|
||||
```
|
||||
|
||||
Mbinu hii inaruhusu:
|
||||
- Usimamizi wa data thabiti katika CLI na API.
|
||||
- Shughuli za usanidi salama za aina.
|
||||
- Umbizo la pembejeo/patou lililopangwa.
|
||||
- Uunganisho na API ya Usanidi iliyopo.
|
||||
|
||||
### Maelezo ya Amri ya CLI
|
||||
|
||||
#### tg-list-config-items
|
||||
```bash
|
||||
tg-list-config-items --type <config-type> [--format text|json] [--api-url <url>]
|
||||
```
|
||||
- **Lengo**: Kuorodisha funguo zote za usanidi kwa aina iliyopewa.
|
||||
- **Wito wa API**: `Config.list(type)`
|
||||
- **Pato**:
|
||||
- `text` (cha kawaida): Funguo za usanidi zilizotenganishwa na mistari mipya.
|
||||
- `json`: Safu ya JSON ya funguo za usanidi.
|
||||
|
||||
#### tg-get-config-item
|
||||
```bash
|
||||
tg-get-config-item --type <type> --key <key> [--format text|json] [--api-url <url>]
|
||||
```
|
||||
- **Lengo**: Kuchukua kipengele maalum cha usanidi.
|
||||
- **Wito wa API**: `Config.get(keys)`
|
||||
- **Pato**:
|
||||
- `text` (cha kawaida): Thamani ya usanidi.
|
||||
- `json`: Thamani ya usanidi katika umbizo la JSON.
|
||||
|
||||
#### tg-put-config-item
|
||||
```bash
|
||||
tg-put-config-item --type <type> --key <key> --value <value>
|
||||
```
|
||||
- **Lengo**: Kuweka thamani ya usanidi.
|
||||
- **Wito wa API**: `Config.put(values)`
|
||||
- **Ingizo**:
|
||||
- `type`: Aina ya usanidi.
|
||||
- `key`: Funguo ya usanidi.
|
||||
- `value`: Thamani ya usanidi.
|
||||
|
||||
#### tg-delete-config-item
|
||||
```bash
|
||||
tg-delete-config-item --type <type> --key <key>
|
||||
```
|
||||
- **Lengo**: Kuondoa kipengele cha usanidi.
|
||||
- **Wito wa API**: `Config.delete(keys)`
|
||||
- **Ingizo**:
|
||||
- `type`: Aina ya usanidi.
|
||||
- `key`: Funguo ya usanidi.
|
||||
|
||||
## Mifano ya Matumizi
|
||||
|
||||
#### Kuorodisha vipengele vya usanidi
|
||||
```bash
|
||||
# Kuorodisha funguo za prompt (umbizo la maandishi)
|
||||
tg-list-config-items --type prompt
|
||||
template-1
|
||||
template-2
|
||||
system-prompt
|
||||
|
||||
# Kuorodisha funguo za prompt (umbizo la JSON)
|
||||
tg-list-config-items --type prompt --format json
|
||||
["template-1", "template-2", "system-prompt"]
|
||||
```
|
||||
|
||||
#### Kupata kipengele cha usanidi
|
||||
```bash
|
||||
# Kupata thamani ya prompt (umbizo la maandishi)
|
||||
tg-get-config-item --type prompt --key template-1
|
||||
You are a helpful assistant. Please respond to: {query}
|
||||
|
||||
# Kupata thamani ya prompt (umbizo la JSON)
|
||||
tg-get-config-item --type prompt --key template-1 --format json
|
||||
"You are a helpful assistant. Please respond to: {query}"
|
||||
```
|
||||
|
||||
#### Kuweka kipengele cha usanidi
|
||||
```bash
|
||||
# Kuweka kutoka kwa mstari wa amri
|
||||
tg-put-config-item --type prompt --key new-template --value "Custom prompt: {input}"
|
||||
|
||||
# Kuweka kutoka kwa faili kupitia bomba
|
||||
cat ./prompt-template.txt | tg-put-config-item --type prompt --key complex-template --stdin
|
||||
|
||||
# Kuweka kutoka kwa faili kupitia urejeshaji
|
||||
tg-put-config-item --type prompt --key complex-template --stdin < ./prompt-template.txt
|
||||
|
||||
# Kuweka kutoka kwa pato la amri
|
||||
echo "Generated template: {query}" | tg-put-config-item --type prompt --key auto-template --stdin
|
||||
```
|
||||
|
||||
#### Kufuta kipengele cha usanidi
|
||||
```bash
|
||||
tg-delete-config-item --type prompt --key old-template
|
||||
```
|
||||
|
||||
## Masuala Yaliyoshindikana
|
||||
|
||||
- Je, amri zinapaswa kusaidia shughuli za kundi (funguo nyingi) pamoja na vipengele vya kimoja?
|
||||
- Umbizo gani la pato unapaswa kutumika kwa uthibitisho wa mafanikio?
|
||||
- Jinsi aina za usanidi zinavyoweza kuelekezwa/kuchunguzwa na watumiaji?
|
||||
|
||||
## Marejeleo
|
||||
|
||||
- API ya Usanidi iliyopo: `trustgraph/api/config.py`
|
||||
- Mfumo wa CLI: `trustgraph-cli/trustgraph/cli/show_config.py`
|
||||
- Data: `trustgraph/api/types.py`
|
||||
780
docs/tech-specs/sw/multi-tenant-support.sw.md
Normal file
780
docs/tech-specs/sw/multi-tenant-support.sw.md
Normal file
|
|
@ -0,0 +1,780 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Maelezo ya Kiufundi: Usaidizi wa Matumizi Mbalimbali (Multi-Tenant)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Maelezo ya Kiufundi: Usaidizi wa Matumizi Mbalimbali (Multi-Tenant)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Wezesha matumizi mbalimbali kwa kurekebisha kutofautiana kwa majina ya vigezo ambavyo huuzuia utengenezaji wa folyo (queue) na kwa kuongeza utaratibu wa kuweka vigezo kwa Cassandra.
|
||||
|
||||
## Mfumo wa Uendeshaji
|
||||
|
||||
### Utatuzi wa Folyo Kulingana na Mchakato
|
||||
|
||||
Mfumo wa TrustGraph hutumia **mfumo wa usanifu unaozingatia mchakato** (flow-based architecture) kwa utatuzi wa folyo, ambao kwa asili unao na uwezo wa kuunga mkono matumizi mbalimbali:
|
||||
|
||||
**Maelezo ya Mchakato** (Flow Definitions) huhifadhiwa katika Cassandra na yanaeleza majina ya folyo kupitia maelezo ya kiungo (interface).
|
||||
**Majina ya folyo hutumia vipengele** (templates) na vigezo vya `{id}` ambavyo hubadilishwa na kitambulisho cha mfano wa mchakato.
|
||||
**Huduma zinatatua folyo kwa njia ya moja kwa moja** kwa kutafuta mipangilio ya mchakato wakati wa ombi.
|
||||
**Kila mtumiaji anaweza kuwa na mchakato wake wa kipekee** na majina tofauti ya folyo, ambayo hutoa upekee.
|
||||
|
||||
Kielelezo cha maelezo ya kiungo ya mchakato:
|
||||
```json
|
||||
{
|
||||
"interfaces": {
|
||||
"triples-store": "persistent://tg/flow/triples-store:{id}",
|
||||
"graph-embeddings-store": "persistent://tg/flow/graph-embeddings-store:{id}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Wakati mwendeshaji A anaanza mtiririko `tenant-a-prod` na mwendeshaji B anaanza mtiririko `tenant-b-prod`, wanapata moja kwa moja folyo zisizo na muunganisho:
|
||||
`persistent://tg/flow/triples-store:tenant-a-prod`
|
||||
`persistent://tg/flow/triples-store:tenant-b-prod`
|
||||
|
||||
**Huduma zilizoundwa vizuri kwa utumiaji wa wateja wengi:**
|
||||
✅ **Usimamizi wa Maarifa (msingi)** - Inatatua moja kwa moja folyo kutoka usanidi wa mtiririko uliopitishwa katika ombi.
|
||||
|
||||
**Huduma zinazohitaji marekebisho:**
|
||||
🔴 **Huduma ya Usanidi** - Utangamano wa jina la parameter unazuia utengenezaji wa folyo
|
||||
🔴 **Huduma ya Maktaba** - Mada ya uhifadhi iliyopangwa (iliyozungumzwa hapa chini)
|
||||
🔴 **Huduma Zote** - Haiwezi kubadilisha nafasi ya Cassandra
|
||||
|
||||
## Taarifa ya Tatizo
|
||||
|
||||
### Tatizo #1: Utangamano wa Jina la Parameter katika AsyncProcessor
|
||||
**CLI inafafanua:** `--config-queue` (jina lisilo wazi)
|
||||
**Argparse inabadilisha kuwa:** `config_queue` (katika kamusi ya params)
|
||||
**Msimu unatafuta:** `config_push_queue`
|
||||
**Matokeo:** Parameter inakwama, inarudisha `persistent://tg/config/config`
|
||||
**Athari:** Huathiri huduma zote 32+ zinazorithi kutoka kwa AsyncProcessor
|
||||
**Inazuia:** Uwekaji wa wateja wengi hauna uwezo wa kutumia folyo maalum za mteja
|
||||
**Suluhisho:** Badilisha parameter ya CLI kuwa `--config-push-queue` kwa uwazi (mabadiliko ya kuvunja yanakubalika kwani kipengele hicho kwa sasa kimevunjika)
|
||||
|
||||
### Tatizo #2: Utangamano wa Jina la Parameter katika Huduma ya Usanidi
|
||||
**CLI inafafanua:** `--push-queue` (jina lisilo wazi)
|
||||
**Argparse inabadilisha kuwa:** `push_queue` (katika kamusi ya params)
|
||||
**Msimu unatafuta:** `config_push_queue`
|
||||
**Matokeo:** Parameter inakwama
|
||||
**Athari:** Huduma ya usanidi haiwezi kutumia folyo ya kushinikiza maalum
|
||||
**Suluhisho:** Badilisha parameter ya CLI kuwa `--config-push-queue` kwa utangamano na uwazi (mabadiliko ya kuvunja yanakubalika)
|
||||
|
||||
### Tatizo #3: Nafasi ya Cassandra Iliyopangwa
|
||||
**Sasa:** Nafasi ya Cassandra imepangwa kama `"config"`, `"knowledge"`, `"librarian"` katika huduma mbalimbali
|
||||
**Matokeo:** Haiwezi kubadilisha nafasi ya utumiaji wa wateja wengi
|
||||
**Athari:** Huduma za usanidi, msingi, na maktaba
|
||||
**Inazuia:** Wateja wengi hawawezi kutumia nafasi tofauti za Cassandra
|
||||
|
||||
### Tatizo #4: Usanidi wa Usimamizi wa Mkusanyiko ✅ IMEKAMILIKA
|
||||
**Hapo awali:** Mkusanyiko ulihifadhiwa katika nafasi ya maktaba ya Cassandra kupitia meza tofauti ya mkusanyiko
|
||||
**Hapo awali:** Maktaba ilitumia mada 4 zilizopangwa za usimamizi wa uhifadhi ili kuratibu uundaji/ufutaji wa mkusanyiko:
|
||||
`vector_storage_management_topic`
|
||||
`object_storage_management_topic`
|
||||
`triples_storage_management_topic`
|
||||
`storage_management_response_topic`
|
||||
**Matatizo (Yaliyoshughulikiwa):**
|
||||
Mada iliyopangwa haikuweza kubadilishwa kwa utumiaji wa wateja wengi
|
||||
Uratibu wa async tata kati ya maktaba na huduma 4+ za uhifadhi
|
||||
Meza tofauti ya Cassandra na miundombinu ya usimamizi
|
||||
Folyo za ombi/jibu zisizo na uhai kwa operesheni muhimu
|
||||
**Suluhisho Liliofanywa:** Kuhamishia mkusanyiko kwenye uhifadhi wa huduma ya usanidi, tumia kushinikiza usanidi kwa usambazaji
|
||||
**Hali:** Uhifadhi wote wa nyuma umehamishwa kwenye mtindo wa `CollectionConfigHandler`
|
||||
|
||||
## Suluhisho
|
||||
|
||||
Hii inahusu Matatizo #1, #2, #3, na #4.
|
||||
|
||||
### Sehemu ya 1: Marekebisho ya Utangamano wa Jina la Parameter
|
||||
|
||||
#### Mabadiliko ya 1: Darasa la Msingi la AsyncProcessor - Badilisha Jina la Parameter ya CLI
|
||||
**Faili:** `trustgraph-base/trustgraph/base/async_processor.py`
|
||||
**Laini:** 260-264
|
||||
|
||||
**Sasa:**
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--config-queue',
|
||||
default=default_config_queue,
|
||||
help=f'Config push queue {default_config_queue}',
|
||||
)
|
||||
```
|
||||
|
||||
**Imara:**
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--config-push-queue',
|
||||
default=default_config_queue,
|
||||
help=f'Config push queue (default: {default_config_queue})',
|
||||
)
|
||||
```
|
||||
|
||||
**Sababu:**
|
||||
Majina wazi na ya dhahiri zaidi
|
||||
Inafanana na jina la ndani la `config_push_queue`
|
||||
Mabadiliko yanayoweza kusababisha migogoro yanafaa kwani kipengele hivi sasa hakifanyi kazi
|
||||
Hakuna mabadiliko ya msimbo yanayohitajika katika params.get() - tayari inatafuta jina sahihi
|
||||
|
||||
#### Mabadiliko ya 2: Huduma ya Usanidi - Badilisha Jina la Paramu ya CLI
|
||||
**Faili:** `trustgraph-flow/trustgraph/config/service/service.py`
|
||||
**Laini:** 276-279
|
||||
|
||||
**Sasa:**
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--push-queue',
|
||||
default=default_config_push_queue,
|
||||
help=f'Config push queue (default: {default_config_push_queue})'
|
||||
)
|
||||
```
|
||||
|
||||
**Imara:**
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--config-push-queue',
|
||||
default=default_config_push_queue,
|
||||
help=f'Config push queue (default: {default_config_push_queue})'
|
||||
)
|
||||
```
|
||||
|
||||
**Sababu:**
|
||||
Majina wazi zaidi - "config-push-queue" yanaeleza zaidi kuliko "push-queue" tu.
|
||||
Inalingana na jina la ndani `config_push_queue`.
|
||||
Inafanana na parameter ya `--config-push-queue` ya AsyncProcessor.
|
||||
Mabadiliko yanayoweza kusababisha migogoro yanafaa kwani kipengele hicho kwa sasa hakifanyi kazi.
|
||||
Hakuna mabadiliko ya msimbo yanayohitajika katika params.get() - tayari inatafuta jina sahihi.
|
||||
|
||||
### Sehemu ya 2: Ongeza Uwekaji wa Vigezo vya Keyspace ya Cassandra
|
||||
|
||||
#### Mabadiliko ya 3: Ongeza Parameter ya Keyspace kwenye Moduli ya cassandra_config
|
||||
**Faili:** `trustgraph-base/trustgraph/base/cassandra_config.py`
|
||||
|
||||
**Ongeza hoja ya CLI** (katika kazi ya `add_cassandra_args()`):
|
||||
```python
|
||||
parser.add_argument(
|
||||
'--cassandra-keyspace',
|
||||
default=None,
|
||||
help='Cassandra keyspace (default: service-specific)'
|
||||
)
|
||||
```
|
||||
|
||||
**Ongeza utumiaji wa vigezo vya mazingira** (katika kitendwa `resolve_cassandra_config()`):
|
||||
```python
|
||||
keyspace = params.get(
|
||||
"cassandra_keyspace",
|
||||
os.environ.get("CASSANDRA_KEYSPACE")
|
||||
)
|
||||
```
|
||||
|
||||
**Sasisha thamani ya kurudiwa** ya `resolve_cassandra_config()`:
|
||||
Hivi sasa inarudisha: `(hosts, username, password)`
|
||||
Badilisha ili irudishe: `(hosts, username, password, keyspace)`
|
||||
|
||||
**Sababu:**
|
||||
Inafanana na mtindo uliopo wa usanidi wa Cassandra
|
||||
Inapatikana kwa huduma zote kupitia `add_cassandra_args()`
|
||||
Inasaidia usanidi wa CLI na wa vigezo vya mazingira
|
||||
|
||||
#### Mabadiliko ya 4: Huduma ya Usanidi - Tumia Vipengele Vilivyobadilishwa
|
||||
**Faili:** `trustgraph-flow/trustgraph/config/service/service.py`
|
||||
|
||||
**Laini ya 30** - Ondoa jina la keyspace lililokodishwa:
|
||||
```python
|
||||
# DELETE THIS LINE:
|
||||
keyspace = "config"
|
||||
```
|
||||
|
||||
**Mishale 69-73** - Sasisha utatuzi wa usanidi wa Cassandra:
|
||||
|
||||
**Sasa:**
|
||||
```python
|
||||
cassandra_host, cassandra_username, cassandra_password = \
|
||||
resolve_cassandra_config(params)
|
||||
```
|
||||
|
||||
**Imara:**
|
||||
```python
|
||||
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
||||
resolve_cassandra_config(params, default_keyspace="config")
|
||||
```
|
||||
|
||||
**Sababu:**
|
||||
Inahifadhi utangamano na "config" kama chaguo-msingi.
|
||||
Inaruhusu kubadilishwa kupitia `--cassandra-keyspace` au `CASSANDRA_KEYSPACE`.
|
||||
|
||||
#### Mabadiliko ya 5: Huduma za Msingi/Huduma ya Maarifa - Tumia Vipengele vya Kubadilika vya Nafasi ya Kuhifadhia
|
||||
**Faili:** `trustgraph-flow/trustgraph/cores/service.py`
|
||||
|
||||
**Laini ya 37** - Ondoa jina la nafasi ya kuhifadhia lililopangwa:
|
||||
```python
|
||||
# DELETE THIS LINE:
|
||||
keyspace = "knowledge"
|
||||
```
|
||||
|
||||
**Sasisha utatuzi wa usanidi wa Cassandra** (katika eneo sawa kama huduma ya usanidi):
|
||||
```python
|
||||
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
||||
resolve_cassandra_config(params, default_keyspace="knowledge")
|
||||
```
|
||||
|
||||
#### Mabadiliko ya 6: Huduma ya Maktaba - Tumia Vipengele Vilivyobadilika
|
||||
**Faili:** `trustgraph-flow/trustgraph/librarian/service.py`
|
||||
|
||||
**Laini ya 51** - Ondoa jina la eneo la kuhifadhi data lililopangwa:
|
||||
```python
|
||||
# DELETE THIS LINE:
|
||||
keyspace = "librarian"
|
||||
```
|
||||
|
||||
**Sasisha utatuzi wa usanidi wa Cassandra** (katika eneo sawa na huduma ya usanidi):
|
||||
```python
|
||||
cassandra_host, cassandra_username, cassandra_password, keyspace = \
|
||||
resolve_cassandra_config(params, default_keyspace="librarian")
|
||||
```
|
||||
|
||||
### Sehemu ya 3: Hamisha Usimamizi wa Mkusanyiko hadi Huduma ya Usanidi
|
||||
|
||||
#### Muhtasari
|
||||
Hamisha mkusanyiko kutoka kwa nafasi ya kuhifadhi "Cassandra librarian" hadi uhifadhi wa huduma ya usanidi. Hii huondoa mada za usimamizi wa uhifadhi zilizopangwa mapema na hurahisisha usanifu kwa kutumia mfumo uliopo wa usambazaji wa usanidi.
|
||||
|
||||
#### Usanifu wa Sasa
|
||||
```
|
||||
API Request → Gateway → Librarian Service
|
||||
↓
|
||||
CollectionManager
|
||||
↓
|
||||
Cassandra Collections Table (librarian keyspace)
|
||||
↓
|
||||
Broadcast to 4 Storage Management Topics (hardcoded)
|
||||
↓
|
||||
Wait for 4+ Storage Service Responses
|
||||
↓
|
||||
Response to Gateway
|
||||
```
|
||||
|
||||
#### Usanifu Mpya
|
||||
```
|
||||
API Request → Gateway → Librarian Service
|
||||
↓
|
||||
CollectionManager
|
||||
↓
|
||||
Config Service API (put/delete/getvalues)
|
||||
↓
|
||||
Cassandra Config Table (class='collections', key='user:collection')
|
||||
↓
|
||||
Config Push (to all subscribers on config-push-queue)
|
||||
↓
|
||||
All Storage Services receive config update independently
|
||||
```
|
||||
|
||||
#### Mabadiliko ya 7: Msimamizi wa Mkusanyiko - Tumia API ya Huduma ya Usanidi
|
||||
**Faili:** `trustgraph-flow/trustgraph/librarian/collection_manager.py`
|
||||
|
||||
**Ondoa:**
|
||||
Matumizi ya `LibraryTableStore` (Mistari 33, 40-41)
|
||||
Uanzishaji wa watengenezaji wa usimamizi wa hifadhi (Mistari 86-140)
|
||||
Njia ya `on_storage_response` (Mistari 400-430)
|
||||
Ufuatiliaji wa `pending_deletions` (Mistari 57, 90-96, na matumizi katika maeneo mengine)
|
||||
|
||||
**Ongeza:**
|
||||
Mteja wa huduma ya usanidi kwa simu za API (mfumo wa ombi/jibu)
|
||||
|
||||
**Uwekaji wa Mteja wa Usanidi:**
|
||||
```python
|
||||
# In __init__, add config request/response producers/consumers
|
||||
from trustgraph.schema.services.config import ConfigRequest, ConfigResponse
|
||||
|
||||
# Producer for config requests
|
||||
self.config_request_producer = Producer(
|
||||
client=pulsar_client,
|
||||
topic=config_request_queue,
|
||||
schema=ConfigRequest,
|
||||
)
|
||||
|
||||
# Consumer for config responses (with correlation ID)
|
||||
self.config_response_consumer = Consumer(
|
||||
taskgroup=taskgroup,
|
||||
client=pulsar_client,
|
||||
flow=None,
|
||||
topic=config_response_queue,
|
||||
subscriber=f"{id}-config",
|
||||
schema=ConfigResponse,
|
||||
handler=self.on_config_response,
|
||||
)
|
||||
|
||||
# Tracking for pending config requests
|
||||
self.pending_config_requests = {} # request_id -> asyncio.Event
|
||||
```
|
||||
|
||||
**Badilisha `list_collections` (Mistari 145-180):**
|
||||
```python
|
||||
async def list_collections(self, user, tag_filter=None, limit=None):
|
||||
"""List collections from config service"""
|
||||
# Send getvalues request to config service
|
||||
request = ConfigRequest(
|
||||
id=str(uuid.uuid4()),
|
||||
operation='getvalues',
|
||||
type='collections',
|
||||
)
|
||||
|
||||
# Send request and wait for response
|
||||
response = await self.send_config_request(request)
|
||||
|
||||
# Parse collections from response
|
||||
collections = []
|
||||
for key, value_json in response.values.items():
|
||||
if ":" in key:
|
||||
coll_user, collection = key.split(":", 1)
|
||||
if coll_user == user:
|
||||
metadata = json.loads(value_json)
|
||||
collections.append(CollectionMetadata(**metadata))
|
||||
|
||||
# Apply tag filtering in-memory (as before)
|
||||
if tag_filter:
|
||||
collections = [c for c in collections if any(tag in c.tags for tag in tag_filter)]
|
||||
|
||||
# Apply limit
|
||||
if limit:
|
||||
collections = collections[:limit]
|
||||
|
||||
return collections
|
||||
|
||||
async def send_config_request(self, request):
|
||||
"""Send config request and wait for response"""
|
||||
event = asyncio.Event()
|
||||
self.pending_config_requests[request.id] = event
|
||||
|
||||
await self.config_request_producer.send(request)
|
||||
await event.wait()
|
||||
|
||||
return self.pending_config_requests.pop(request.id + "_response")
|
||||
|
||||
async def on_config_response(self, message, consumer, flow):
|
||||
"""Handle config response"""
|
||||
response = message.value()
|
||||
if response.id in self.pending_config_requests:
|
||||
self.pending_config_requests[response.id + "_response"] = response
|
||||
self.pending_config_requests[response.id].set()
|
||||
```
|
||||
|
||||
**Badilisha `update_collection` (Mistari 182-312):**
|
||||
```python
|
||||
async def update_collection(self, user, collection, name, description, tags):
|
||||
"""Update collection via config service"""
|
||||
# Create metadata
|
||||
metadata = CollectionMetadata(
|
||||
user=user,
|
||||
collection=collection,
|
||||
name=name,
|
||||
description=description,
|
||||
tags=tags,
|
||||
)
|
||||
|
||||
# Send put request to config service
|
||||
request = ConfigRequest(
|
||||
id=str(uuid.uuid4()),
|
||||
operation='put',
|
||||
type='collections',
|
||||
key=f'{user}:{collection}',
|
||||
value=json.dumps(metadata.to_dict()),
|
||||
)
|
||||
|
||||
response = await self.send_config_request(request)
|
||||
|
||||
if response.error:
|
||||
raise RuntimeError(f"Config update failed: {response.error.message}")
|
||||
|
||||
# Config service will trigger config push automatically
|
||||
# Storage services will receive update and create collections
|
||||
```
|
||||
|
||||
**Badilisha `delete_collection` (Mistari 314-398):**
|
||||
```python
|
||||
async def delete_collection(self, user, collection):
|
||||
"""Delete collection via config service"""
|
||||
# Send delete request to config service
|
||||
request = ConfigRequest(
|
||||
id=str(uuid.uuid4()),
|
||||
operation='delete',
|
||||
type='collections',
|
||||
key=f'{user}:{collection}',
|
||||
)
|
||||
|
||||
response = await self.send_config_request(request)
|
||||
|
||||
if response.error:
|
||||
raise RuntimeError(f"Config delete failed: {response.error.message}")
|
||||
|
||||
# Config service will trigger config push automatically
|
||||
# Storage services will receive update and delete collections
|
||||
```
|
||||
|
||||
**Muundo wa Meta Data ya Mkusanyiko:**
|
||||
Hifadhiwa katika jedwali la usanidi kama: `class='collections', key='user:collection'`
|
||||
Thamani ni CollectionMetadata iliyopigwa muundo wa JSON (bila mashamba ya wakati)
|
||||
Mashamba: `user`, `collection`, `name`, `description`, `tags`
|
||||
Mfano: `class='collections', key='alice:my-docs', value='{"user":"alice","collection":"my-docs","name":"My Documents","description":"...","tags":["work"]}'`
|
||||
|
||||
#### Mabadiliko ya 8: Huduma ya Maktaba - Ondoa Miundombinu ya Usimamizi wa Uhifadhi
|
||||
**Faili:** `trustgraph-flow/trustgraph/librarian/service.py`
|
||||
|
||||
**Ondoa:**
|
||||
Wafalme wa usimamizi wa uhifadhi (Mistari 173-190):
|
||||
`vector_storage_management_producer`
|
||||
`object_storage_management_producer`
|
||||
`triples_storage_management_producer`
|
||||
Mfumo wa matumizi ya majibu ya uhifadhi (Mistari 192-201)
|
||||
Msimamizi `on_storage_response` (Mistari 467-473)
|
||||
|
||||
**Badilisha:**
|
||||
Uanzishaji wa CollectionManager (Mistari 215-224) - ondoa vigezo vya mtayarishaji wa uhifadhi
|
||||
|
||||
**Kumbuka:** API ya nje ya mkusanyiko inabaki bila kubadilika:
|
||||
`list-collections`
|
||||
`update-collection`
|
||||
`delete-collection`
|
||||
|
||||
#### Mabadiliko ya 9: Ondoa Jedwali la Mkusanyiko kutoka LibraryTableStore
|
||||
**Faili:** `trustgraph-flow/trustgraph/tables/library.py`
|
||||
|
||||
**Futa:**
|
||||
Kauli ya kuunda jedwali la Mkusanyiko (Mistari 114-127)
|
||||
Maneno yaliyotayarishwa ya Mkusanyiko (Mistari 205-240)
|
||||
Mbinu zote za mkusanyiko (Mistari 578-717):
|
||||
`ensure_collection_exists`
|
||||
`list_collections`
|
||||
`update_collection`
|
||||
`delete_collection`
|
||||
`get_collection`
|
||||
`create_collection`
|
||||
|
||||
**Mazingatio:**
|
||||
Mkusanyiko sasa huhifadhiwa kwenye meza ya usanidi.
|
||||
Mabadiliko yanayoweza kusababisha migogoro yanayokubalika - hakuna uhamisho wa data unaohitajika.
|
||||
Inarahisisha huduma ya "librarian" kwa kiasi kikubwa.
|
||||
|
||||
#### Mabadiliko ya 10: Huduma za Uhifadhi - Usimamizi wa Mkusanyiko Kulingana na Usanidi ✅ IMEKAMILIKA
|
||||
|
||||
**Hali:** Vifaa vyote 11 vya uhifadhi vimehamishwa ili kutumia `CollectionConfigHandler`.
|
||||
|
||||
**Huduma Zinazoathirika (jumla ya 11):**
|
||||
Uingizaji wa hati: milvus, pinecone, qdrant
|
||||
Uingizaji wa grafu: milvus, pinecone, qdrant
|
||||
Uhifadhi wa vitu: cassandra
|
||||
Uhifadhi wa "triples": cassandra, falkordb, memgraph, neo4j
|
||||
|
||||
**Faili:**
|
||||
`trustgraph-flow/trustgraph/storage/doc_embeddings/milvus/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/doc_embeddings/pinecone/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/doc_embeddings/qdrant/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/graph_embeddings/milvus/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/graph_embeddings/pinecone/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/graph_embeddings/qdrant/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/objects/cassandra/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/triples/falkordb/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/triples/memgraph/write.py`
|
||||
`trustgraph-flow/trustgraph/storage/triples/neo4j/write.py`
|
||||
|
||||
**Mfumo wa Utendaji (huduma zote):**
|
||||
|
||||
1. **Jisajili "config handler" katika `__init__`:**
|
||||
```python
|
||||
# Add after AsyncProcessor initialization
|
||||
self.register_config_handler(self.on_collection_config)
|
||||
self.known_collections = set() # Track (user, collection) tuples
|
||||
```
|
||||
|
||||
2. **Teleza kidhibiti cha usanidi:**
|
||||
```python
|
||||
async def on_collection_config(self, config, version):
|
||||
"""Handle collection configuration updates"""
|
||||
logger.info(f"Collection config version: {version}")
|
||||
|
||||
if "collections" not in config:
|
||||
return
|
||||
|
||||
# Parse collections from config
|
||||
# Key format: "user:collection" in config["collections"]
|
||||
config_collections = set()
|
||||
for key in config["collections"].keys():
|
||||
if ":" in key:
|
||||
user, collection = key.split(":", 1)
|
||||
config_collections.add((user, collection))
|
||||
|
||||
# Determine changes
|
||||
to_create = config_collections - self.known_collections
|
||||
to_delete = self.known_collections - config_collections
|
||||
|
||||
# Create new collections (idempotent)
|
||||
for user, collection in to_create:
|
||||
try:
|
||||
await self.create_collection_internal(user, collection)
|
||||
self.known_collections.add((user, collection))
|
||||
logger.info(f"Created collection: {user}/{collection}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create {user}/{collection}: {e}")
|
||||
|
||||
# Delete removed collections (idempotent)
|
||||
for user, collection in to_delete:
|
||||
try:
|
||||
await self.delete_collection_internal(user, collection)
|
||||
self.known_collections.discard((user, collection))
|
||||
logger.info(f"Deleted collection: {user}/{collection}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to delete {user}/{collection}: {e}")
|
||||
```
|
||||
|
||||
3. **Anzisha kukusanya data inayojulikana wakati wa kuanza:**
|
||||
```python
|
||||
async def start(self):
|
||||
"""Start the processor"""
|
||||
await super().start()
|
||||
await self.sync_known_collections()
|
||||
|
||||
async def sync_known_collections(self):
|
||||
"""Query backend to populate known_collections set"""
|
||||
# Backend-specific implementation:
|
||||
# - Milvus/Pinecone/Qdrant: List collections/indexes matching naming pattern
|
||||
# - Cassandra: Query keyspaces or collection metadata
|
||||
# - Neo4j/Memgraph/FalkorDB: Query CollectionMetadata nodes
|
||||
pass
|
||||
```
|
||||
|
||||
4. **Kuboresha mbinu zilizopo za utendaji:**
|
||||
```python
|
||||
# Rename and remove response sending:
|
||||
# handle_create_collection → create_collection_internal
|
||||
# handle_delete_collection → delete_collection_internal
|
||||
|
||||
async def create_collection_internal(self, user, collection):
|
||||
"""Create collection (idempotent)"""
|
||||
# Same logic as current handle_create_collection
|
||||
# But remove response producer calls
|
||||
# Handle "already exists" gracefully
|
||||
pass
|
||||
|
||||
async def delete_collection_internal(self, user, collection):
|
||||
"""Delete collection (idempotent)"""
|
||||
# Same logic as current handle_delete_collection
|
||||
# But remove response producer calls
|
||||
# Handle "not found" gracefully
|
||||
pass
|
||||
```
|
||||
|
||||
5. **Ondoa miundomino ya usimamizi wa hifadhi:**
|
||||
Ondoa usanidi na uanzishaji wa `self.storage_request_consumer`
|
||||
Ondoa usanidi wa `self.storage_response_producer`
|
||||
Ondoa njia ya utaratibu wa `on_storage_management`
|
||||
Ondoa vipimo (metrics) vya usimamizi wa hifadhi
|
||||
Ondoa uingizaji (imports): `StorageManagementRequest`, `StorageManagementResponse`
|
||||
|
||||
**Mazingatio Maalum ya Seva ya Nyuma (Backend):**
|
||||
|
||||
**Vihifadhi vya data (Vector stores) (Milvus, Pinecone, Qdrant):** Fuatilia `(user, collection)` ya kimantiki katika `known_collections`, lakini inaweza kuunda mkusanyiko mwingi wa seva ya nyuma kwa kila kipimo. Endeleza mtindo wa uundaji wa polepole. Operesheni za kufuta lazima iondoe matoleo yote ya kipimo.
|
||||
|
||||
**Cassandra Objects:** Mikusanyiko ni sifa za mstari, sio muundo. Fuatilia taarifa za kiwango cha keyspace.
|
||||
|
||||
**Vihifadhi vya grafu (Graph stores) (Neo4j, Memgraph, FalkorDB):** Tafuta nodi za `CollectionMetadata` wakati wa kuanza. Unda/futa nodi za metadata wakati wa kusawazisha.
|
||||
|
||||
**Cassandra Triples:** Tumia API ya `KnowledgeGraph` kwa operesheni za mkusanyiko.
|
||||
|
||||
**Mambo Muhimu ya Ubunifu:**
|
||||
|
||||
**Ulinganifu wa muda (Eventual consistency):** Hakuna utaratibu wa ombi/jibu, utaratibu wa kusukuma usanidi hutangazwa
|
||||
**Ulinganifu (Idempotency):** Operesheni zote za kuunda/kufuta lazima ziwe salama kufanywa tena
|
||||
**Usimamizi wa makosa (Error handling):** Leta makosa lakini usizuie sasisho za usanidi
|
||||
**Kujirejesha (Self-healing):** Operesheni ambazo zimefeli zitajaribu tena wakati wa sasisho la usanidi lijayo
|
||||
**Muundo wa ufunguo wa mkusanyiko (Collection key format):** `"user:collection"` katika `config["collections"]`
|
||||
|
||||
#### Mabadiliko ya 11: Sasisha Mpango wa Mkusaniko - Ondoa Alama za Muda (Timestamps)
|
||||
**Faili:** `trustgraph-base/trustgraph/schema/services/collection.py`
|
||||
|
||||
**Badilisha CollectionMetadata (Mistari 13-21):**
|
||||
Ondoa sehemu za `created_at` na `updated_at`:
|
||||
```python
|
||||
class CollectionMetadata(Record):
|
||||
user = String()
|
||||
collection = String()
|
||||
name = String()
|
||||
description = String()
|
||||
tags = Array(String())
|
||||
# Remove: created_at = String()
|
||||
# Remove: updated_at = String()
|
||||
```
|
||||
|
||||
**Badilisha `CollectionManagementRequest` (Mistari 25-47):**
|
||||
Ondoa sehemu za wakati:
|
||||
```python
|
||||
class CollectionManagementRequest(Record):
|
||||
operation = String()
|
||||
user = String()
|
||||
collection = String()
|
||||
timestamp = String()
|
||||
name = String()
|
||||
description = String()
|
||||
tags = Array(String())
|
||||
# Remove: created_at = String()
|
||||
# Remove: updated_at = String()
|
||||
tag_filter = Array(String())
|
||||
limit = Integer()
|
||||
```
|
||||
|
||||
**Sababu:**
|
||||
Wakati (Timestamps) havi faida kwa makusanyo.
|
||||
Huduma ya usanidi (config) inadhibiti ufuatiliaji wake wa toleo.
|
||||
Inarahisisha muundo na kupunguza uhifadhi.
|
||||
|
||||
#### Faida za Uhamishaji wa Huduma ya Usanidi
|
||||
|
||||
1. ✅ **Inaondoa masuala ya usimamizi wa uhifadhi yaliyopangwa awali** - Inatatua kizuizi cha wateja wengi.
|
||||
2. ✅ **Uratibu rahisi zaidi** - Hakuna utaratibu ngumu wa kusubiri majibu kutoka kwa huduma 4+ za uhifadhi.
|
||||
3. ✅ **Ulinganifu wa muda** - Huduma za uhifadhi husasishwa kwa kujitegemea kupitia utaratibu wa usanidi.
|
||||
4. ✅ **Uaminifu bora zaidi** - Uratibu wa kudumu wa usanidi dhidi ya ombi/jibu lisilo la kudumu.
|
||||
5. ✅ **Mfumo wa usanidi uliounganishwa** - Makusanyo yanatibiwa kama usanidi.
|
||||
6. ✅ **Inapunguza utata** - Inondoa mistari ~300 ya nambari ya uratibu.
|
||||
7. ✅ **Inafaa kwa wateja wengi** - Usanidi tayari una msaada wa kutenganisha wateja kupitia nafasi.
|
||||
8. ✅ **Ufuatiliaji wa toleo** - Mfumo wa toleo wa huduma ya usanidi hutoa kumbukumbu ya uhakiki.
|
||||
|
||||
## Maelezo ya Utendaji
|
||||
|
||||
### Utangamano wa Nyuma
|
||||
|
||||
**Mabadiliko ya Vigezo:**
|
||||
Mabadiliko ya majina ya vigezo vya CLI ni mabadiliko ambayo yanaweza kusababisha matatizo lakini yanapokelewa (kipengele hapo kwa sasa hakifanyi kazi).
|
||||
Huduma zinafanya kazi bila vigezo (tumia chaguo-msingi).
|
||||
Nafasi chaguo-msingi zimehifadhiwa: "config", "knowledge", "librarian".
|
||||
Kichefuchefu chaguo-msingi: `persistent://tg/config/config`
|
||||
|
||||
**Usimamizi wa Makusanyo:**
|
||||
**Mabadiliko ambayo yanaweza kusababisha matatizo:** Jedwali la makusanyo limeondolewa kutoka kwa nafasi ya "librarian".
|
||||
**Hakuna uhamishaji wa data unaotolewa** - inakubalika kwa hatua hii.
|
||||
API ya makusanyo ya nje haijabadilika (operesheni za kuorodhesha/kusasisha/kufuta).
|
||||
Muundo wa metadata ya makusanyo umeboreshwa (wakati umeondolewa).
|
||||
|
||||
### Mahitaji ya Majaribio
|
||||
|
||||
**Majaribio ya Vigezo:**
|
||||
1. Thibitisha kwamba kiparamu `--config-push-queue` hufanya kazi kwenye huduma ya "graph-embeddings".
|
||||
2. Thibitisha kwamba kiparamu `--config-push-queue` hufanya kazi kwenye huduma ya "text-completion".
|
||||
3. Thibitisha kwamba kiparamu `--config-push-queue` hufanya kazi kwenye huduma ya usanidi.
|
||||
4. Thibitisha kwamba kiparamu `--cassandra-keyspace` hufanya kazi kwa huduma ya usanidi.
|
||||
5. Thibitisha kwamba kiparamu `--cassandra-keyspace` hufanya kazi kwa huduma ya "cores".
|
||||
6. Thibitisha kwamba kiparamu `--cassandra-keyspace` hufanya kazi kwa huduma ya "librarian".
|
||||
7. Thibitisha kwamba huduma zinafanya kazi bila vigezo (zinatumia chaguo-msingi).
|
||||
8. Thibitisha uwekaji wa wateja wengi na majina ya kichefuchefu na nafasi maalum.
|
||||
|
||||
**Majaribio ya Usimamizi wa Makusanyo:**
|
||||
9. Thibitisha operesheni `list-collections` kupitia huduma ya usanidi.
|
||||
10. Thibitisha kwamba `update-collection` huunda/kusasisha kwenye jedwali la usanidi.
|
||||
11. Thibitisha kwamba `delete-collection` huondoa kutoka kwenye jedwali la usanidi.
|
||||
12. Thibitisha kwamba utaratibu wa "config push" huwashwa wakati wa sasisho za makusanyo.
|
||||
13. Thibitisha kwamba utaratibu wa kuchujwa wa lebo hufanya kazi na uhifadhi unaotegemea usanidi.
|
||||
14. Thibitisha kwamba operesheni za makusanyo hufanya kazi bila sehemu za wakati.
|
||||
|
||||
### Mfano wa Uwekaji wa Wateja Wengi
|
||||
```bash
|
||||
# Tenant: tg-dev
|
||||
graph-embeddings \
|
||||
-p pulsar+ssl://broker:6651 \
|
||||
--pulsar-api-key <KEY> \
|
||||
--config-push-queue persistent://tg-dev/config/config
|
||||
|
||||
config-service \
|
||||
-p pulsar+ssl://broker:6651 \
|
||||
--pulsar-api-key <KEY> \
|
||||
--config-push-queue persistent://tg-dev/config/config \
|
||||
--cassandra-keyspace tg_dev_config
|
||||
```
|
||||
|
||||
## Uchambuzi wa Athari
|
||||
|
||||
### Huduma Zinazoathiriwa na Mabadiliko 1-2 (Kubadilisha Jina la Paramu ya CLI)
|
||||
Huduma zote zinazorithi kutoka kwa AsyncProcessor au FlowProcessor:
|
||||
config-service
|
||||
cores-service
|
||||
librarian-service
|
||||
graph-embeddings
|
||||
document-embeddings
|
||||
text-completion-* (wote watoa huduma)
|
||||
extract-* (wote waangamizi)
|
||||
query-* (huduma zote za kuulizia)
|
||||
retrieval-* (huduma zote za RAG)
|
||||
storage-* (huduma zote za kuhifadhi)
|
||||
Na huduma zingine 20+
|
||||
|
||||
### Huduma Zinazoathiriwa na Mabadiliko 3-6 (Nafasi ya Cassandra)
|
||||
config-service
|
||||
cores-service
|
||||
librarian-service
|
||||
|
||||
### Huduma Zinazoathiriwa na Mabadiliko 7-11 (Usimamizi wa Mkusanyiko)
|
||||
|
||||
**Mabadiliko ya Mara Moja:**
|
||||
librarian-service (collection_manager.py, service.py)
|
||||
tables/library.py (ondoa meza ya mkusanyiko)
|
||||
schema/services/collection.py (ondoa alama ya muda)
|
||||
|
||||
**Mabadiliko Yaliyokamilika (Mabadiliko ya 10):** ✅
|
||||
Huduma zote za kuhifadhi (jumla ya 11) - zimehamishwa kwa utaratibu wa kusukuma usanidi kwa mkusanyiko kupitia `CollectionConfigHandler`
|
||||
Mfumo wa usimamizi wa kuhifadhi umeondolewa kutoka `storage.py`
|
||||
|
||||
## Mambo ya Kuzingatia ya Baadaye
|
||||
|
||||
### Mfumo wa Nafasi ya Mtumiaji Kila Mmoja
|
||||
|
||||
Huduma zingine hutumia **nafasi za mtumiaji kila mmoja** kwa njia ya moja kwa moja, ambapo kila mtumiaji hupata nafasi yake mwenyewe ya Cassandra:
|
||||
|
||||
**Huduma zenye nafasi za mtumiaji kila mmoja:**
|
||||
1. **Huduma ya Uulizia ya Triples** (`trustgraph-flow/trustgraph/query/triples/cassandra/service.py:65`)
|
||||
Inatumia `keyspace=query.user`
|
||||
2. **Huduma ya Uulizia ya Vitabu** (`trustgraph-flow/trustgraph/query/objects/cassandra/service.py:479`)
|
||||
Inatumia `keyspace=self.sanitize_name(user)`
|
||||
3. **Ufikiaji wa Moja kwa Moja wa KnowledgeGraph** (`trustgraph-flow/trustgraph/direct/cassandra_kg.py:18`)
|
||||
Paramu ya chaguo-msingi `keyspace="trustgraph"`
|
||||
|
||||
**Hali:** Hizi **hazibadiliki** katika maelezo haya.
|
||||
|
||||
**Hakikisha Upya wa Baadaye:**
|
||||
Tathmini ikiwa mfumo wa nafasi ya mtumiaji kila mmoja huunda masuala ya kutenganisha wateja
|
||||
Fikiria ikiwa usambazaji wa wateja mbalimbali unahitaji muundo wa mbele ya nafasi (k.m., `tenant_a_user1`)
|
||||
Angalia uwezekano wa migongano ya kitambulisho cha mtumiaji kati ya wateja
|
||||
Tathmini ikiwa nafasi moja iliyoshirikiwa kwa kila mteja na kutenganisha mstari kwa msingi wa mtumiaji ni bora
|
||||
|
||||
**Kumbuka:** Hii haizuie utekelezaji wa sasa wa wateja mbalimbali lakini inapaswa kukaguliwa kabla ya utekelezaji wa wateja mbalimbali wa uzalishaji.
|
||||
|
||||
## Awamu za Utendaji
|
||||
|
||||
### Awamu ya 1: Marekebisho ya Paramu (Mabadiliko 1-6)
|
||||
Marekebisho ya jina la paramu `--config-push-queue`
|
||||
Ongeza msaada wa paramu `--cassandra-keyspace`
|
||||
**Matokeo:** Mpangilio wa folyo na nafasi ya wateja mbalimbali umeanzishwa
|
||||
|
||||
### Awamu ya 2: Uhamishaji wa Usimamizi wa Mkusanyiko (Mabadiliko 7-9, 11)
|
||||
Hamisha uhifadhi wa mkusanyiko kwa huduma ya usanidi
|
||||
Ondoa meza ya mkusanyiko kutoka kwa librarian
|
||||
Sasisha mfumo wa mkusanyiko (ondoa alama za muda)
|
||||
**Matokeo:** Huondoa mada zilizokota za usimamizi wa uhifadhi, hurahisisha librarian
|
||||
|
||||
### Awamu ya 3: Masasisho ya Huduma ya Uhifadhi (Mabadiliko ya 10) ✅ IMEKAMILIKA
|
||||
Zesasisha huduma zote za kuhifadhi ili zitumie utaratibu wa kusukuma usanidi kwa mkusanyiko kupitia `CollectionConfigHandler`
|
||||
Ondoa miundombinu ya ombi/jibu la usimamizi wa uhifadhi
|
||||
Ondoa ufafanuzi wa zamani wa mfumo
|
||||
**Matokeo:** Usimamizi kamili wa mkusanyiko unaotegemea usanidi umefikiwa
|
||||
|
||||
## Marejeleo
|
||||
GitHub Issue: https://github.com/trustgraph-ai/trustgraph/issues/582
|
||||
Faili Zinazohusiana:
|
||||
`trustgraph-base/trustgraph/base/async_processor.py`
|
||||
`trustgraph-base/trustgraph/base/cassandra_config.py`
|
||||
`trustgraph-base/trustgraph/schema/core/topic.py`
|
||||
`trustgraph-base/trustgraph/schema/services/collection.py`
|
||||
`trustgraph-flow/trustgraph/config/service/service.py`
|
||||
`trustgraph-flow/trustgraph/cores/service.py`
|
||||
`trustgraph-flow/trustgraph/librarian/service.py`
|
||||
`trustgraph-flow/trustgraph/librarian/collection_manager.py`
|
||||
`trustgraph-flow/trustgraph/tables/library.py`
|
||||
192
docs/tech-specs/sw/neo4j-user-collection-isolation.sw.md
Normal file
192
docs/tech-specs/sw/neo4j-user-collection-isolation.sw.md
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Usaidizi wa Kuhami Mtumiaji/Mkusanyiko katika Neo4j"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Usaidizi wa Kuhami Mtumiaji/Mkusanyiko katika Neo4j
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Tatizo
|
||||
|
||||
Hifadhi na utekelezaji wa masuala ya Neo4j kwa sasa hutoi ukinzani wa mtumiaji/mkusanyiko, jambo ambalo huunda tatizo la usalama la utendaji wa wateja wengi. Masuala yote yanahifadhiwa katika nafasi moja ya grafu bila njia yoyote ya kuzuia watumiaji kupata data ya watumiaji wengine au kuchanganya mkusanyiko.
|
||||
|
||||
Tofauti na mifumo mingine ya kuhifadhi katika TrustGraph:
|
||||
- **Cassandra**: Hutumia nafasi tofauti kwa kila mtumiaji na meza kwa kila mkusanyiko.
|
||||
- **Hifadhi za Vector** (Milvus, Qdrant, Pinecone): Hutumia nafasi maalum kwa kila mkusanyiko.
|
||||
- **Neo4j**: Kwa sasa, inashiriki data yote katika grafu moja (hatari ya usalama).
|
||||
|
||||
## Muundo wa Sasa
|
||||
|
||||
### Mfano wa Data
|
||||
- **Nod**: Laini `:Node` na `uri` ya jambo, `:Literal` na `value` ya jambo.
|
||||
- **Uhusiano**: Laini `:Rel` na `uri` ya jambo.
|
||||
- **Faharasa**: `Node.uri`, `Literal.value`, `Rel.uri`.
|
||||
|
||||
### Mzunguko wa Ujumbe
|
||||
- Ujumbe wa `Triples` una `metadata.user` na `metadata.collection` ya jambo.
|
||||
- Huduma ya kuhifadhi inapokea taarifa za mtumiaji/mkusanyiko lakini inaizaba.
|
||||
- Huduma ya kuuliza inatarajia `user` na `collection` katika `TriplesQueryRequest` lakini inaizaba.
|
||||
|
||||
### Tatizo la Sasa la Usalama
|
||||
```cypher
|
||||
# Mtumiaji wowote anaweza kuuliza data yoyote - hakuna ukinzani
|
||||
MATCH (src:Node)-[rel:Rel]->(dest:Node)
|
||||
RETURN src.uri, rel.uri, dest.uri
|
||||
```
|
||||
|
||||
## Suluhisho Lililopendekezwa: Kuchuja Kulingana na Jambo (Inapendekezwa)
|
||||
|
||||
### Muhtasari
|
||||
Ongeza `user` na `collection` ya jambo kwenye nodi na uhusiano wote, kisha chuja shughuli zote kwa jambo hizi. Njia hii hutoa ukinzani kamili huku ikiendeleza uwezo wa kuuliza na urafiki na mifumo iliyopo.
|
||||
|
||||
### Mabadiliko ya Mfano wa Data
|
||||
|
||||
#### Muundo Ulioboreshwa wa Nodi
|
||||
```cypher
|
||||
// Vitu vya Nodi
|
||||
CREATE (n:Node {
|
||||
uri: "http://example.com/entity1",
|
||||
user: "john_doe",
|
||||
collection: "production_v1"
|
||||
})
|
||||
|
||||
// Vitu vya Literal
|
||||
CREATE (n:Literal {
|
||||
value: "literal value",
|
||||
user: "john_doe",
|
||||
collection: "production_v1"
|
||||
})
|
||||
```
|
||||
|
||||
#### Muundo Ulioboreshwa wa Uhusiano
|
||||
```cypher
|
||||
// Uhusiano na jambo la mtumiaji/mkusanyiko
|
||||
CREATE (src)-[:Rel {
|
||||
uri: "http://example.com/predicate1",
|
||||
user: "john_doe",
|
||||
collection: "production_v1"
|
||||
}]->(dest)
|
||||
```
|
||||
|
||||
#### Faharasa Zilizosasishwa
|
||||
```cypher
|
||||
// Faharasa za pamoja kwa kuchuja kwa ufanisi
|
||||
CREATE INDEX node_user_collection_uri FOR (n:Node) ON (n.user, n.collection, n.uri);
|
||||
CREATE INDEX literal_user_collection_value FOR (n:Literal) ON (n.user, n.collection, n.value);
|
||||
```
|
||||
|
||||
## Mpango wa Utendaji
|
||||
|
||||
### Awamu ya 1: Msingi (Wiki ya 1)
|
||||
1. [ ] Sasisha huduma ya kuhifadhi ili kukubali na kuhifadhi jambo la mtumiaji/mkusanyiko.
|
||||
2. [ ] Ongeza faharasa za pamoja kwa kuuliza kwa ufanisi.
|
||||
3. [ ] Lenga ukinzani wa nyuma.
|
||||
4. [ ] Unda vipimo vya kitengo kwa utendakazi mpya.
|
||||
|
||||
### Awamu ya 2: Masuala ya Uulizaji (Wiki ya 2)
|
||||
1. [ ] Sasisha mifumo yote ya kuuliza ili kujumuisha vichujio vya mtumiaji/mkusanyiko.
|
||||
2. [ ] Ongeza uthibitisho wa kuuliza na udhibiti wa usalama.
|
||||
3. [ ] Sasisha vipimo vya ujumuishaji.
|
||||
4. [ ] Vipimo vya utendaji na masuala yaliyofilishwa.
|
||||
|
||||
### Awamu ya 3: Uhamishaji na Uwekaji (Wiki ya 3)
|
||||
1. [ ] Unda skripti za uhamishaji wa data kwa matukio ya sasa ya Neo4j.
|
||||
2. [ ] Nyaraka na maelekezo ya uwekaji.
|
||||
3. [ ] Ufuatiliaji na arifa kwa ukiukaji wa ukinzani.
|
||||
4. [ ] Vipimo kamili na matukio mengi ya mtumiaji/mkusanyiko.
|
||||
|
||||
### Awamu ya 4: Uimarishaji (Wiki ya 4)
|
||||
1. [ ] Ondoa hali ya ukinzani wa nyuma.
|
||||
2. [ ] Ongeza ufuatiliaji kamili.
|
||||
3. [ ] Mapitio ya usalama na majaribio ya uvamizi.
|
||||
4. [ ] Uboreshaji wa utendaji.
|
||||
|
||||
## Mikakati ya Ujaribio
|
||||
|
||||
### Vipimo vya Kitengo
|
||||
```python
|
||||
def test_user_collection_isolation():
|
||||
# Hifadhi masuala kwa mtumiaji 1/mkusanyiko 1
|
||||
processor.store_triples(triples_user1_coll1)
|
||||
|
||||
# Hifadhi masuala kwa mtumiaji 2/mkusanyiko 2
|
||||
processor.store_triples(triples_user2_coll2)
|
||||
|
||||
# Uulizaje kama mtumiaji 1 unapaswa kurejesha data ya mtumiaji 1
|
||||
results = processor.query_triples(query_user1_coll1)
|
||||
assert all_results_belong_to_user1_coll1(results)
|
||||
|
||||
# Uulizaje kama mtumiaji 2 unapaswa kurejesha data ya mtumiaji 2
|
||||
results = processor.query_triples(query_user2_coll2)
|
||||
assert all_results_belong_to_user2_coll2(results)
|
||||
```
|
||||
|
||||
### Vipimo vya Ujumuishaji
|
||||
- Matukio mengi ya mtumiaji na data iliyofanana.
|
||||
- Masuala ya kati ya mkusanyiko (yanapaswa kushindwa).
|
||||
- Vipimo vya uhamishaji na data iliyopo.
|
||||
- Benchi za utendaji na data kubwa.
|
||||
|
||||
### Vipimo vya Usalama
|
||||
- Jaribu kuuliza data ya watumiaji wengine.
|
||||
- Mashambulio ya aina ya SQL kwenye vigezo vya mtumiaji/mkusanyiko.
|
||||
- Thibitisha ukinzani kamili chini ya mifumo tofauti ya kuuliza.
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
### Mkakati wa Faharasa
|
||||
- Faharasa za pamoja kwenye `(user, collection, uri)` kwa kuchuja bora.
|
||||
- Fikiria faharasa za sehemu ikiwa baadhi ya makusanyiko ni makubwa sana.
|
||||
- Fuatilia matumizi ya faharasa na utendaji wa kuuliza.
|
||||
|
||||
### Uboreshaji wa Uulizaji
|
||||
- Tumia EXPLAIN ili kuhakikisha matumizi ya faharasa katika masuala yaliyofilishwa.
|
||||
- Fikiria kuhifadhi matokeo ya kuuliza kwa data inayopatikana mara kwa mara.
|
||||
- Profaili matumizi ya kumbukumbu na idadi kubwa ya watumiaji/makusanyiko.
|
||||
|
||||
### Urahisi
|
||||
- Kila mtumiaji/mkusanyiko huunda kisiwa cha data.
|
||||
- Fuatilia saizi ya hifadhi na matumizi ya kikao.
|
||||
- Fikiria mikakati ya urahisi wa wima ikiwa inahitajika.
|
||||
|
||||
## Usalama na Uzingatiaji
|
||||
|
||||
### Ahadi za Ukinzani wa Data
|
||||
- **Kimwili**: Data yote ya mtumiaji iliyohifadhiwa na jambo la mtumiaji/mkusanyiko.
|
||||
- **Mantiki**: Masuala yote yaliyofilishwa kwa muktadha wa mtumiaji/mkusanyiko.
|
||||
- **Udhibiti wa Ufikiaji**: Uthibitisho wa kiwango cha huduma unaozuia ufikiaji usioidhinishwa.
|
||||
|
||||
### Mahitaji ya Ufuatiliaji
|
||||
- Ingiza masuala yote ya data na muktadha wa mtumiaji/mkusanyiko.
|
||||
- Fuatilia shughuli za uhamishaji na uhamishaji wa data.
|
||||
- Fuatilia majaribio ya ukiukaji wa ukinzani.
|
||||
|
||||
### Masuala ya Uzingatiaji
|
||||
- GDPR: Uwezo ulioboreshwa wa kutafuta na kufuta data maalum ya mtumiaji.
|
||||
- SOC2: Ukinzani wa wateja wengi na udhibiti wa ufikiaji.
|
||||
- HIPAA: Ukinzani wa mteja kwa data ya afya.
|
||||
|
||||
## Hatari na Kupunguza
|
||||
|
||||
| Hatari | Athari | Uwezekano | Kupunguza |
|
||||
|------|--------|------------|------------|
|
||||
| Uulizaji unokosa jambo la mtumiaji/mkusanyiko | Juu | Katikati | Uthibitisho wa lazima, vipimo vya kina |
|
||||
| Uharibifu wa utendaji | Katikati | Chini | Uboreshaji wa faharasa, profaili ya kuuliza |
|
||||
| Uharibifu wa data ya uhamishaji | Juu | Chini | Mkakati wa chelezo, taratibu za kurejesha |
|
||||
| Masuala ya kuuliza ya mkusanyiko mingi | Katikati | Katikati | Nyaraka za mifumo ya kuuliza, toa mifano |
|
||||
|
||||
## Vigezo vya Mafanikio
|
||||
|
||||
1. **Usalama**: Hakuna ufikiaji wa data ya mtumiaji mwingine katika uzalishaji.
|
||||
2. **Utendaji**: <10% athari ya utendaji kwenye masuala yaliyofilishwa.
|
||||
3. **Uhamishaji**: 100% ya data iliyopo iliyohama bila kupoteza.
|
||||
4. **Urahisi**: Masuala yote ya sasa ya kuuliza yanapaswa kufanya kazi na muktadha wa mtumiaji/mkusanyiko.
|
||||
5. **Uzingatiaji**: Ufuatiliaji kamili wa masuala ya mtumiaji/mkusanyiko.
|
||||
|
||||
## Hitimisho
|
||||
|
||||
Njia ya kuchuja kulingana na jambo hutoa uwiano bora wa usalama, utendaji, na urafiki kwa kuongeza ukinzani wa mtumiaji/mkusanyiko katika Neo4j. Inalingana na mifumo iliyopo ya utendaji wa wateja wengi katika TrustGraph huku ikiendeleza nguvu za Neo4j katika kuuliza na faharasa.
|
||||
|
||||
Suluhisho hili huhakikisha kwamba hifadhi ya Neo4j katika TrustGraph inakidhi viwango sawa vya usalama kama mifumo mingine ya kuhifadhi, ikiepuka udhaifu wa ukinzani wa data huku ikiendeleza uwezo na nguvu ya kuuliza ya grafu.
|
||||
769
docs/tech-specs/sw/ontology-extract-phase-2.sw.md
Normal file
769
docs/tech-specs/sw/ontology-extract-phase-2.sw.md
Normal file
|
|
@ -0,0 +1,769 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Dondoo la Maarifa la Ontolojia - Awamu ya 2 ya Urekebishaji"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Dondoo la Maarifa la Ontolojia - Awamu ya 2 ya Urekebishaji
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
**Hali**: Rasimu
|
||||
**Mwandishi**: Mkutano wa Uchambuzi wa 2025-12-03
|
||||
**Inahusiana na**: `ontology.md`, `ontorag.md`
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Hati hii inataja kutofautiana katika mfumo wa sasa wa dondoo la maarifa unaotegemea ontolojia na inapendekeza urekebishaji ili kuboresha utendaji wa LLM na kupunguza upotevu wa habari.
|
||||
|
||||
## Utendaji wa Sasa
|
||||
|
||||
### Inavyofanya Sasa
|
||||
|
||||
1. **Kupakia Ontolojia** (`ontology_loader.py`)
|
||||
Inapakia JSON ya ontolojia na vitufe kama `"fo/Recipe"`, `"fo/Food"`, `"fo/produces"`
|
||||
Nambari za darasa zina jalizi la nafasi katika kitufe yenyewe
|
||||
Mfano kutoka `food.ontology`:
|
||||
```json
|
||||
"classes": {
|
||||
"fo/Recipe": {
|
||||
"uri": "http://purl.org/ontology/fo/Recipe",
|
||||
"rdfs:comment": "A Recipe is a combination..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. **Uundaji wa Maagizo** (`extract.py:299-307`, `ontology-prompt.md`)
|
||||
Kiolezo kinapokea dictionaries `classes`, `object_properties`, `datatype_properties`
|
||||
Kiolezo huchanganua: `{% for class_id, class_def in classes.items() %}`
|
||||
LLM inaona: `**fo/Recipe**: A Recipe is a combination...`
|
||||
Muundo wa mfano wa matokeo unaonyesha:
|
||||
```json
|
||||
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}
|
||||
{"subject": "recipe:cornish-pasty", "predicate": "has_ingredient", "object": "ingredient:flour"}
|
||||
```
|
||||
|
||||
3. **Uchambuzi wa Majibu** (`extract.py:382-428`)
|
||||
Inatarajia safu ya JSON: `[{"subject": "...", "predicate": "...", "object": "..."}]`
|
||||
Inathibitisha dhidi ya sehemu ya ontolojia
|
||||
Inapanua URI kupitia `expand_uri()` (extract.py:473-521)
|
||||
|
||||
4. **Upanuzi wa URI** (`extract.py:473-521`)
|
||||
Inangalia ikiwa thamani iko katika kamusi `ontology_subset.classes`
|
||||
Ikiwa imepatikana, inatoa URI kutoka kwenye ufafanuzi wa darasa
|
||||
Ikiwa haijapatikana, inaunda URI: `f"https://trustgraph.ai/ontology/{ontology_id}#{value}"`
|
||||
|
||||
### Mfano wa Mtiririko wa Data
|
||||
|
||||
**Ontolojia ya JSON → Mpakuzi → Ombi:**
|
||||
```
|
||||
"fo/Recipe" → classes["fo/Recipe"] → LLM sees "**fo/Recipe**"
|
||||
```
|
||||
|
||||
**LLM → Mfumo wa Uchambuzi → Matokeo:**
|
||||
```
|
||||
"Recipe" → not in classes["fo/Recipe"] → constructs URI → LOSES original URI
|
||||
"fo/Recipe" → found in classes → uses original URI → PRESERVES URI
|
||||
```
|
||||
|
||||
## Matatizo Yaliyobainika
|
||||
|
||||
### 1. **Mfano Usiofuata Kanuni katika Maagizo**
|
||||
|
||||
**Tatizo**: Kiolezo cha maagizo huonyesha vitambulisho vya darasa na mabainisha (`fo/Recipe`) lakini matokeo ya mfano hutumia majina ya darasa yasiyo na mabainisha (`Recipe`).
|
||||
|
||||
**Mahali**: `ontology-prompt.md:5-52`
|
||||
|
||||
```markdown
|
||||
## Ontology Classes:
|
||||
- **fo/Recipe**: A Recipe is...
|
||||
|
||||
## Example Output:
|
||||
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}
|
||||
```
|
||||
|
||||
**Athari**: Mfumo wa lugha (LLM) hupokea ishara tofauti kuhusu muundo ambao unapaswa kutumika.
|
||||
|
||||
### 2. **Upatanishi wa Habari katika Upanuzi wa URI**
|
||||
|
||||
**Tatizo**: Wakati LLM hurudisha majina ya darasa ambayo hayana alama ya mbele, kama ilivyoelezwa katika mfano, `expand_uri()` hayawezi kuyakuta katika kamusi ya ontolojia na huunda URI za dharura, na kusababisha kupoteza URI za asili.
|
||||
|
||||
**Mahali**: `extract.py:494-500`
|
||||
|
||||
```python
|
||||
if value in ontology_subset.classes: # Looks for "Recipe"
|
||||
class_def = ontology_subset.classes[value] # But key is "fo/Recipe"
|
||||
if isinstance(class_def, dict) and 'uri' in class_def:
|
||||
return class_def['uri'] # Never reached!
|
||||
return f"https://trustgraph.ai/ontology/{ontology_id}#{value}" # Fallback
|
||||
```
|
||||
|
||||
**Athari**:
|
||||
URI asili: `http://purl.org/ontology/fo/Recipe`
|
||||
URI iliyoundwa: `https://trustgraph.ai/ontology/food#Recipe`
|
||||
Maana ya kielelezo yamepotea, husababisha kutofanya kazi kwa pamoja.
|
||||
|
||||
### 3. **Muundo Usio Wazi wa Eneo la Kitu**
|
||||
|
||||
**Tatizo**: Hakuna mwongozo wazi kuhusu muundo wa URI ya eneo la kitu.
|
||||
|
||||
**Mfano katika maagizo**:
|
||||
`"recipe:cornish-pasty"` (kielezi kama kielezi)
|
||||
`"ingredient:flour"` (kielezi tofauti)
|
||||
|
||||
**Tabia halisi** (extract.py:517-520):
|
||||
```python
|
||||
# Treat as entity instance - construct unique URI
|
||||
normalized = value.replace(" ", "-").lower()
|
||||
return f"https://trustgraph.ai/{ontology_id}/{normalized}"
|
||||
```
|
||||
|
||||
**Athari**: Mfumo wa lugha (LLM) lazima ajue mbinu ya kuweka alama (prefixing) bila kuwa na msingi wa elimu (ontology).
|
||||
|
||||
### 4. **Hakuna Maelekezo ya Mbele ya Nafasi (Namespace)**
|
||||
|
||||
**Tatizo**: Faili ya JSON ya elimu ina maelezo ya nafasi (namespace) (kwa mstari wa 10-25 katika food.ontology):
|
||||
```json
|
||||
"namespaces": {
|
||||
"fo": "http://purl.org/ontology/fo/",
|
||||
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Lakini mistari hii haionyeshwi kwa mfumo wa lugha (LLM). MFUMO WA LUGHA (LLM) haujua:
|
||||
Maana ya "fo"
|
||||
Njia gani ya kutumia kwa vitu
|
||||
Nafasi gani inayotumika kwa vipengele
|
||||
|
||||
### 5. **Lebo Ambazo Hazitumiki katika Swali**
|
||||
|
||||
**Tatizo**: Kila darasa lina sehemu za `rdfs:label` (k.m., `{"value": "Recipe", "lang": "en-gb"}`), lakini kigezo cha swali haziitumii.
|
||||
|
||||
**Hali ya sasa**: Inaonyesha tu `class_id` na `comment`
|
||||
```jinja
|
||||
- **{{class_id}}**{% if class_def.comment %}: {{class_def.comment}}{% endif %}
|
||||
```
|
||||
|
||||
**Inapatikana lakini haitumiki:**
|
||||
```python
|
||||
"rdfs:label": [{"value": "Recipe", "lang": "en-gb"}]
|
||||
```
|
||||
|
||||
**Athari**: Inaweza kutoa majina ambayo yanaweza kueleweka kwa binadamu pamoja na vitambulisho vya kiufundi.
|
||||
|
||||
## Suluhisho Zilizopendekezwa
|
||||
|
||||
### Chaguo A: Kuweka Vipengele sawa na Vitambulisho visivyo na Mbele
|
||||
|
||||
**Mbinu**: Ondoa mbele kutoka kwa vitambulisho vya darasa kabla ya kuviwasha kwa mfumo wa akili bandia (LLM).
|
||||
|
||||
**Mabadiliko**:
|
||||
1. Badilisha `build_extraction_variables()` ili kubadilisha funguo:
|
||||
```python
|
||||
classes_for_prompt = {
|
||||
k.split('/')[-1]: v # "fo/Recipe" → "Recipe"
|
||||
for k, v in ontology_subset.classes.items()
|
||||
}
|
||||
```
|
||||
|
||||
2. Sasisha mfano wa maagizo ili ufanane (tayari hutumia majina yasiyo na alama).
|
||||
|
||||
3. Badilisha `expand_uri()` ili iweze kushughulikia aina zote mbili:
|
||||
```python
|
||||
# Try exact match first
|
||||
if value in ontology_subset.classes:
|
||||
return ontology_subset.classes[value]['uri']
|
||||
|
||||
# Try with prefix
|
||||
for prefix in ['fo/', 'rdf:', 'rdfs:']:
|
||||
prefixed = f"{prefix}{value}"
|
||||
if prefixed in ontology_subset.classes:
|
||||
return ontology_subset.classes[prefixed]['uri']
|
||||
```
|
||||
|
||||
**Faida:**
|
||||
Safi zaidi, rahisi zaidi kusoma na kuelewa.
|
||||
Inafanana na mifano iliyopo ya maagizo.
|
||||
Mifumo ya lugha kubwa (LLMs) hufanya kazi vizuri zaidi na alama (tokens) rahisi.
|
||||
|
||||
**Hasara:**
|
||||
Migongano ya majina ya madarasa ikiwa ontolojia nyingi zina jina sawa la darasa.
|
||||
Inapoteza habari ya nafasi (namespace).
|
||||
Inahitaji mantiki ya dharura kwa utafutaji.
|
||||
|
||||
### Chaguo B: Tumia Kitambulisho Kamili Chenye Alama (Prefix) kwa Ufanisi
|
||||
|
||||
**Mbinu:** Sasisha mifano ili kutumia kitambulisho chenye alama kinacholingana na kile kinachoonyeshwa katika orodha ya madarasa.
|
||||
|
||||
**Mabadiliko:**
|
||||
1. Sasisha mfano wa agizo (ontology-prompt.md:46-52):
|
||||
```json
|
||||
[
|
||||
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "fo/Recipe"},
|
||||
{"subject": "recipe:cornish-pasty", "predicate": "rdfs:label", "object": "Cornish Pasty"},
|
||||
{"subject": "recipe:cornish-pasty", "predicate": "fo/produces", "object": "food:cornish-pasty"},
|
||||
{"subject": "food:cornish-pasty", "predicate": "rdf:type", "object": "fo/Food"}
|
||||
]
|
||||
```
|
||||
|
||||
2. Ongeza maelezo ya nafasi ya kazi kwenye swali:
|
||||
```markdown
|
||||
## Namespace Prefixes:
|
||||
- **fo/**: Food Ontology (http://purl.org/ontology/fo/)
|
||||
- **rdf:**: RDF Schema
|
||||
- **rdfs:**: RDF Schema
|
||||
|
||||
Use these prefixes exactly as shown when referencing classes and properties.
|
||||
```
|
||||
|
||||
3. Acha `expand_uri()` kama ilivyo (hufanya kazi vizuri wakati mechi zinapopatikana).
|
||||
|
||||
**Faida:**
|
||||
Ulinganisho kati ya ingizo na pato.
|
||||
Hakuna upotevu wa habari.
|
||||
Inahifadhi maana ya nafasi (namespace).
|
||||
Inafanya kazi na ontolojia nyingi.
|
||||
|
||||
**Hasara:**
|
||||
Alama (tokens) zaidi kwa LLM.
|
||||
Inahitaji LLM kufuatilia alama za mbele (prefixes).
|
||||
|
||||
### Chaguo C: Mchanganyiko - Onyesha Lebo na Kitambulisho (ID)
|
||||
|
||||
**Mbinu:** Ongeza maagizo katika swali ili kuonyesha lebo zinazoweza kusomwa na binadamu na kitambulisho (ID) cha kiufundi.
|
||||
|
||||
**Mabadiliko:**
|
||||
1. Sasisha mfumo wa swali:
|
||||
```jinja
|
||||
{% for class_id, class_def in classes.items() %}
|
||||
- **{{class_id}}** (label: "{{class_def.labels[0].value if class_def.labels else class_id}}"){% if class_def.comment %}: {{class_def.comment}}{% endif %}
|
||||
{% endfor %}
|
||||
```
|
||||
|
||||
Matokeo ya mfano:
|
||||
```markdown
|
||||
- **fo/Recipe** (label: "Recipe"): A Recipe is a combination...
|
||||
```
|
||||
|
||||
2. Maelekezo ya sasisho:
|
||||
```markdown
|
||||
When referencing classes:
|
||||
- Use the full prefixed ID (e.g., "fo/Recipe") in JSON output
|
||||
- The label (e.g., "Recipe") is for human understanding only
|
||||
```
|
||||
|
||||
**Faida:**
|
||||
Inafaa zaidi kwa mifumo ya lugha kubwa (LLM).
|
||||
Inahifadhi habari yote.
|
||||
Inaeleza wazi ni nini kinachotakiwa kutumika.
|
||||
|
||||
**Hasara:**
|
||||
Ombi refu zaidi.
|
||||
Mfumo mgumu zaidi.
|
||||
|
||||
## Njia Iliyotekelezwa
|
||||
|
||||
**Muundo Ulioboreshwa wa Muhusiano wa Vitu na Sifa** - unaibadilisha kabisa mfumo wa zamani unaotegemea triplet.
|
||||
|
||||
Njia mpya ilichaguliwa kwa sababu:
|
||||
|
||||
1. **Hakuna Upotevu wa Habari:** Anwani za mtandaoni (URIs) za awali zinaendelea kuhifadhiwa kwa usahihi.
|
||||
2. **Mantiki Rahisi:** Hakuna mabadiliko yanayohitajika, utafutaji wa moja kwa moja wa kamusi unafanya kazi.
|
||||
3. **Usalama wa Nafasi:** Inashughulikia ontolojia nyingi bila migongano.
|
||||
4. **Ukweli wa Kisia:** Inahifadhi maana ya RDF/OWL.
|
||||
|
||||
## Utendaji Uliofanyika
|
||||
|
||||
### Kilichojengwa:
|
||||
|
||||
1. **Mfumo Mpya wa Ombi** (`prompts/ontology-extract-v2.txt`)
|
||||
✅ Sehemu zilizoelezwa wazi: Aina za Vitu, Mahusiano, Sifa.
|
||||
✅ Mfano unaotumia kitambulisho kamili cha aina (`fo/Recipe`, `fo/has_ingredient`).
|
||||
✅ Maelekezo ya kutumia kitambulisho halisi kutoka kwa schema.
|
||||
✅ Muundo mpya wa JSON na safu za vitu/mahusiano/sifa.
|
||||
|
||||
2. **Urekebishaji wa Vitu** (`entity_normalizer.py`)
|
||||
✅ `normalize_entity_name()` - Inabadilisha majina kuwa muundo salama wa URI.
|
||||
✅ `normalize_type_identifier()` - Inashughulikia alama za upande katika aina (`fo/Recipe` → `fo-recipe`).
|
||||
✅ `build_entity_uri()` - Inaunda anwani za kipekee (URIs) kwa kutumia jozi (jina, aina).
|
||||
✅ `EntityRegistry` - Inafuatilia vitu ili kuepuka marudia.
|
||||
|
||||
3. **Mchangamizi wa JSON** (`simplified_parser.py`)
|
||||
✅ Inachanganua muundo mpya: `{entities: [...], relationships: [...], attributes: [...]}`.
|
||||
✅ Inasaidia majina ya sehemu katika muundo wa kebab na muundo wa nyoka.
|
||||
✅ Inarudisha madarasa ya data iliyopangwa.
|
||||
✅ Usimamizi wa makosa kwa njia nzuri pamoja na uandishi wa matukio.
|
||||
|
||||
4. **Mabadilishaji wa Triplet** (`triple_converter.py`)
|
||||
✅ `convert_entity()` - Inaunda triplet za aina + lebo moja kwa moja.
|
||||
✅ `convert_relationship()` - Inaunganisha anwani za vitu (URIs) kupitia sifa.
|
||||
✅ `convert_attribute()` - Inaongeza maadili ya moja kwa moja.
|
||||
✅ Inatafuta anwani kamili kutoka kwa maelezo ya ontolojia.
|
||||
|
||||
5. **Mchakato Mkuu Uliosasishwa** (`extract.py`)
|
||||
✅ Imeondoa msimbo wa zamani wa uondoaji wa triplet.
|
||||
✅ Imeongeza `extract_with_simplified_format()`.
|
||||
✅ Sasa inatumia tu muundo uliorahisishwa.
|
||||
✅ Inaitisha ombi na kitambulisho `extract-with-ontologies-v2`.
|
||||
|
||||
## Majaribio
|
||||
|
||||
### Jaribio la 1: Uhifadhi wa URI
|
||||
```python
|
||||
# Given ontology class
|
||||
classes = {"fo/Recipe": {"uri": "http://purl.org/ontology/fo/Recipe", ...}}
|
||||
|
||||
# When LLM returns
|
||||
llm_output = {"subject": "x", "predicate": "rdf:type", "object": "fo/Recipe"}
|
||||
|
||||
# Then expanded URI should be
|
||||
assert expanded == "http://purl.org/ontology/fo/Recipe"
|
||||
# Not: "https://trustgraph.ai/ontology/food#Recipe"
|
||||
```
|
||||
|
||||
### Mtihani wa 2: Mzozo wa Ontolojia Nyingi
|
||||
```python
|
||||
# Given two ontologies
|
||||
ont1 = {"fo/Recipe": {...}}
|
||||
ont2 = {"cooking/Recipe": {...}}
|
||||
|
||||
# LLM should use full prefix to disambiguate
|
||||
llm_output = {"object": "fo/Recipe"} # Not just "Recipe"
|
||||
```
|
||||
|
||||
### Mtihani wa 3: Muundo wa Eneo la Mfano
|
||||
```python
|
||||
# Given prompt with food ontology
|
||||
# LLM should create instances like
|
||||
{"subject": "recipe:cornish-pasty"} # Namespace-style
|
||||
{"subject": "food:beef"} # Consistent prefix
|
||||
```
|
||||
|
||||
## Maswali ya Kufungua
|
||||
|
||||
1. **Je, vipozi vya mifano ya vitu vinapaswa kutumia mbele za nafasi?**
|
||||
Sasa: `"recipe:cornish-pasty"` (ya hiari)
|
||||
Mbadala: Je, kutumia mbele ya ontolojia `"fo:cornish-pasty"`?
|
||||
Mbadala: Hakuna mbele, kupanua katika URI `"cornish-pasty"` → URI kamili?
|
||||
|
||||
2. **Jinsi ya kushughulikia uwanja/jukumu katika swali?**
|
||||
Kwa sasa inaonyesha: `(Recipe → Food)`
|
||||
Je, inapaswa kuwa: `(fo/Recipe → fo/Food)`?
|
||||
|
||||
3. **Je, tunapaswa kuthibitisha vikwazo vya uwanja/jukumu?**
|
||||
TODO maoni katika extract.py:470
|
||||
Itakamata makosa zaidi lakini ni ngumu zaidi
|
||||
|
||||
4. **Hebu kuhusu sifa za kinyume na usawa?**
|
||||
Ontolojia ina `owl:inverseOf`, `owl:equivalentClass`
|
||||
Hasa haitumiki katika uondoaji
|
||||
Je, inapaswa kutumika?
|
||||
|
||||
## Viashiria vya Mafanikio
|
||||
|
||||
✅ Hakuna upotevu wa habari ya URI (uhifadhi wa 100% wa URI za awali)
|
||||
✅ Muundo wa pato la LLM unalingana na muundo wa ingizo
|
||||
✅ Hakuna mifano ya kusumbua katika swali
|
||||
✅ Vipimo hufanikiwa na ontolojia nyingi
|
||||
✅ Ubora wa uondoaji ulioboreshwa (uliofanywa na asilimia ya triple halali)
|
||||
|
||||
## Mbinu Mbadala: Muundo Ulioboreshwa wa Uondoaji
|
||||
|
||||
### Falsafa
|
||||
|
||||
Badala ya kuuliza LLM kuelewa maana ya RDF/OWL, waulize ifanye kile ambacho ni nzuri: **kutafuta vitu na uhusiano katika maandishi**.
|
||||
|
||||
Acha msimbo kushughulikia uundaji wa URI, ubadilishaji wa RDF, na mambo rasmi ya wavuti ya kiakili.
|
||||
|
||||
### Mfano: Uainishaji wa Vitu
|
||||
|
||||
**Maandishi ya Ingizo:**
|
||||
```
|
||||
Cornish pasty is a traditional British pastry filled with meat and vegetables.
|
||||
```
|
||||
|
||||
**Muundo wa Ontolojia (unaonyeshwa kwa mfumo wa lugha kubwa):**
|
||||
```markdown
|
||||
## Entity Types:
|
||||
- Recipe: A recipe is a combination of ingredients and a method
|
||||
- Food: A food is something that can be eaten
|
||||
- Ingredient: An ingredient combines a quantity and a food
|
||||
```
|
||||
|
||||
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
|
||||
```json
|
||||
{
|
||||
"entities": [
|
||||
{
|
||||
"entity": "Cornish pasty",
|
||||
"type": "Recipe"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Ni Nini Ambo Inazalisha (Triple za RDF):**
|
||||
```python
|
||||
# 1. Normalize entity name + type to ID (type prevents collisions)
|
||||
entity_id = "recipe-cornish-pasty" # normalize("Cornish pasty", "Recipe")
|
||||
entity_uri = "https://trustgraph.ai/food/recipe-cornish-pasty"
|
||||
|
||||
# Note: Same name, different type = different URI
|
||||
# "Cornish pasty" (Recipe) → recipe-cornish-pasty
|
||||
# "Cornish pasty" (Food) → food-cornish-pasty
|
||||
|
||||
# 2. Generate triples
|
||||
triples = [
|
||||
# Type triple
|
||||
Triple(
|
||||
s=Value(value=entity_uri, is_uri=True),
|
||||
p=Value(value="http://www.w3.org/1999/02/22-rdf-syntax-ns#type", is_uri=True),
|
||||
o=Value(value="http://purl.org/ontology/fo/Recipe", is_uri=True)
|
||||
),
|
||||
# Label triple (automatic)
|
||||
Triple(
|
||||
s=Value(value=entity_uri, is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Cornish pasty", is_uri=False)
|
||||
)
|
||||
]
|
||||
```
|
||||
|
||||
### Faida
|
||||
|
||||
1. **LLM haihitaji:**
|
||||
Kuelewa sintaksia ya URI
|
||||
Kuunda mbele za kitambulisho (`recipe:`, `ingredient:`)
|
||||
Kujua kuhusu `rdf:type` au `rdfs:label`
|
||||
Kuunda kitambulisho cha mtandao wa maana
|
||||
|
||||
2. **LLM inahitaji tu:**
|
||||
Kupata vitu katika maandishi
|
||||
Kuviweka katika madarasa ya ontolojia
|
||||
Kuchukua uhusiano na sifa
|
||||
|
||||
3. **Msimbo hushughulikia:**
|
||||
Usanifu na uundaji wa URI
|
||||
Uzalishaji wa triple za RDF
|
||||
Uwekaji wa kiotomatiki wa lebo
|
||||
Usimamizi wa nafasi
|
||||
|
||||
### Kwa Nini Hii Inafanya Vyema
|
||||
|
||||
**Swali rahisi** = uchanganyifu mdogo = makosa machache
|
||||
**Kitambulisho thabiti** = msimbo udhibiti sheria za usanifu
|
||||
**Lebo zilizozalishwa kiotomatiki** = hakuna triple za rdfs:label zilizopotea
|
||||
**LLM inazingatia uondoaji** = ambayo ni jambo ambalo inafaa
|
||||
|
||||
### Mfano: Uhusiano wa Vitu
|
||||
|
||||
**Maandishi ya Ingizo:**
|
||||
```
|
||||
Cornish pasty is a traditional British pastry filled with beef and potatoes.
|
||||
```
|
||||
|
||||
**Muundo wa Ontolojia (unaonyeshwa kwa LLM):**
|
||||
```markdown
|
||||
## Entity Types:
|
||||
- Recipe: A recipe is a combination of ingredients and a method
|
||||
- Food: A food is something that can be eaten
|
||||
- Ingredient: An ingredient combines a quantity and a food
|
||||
|
||||
## Relationships:
|
||||
- has_ingredient: Relates a recipe to an ingredient it uses (Recipe → Ingredient)
|
||||
- food: Relates an ingredient to the food that is required (Ingredient → Food)
|
||||
```
|
||||
|
||||
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
|
||||
```json
|
||||
{
|
||||
"entities": [
|
||||
{
|
||||
"entity": "Cornish pasty",
|
||||
"type": "Recipe"
|
||||
},
|
||||
{
|
||||
"entity": "beef",
|
||||
"type": "Food"
|
||||
},
|
||||
{
|
||||
"entity": "potatoes",
|
||||
"type": "Food"
|
||||
}
|
||||
],
|
||||
"relationships": [
|
||||
{
|
||||
"subject": "Cornish pasty",
|
||||
"subject-type": "Recipe",
|
||||
"relation": "has_ingredient",
|
||||
"object": "beef",
|
||||
"object-type": "Food"
|
||||
},
|
||||
{
|
||||
"subject": "Cornish pasty",
|
||||
"subject-type": "Recipe",
|
||||
"relation": "has_ingredient",
|
||||
"object": "potatoes",
|
||||
"object-type": "Food"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Ni Nini Ambo Inazalisha (Triple za RDF):**
|
||||
```python
|
||||
# Normalize entity names to URIs
|
||||
cornish_pasty_uri = "https://trustgraph.ai/food/cornish-pasty"
|
||||
beef_uri = "https://trustgraph.ai/food/beef"
|
||||
potatoes_uri = "https://trustgraph.ai/food/potatoes"
|
||||
|
||||
# Look up relation URI from ontology
|
||||
has_ingredient_uri = "http://purl.org/ontology/fo/ingredients" # from fo/has_ingredient
|
||||
|
||||
triples = [
|
||||
# Entity type triples (as before)
|
||||
Triple(s=cornish_pasty_uri, p=rdf_type, o="http://purl.org/ontology/fo/Recipe"),
|
||||
Triple(s=cornish_pasty_uri, p=rdfs_label, o="Cornish pasty"),
|
||||
|
||||
Triple(s=beef_uri, p=rdf_type, o="http://purl.org/ontology/fo/Food"),
|
||||
Triple(s=beef_uri, p=rdfs_label, o="beef"),
|
||||
|
||||
Triple(s=potatoes_uri, p=rdf_type, o="http://purl.org/ontology/fo/Food"),
|
||||
Triple(s=potatoes_uri, p=rdfs_label, o="potatoes"),
|
||||
|
||||
# Relationship triples
|
||||
Triple(
|
||||
s=Value(value=cornish_pasty_uri, is_uri=True),
|
||||
p=Value(value=has_ingredient_uri, is_uri=True),
|
||||
o=Value(value=beef_uri, is_uri=True)
|
||||
),
|
||||
Triple(
|
||||
s=Value(value=cornish_pasty_uri, is_uri=True),
|
||||
p=Value(value=has_ingredient_uri, is_uri=True),
|
||||
o=Value(value=potatoes_uri, is_uri=True)
|
||||
)
|
||||
]
|
||||
```
|
||||
|
||||
**Pointi Muhimu:**
|
||||
LLM hurudia majina ya vitu katika lugha ya asili: `"Cornish pasty"`, `"beef"`, `"potatoes"`
|
||||
LLM hujumuisha aina ili kufafanua: `subject-type`, `object-type`
|
||||
LLM hutumia jina la uhusiano kutoka kwa schema: `"has_ingredient"`
|
||||
Msimbo hutengeneza vitambulisho vinavyolingana kwa kutumia (jina, aina): `("Cornish pasty", "Recipe")` → `recipe-cornish-pasty`
|
||||
Msimbo hutafuta URI ya uhusiano kutoka kwa ontolojia: `fo/has_ingredient` → URI kamili
|
||||
Jozi sawa (jina, aina) daima hupata URI sawa (kuondoa marudia)
|
||||
|
||||
### Mfano: Utambuzi wa Jina la Kitu
|
||||
|
||||
**Tatizo:** Jina lile lile linaweza kurejelea aina tofauti za vitu.
|
||||
|
||||
**Mfano halisi:**
|
||||
```
|
||||
"Cornish pasty" can be:
|
||||
- A Recipe (instructions for making it)
|
||||
- A Food (the dish itself)
|
||||
```
|
||||
|
||||
**Jinsi Inavyoshughuliwa:**
|
||||
|
||||
Mfumo wa lugha kubwa (LLM) hurudisha yote kama vitu tofauti:
|
||||
```json
|
||||
{
|
||||
"entities": [
|
||||
{"entity": "Cornish pasty", "type": "Recipe"},
|
||||
{"entity": "Cornish pasty", "type": "Food"}
|
||||
],
|
||||
"relationships": [
|
||||
{
|
||||
"subject": "Cornish pasty",
|
||||
"subject-type": "Recipe",
|
||||
"relation": "produces",
|
||||
"object": "Cornish pasty",
|
||||
"object-type": "Food"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Suluhisho la Msimbo:**
|
||||
```python
|
||||
# Different types → different URIs
|
||||
recipe_uri = normalize("Cornish pasty", "Recipe")
|
||||
# → "https://trustgraph.ai/food/recipe-cornish-pasty"
|
||||
|
||||
food_uri = normalize("Cornish pasty", "Food")
|
||||
# → "https://trustgraph.ai/food/food-cornish-pasty"
|
||||
|
||||
# Relationship connects them correctly
|
||||
triple = Triple(
|
||||
s=recipe_uri, # The Recipe
|
||||
p="http://purl.org/ontology/fo/produces",
|
||||
o=food_uri # The Food
|
||||
)
|
||||
```
|
||||
|
||||
**Kwa Nini Hifanya Kazi:**
|
||||
Aina (type) imejumuishwa katika marejeleo yote (vitu, uhusiano, sifa).
|
||||
Msimbo hutumia `(name, type)` kama ufunguo wa utafutaji.
|
||||
Hakuna ukosefu wa uwazi, hakuna migongano.
|
||||
|
||||
### Mifano: Sifa za Vitu
|
||||
|
||||
**Nakala ya Ingizo:**
|
||||
```
|
||||
This Cornish pasty recipe serves 4-6 people and takes 45 minutes to prepare.
|
||||
```
|
||||
|
||||
**Muundo wa Ontolojia (unaonyeshwa kwa LLM):**
|
||||
```markdown
|
||||
## Entity Types:
|
||||
- Recipe: A recipe is a combination of ingredients and a method
|
||||
|
||||
## Attributes:
|
||||
- serves: Indicates what the recipe is intended to serve (Recipe → text)
|
||||
- preparation_time: Time needed to prepare the recipe (Recipe → text)
|
||||
```
|
||||
|
||||
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
|
||||
```json
|
||||
{
|
||||
"entities": [
|
||||
{
|
||||
"entity": "Cornish pasty recipe",
|
||||
"type": "Recipe"
|
||||
}
|
||||
],
|
||||
"attributes": [
|
||||
{
|
||||
"entity": "Cornish pasty recipe",
|
||||
"entity-type": "Recipe",
|
||||
"attribute": "serves",
|
||||
"value": "4-6 people"
|
||||
},
|
||||
{
|
||||
"entity": "Cornish pasty recipe",
|
||||
"entity-type": "Recipe",
|
||||
"attribute": "preparation_time",
|
||||
"value": "45 minutes"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Ni Nini Ambo Inazalisha (Triple za RDF):**
|
||||
```python
|
||||
# Normalize entity name to URI
|
||||
recipe_uri = "https://trustgraph.ai/food/cornish-pasty-recipe"
|
||||
|
||||
# Look up attribute URIs from ontology
|
||||
serves_uri = "http://purl.org/ontology/fo/serves" # from fo/serves
|
||||
prep_time_uri = "http://purl.org/ontology/fo/preparation_time" # from fo/preparation_time
|
||||
|
||||
triples = [
|
||||
# Entity type triple
|
||||
Triple(
|
||||
s=Value(value=recipe_uri, is_uri=True),
|
||||
p=Value(value=rdf_type, is_uri=True),
|
||||
o=Value(value="http://purl.org/ontology/fo/Recipe", is_uri=True)
|
||||
),
|
||||
|
||||
# Label triple (automatic)
|
||||
Triple(
|
||||
s=Value(value=recipe_uri, is_uri=True),
|
||||
p=Value(value=rdfs_label, is_uri=True),
|
||||
o=Value(value="Cornish pasty recipe", is_uri=False)
|
||||
),
|
||||
|
||||
# Attribute triples (objects are literals, not URIs)
|
||||
Triple(
|
||||
s=Value(value=recipe_uri, is_uri=True),
|
||||
p=Value(value=serves_uri, is_uri=True),
|
||||
o=Value(value="4-6 people", is_uri=False) # Literal value!
|
||||
),
|
||||
Triple(
|
||||
s=Value(value=recipe_uri, is_uri=True),
|
||||
p=Value(value=prep_time_uri, is_uri=True),
|
||||
o=Value(value="45 minutes", is_uri=False) # Literal value!
|
||||
)
|
||||
]
|
||||
```
|
||||
|
||||
**Pointi Muhimu:**
|
||||
LLM huchukua maadili halisi: `"4-6 people"`, `"45 minutes"`
|
||||
LLM hujumuisha aina ya kitu ili kuepusha utofauti: `entity-type`
|
||||
LLM hutumia jina la sifa kutoka kwa schema: `"serves"`, `"preparation_time"`
|
||||
Msimbo hutafuta URI ya sifa kutoka kwa sifa za aina ya ontology
|
||||
**Kitu ni halali** (`is_uri=False`), si rejea la URI
|
||||
Maadili husalia kama maandishi ya asili, hakuna haja ya urekebishaji
|
||||
|
||||
**Tofauti na Mahusiano:**
|
||||
Mahusiano: kitu cha kwanza na cha pili ni vitu (URIs)
|
||||
Sifa: kitu cha kwanza ni kitu (URI), kitu cha pili ni thamani halali (mstari/nambari)
|
||||
|
||||
### Mfano Kamili: Vitu + Mahusiano + Sifa
|
||||
|
||||
**Maandishi ya Ingizo:**
|
||||
```
|
||||
Cornish pasty is a savory pastry filled with beef and potatoes.
|
||||
This recipe serves 4 people.
|
||||
```
|
||||
|
||||
**Hili Ni Lile Ambalo Mfumo wa Lugha Kubwa Hurudisha:**
|
||||
```json
|
||||
{
|
||||
"entities": [
|
||||
{
|
||||
"entity": "Cornish pasty",
|
||||
"type": "Recipe"
|
||||
},
|
||||
{
|
||||
"entity": "beef",
|
||||
"type": "Food"
|
||||
},
|
||||
{
|
||||
"entity": "potatoes",
|
||||
"type": "Food"
|
||||
}
|
||||
],
|
||||
"relationships": [
|
||||
{
|
||||
"subject": "Cornish pasty",
|
||||
"subject-type": "Recipe",
|
||||
"relation": "has_ingredient",
|
||||
"object": "beef",
|
||||
"object-type": "Food"
|
||||
},
|
||||
{
|
||||
"subject": "Cornish pasty",
|
||||
"subject-type": "Recipe",
|
||||
"relation": "has_ingredient",
|
||||
"object": "potatoes",
|
||||
"object-type": "Food"
|
||||
}
|
||||
],
|
||||
"attributes": [
|
||||
{
|
||||
"entity": "Cornish pasty",
|
||||
"entity-type": "Recipe",
|
||||
"attribute": "serves",
|
||||
"value": "4 people"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Matokeo:** Triple 11 za RDF zilizoundwa:
|
||||
Triple 3 za aina ya kitu (rdf:type)
|
||||
Triple 3 za lebo ya kitu (rdfs:label) - moja kwa moja
|
||||
Triple 2 za uhusiano (ina_viungo)
|
||||
Triple 1 ya sifa (inafaa)
|
||||
|
||||
Yote kutoka kwa uundaji rahisi, wa lugha ya asili na mfumo wa akili bandia (LLM)!
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Utaratibu wa sasa: `trustgraph-flow/trustgraph/extract/kg/ontology/extract.py`
|
||||
Mfumo wa swali: `ontology-prompt.md`
|
||||
Majaribio: `tests/unit/test_extract/test_ontology/`
|
||||
Ontolojia ya mfano: `e2e/test-data/food.ontology`
|
||||
155
docs/tech-specs/sw/ontology.sw.md
Normal file
155
docs/tech-specs/sw/ontology.sw.md
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Mbinu ya Muundo wa Ontolojia"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Mbinu ya Muundo wa Ontolojia
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Maelezo
|
||||
|
||||
Mazingira haya yanatoa maelezo kuhusu muundo na umbizo wa ontolojia ndani ya mfumo wa TrustGraph. Ontolojia hutoa modeli rasmi za maarifa ambayo inafafanua madarasa, sifa, na uhusiano, na inasaidia uwezo wa utafsiri na utabiri. Mfumo hutumia umbizo unaolingana na OWL (Web Ontology Language) ambao unafafanua dhana za OWL/RDFS, lakini umeboreshwa kwa mahitaji maalum ya TrustGraph.
|
||||
|
||||
**Mikataba ya Majina**: Mradi huu hutumia "kebab-case" kwa kitambulisho chote (funguo za usanidi, sehemu za API, majina ya moduli, n.k.) badala ya "snake_case".
|
||||
|
||||
## Lengo
|
||||
|
||||
- **Usimamizi wa Darasa na Sifa**: Tafsiri madarasa kama vile ya OWL na sifa, vikoa, masafa, na vikwazo vya aina.
|
||||
- **Usaidizi Kamili wa Semantikia**: Uwezo wa kutumia sifa za RDFS/OWL ikiwa ni pamoja na lebo, usaidizi wa lugha nyingi, na vikwazo rasmi.
|
||||
- **Usaidizi wa Ontolojia nyingi**: Kuruhusu ontolojia nyingi kuwepo na kufanya kazi pamoja.
|
||||
- **Uthibitisho na Utabiri**: Hakikisha kuwa ontolojia zinafuata viwango vya aina ya OWL, na hutoa ufuatiliaji wa uthabiti na usaidizi wa utabiri.
|
||||
- **Ulinganisho na Viwango**: Kusaidia uingizaji na urekebishaji katika umbizo wa kawaida (Turtle, RDF/XML, OWL/XML) wakati unahifadhi uboreshaji wa ndani.
|
||||
|
||||
## Misingi
|
||||
|
||||
TrustGraph huhifadhi ontolojia kama vitu vya usanidi katika mfumo wa thamani-funguo ambao una uwezo wa kubadilika. Ingawa umbizo huu unatokana na OWL (Web Ontology Language), umeboreshwa kwa matumio maalum ya TrustGraph na haufuate vipengele vyote vya OWL.
|
||||
|
||||
Ontolojia katika TrustGraph zinaruhusu:
|
||||
- Ufafanuzi wa aina rasmi za vitu na sifa zake.
|
||||
- Ufafanuzi wa vikoa na masafa ya sifa, pamoja na vikwazo vya aina.
|
||||
- Utabiri na utafsiri wa mantiki.
|
||||
- Uhusiano tata na vikwazo vya wingi.
|
||||
- Usaidizi wa lugha nyingi kwa utoaji wa lugha.
|
||||
|
||||
## Muundo wa Ontolojia
|
||||
|
||||
### Uhifadhi wa Usanidi
|
||||
|
||||
Ontolojia huhifadhiwa kama vitu vya usanidi na muundo ufuatao:
|
||||
- **Aina**: `ontology`
|
||||
- **Funguo**: Kitambulisho cha kipekee cha ontolojia (k.m., `natural-world`, `domain-model`)
|
||||
- **Thamani**: Ontolojia kamili katika umbizo la JSON.
|
||||
|
||||
### Muundo wa JSON
|
||||
|
||||
Muundo wa JSON wa ontolojia una sehemu nne kuu:
|
||||
|
||||
#### 1. MetaData
|
||||
|
||||
Inayo habari ya utawala na maelezo kuhusu ontolojia:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"name": "Ulimwengu wa asili",
|
||||
"description": "Ontolojia inayofafanua mazingira ya asili",
|
||||
"version": "1.0.0",
|
||||
"created": "2025-09-20T12:07:37.068Z",
|
||||
"modified": "2025-09-20T12:12:20.725Z",
|
||||
"creator": "mtumiaji-sasa",
|
||||
"namespace": "http://trustgraph.ai/ontologies/natural-world",
|
||||
"imports": ["http://www.w3.org/2002/07/owl#"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Vifaa:**
|
||||
- `name`: Jina linaloweza kusomwa na binadamu la ontolojia.
|
||||
- `description`: Maelezo mafupi ya lengo la ontolojia.
|
||||
- `version`: Nambari ya toleo.
|
||||
- `created`: Alama ya muda wa ISO 8601 ya uundaji.
|
||||
- `modified`: Alama ya muda wa ISO 8601 ya mabadiliko ya mwisho.
|
||||
- `creator`: Kitambulisho cha mtumiaji/mfumo aliyeuunda.
|
||||
- `namespace`: URI ya msingi kwa vipengele vya ontolojia.
|
||||
- `imports`: Orodha ya URI za ontolojia zilizounganishwa.
|
||||
|
||||
#### 2. Madarasa
|
||||
|
||||
Inafafanua aina za vitu na uhusiano wao wa kimfumo:
|
||||
|
||||
```json
|
||||
{
|
||||
"classes": {
|
||||
"lifeform": {
|
||||
"uri": "http://trustgraph.ai/ontologies/natural-world#lifeform",
|
||||
"type": "owl:Class",
|
||||
"rdfs:label": [{"value": "Lifeform", "lang": "en"}],
|
||||
"rdfs:comment": "Kiumbe hai"
|
||||
},
|
||||
"animal": {
|
||||
"uri": "http://trustgraph.ai/ontologies/natural-world#animal",
|
||||
"type": "owl:Class",
|
||||
"rdfs:label": [{"value": "Animal", "lang": "en"}],
|
||||
"rdfs:comment": "Kiumbe cha wanyama",
|
||||
"rdfs:subClassOf": "lifeform"
|
||||
},
|
||||
"cat": {
|
||||
"uri": "http://trustgraph.ai/ontologies/natural-world#cat",
|
||||
"type": "owl:Class",
|
||||
"rdfs:label": [{"value": "Cat", "lang": "en"}],
|
||||
"rdfs:comment": "Paka",
|
||||
"rdfs:subClassOf": "animal"
|
||||
},
|
||||
"dog": {
|
||||
"uri": "http://trustgraph.ai/ontologies/natural-world#dog",
|
||||
"type": "owl:Class",
|
||||
"rdfs:label": [{"value": "Dog", "lang": "en"}],
|
||||
"rdfs:comment": "Mbwa",
|
||||
"rdfs:subClassOf": "animal",
|
||||
"owl:disjointWith": ["cat"]
|
||||
}
|
||||
},
|
||||
```
|
||||
|
||||
#### 3. Sifa
|
||||
|
||||
(Tangu hakuna sifa katika mfano, sehemu hii imetolewa)
|
||||
|
||||
#### 4. Ufafanuzi wa Sifa za Data
|
||||
|
||||
(Tangu hakunafafanushwi, sehemu hii imetolewa)
|
||||
|
||||
## Kanuni za Uthibitisho
|
||||
|
||||
### Uthibitisho wa Muundo
|
||||
|
||||
1. **Ulinganifu wa URI**: URI zote zinapaswa kufuata muundo `{namespace}#{identifier}`.
|
||||
2. **Hieroni ya Darasa**: Hakuna urithi wa mzunguko katika `rdfs:subClassOf`.
|
||||
3. **Vikoa/Masafa ya Sifa**: Lazima irejee madarasa yaliyopo au aina halali za XSD.
|
||||
4. **Darasa zisizolingana**: Haiwezi kuwa ndogo za kila mmoja.
|
||||
5. **Sifa za Kinyume**: Lazima iwe bidirectional ikiwa imeelezwa.
|
||||
|
||||
### Uthibitisho wa Semantikia
|
||||
|
||||
1. **Kitambulisho cha kipekee**: Kitambulisho cha darasa na sifa lazima kiwe kipekee ndani ya ontolojia.
|
||||
2. **Lebo za Lugha**: Lazima ifuate muundo wa lebo wa BCP 47.
|
||||
3. **Vikwazo vya Kiasi**: `minCardinality` ≤ `maxCardinality` wakati zote zimetajwa.
|
||||
4. **Sifa za Kifaa**: Haiwezi kuwa na `maxCardinality` > 1.
|
||||
|
||||
## Usaidizi wa Umbizo wa Uingizaji/Urekebishaji
|
||||
|
||||
Ingawa umbizo la ndani ni JSON, mfumo unaweza kubadilisha hadi/kutoka kwa umbizo wa kawaida wa ontolojia:
|
||||
|
||||
- **Turtle (.ttl)** - Urekebishaji kompakt wa RDF.
|
||||
- **RDF/XML (.rdf, .owl)** - Umbizo la kawaida la W3C.
|
||||
- **OWL/XML (.owx)** - Umbizo la XML maalum kwa OWL.
|
||||
- **JSON-LD (.jsonld)** - JSON kwa Data iliyounganishwa.
|
||||
|
||||
## Marejeleo
|
||||
|
||||
- [OWL 2 Web Ontology Language](https://www.w3.org/TR/owl2-overview/)
|
||||
- [RDF Schema 1.1](https://www.w3.org/TR/rdf-schema/)
|
||||
- [XML Schema Datatypes](https://www.w3.org/TR/xmlschema-2/)
|
||||
- [BCP 47 Language Tags](https://tools.ietf.org/html/bcp47)
|
||||
1075
docs/tech-specs/sw/ontorag.sw.md
Normal file
1075
docs/tech-specs/sw/ontorag.sw.md
Normal file
File diff suppressed because it is too large
Load diff
239
docs/tech-specs/sw/openapi-spec.sw.md
Normal file
239
docs/tech-specs/sw/openapi-spec.sw.md
Normal file
|
|
@ -0,0 +1,239 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Maelezo ya Kiufundi ya OpenAPI"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Maelezo ya Kiufundi ya OpenAPI
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Lengo
|
||||
|
||||
Kuunda maelezo kamili na yanayoweza kutenganishwa ya OpenAPI 3.1 kwa lango la API la TrustGraph ambalo:
|
||||
Yanadhihirisha viungo vyote vya REST
|
||||
Yanatumia `$ref` ya nje kwa utendaji na uendelevu
|
||||
Yanalingana moja kwa moja na msimuaji wa ujumbe
|
||||
Yanatoa muundo sahihi wa ombi/jibu
|
||||
|
||||
## Chanzo cha Uhakika
|
||||
|
||||
API inafafanuliwa na:
|
||||
**Wasimamizi wa Tafsiri ya Ujumbe**: `trustgraph-base/trustgraph/messaging/translators/*.py`
|
||||
**Meneja wa Msambazaji**: `trustgraph-flow/trustgraph/gateway/dispatch/manager.py`
|
||||
**Meneja wa Kikoa**: `trustgraph-flow/trustgraph/gateway/endpoint/manager.py`
|
||||
|
||||
## Muundo wa Saraka
|
||||
|
||||
```
|
||||
openapi/
|
||||
├── openapi.yaml # Main entry point
|
||||
├── paths/
|
||||
│ ├── config.yaml # Global services
|
||||
│ ├── flow.yaml
|
||||
│ ├── librarian.yaml
|
||||
│ ├── knowledge.yaml
|
||||
│ ├── collection-management.yaml
|
||||
│ ├── flow-services/ # Flow-hosted services
|
||||
│ │ ├── agent.yaml
|
||||
│ │ ├── document-rag.yaml
|
||||
│ │ ├── graph-rag.yaml
|
||||
│ │ ├── text-completion.yaml
|
||||
│ │ ├── prompt.yaml
|
||||
│ │ ├── embeddings.yaml
|
||||
│ │ ├── mcp-tool.yaml
|
||||
│ │ ├── triples.yaml
|
||||
│ │ ├── objects.yaml
|
||||
│ │ ├── nlp-query.yaml
|
||||
│ │ ├── structured-query.yaml
|
||||
│ │ ├── structured-diag.yaml
|
||||
│ │ ├── graph-embeddings.yaml
|
||||
│ │ ├── document-embeddings.yaml
|
||||
│ │ ├── text-load.yaml
|
||||
│ │ └── document-load.yaml
|
||||
│ ├── import-export/
|
||||
│ │ ├── core-import.yaml
|
||||
│ │ ├── core-export.yaml
|
||||
│ │ └── flow-import-export.yaml # WebSocket import/export
|
||||
│ ├── websocket.yaml
|
||||
│ └── metrics.yaml
|
||||
├── components/
|
||||
│ ├── schemas/
|
||||
│ │ ├── config/
|
||||
│ │ ├── flow/
|
||||
│ │ ├── librarian/
|
||||
│ │ ├── knowledge/
|
||||
│ │ ├── collection/
|
||||
│ │ ├── ai-services/
|
||||
│ │ ├── common/
|
||||
│ │ └── errors/
|
||||
│ ├── parameters/
|
||||
│ ├── responses/
|
||||
│ └── examples/
|
||||
└── security/
|
||||
└── bearerAuth.yaml
|
||||
```
|
||||
|
||||
## Huduma za Ulinganishaji
|
||||
|
||||
### Huduma za Kimataifa (`/api/v1/{kind}`)
|
||||
`config` - Usimamizi wa usanidi
|
||||
`flow` - Mzungano wa mchakato
|
||||
`librarian` - Maktaba ya nyaraka
|
||||
`knowledge` - Vituo vya maarifa
|
||||
`collection-management` - Meta data ya mkusanyiko
|
||||
|
||||
### Huduma Zilizowekwa kwenye Mchakato (`/api/v1/flow/{flow}/service/{kind}`)
|
||||
|
||||
**Ombi/Jibu:**
|
||||
`agent`, `text-completion`, `prompt`, `mcp-tool`
|
||||
`graph-rag`, `document-rag`
|
||||
`embeddings`, `graph-embeddings`, `document-embeddings`
|
||||
`triples`, `objects`, `nlp-query`, `structured-query`, `structured-diag`
|
||||
|
||||
**Tuma na Usisubiri:**
|
||||
`text-load`, `document-load`
|
||||
|
||||
### Uingizaji/Utoaji
|
||||
`/api/v1/import-core` (POST)
|
||||
`/api/v1/export-core` (GET)
|
||||
`/api/v1/flow/{flow}/import/{kind}` (WebSocket)
|
||||
`/api/v1/flow/{flow}/export/{kind}` (WebSocket)
|
||||
|
||||
### Mengine
|
||||
`/api/v1/socket` (WebSocket iliyounganishwa)
|
||||
`/api/metrics` (Prometheus)
|
||||
|
||||
## Mbinu
|
||||
|
||||
### Awamu ya 1: Uwekaji
|
||||
1. Unda muundo wa saraka
|
||||
2. Unda `openapi.yaml` kuu pamoja na metadata, seva, usalama
|
||||
3. Unda vipengele vinavyoweza kutumika tena (makosa, vigezo vya kawaida, mpango wa usalama)
|
||||
|
||||
### Awamu ya 2: Mfumo wa Kawaida
|
||||
Unda mfumo wa kawaida unaotumika katika huduma:
|
||||
`RdfValue`, `Triple` - Miundo ya RDF/triple
|
||||
`ErrorObject` - Jibu la kosa
|
||||
`DocumentMetadata`, `ProcessingMetadata` - Miundo ya metadata
|
||||
Vigezo vya kawaida: `FlowId`, `User`, `Collection`
|
||||
|
||||
### Awamu ya 3: Huduma za Kimataifa
|
||||
Kwa kila huduma ya kimataifa (usanidi, mchakato, maktaba, maarifa, usimamizi wa mkusanyiko):
|
||||
1. Unda faili ya njia katika `paths/`
|
||||
2. Unda mfumo wa ombi katika `components/schemas/{service}/`
|
||||
3. Unda mfumo wa jibu
|
||||
4. Ongeza mifano
|
||||
5. Rejelea kutoka kwa `openapi.yaml` kuu
|
||||
|
||||
### Awamu ya 4: Huduma Zilizowekwa kwenye Mchakato
|
||||
Kwa kila huduma iliyowekwa kwenye mchakato:
|
||||
1. Unda faili ya njia katika `paths/flow-services/`
|
||||
2. Unda mifumo ya ombi/jibu katika `components/schemas/ai-services/`
|
||||
3. Ongeza maandishi ya bendera ya utiririshaji ambapo inafaa
|
||||
4. Rejelea kutoka kwa `openapi.yaml` kuu
|
||||
|
||||
### Awamu ya 5: Uingizaji/Utoaji & WebSocket
|
||||
1. Andika kuhusu mwisho wa msingi wa uingizaji/utoaji
|
||||
2. Andika kuhusu mifumo ya itifaki ya WebSocket
|
||||
3. Andika kuhusu mwisho wa uingizaji/utoaji wa WebSocket wa kiwango cha mchakato
|
||||
|
||||
### Awamu ya 6: Uthibitisho
|
||||
1. Thibitisha kwa kutumia zana za uthibitisho wa OpenAPI
|
||||
2. Jaribu kwa kutumia Swagger UI
|
||||
3. Hakikisha kuwa watumizi wote wamefunikwa
|
||||
|
||||
## Mfumo wa Majina ya Nyanja
|
||||
|
||||
Nyanja zote za JSON hutumia **kebab-case**:
|
||||
`flow-id`, `blueprint-name`, `doc-limit`, `entity-limit`, n.k.
|
||||
|
||||
## Kuunda Faili za Mfumo
|
||||
|
||||
Kwa kila mtumizi katika `trustgraph-base/trustgraph/messaging/translators/`:
|
||||
|
||||
1. **Soma njia ya mtumizi `to_pulsar()`** - Inaweka mfumo wa ombi
|
||||
2. **Soma njia ya mtumizi `from_pulsar()`** - Inaweka mfumo wa jibu
|
||||
3. **Toa majina na aina za nyanja**
|
||||
4. **Unda mfumo wa OpenAPI** pamoja na:
|
||||
Majina ya nyanja (kebab-case)
|
||||
Aina (string, integer, boolean, object, array)
|
||||
Nyanja zinazohitajika
|
||||
Maagizo
|
||||
Maelezo
|
||||
|
||||
### Mchakato wa Ulinganishaji wa Kifaa
|
||||
|
||||
```python
|
||||
# From retrieval.py DocumentRagRequestTranslator
|
||||
def to_pulsar(self, data: Dict[str, Any]) -> DocumentRagQuery:
|
||||
return DocumentRagQuery(
|
||||
query=data["query"], # required string
|
||||
user=data.get("user", "trustgraph"), # optional string, default "trustgraph"
|
||||
collection=data.get("collection", "default"), # optional string, default "default"
|
||||
doc_limit=int(data.get("doc-limit", 20)), # optional integer, default 20
|
||||
streaming=data.get("streaming", False) # optional boolean, default false
|
||||
)
|
||||
```
|
||||
|
||||
Inafanana na:
|
||||
|
||||
```yaml
|
||||
# components/schemas/ai-services/DocumentRagRequest.yaml
|
||||
type: object
|
||||
required:
|
||||
- query
|
||||
properties:
|
||||
query:
|
||||
type: string
|
||||
description: Search query
|
||||
user:
|
||||
type: string
|
||||
default: trustgraph
|
||||
collection:
|
||||
type: string
|
||||
default: default
|
||||
doc-limit:
|
||||
type: integer
|
||||
default: 20
|
||||
description: Maximum number of documents to retrieve
|
||||
streaming:
|
||||
type: boolean
|
||||
default: false
|
||||
description: Enable streaming responses
|
||||
```
|
||||
|
||||
## Majibu ya Utiririshaji
|
||||
|
||||
Huduma zinazounga mkono utiririshaji hurudisha majibu mengi na bendera `end_of_stream`:
|
||||
`agent`, `text-completion`, `prompt`
|
||||
`document-rag`, `graph-rag`
|
||||
|
||||
Elezea mfumo huu katika schema ya majibu ya kila huduma.
|
||||
|
||||
## Majibu ya Kosa
|
||||
|
||||
Huduma zote zinaweza kurudisha:
|
||||
```yaml
|
||||
error:
|
||||
oneOf:
|
||||
- type: string
|
||||
- $ref: '#/components/schemas/ErrorObject'
|
||||
```
|
||||
|
||||
Ambako `ErrorObject` iko:
|
||||
```yaml
|
||||
type: object
|
||||
properties:
|
||||
type:
|
||||
type: string
|
||||
message:
|
||||
type: string
|
||||
```
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Watengenezaji: `trustgraph-base/trustgraph/messaging/translators/`
|
||||
Ramani ya utumaji: `trustgraph-flow/trustgraph/gateway/dispatch/manager.py`
|
||||
Uelekezaji wa mwisho: `trustgraph-flow/trustgraph/gateway/endpoint/manager.py`
|
||||
Muhtasari wa huduma: `API_SERVICES_SUMMARY.md`
|
||||
965
docs/tech-specs/sw/pubsub.sw.md
Normal file
965
docs/tech-specs/sw/pubsub.sw.md
Normal file
|
|
@ -0,0 +1,965 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Mifumo ya Uwasilishaji na Ufuatiliaji (Pub/Sub)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Mifumo ya Uwasilishaji na Ufuatiliaji (Pub/Sub)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Hati hii inaorodhesha miunganisho yote kati ya mfumo wa TrustGraph na miundomino ya uwasilishaji na ufuatiliaji. Kwa sasa, mfumo huu umewekwa ili kutumia Apache Pulsar. Uchunguzi huu unaeleza maeneo yote ya kuunganisha ili kutoa taarifa kwa urekebishaji wa baadaye kuelekea uainishaji wa uwasilishaji na ufuatiliaji unaoweza kusanidiwa.
|
||||
|
||||
## Hali ya Sasa: Maeneo ya Kuunganisha ya Pulsar
|
||||
|
||||
### 1. Matumizi ya Moja kwa Moja ya Mteja wa Pulsar
|
||||
|
||||
**Mahali:** `trustgraph-flow/trustgraph/gateway/service.py`
|
||||
|
||||
Lango la API huleta na kuunda mteja wa Pulsar moja kwa moja:
|
||||
|
||||
**Laini ya 20:** `import pulsar`
|
||||
**Laini za 54-61:** Uundaji wa moja kwa moja wa `pulsar.Client()` pamoja na `pulsar.AuthenticationToken()` inayohitajika.
|
||||
**Laini za 33-35:** Usanidi chaguo-msingi wa hosti wa Pulsar kutoka kwa vigezo vya mazingira.
|
||||
**Laini za 178-192:** Vigezo vya CLI kwa `--pulsar-host`, `--pulsar-api-key`, na `--pulsar-listener`.
|
||||
**Laini za 78, 124:** Hupitisha `pulsar_client` kwa `ConfigReceiver` na `DispatcherManager`.
|
||||
|
||||
Hii ndio eneo pekee ambalo huunda mteja wa Pulsar moja kwa moja nje ya safu ya uainishaji.
|
||||
|
||||
### 2. Muundo wa Msingi wa Mchakato
|
||||
|
||||
**Mahali:** `trustgraph-base/trustgraph/base/async_processor.py`
|
||||
|
||||
Darasa la msingi kwa mchakato wote hutoa uwezo wa kuunganisha na Pulsar:
|
||||
|
||||
**Laini ya 9:** `import _pulsar` (kwa usimamizi wa makosa)
|
||||
**Laini ya 18:** `from . pubsub import PulsarClient`
|
||||
**Laini ya 38:** Huunda `pulsar_client_object = PulsarClient(**params)`
|
||||
**Laini za 104-108:** Vipengele ambavyo huonyesha `pulsar_host` na `pulsar_client`
|
||||
**Laini ya 250:** Njia ya tuli `add_args()` huita `PulsarClient.add_args(parser)` kwa vigezo vya CLI
|
||||
**Laini za 223-225:** Usimamizi wa makosa kwa `_pulsar.Interrupted`
|
||||
|
||||
Mchakato wote hurithi kutoka kwa `AsyncProcessor`, na hivyo kuwa eneo kuu la kuunganisha.
|
||||
|
||||
### 3. Uainishaji wa Mtumiaji
|
||||
|
||||
**Mahali:** `trustgraph-base/trustgraph/base/consumer.py`
|
||||
|
||||
Huchukua meseji kutoka kwa folyo na kutoa kazi za utendaji:
|
||||
|
||||
**Uingizaji wa Pulsar:**
|
||||
**Laini ya 12:** `from pulsar.schema import JsonSchema`
|
||||
**Laini ya 13:** `import pulsar`
|
||||
**Laini ya 14:** `import _pulsar`
|
||||
|
||||
**Matumizi maalum ya Pulsar:**
|
||||
**Laini za 100, 102:** `pulsar.InitialPosition.Earliest` / `pulsar.InitialPosition.Latest`
|
||||
**Laini ya 108:** `JsonSchema(self.schema)` wrapper
|
||||
**Laini ya 110:** `pulsar.ConsumerType.Shared`
|
||||
**Laini za 104-111:** `self.client.subscribe()` pamoja na vigezo maalum ya Pulsar
|
||||
**Laini za 143, 150, 65:** `consumer.unsubscribe()` na `consumer.close()` methods
|
||||
**Laini ya 162:** `_pulsar.Timeout` exception
|
||||
**Laini za 182, 205, 232:** `consumer.acknowledge()` / `consumer.negative_acknowledge()`
|
||||
|
||||
**Faili ya spec:** `trustgraph-base/trustgraph/base/consumer_spec.py`
|
||||
**Laini ya 22:** Inarejelea `processor.pulsar_client`
|
||||
|
||||
### 4. Uainishaji wa Mtume
|
||||
|
||||
**Mahali:** `trustgraph-base/trustgraph/base/producer.py`
|
||||
|
||||
Hutuma meseji kwa folyo:
|
||||
|
||||
**Uingizaji wa Pulsar:**
|
||||
**Laini ya 2:** `from pulsar.schema import JsonSchema`
|
||||
|
||||
**Matumizi maalum ya Pulsar:**
|
||||
**Laini ya 49:** `JsonSchema(self.schema)` wrapper
|
||||
**Laini za 47-51:** `self.client.create_producer()` pamoja na vigezo maalum ya Pulsar (mada, schema, chunking_enabled)
|
||||
**Laini za 31, 76:** `producer.close()` method
|
||||
**Laini za 64-65:** `producer.send()` pamoja na meseji na vipengele
|
||||
|
||||
**Faili ya spec:** `trustgraph-base/trustgraph/base/producer_spec.py`
|
||||
**Laini ya 18:** Inarejelea `processor.pulsar_client`
|
||||
|
||||
### 5. Uainishaji wa Mchapishaji
|
||||
|
||||
**Mahali:** `trustgraph-base/trustgraph/base/publisher.py`
|
||||
|
||||
Uchapishaji wa meseji usiohusisha na uwekaji wa folyo:
|
||||
|
||||
**Uingizaji wa Pulsar:**
|
||||
**Laini ya 2:** `from pulsar.schema import JsonSchema`
|
||||
**Laini ya 6:** `import pulsar`
|
||||
|
||||
**Matumizi maalum ya Pulsar:**
|
||||
**Laini ya 52:** `JsonSchema(self.schema)` wrapper
|
||||
**Laini za 50-54:** `self.client.create_producer()` pamoja na vigezo maalum ya Pulsar
|
||||
**Laini za 101, 103:** `producer.send()` pamoja na meseji na vipengele vya hiari
|
||||
**Laini za 106-107:** `producer.flush()` na `producer.close()` methods
|
||||
|
||||
### 6. Uainishaji wa Mlisani
|
||||
|
||||
**Mahali:** `trustgraph-base/trustgraph/base/subscriber.py`
|
||||
|
||||
Inatoa usambazaji wa ujumbe kwa wapokeaji wengi kutoka kwa folyo:
|
||||
|
||||
**Uingizaji kutoka Pulsar:**
|
||||
**Laini ya 6:** `from pulsar.schema import JsonSchema`
|
||||
**Laini ya 8:** `import _pulsar`
|
||||
|
||||
**Matumizi maalum ya Pulsar:**
|
||||
**Laini ya 55:** `JsonSchema(self.schema)` wrapper
|
||||
**Laini ya 57:** `self.client.subscribe(**subscribe_args)`
|
||||
**Laini 101, 136, 160, 167-172:** Vizuizi vya Pulsar: `_pulsar.Timeout`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
|
||||
**Laini 159, 166, 170:** Mbinu za mtumiaji: `negative_acknowledge()`, `unsubscribe()`, `close()`
|
||||
**Laini 247, 251:** Utambuzi wa ujumbe: `acknowledge()`, `negative_acknowledge()`
|
||||
|
||||
**Faili ya spec:** `trustgraph-base/trustgraph/base/subscriber_spec.py`
|
||||
**Laini ya 19:** Inarejelea `processor.pulsar_client`
|
||||
|
||||
### 7. Mfumo wa Schemas (Heart of Darkness)
|
||||
|
||||
**Mahali:** `trustgraph-base/trustgraph/schema/`
|
||||
|
||||
Schemas kila ujumbe katika mfumo huu imefafanuliwa kwa kutumia mfumo wa schemas wa Pulsar.
|
||||
|
||||
**Vipengele muhimu:** `schema/core/primitives.py`
|
||||
**Laini ya 2:** `from pulsar.schema import Record, String, Boolean, Array, Integer`
|
||||
Schemas zote hurithi kutoka kwa darasa la msingi la Pulsar `Record`
|
||||
Aina zote za sehemu ni aina za Pulsar: `String()`, `Integer()`, `Boolean()`, `Array()`, `Map()`, `Double()`
|
||||
|
||||
**Sampuli za schemas:**
|
||||
`schema/services/llm.py` (Laini ya 2): `from pulsar.schema import Record, String, Array, Double, Integer, Boolean`
|
||||
`schema/services/config.py` (Laini ya 2): `from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer`
|
||||
|
||||
**Jina la mada:** `schema/core/topic.py`
|
||||
**Laini 2-3:** Muundo wa mada: `{kind}://{tenant}/{namespace}/{topic}`
|
||||
Muundo huu wa URI ni maalum kwa Pulsar (k.m.e., `persistent://tg/flow/config`)
|
||||
|
||||
**Athari:**
|
||||
Ufafanuzi wote wa ujumbe wa ombi/jibu katika msimbo wote hutumia schemas za Pulsar
|
||||
Hii inajumuisha huduma za: config, flow, llm, prompt, query, storage, agent, collection, diagnosis, library, lookup, nlp_query, objects_query, retrieval, structured_query
|
||||
Ufafanuzi wa schemas huingizwa na kutumika kwa kina katika processors na huduma zote
|
||||
|
||||
## Muhtasari
|
||||
|
||||
### Utegemezi wa Pulsar kwa Kategoria
|
||||
|
||||
1. **Uundaji wa mteja:**
|
||||
Moja kwa moja: `gateway/service.py`
|
||||
Imefichwa: `async_processor.py` → `pubsub.py` (PulsarClient)
|
||||
|
||||
2. **Usafirishaji wa ujumbe:**
|
||||
Mtumiaji: `consumer.py`, `consumer_spec.py`
|
||||
Mtayarishaji: `producer.py`, `producer_spec.py`
|
||||
Mchapishaji: `publisher.py`
|
||||
Msubiri: `subscriber.py`, `subscriber_spec.py`
|
||||
|
||||
3. **Mfumo wa schemas:**
|
||||
Aina za msingi: `schema/core/primitives.py`
|
||||
Schemas zote za huduma: `schema/services/*.py`
|
||||
Jina la mada: `schema/core/topic.py`
|
||||
|
||||
4. **Dhima maalum za Pulsar zinazohitajika:**
|
||||
Ujumbe unaotegemea mada
|
||||
Mfumo wa schemas (Rekodi, aina za sehemu)
|
||||
Usajili uliogawanywa
|
||||
Utambuzi wa ujumbe (chanya/hasi)
|
||||
Nafasi ya mtumiaji (mapema/ya hivi karibuni)
|
||||
Sifa za ujumbe
|
||||
Nafasi ya awali na aina za mtumiaji
|
||||
Usaidizi wa chunking
|
||||
Mada za kudumu vs. zisizo za kudumu
|
||||
|
||||
### Changamoto za Urekebishaji
|
||||
|
||||
Habari njema: Safu ya uainishaji (Mtumiaji, Mtayarishaji, Mchapishaji, Msubiri) hutoa uainishaji safi wa mwingiliano mwingi wa Pulsar.
|
||||
|
||||
Changamoto:
|
||||
1. **Ukuaji wa mfumo wa schemas:** Ufafanuzi kila ujumbe hutumia `pulsar.schema.Record` na aina za Pulsar
|
||||
2. **Enums maalum za Pulsar:** `InitialPosition`, `ConsumerType`
|
||||
3. **Vizuizi vya Pulsar:** `_pulsar.Timeout`, `_pulsar.Interrupted`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
|
||||
4. **Mifumo ya mbinu:** `acknowledge()`, `negative_acknowledge()`, `subscribe()`, `create_producer()`, n.k.
|
||||
5. **Muundo wa URI ya mada:** Muundo wa `kind://tenant/namespace/topic` wa Pulsar
|
||||
|
||||
### Hatua Zinazofuata
|
||||
|
||||
Ili kufanya miundombinu ya p/s kuwa configurable, tunahitaji:
|
||||
|
||||
1. Kuunda kiolesura cha uainishaji kwa mfumo wa mteja/schema
|
||||
2. Kuainisha enums na vizuizi maalum za Pulsar
|
||||
3. Kuunda wrappers za schemas au ufafanuzi mbadala wa schemas
|
||||
4. Kutekeleza kiolesura kwa wateja na mifumo mingine (Kafka, RabbitMQ, Redis Streams, n.k.)
|
||||
5. Kusasisha `pubsub.py` ili iwe configurable na iunge mkono mifumo mingi
|
||||
6. Kutoa njia ya uhamishaji kwa usakinishaji uliopo
|
||||
|
||||
## Mfumo Mkuu wa 1: Mfumo wa Adapta na Safu ya Tafsiri ya Schemas
|
||||
|
||||
### Maarifa Muhimu
|
||||
Mfumo wa schemas ndio msingi wa mfumo huu.
|
||||
|
||||
|
||||
|
||||
**1. Endelea kutumia muundo wa Pulsar kama uwakilishi wa ndani**
|
||||
Usiandike upya maelezo yote ya muundo.
|
||||
Muundo utabaki `pulsar.schema.Record` ndani.
|
||||
Tumia adapta ili kutafsiri katika eneo kati ya programu yetu na mfumo wa utumaji/kupokea.
|
||||
|
||||
**2. Unda safu ya utengwa kwa utumaji/kupokea:**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Existing Code (unchanged) │
|
||||
│ - Uses Pulsar schemas internally │
|
||||
│ - Consumer/Producer/Publisher │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
┌──────────────┴──────────────────────┐
|
||||
│ PubSubFactory (configurable) │
|
||||
│ - Creates backend-specific client │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
┌──────┴──────┐
|
||||
│ │
|
||||
┌───────▼─────┐ ┌────▼─────────┐
|
||||
│ PulsarAdapter│ │ KafkaAdapter │ etc...
|
||||
│ (passthrough)│ │ (translates) │
|
||||
└──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
**3. Tafakikata viambishi vya dhahabu:**
|
||||
`PubSubClient` - muunganisho wa mteja
|
||||
`PubSubProducer` - kutuma ujumbe
|
||||
`PubSubConsumer` - kupokea ujumbe
|
||||
`SchemaAdapter` - kutafsiri muundo wa Pulsar kuwa/kutoka JSON au muundo maalum wa mfumo wa nyuma
|
||||
|
||||
**4. Maelezo ya utekelezaji:**
|
||||
|
||||
Kwa **adapta ya Pulsar**: Karibu kupita moja kwa moja, tafsiri ndogo.
|
||||
|
||||
Kwa **mfumo mwingine wa nyuma** (Kafka, RabbitMQ, n.k.):
|
||||
Tafsiri vitu vya rekodi ya Pulsar kuwa JSON/bytes
|
||||
Linganisha dhana kama:
|
||||
`InitialPosition.Earliest/Latest` → auto.offset.reset ya Kafka
|
||||
`acknowledge()` → kukubali kwa Kafka
|
||||
`negative_acknowledge()` → mfumo wa kurudisha au DLQ
|
||||
URI za mada → majina ya mada maalum ya mfumo wa nyuma
|
||||
|
||||
### Uchambuzi
|
||||
|
||||
**Faida:**
|
||||
✅ Mabadiliko madogo ya msimbo kwa huduma zilizopo
|
||||
✅ Muundo unaendelea kuwa kama ilivyo (hakuna marekebisho makubwa)
|
||||
✅ Njia ya hatua kwa hatua ya uhamishaji
|
||||
✅ Watumiaji wa Pulsar hawona tofauti
|
||||
✅ Mifumo mipya ya nyuma inaongezwa kupitia adapta
|
||||
|
||||
**Hasara:**
|
||||
⚠️ Bado ina utegemezi wa Pulsar (kwa maelezo ya muundo)
|
||||
⚠️ Mizozo mingine inapotafsiri dhana
|
||||
|
||||
### Toleo Mbadala
|
||||
|
||||
Unda **mfumo wa muundo wa TrustGraph** ambao hautegemei mfumo wowote wa kutuma na kupokea (kwa kutumia madarasa ya data au Pydantic), kisha uzalisha muundo wa Pulsar/Kafka/n.k. kutoka humo. Hii inahitaji kuandikewa tena kila faili ya muundo na inaweza kusababisha mabadiliko.
|
||||
|
||||
### Mapendekezo kwa Rasimu ya 1
|
||||
|
||||
Anza na **mbinu ya adapta** kwa sababu:
|
||||
1. Ni ya vitendo - inafanya kazi na msimbo uliopo
|
||||
2. Inathibitisha dhana kwa hatari ndogo
|
||||
3. Inaweza kubadilika kuwa mfumo wa asili wa muundo baadaye ikiwa inahitajika
|
||||
4. Inadumishwa kupitia usanidi: variable moja ya mazingira inabadilisha mifumo ya nyuma
|
||||
|
||||
## Mbinu ya Rasimu ya 2: Mfumo wa Muundo Usio na Utendaji wa Nyuma na Madarasa ya Data
|
||||
|
||||
### Dhana Kuu
|
||||
|
||||
Tumia **madarasa ya data ya Python** kama muundo wa muundo wa kati. Kila mfumo wa nyuma wa kutuma na kupokea hutoa utafsiri wake mwenyewe wa kuandika/kusoma kwa madarasa ya data, na kuondoa hitaji kwamba muundo wa Pulsar uendelee kuwa katika msimbo.
|
||||
|
||||
### Ulinganifu wa Muundo katika Kiwango cha Kiwanda
|
||||
|
||||
Badala ya kutafsiri muundo wa Pulsar, **kila mfumo wa nyuma hutoa utunzaji wake mwenyewe wa muundo** ambao unafanya kazi na madarasa ya data ya Python ya kawaida.
|
||||
|
||||
### Mtiririko wa Mchapishaji
|
||||
|
||||
```python
|
||||
# 1. Get the configured backend from factory
|
||||
pubsub = get_pubsub() # Returns PulsarBackend, MQTTBackend, etc.
|
||||
|
||||
# 2. Get schema class from the backend
|
||||
# (Can be imported directly - backend-agnostic)
|
||||
from trustgraph.schema.services.llm import TextCompletionRequest
|
||||
|
||||
# 3. Create a producer/publisher for a specific topic
|
||||
producer = pubsub.create_producer(
|
||||
topic="text-completion-requests",
|
||||
schema=TextCompletionRequest # Tells backend what schema to use
|
||||
)
|
||||
|
||||
# 4. Create message instances (same API regardless of backend)
|
||||
request = TextCompletionRequest(
|
||||
system="You are helpful",
|
||||
prompt="Hello world",
|
||||
streaming=False
|
||||
)
|
||||
|
||||
# 5. Send the message
|
||||
producer.send(request) # Backend serializes appropriately
|
||||
```
|
||||
|
||||
### Mchakato wa Mtumiaji
|
||||
|
||||
```python
|
||||
# 1. Get the configured backend
|
||||
pubsub = get_pubsub()
|
||||
|
||||
# 2. Create a consumer
|
||||
consumer = pubsub.subscribe(
|
||||
topic="text-completion-requests",
|
||||
schema=TextCompletionRequest # Tells backend how to deserialize
|
||||
)
|
||||
|
||||
# 3. Receive and deserialize
|
||||
msg = consumer.receive()
|
||||
request = msg.value() # Returns TextCompletionRequest dataclass instance
|
||||
|
||||
# 4. Use the data (type-safe access)
|
||||
print(request.system) # "You are helpful"
|
||||
print(request.prompt) # "Hello world"
|
||||
print(request.streaming) # False
|
||||
```
|
||||
|
||||
### Kile Kinachotokea Nyuma ya Kulabu
|
||||
|
||||
**Kwa mfumo wa nyuma (backend) wa Pulsar:**
|
||||
`create_producer()` → huunda mtayarishaji (producer) wa Pulsar ukitumia schema ya JSON au rekodi iliyoundwa moja kwa moja.
|
||||
`send(request)` → huhifadhi (hufanya serialization) darasa la data (dataclass) kuwa muundo wa JSON/Pulsar, na hutuma kwa Pulsar.
|
||||
`receive()` → hupokea ujumbe wa Pulsar, na huhifadhi tena (hufanya deserialization) kurudi kuwa darasa la data.
|
||||
|
||||
**Kwa mfumo wa nyuma (backend) wa MQTT:**
|
||||
`create_producer()` → huunganisha na programu (broker) ya MQTT, hakuna haja ya usajili wa schema.
|
||||
`send(request)` → hubadilisha darasa la data kuwa JSON, na hutuma kwenye mada (topic) ya MQTT.
|
||||
`receive()` → huhudhuria mada (topic) ya MQTT, na huhifadhi tena JSON kurudi kuwa darasa la data.
|
||||
|
||||
**Kwa mfumo wa nyuma (backend) wa Kafka:**
|
||||
`create_producer()` → huunda mtayarishaji (producer) wa Kafka, na husajili schema ya Avro ikiwa inahitajika.
|
||||
`send(request)` → huhifadhi darasa la data kuwa muundo wa Avro, na hutuma kwa Kafka.
|
||||
`receive()` → hupokea ujumbe wa Kafka, na huhifadhi tena Avro kurudi kuwa darasa la data.
|
||||
|
||||
### Vipengele Muhimu vya Ubunifu
|
||||
|
||||
1. **Uundaji wa kitu (object) cha schema:** Kitu (object) cha darasa la data (dataclass) (`TextCompletionRequest(...)`) ni sawa bila kujali mfumo wa nyuma (backend).
|
||||
2. **Mfumo wa nyuma (backend) hutunza uhifadhi:** Kila mfumo wa nyuma (backend) unajua jinsi ya kuhifadhi darasa lake la data kuwa muundo unaotumwa.
|
||||
3. **Ufafanuzi wa schema wakati wa uundaji:** Unapounda mtayarishaji (producer)/mpokeaji (consumer), unataja aina ya schema.
|
||||
4. **Usalama wa aina (type) unahifadhiwa:** Unapata kitu (object) sahihi cha `TextCompletionRequest`, sio kamusi (dict).
|
||||
5. **Hakuna uvujaji wa mfumo wa nyuma (backend):** Msimbo wa programu kamwe hauingize maktaba maalum za mfumo wa nyuma (backend).
|
||||
|
||||
### Mfano wa Ubadilishaji
|
||||
|
||||
**Hali ya sasa (maalum kwa Pulsar):**
|
||||
```python
|
||||
# schema/services/llm.py
|
||||
from pulsar.schema import Record, String, Boolean, Integer
|
||||
|
||||
class TextCompletionRequest(Record):
|
||||
system = String()
|
||||
prompt = String()
|
||||
streaming = Boolean()
|
||||
```
|
||||
|
||||
**Mpya (Sio tegemezi ya mfumo wa nyuma):**
|
||||
```python
|
||||
# schema/services/llm.py
|
||||
from dataclasses import dataclass
|
||||
|
||||
@dataclass
|
||||
class TextCompletionRequest:
|
||||
system: str
|
||||
prompt: str
|
||||
streaming: bool = False
|
||||
```
|
||||
|
||||
### Uunganisho wa Seva ya Nyuma (Backend)
|
||||
|
||||
Kila seva ya nyuma hushughulikia us serialization/deserialization wa madatakesi:
|
||||
|
||||
**Seva ya nyuma ya Pulsar:**
|
||||
Huunda madatakesi `pulsar.schema.Record` moja kwa moja kutoka kwa madatakesi.
|
||||
Au huserialize madatakesi hadi JSON na kutumia mfumo wa JSON wa Pulsar.
|
||||
Inaendelea kudumisha utangamano na matumizi ya sasa ya Pulsar.
|
||||
|
||||
**Seva ya nyuma ya MQTT/Redis:**
|
||||
Huserialize madatakesi ya aina ya JSON moja kwa moja.
|
||||
Tumia `dataclasses.asdict()` / `from_dict()`.
|
||||
Nyepesi, haihitaji usajili wa mfumo.
|
||||
|
||||
**Seva ya nyuma ya Kafka:**
|
||||
Huunda mifumo ya Avro kutoka kwa maelezo ya madatakesi.
|
||||
Tumia usajili wa mfumo wa Confluent.
|
||||
Us serialization wa salama wa aina na udhamini wa mabadiliko ya mfumo.
|
||||
|
||||
### Muundo
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Application Code │
|
||||
│ - Uses dataclass schemas │
|
||||
│ - Backend-agnostic │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
┌──────────────┴──────────────────────┐
|
||||
│ PubSubFactory (configurable) │
|
||||
│ - get_pubsub() returns backend │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
┌──────┴──────┐
|
||||
│ │
|
||||
┌───────▼─────────┐ ┌────▼──────────────┐
|
||||
│ PulsarBackend │ │ MQTTBackend │
|
||||
│ - JSON schema │ │ - JSON serialize │
|
||||
│ - or dynamic │ │ - Simple queues │
|
||||
│ Record gen │ │ │
|
||||
└─────────────────┘ └───────────────────┘
|
||||
```
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
**1. Ufafanuzi wa muundo:** Darasa za data za kawaida na maelezo ya aina
|
||||
`str`, `int`, `bool`, `float` kwa vipengele vya msingi
|
||||
`list[T]` kwa safu
|
||||
`dict[str, T]` kwa ramani
|
||||
Darasa za data zilizounganishwa kwa aina ngumu
|
||||
|
||||
**2. Kila mfumo hutoa:**
|
||||
Mfumo wa ubadilishaji: `dataclass → bytes/wire format`
|
||||
Mfumo wa kurejesha: `bytes/wire format → dataclass`
|
||||
Usajili wa muundo (ikiwa inahitajika, kama Pulsar/Kafka)
|
||||
|
||||
**3. Dhana ya mtumiaji/mtayarishaji:**
|
||||
Tayari ipo (consumer.py, producer.py)
|
||||
Sasisha ili kutumia ubadilishaji wa mfumo
|
||||
Ondoa uingizaji wa moja kwa moja wa Pulsar
|
||||
|
||||
**4. Ulinganisho wa aina:**
|
||||
Pulsar `String()` → Python `str`
|
||||
Pulsar `Integer()` → Python `int`
|
||||
Pulsar `Boolean()` → Python `bool`
|
||||
Pulsar `Array(T)` → Python `list[T]`
|
||||
Pulsar `Map(K, V)` → Python `dict[K, V]`
|
||||
Pulsar `Double()` → Python `float`
|
||||
Pulsar `Bytes()` → Python `bytes`
|
||||
|
||||
### Njia ya Uhamishaji
|
||||
|
||||
1. **Tengeneza matoleo ya darasa za data** ya muundo wote katika `trustgraph/schema/`
|
||||
2. **Sasisha madarasa ya mfumo** (Mtumiaji, Mtayarishaji, Mchapishaji, Mwasili) ili kutumia ubadilishaji unaotolewa na mfumo
|
||||
3. **Teleza PulsarBackend** na muundo wa JSON au uzalishaji wa Rekodi wa moja kwa moja
|
||||
4. **Jaribu na Pulsar** ili kuhakikisha utangamano wa nyuma na matumizi yaliyopo
|
||||
5. **Ongeza mifumo mipya** (MQTT, Kafka, Redis, n.k.) kama inahitajika
|
||||
6. **Ondoa uingizaji wa Pulsar** kutoka kwa faili za muundo
|
||||
|
||||
### Faida
|
||||
|
||||
✅ **Hakuna utegemezi wa pub/sub** katika ufafanuzi wa muundo
|
||||
✅ **Python ya kawaida** - rahisi kuelewa, kuangalia aina, na kutoa maelezo
|
||||
✅ **Zana za kisasa** - inafanya kazi na mypy, kukamilisha kiotomatiki kwa IDE, na vichujio
|
||||
✅ **Imeboreshwa kwa mfumo** - kila mfumo hutumia ubadilishaji wa asili
|
||||
✅ **Hakuna gharama ya tafsiri** - ubadilishaji wa moja kwa moja, hakuna adapta
|
||||
✅ **Usalama wa aina** - vitu halisi na aina sahihi
|
||||
✅ **Uthibitisho rahisi** - inaweza kutumia Pydantic ikiwa inahitajika
|
||||
|
||||
### Changamoto na Suluhisho
|
||||
|
||||
**Changamoto:** `Record` ya Pulsar ina uthibitisho wa uwanja wakati wa utekelezaji
|
||||
**Suluhisho:** Tumia darasa za data za Pydantic kwa uthibitisho ikiwa inahitajika, au vipengele vya darasa za data ya Python 3.10+ na `__post_init__`
|
||||
|
||||
**Changamoto:** Vipengele vingine maalum vya Pulsar (kama aina ya `Bytes`)
|
||||
**Suluhisho:** Linganisha na aina ya `bytes` katika darasa ya data, mfumo hutunza uandikaji ipasavyo
|
||||
|
||||
**Changamoto:** Majina ya mada (`persistent://tenant/namespace/topic`)
|
||||
**Suluhisho:** Dhani majina ya mada katika ufafanuzi wa muundo, mfumo hubadilisha kuwa muundo sahihi
|
||||
|
||||
**Changamoto:** Maendeleo na toleo la muundo
|
||||
**Suluhisho:** Kila mfumo hushughulikia hii kulingana na uwezo wake (matoleo ya muundo ya Pulsar, rejista ya muundo ya Kafka, n.k.)
|
||||
|
||||
**Changamoto:** Aina ngumu zilizounganishwa
|
||||
**Suluhisho:** Tumia darasa za data zilizounganishwa, mifumo inabadilisha/kurejesha kwa uangalifu
|
||||
|
||||
### Maamuzi ya Ubunifu
|
||||
|
||||
1. **Darasa za data za kawaida au Pydantic?**
|
||||
✅ **Maamuzi: Tumia darasa za data za Python za kawaida**
|
||||
Rahisi, hakuna utegemezi wa ziada
|
||||
Uthibitisho hauhitajiki kwa mazoea
|
||||
Rahisi kuelewa na kudumisha
|
||||
|
||||
2. **Maendeleo ya muundo:**
|
||||
✅ **Maamuzi: Hakuna utaratibu wa toleo unaohitajika**
|
||||
Miondoko ni thabiti na ya muda mrefu
|
||||
Marekebisho kawaida huongeza sehemu mpya (utangamano wa nyuma)
|
||||
Mifumo inashughulikia maendeleo ya muundo kulingana na uwezo wake
|
||||
|
||||
3. **Ulingano wa nyuma:**
|
||||
✅ **Maamuzi: Mabadiliko makubwa ya toleo, utangamano wa nyuma hauhitajiki**
|
||||
Itakuwa mabadiliko ya kuvunja na maagizo ya uhamishaji
|
||||
Kutoa mtego huruhusu muundo bora
|
||||
Mwongozo wa uhamishaji utatolewa kwa matumizi yaliyopo
|
||||
|
||||
4. **Aina zilizounganishwa na miundo ngumu:**
|
||||
✅ **Maamuzi: Tumia darasa za data zilizounganishwa kwa asili**
|
||||
Darasa za data za Python zinashughulikia uunganishaji kikamilifu
|
||||
`list[T]` kwa safu, `dict[K, V]` kwa ramani
|
||||
Mifumo inabadilisha/kurejesha kwa uangalifu
|
||||
Mfano:
|
||||
```python
|
||||
@dataclass
|
||||
class Value:
|
||||
value: str
|
||||
is_uri: bool
|
||||
|
||||
@dataclass
|
||||
class Triple:
|
||||
s: Value # Nested dataclass
|
||||
p: Value
|
||||
o: Value
|
||||
|
||||
@dataclass
|
||||
class GraphQuery:
|
||||
triples: list[Triple] # Array of nested dataclasses
|
||||
metadata: dict[str, str]
|
||||
```
|
||||
|
||||
5. **Maelezo ya msingi na sehemu za hiari:**
|
||||
✅ **Uamuzi: Mchanganyiko wa sehemu za lazima, maelezo ya msingi, na sehemu za hiari**
|
||||
Sehemu za lazima: Hakuna maelezo ya msingi
|
||||
Sehemu zilizo na maelezo ya msingi: Zipo kila wakati, zina maelezo ya msingi yanayofaa
|
||||
Sehemu za hiari kabisa: `T | None = None`, huachwa kutoka kwenye serialization wakati `None`
|
||||
Mfano:
|
||||
```python
|
||||
@dataclass
|
||||
class TextCompletionRequest:
|
||||
system: str # Required, no default
|
||||
prompt: str # Required, no default
|
||||
streaming: bool = False # Optional with default value
|
||||
metadata: dict | None = None # Truly optional, can be absent
|
||||
```
|
||||
|
||||
**Maana muhimu ya usanifu:**
|
||||
|
||||
Wakati `metadata = None`:
|
||||
```json
|
||||
{
|
||||
"system": "...",
|
||||
"prompt": "...",
|
||||
"streaming": false
|
||||
// metadata field NOT PRESENT
|
||||
}
|
||||
```
|
||||
|
||||
Wakati `metadata = {}` (tupu wazi):
|
||||
```json
|
||||
{
|
||||
"system": "...",
|
||||
"prompt": "...",
|
||||
"streaming": false,
|
||||
"metadata": {} // Field PRESENT but empty
|
||||
}
|
||||
```
|
||||
|
||||
**Tofauti kuu:**
|
||||
`None` → sehemu ambayo haina katika JSON (hairekebishwi)
|
||||
Thamani tupu (`{}`, `[]`, `""`) → sehemu inayoonekana na thamani tupu
|
||||
Hii ina umuhimu wa maana: "haiyapatikani" dhidi ya "tupu kwa uwazi"
|
||||
Mifumo ya kurekebisha data lazima zisipite sehemu za `None`, badala ya kuzirekebisha kama `null`
|
||||
|
||||
## Mfumo wa Awali wa 3: Maelezo ya Utendaji
|
||||
|
||||
### Muundo wa Jina la Kawaida la Kundi
|
||||
|
||||
Badilisha majina ya kundi maalum ya kila mfumo na muundo wa kawaida ambao mifumo inaweza kulinganisha ipasavyo.
|
||||
|
||||
**Muundo:** `{qos}/{tenant}/{namespace}/{queue-name}`
|
||||
|
||||
Ambako:
|
||||
`qos`: Ngazi ya Huduma ya Ubora
|
||||
`q0` = juhudi za kawaida (tuma na usisahau, hakuna utambuzi)
|
||||
`q1` = angalau mara moja (inahitaji utambuzi)
|
||||
`q2` = hasiwa mara moja (utambuzi wa awamu mbili)
|
||||
`tenant`: Kikundi cha mantiki kwa ushirikaji wa wateja wengi
|
||||
`namespace`: Kikundi kidogo ndani ya mteja
|
||||
`queue-name`: Jina halisi la kundi/mada
|
||||
|
||||
**Mfano:**
|
||||
```
|
||||
q1/tg/flow/text-completion-requests
|
||||
q2/tg/config/config-push
|
||||
q0/tg/metrics/stats
|
||||
```
|
||||
|
||||
### Ramani ya Mada za Seva ya Nyuma (Backend)
|
||||
|
||||
Kila seva ya nyuma (backend) inaunganisha muundo wa jumla na muundo wake wa asili:
|
||||
|
||||
**Seva ya Nyuma ya Pulsar:**
|
||||
```python
|
||||
def map_topic(self, generic_topic: str) -> str:
|
||||
# Parse: q1/tg/flow/text-completion-requests
|
||||
qos, tenant, namespace, queue = generic_topic.split('/', 3)
|
||||
|
||||
# Map QoS to persistence
|
||||
persistence = 'persistent' if qos in ['q1', 'q2'] else 'non-persistent'
|
||||
|
||||
# Return Pulsar URI: persistent://tg/flow/text-completion-requests
|
||||
return f"{persistence}://{tenant}/{namespace}/{queue}"
|
||||
```
|
||||
|
||||
**Umfumo wa Nyuma wa MQTT:**
|
||||
```python
|
||||
def map_topic(self, generic_topic: str) -> tuple[str, int]:
|
||||
# Parse: q1/tg/flow/text-completion-requests
|
||||
qos, tenant, namespace, queue = generic_topic.split('/', 3)
|
||||
|
||||
# Map QoS level
|
||||
qos_level = {'q0': 0, 'q1': 1, 'q2': 2}[qos]
|
||||
|
||||
# Build MQTT topic including tenant/namespace for proper namespacing
|
||||
mqtt_topic = f"{tenant}/{namespace}/{queue}"
|
||||
|
||||
return mqtt_topic, qos_level
|
||||
```
|
||||
|
||||
### Kazi ya Msaidizi ya Mada Iliyosasishwa
|
||||
|
||||
```python
|
||||
# schema/core/topic.py
|
||||
def topic(queue_name, qos='q1', tenant='tg', namespace='flow'):
|
||||
"""
|
||||
Create a generic topic identifier that can be mapped by backends.
|
||||
|
||||
Args:
|
||||
queue_name: The queue/topic name
|
||||
qos: Quality of service
|
||||
- 'q0' = best-effort (no ack)
|
||||
- 'q1' = at-least-once (ack required)
|
||||
- 'q2' = exactly-once (two-phase ack)
|
||||
tenant: Tenant identifier for multi-tenancy
|
||||
namespace: Namespace within tenant
|
||||
|
||||
Returns:
|
||||
Generic topic string: qos/tenant/namespace/queue_name
|
||||
|
||||
Examples:
|
||||
topic('my-queue') # q1/tg/flow/my-queue
|
||||
topic('config', qos='q2', namespace='config') # q2/tg/config/config
|
||||
"""
|
||||
return f"{qos}/{tenant}/{namespace}/{queue_name}"
|
||||
```
|
||||
|
||||
### Usanidi na Uanzishaji
|
||||
|
||||
**Vigezo vya Amri na Vigezo vya Mazingira:**
|
||||
|
||||
```python
|
||||
# In base/async_processor.py - add_args() method
|
||||
@staticmethod
|
||||
def add_args(parser):
|
||||
# Pub/sub backend selection
|
||||
parser.add_argument(
|
||||
'--pubsub-backend',
|
||||
default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
|
||||
choices=['pulsar', 'mqtt'],
|
||||
help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)'
|
||||
)
|
||||
|
||||
# Pulsar-specific configuration
|
||||
parser.add_argument(
|
||||
'--pulsar-host',
|
||||
default=os.getenv('PULSAR_HOST', 'pulsar://localhost:6650'),
|
||||
help='Pulsar host (default: pulsar://localhost:6650, env: PULSAR_HOST)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--pulsar-api-key',
|
||||
default=os.getenv('PULSAR_API_KEY', None),
|
||||
help='Pulsar API key (env: PULSAR_API_KEY)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--pulsar-listener',
|
||||
default=os.getenv('PULSAR_LISTENER', None),
|
||||
help='Pulsar listener name (env: PULSAR_LISTENER)'
|
||||
)
|
||||
|
||||
# MQTT-specific configuration
|
||||
parser.add_argument(
|
||||
'--mqtt-host',
|
||||
default=os.getenv('MQTT_HOST', 'localhost'),
|
||||
help='MQTT broker host (default: localhost, env: MQTT_HOST)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--mqtt-port',
|
||||
type=int,
|
||||
default=int(os.getenv('MQTT_PORT', '1883')),
|
||||
help='MQTT broker port (default: 1883, env: MQTT_PORT)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--mqtt-username',
|
||||
default=os.getenv('MQTT_USERNAME', None),
|
||||
help='MQTT username (env: MQTT_USERNAME)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--mqtt-password',
|
||||
default=os.getenv('MQTT_PASSWORD', None),
|
||||
help='MQTT password (env: MQTT_PASSWORD)'
|
||||
)
|
||||
```
|
||||
|
||||
**Fungua Kazi:**
|
||||
|
||||
```python
|
||||
# In base/pubsub.py or base/pubsub_factory.py
|
||||
def get_pubsub(**config) -> PubSubBackend:
|
||||
"""
|
||||
Create and return a pub/sub backend based on configuration.
|
||||
|
||||
Args:
|
||||
config: Configuration dict from command-line args
|
||||
Must include 'pubsub_backend' key
|
||||
|
||||
Returns:
|
||||
Backend instance (PulsarBackend, MQTTBackend, etc.)
|
||||
"""
|
||||
backend_type = config.get('pubsub_backend', 'pulsar')
|
||||
|
||||
if backend_type == 'pulsar':
|
||||
return PulsarBackend(
|
||||
host=config.get('pulsar_host'),
|
||||
api_key=config.get('pulsar_api_key'),
|
||||
listener=config.get('pulsar_listener'),
|
||||
)
|
||||
elif backend_type == 'mqtt':
|
||||
return MQTTBackend(
|
||||
host=config.get('mqtt_host'),
|
||||
port=config.get('mqtt_port'),
|
||||
username=config.get('mqtt_username'),
|
||||
password=config.get('mqtt_password'),
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unknown pub/sub backend: {backend_type}")
|
||||
```
|
||||
|
||||
**Matumizi katika AsyncProcessor:**
|
||||
|
||||
```python
|
||||
# In async_processor.py
|
||||
class AsyncProcessor:
|
||||
def __init__(self, **params):
|
||||
self.id = params.get("id")
|
||||
|
||||
# Create backend from config (replaces PulsarClient)
|
||||
self.pubsub = get_pubsub(**params)
|
||||
|
||||
# Rest of initialization...
|
||||
```
|
||||
|
||||
### Kiolesura cha Nyuma
|
||||
|
||||
```python
|
||||
class PubSubBackend(Protocol):
|
||||
"""Protocol defining the interface all pub/sub backends must implement."""
|
||||
|
||||
def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
|
||||
"""
|
||||
Create a producer for a topic.
|
||||
|
||||
Args:
|
||||
topic: Generic topic format (qos/tenant/namespace/queue)
|
||||
schema: Dataclass type for messages
|
||||
options: Backend-specific options (e.g., chunking_enabled)
|
||||
|
||||
Returns:
|
||||
Backend-specific producer instance
|
||||
"""
|
||||
...
|
||||
|
||||
def create_consumer(
|
||||
self,
|
||||
topic: str,
|
||||
subscription: str,
|
||||
schema: type,
|
||||
initial_position: str = 'latest',
|
||||
consumer_type: str = 'shared',
|
||||
**options
|
||||
) -> BackendConsumer:
|
||||
"""
|
||||
Create a consumer for a topic.
|
||||
|
||||
Args:
|
||||
topic: Generic topic format (qos/tenant/namespace/queue)
|
||||
subscription: Subscription/consumer group name
|
||||
schema: Dataclass type for messages
|
||||
initial_position: 'earliest' or 'latest' (MQTT may ignore)
|
||||
consumer_type: 'shared', 'exclusive', 'failover' (MQTT may ignore)
|
||||
options: Backend-specific options
|
||||
|
||||
Returns:
|
||||
Backend-specific consumer instance
|
||||
"""
|
||||
...
|
||||
|
||||
def close(self) -> None:
|
||||
"""Close the backend connection."""
|
||||
...
|
||||
```
|
||||
|
||||
```python
|
||||
class BackendProducer(Protocol):
|
||||
"""Protocol for backend-specific producer."""
|
||||
|
||||
def send(self, message: Any, properties: dict = {}) -> None:
|
||||
"""Send a message (dataclass instance) with optional properties."""
|
||||
...
|
||||
|
||||
def flush(self) -> None:
|
||||
"""Flush any buffered messages."""
|
||||
...
|
||||
|
||||
def close(self) -> None:
|
||||
"""Close the producer."""
|
||||
...
|
||||
```
|
||||
|
||||
```python
|
||||
class BackendConsumer(Protocol):
|
||||
"""Protocol for backend-specific consumer."""
|
||||
|
||||
def receive(self, timeout_millis: int = 2000) -> Message:
|
||||
"""
|
||||
Receive a message from the topic.
|
||||
|
||||
Raises:
|
||||
TimeoutError: If no message received within timeout
|
||||
"""
|
||||
...
|
||||
|
||||
def acknowledge(self, message: Message) -> None:
|
||||
"""Acknowledge successful processing of a message."""
|
||||
...
|
||||
|
||||
def negative_acknowledge(self, message: Message) -> None:
|
||||
"""Negative acknowledge - triggers redelivery."""
|
||||
...
|
||||
|
||||
def unsubscribe(self) -> None:
|
||||
"""Unsubscribe from the topic."""
|
||||
...
|
||||
|
||||
def close(self) -> None:
|
||||
"""Close the consumer."""
|
||||
...
|
||||
```
|
||||
|
||||
```python
|
||||
class Message(Protocol):
|
||||
"""Protocol for a received message."""
|
||||
|
||||
def value(self) -> Any:
|
||||
"""Get the deserialized message (dataclass instance)."""
|
||||
...
|
||||
|
||||
def properties(self) -> dict:
|
||||
"""Get message properties/metadata."""
|
||||
...
|
||||
```
|
||||
|
||||
### Urekebishaji wa Darasa Zilizopo
|
||||
|
||||
Madarasa yaliyopo ya `Consumer`, `Producer`, `Publisher`, `Subscriber` yanabaki kwa kiasi kikubwa bila kubadilishwa:
|
||||
|
||||
**Jukumu la sasa (hakikisha):**
|
||||
Mfumo wa uzi usio na usumbufu na vikundi vya kazi
|
||||
Mantiki ya kuunganisha tena na udhibiti wa kujaribu tena
|
||||
Ukusanyaji wa metriki
|
||||
Udhibiti wa kiwango
|
||||
Usimamizi wa ushindani
|
||||
|
||||
**Mabadiliko yanayohitajika:**
|
||||
Ondoa uingizaji wa moja kwa moja wa Pulsar (`pulsar.schema`, `pulsar.InitialPosition`, n.k.)
|
||||
Kubali `BackendProducer`/`BackendConsumer` badala ya mteja wa Pulsar
|
||||
Agiza shughuli halisi za kutuma/kupokea kwa mifumo ya nyuma
|
||||
Linganisha dhana za jumla na simu za mfumo wa nyuma
|
||||
|
||||
**Mfano wa urekebishaji:**
|
||||
|
||||
```python
|
||||
# OLD - consumer.py
|
||||
class Consumer:
|
||||
def __init__(self, client, topic, subscriber, schema, ...):
|
||||
self.client = client # Direct Pulsar client
|
||||
# ...
|
||||
|
||||
async def consumer_run(self):
|
||||
# Uses pulsar.InitialPosition, pulsar.ConsumerType
|
||||
self.consumer = self.client.subscribe(
|
||||
topic=self.topic,
|
||||
schema=JsonSchema(self.schema),
|
||||
initial_position=pulsar.InitialPosition.Earliest,
|
||||
consumer_type=pulsar.ConsumerType.Shared,
|
||||
)
|
||||
|
||||
# NEW - consumer.py
|
||||
class Consumer:
|
||||
def __init__(self, backend_consumer, schema, ...):
|
||||
self.backend_consumer = backend_consumer # Backend-specific consumer
|
||||
self.schema = schema
|
||||
# ...
|
||||
|
||||
async def consumer_run(self):
|
||||
# Backend consumer already created with right settings
|
||||
# Just use it directly
|
||||
while self.running:
|
||||
msg = await asyncio.to_thread(
|
||||
self.backend_consumer.receive,
|
||||
timeout_millis=2000
|
||||
)
|
||||
await self.handle_message(msg)
|
||||
```
|
||||
|
||||
### Tabia Maalum za Seva (Backend)
|
||||
|
||||
**Seva ya Pulsar:**
|
||||
Inahusisha `q0` → `non-persistent://`, `q1`/`q2` → `persistent://`
|
||||
Inasaidia aina zote za watumiaji (walioshirikiana, wa kipekee, wa chechezi)
|
||||
Inasaidia nafasi ya awali (ya kwanza/ya mwisho)
|
||||
Utambuzi wa asili wa ujumbe
|
||||
Inasaidia usajili wa schema
|
||||
|
||||
**Seva ya MQTT:**
|
||||
Inahusisha `q0`/`q1`/`q2` → Viwango vya QoS vya MQTT 0/1/2
|
||||
Inajumuisha mpangilio/nafasi katika njia ya mada kwa ajili ya utenganishaji
|
||||
Inazalisha kiotomatiki vitambulisho vya wateja kutoka kwa majina ya usajili
|
||||
Inapuuza nafasi ya awali (hakuna historia ya ujumbe katika MQTT ya msingi)
|
||||
Inapuuza aina ya mtumiaji (MQTT hutumia vitambulisho vya wateja, sio vikundi vya watumiaji)
|
||||
Mfumo rahisi wa kuchapisha/kusajili
|
||||
|
||||
### Muhtasari wa Maamuzi ya Ubunifu
|
||||
|
||||
1. ✅ **Jina la kawaida la folyo:** Muundo wa `qos/tenant/namespace/queue-name`
|
||||
2. ✅ **QoS katika kitambulisho cha folyo:** Huamuliwa na ufafanuzi wa folyo, sio usanidi
|
||||
3. ✅ **Uunganishaji upya:** Unashughulikiwa na madarasa ya Mtumiaji/Mzalishaji, sio seva
|
||||
4. ✅ **Mada za MQTT:** Zijumuishie mpangilio/nafasi kwa ajili ya utenganishaji sahihi
|
||||
5. ✅ **Historia ya ujumbe:** MQTT inapuuza parameter ya `initial_position` (ongezeko la baadaye)
|
||||
6. ✅ **Vitambulisho vya wateja:** Seva ya MQTT inazalisha kiotomatiki kutoka kwa jina la usajili
|
||||
|
||||
### Ongezeko za Baadaye
|
||||
|
||||
**Historia ya ujumbe wa MQTT:**
|
||||
Inaweza kuongeza safu ya hiari ya kudumu (k.m., ujumbe uliokaguliwa, duka la nje)
|
||||
Itaruhusu kuunga mkono `initial_position='earliest'`
|
||||
Haihitajiki kwa utekelezaji wa awali
|
||||
1516
docs/tech-specs/sw/python-api-refactor.sw.md
Normal file
1516
docs/tech-specs/sw/python-api-refactor.sw.md
Normal file
File diff suppressed because it is too large
Load diff
271
docs/tech-specs/sw/query-time-explainability.sw.md
Normal file
271
docs/tech-specs/sw/query-time-explainability.sw.md
Normal file
|
|
@ -0,0 +1,271 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Ufafanuzi wa Wakati wa Uchunguzi"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Ufafanuzi wa Wakati wa Uchunguzi
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Hali
|
||||
|
||||
Imetekelezwa
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelekezo haya yanaelezea jinsi GraphRAG inavyorekodi na kuwasilisha data ya ufafanuzi wakati wa utekelezaji wa uchunguzi. Lengo ni ufuatiliaji kamili: kutoka kwa jibu la mwisho, kiasi kupitia miunganisho iliyochaguliwa, hadi kwa nyaraka za asili.
|
||||
|
||||
Ufafanuzi wa wakati wa uchunguzi unaeleza kile ambacho mstari wa GraphRAG ulifanya wakati wa utaratibu. Inahusiana na uhifadhi wa wakati wa uchimbaji ambao unarekodi mahali ambapo ukweli wa grafu ya maarifa ulitoka.
|
||||
|
||||
## Dhana
|
||||
|
||||
| Neno | Ufafanuzi |
|
||||
|------|------------|
|
||||
| **Ufafanuzi** | Rekodi ya jinsi matokeo yalivyopatikana |
|
||||
| **Kipindi** | Utendaji mmoja wa GraphRAG |
|
||||
| **Uchaguzi wa Miunganisho** | Uchaguzi wa miunganisho inayofaa inayodumishwa na LLM pamoja na utaratibu |
|
||||
| **Mnyororo wa Uhifadhi** | Njia kutoka kwa miunganisho → kipande → ukurasa → nyaraka |
|
||||
|
||||
## Muundo
|
||||
|
||||
### Mtiririko wa Ufafanuzi
|
||||
|
||||
```
|
||||
GraphRAG Query
|
||||
│
|
||||
├─► Session Activity
|
||||
│ └─► Query text, timestamp
|
||||
│
|
||||
├─► Retrieval Entity
|
||||
│ └─► All edges retrieved from subgraph
|
||||
│
|
||||
├─► Selection Entity
|
||||
│ └─► Selected edges with LLM reasoning
|
||||
│ └─► Each edge links to extraction provenance
|
||||
│
|
||||
└─► Answer Entity
|
||||
└─► Reference to synthesized response (in librarian)
|
||||
```
|
||||
|
||||
### Mfumo wa Hatua Mbili wa GraphRAG
|
||||
|
||||
1. **Uchaguzi wa Njia (Edge)**: LLM huangalia njia muhimu kutoka kwenye sehemu ndogo ya grafu, hutoa maelezo kwa kila moja.
|
||||
2. **Uunganisho (Synthesis)**: LLM huunda jibu kutoka kwa njia zilizochaguliwa pekee.
|
||||
|
||||
Tofauti hii inaruhusu uelewaji - tunajua hasa ni njia zipi zilizochangia.
|
||||
|
||||
### Uhifadhi
|
||||
|
||||
Matriki ya uelewaji yaliyohifadhiwa katika mkusanyiko unaoweza kusanidiwa (kiwango chake: `explainability`)
|
||||
Hutumia ontolojia ya PROV-O kwa uhusiano wa asili.
|
||||
Ufafanuzi wa RDF-star kwa marejeleo ya njia.
|
||||
Yaliyomo ya jibu yamehifadhiwa katika huduma ya "librarian" (hayapo ndani - ni makubwa).
|
||||
|
||||
### Uhamishaji wa Muda Halisi
|
||||
|
||||
Matukio ya uelewaji huhamishwa kwa mteja wakati swali linapojibiwa:
|
||||
|
||||
1. Kipindi kimeanzishwa → tukio limehamishwa
|
||||
2. Njia zimepatikana → tukio limehamishwa
|
||||
3. Njia zimechaguliwa pamoja na maelezo → tukio limehamishwa
|
||||
4. Jibu limeunganishwa → tukio limehamishwa
|
||||
|
||||
Mteja hupokea `explain_id` na `explain_collection` ili kupata maelezo kamili.
|
||||
|
||||
## Muundo wa URI
|
||||
|
||||
URI zote hutumia nafasi ya `urn:trustgraph:` pamoja na UUIDs:
|
||||
|
||||
| Kitu | Muundo wa URI |
|
||||
|--------|-------------|
|
||||
| Kipindi | `urn:trustgraph:session:{uuid}` |
|
||||
| Kupata | `urn:trustgraph:prov:retrieval:{uuid}` |
|
||||
| Uchaguzi | `urn:trustgraph:prov:selection:{uuid}` |
|
||||
| Jibu | `urn:trustgraph:prov:answer:{uuid}` |
|
||||
| Uchaguzi wa Njia | `urn:trustgraph:prov:edge:{uuid}:{index}` |
|
||||
|
||||
## Mfumo wa RDF (PROV-O)
|
||||
|
||||
### Shughuli ya Kipindi
|
||||
|
||||
```turtle
|
||||
<session-uri> a prov:Activity ;
|
||||
rdfs:label "GraphRAG query session" ;
|
||||
prov:startedAtTime "2024-01-15T10:30:00Z" ;
|
||||
tg:query "What was the War on Terror?" .
|
||||
```
|
||||
|
||||
### Kitengo cha Upatikanaji
|
||||
|
||||
```turtle
|
||||
<retrieval-uri> a prov:Entity ;
|
||||
rdfs:label "Retrieved edges" ;
|
||||
prov:wasGeneratedBy <session-uri> ;
|
||||
tg:edgeCount 50 .
|
||||
```
|
||||
|
||||
### Kitengo cha Uchaguzi
|
||||
|
||||
```turtle
|
||||
<selection-uri> a prov:Entity ;
|
||||
rdfs:label "Selected edges" ;
|
||||
prov:wasDerivedFrom <retrieval-uri> ;
|
||||
tg:selectedEdge <edge-sel-0> ;
|
||||
tg:selectedEdge <edge-sel-1> .
|
||||
|
||||
<edge-sel-0> tg:edge << <s> <p> <o> >> ;
|
||||
tg:reasoning "This edge establishes the key relationship..." .
|
||||
```
|
||||
|
||||
### Jibu la Kitu
|
||||
|
||||
```turtle
|
||||
<answer-uri> a prov:Entity ;
|
||||
rdfs:label "GraphRAG answer" ;
|
||||
prov:wasDerivedFrom <selection-uri> ;
|
||||
tg:document <urn:trustgraph:answer:{uuid}> .
|
||||
```
|
||||
|
||||
`tg:document` inarejelea jibu lililohifadhiwa katika huduma ya msimamizi.
|
||||
|
||||
## Mara kwa Mara za Nafasi
|
||||
|
||||
Zimefafumiwa katika `trustgraph-base/trustgraph/provenance/namespaces.py`:
|
||||
|
||||
| Mara kwa Mara | URI |
|
||||
|----------|-----|
|
||||
| `TG_QUERY` | `https://trustgraph.ai/ns/query` |
|
||||
| `TG_EDGE_COUNT` | `https://trustgraph.ai/ns/edgeCount` |
|
||||
| `TG_SELECTED_EDGE` | `https://trustgraph.ai/ns/selectedEdge` |
|
||||
| `TG_EDGE` | `https://trustgraph.ai/ns/edge` |
|
||||
| `TG_REASONING` | `https://trustgraph.ai/ns/reasoning` |
|
||||
| `TG_CONTENT` | `https://trustgraph.ai/ns/content` |
|
||||
| `TG_DOCUMENT` | `https://trustgraph.ai/ns/document` |
|
||||
|
||||
## Muundo wa GraphRagResponse
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class GraphRagResponse:
|
||||
error: Error | None = None
|
||||
response: str = ""
|
||||
end_of_stream: bool = False
|
||||
explain_id: str | None = None
|
||||
explain_collection: str | None = None
|
||||
message_type: str = "" # "chunk" or "explain"
|
||||
end_of_session: bool = False
|
||||
```
|
||||
|
||||
### Aina za Ujumbe
|
||||
|
||||
| aina_ya_ujumbe | Madhumuni |
|
||||
|--------------|---------|
|
||||
| `chunk` | Nakala ya majibu (ya mtiririko au ya mwisho) |
|
||||
| `explain` | Tukio la kufafanua na rejea ya IRI |
|
||||
|
||||
### Mzunguko wa Kisesheni
|
||||
|
||||
1. Ujumbe mwingi wa `explain` (kisesheni, upataji, uchaguzi, jibu)
|
||||
2. Ujumbe mwingi wa `chunk` (jibu la mtiririko)
|
||||
3. `chunk` ya mwisho na `end_of_session=True`
|
||||
|
||||
## Muundo wa Uchaguzi wa Njia
|
||||
|
||||
LLM hurudisha JSONL na njia zilizochaguliwa:
|
||||
|
||||
```jsonl
|
||||
{"id": "edge-hash-1", "reasoning": "This edge shows the key relationship..."}
|
||||
{"id": "edge-hash-2", "reasoning": "Provides supporting evidence..."}
|
||||
```
|
||||
|
||||
`id` ni hash ya `(labeled_s, labeled_p, labeled_o)` iliyohesabiwa na `edge_id()`.
|
||||
|
||||
## Uhifadhi wa URI
|
||||
|
||||
### Tatizo
|
||||
|
||||
GraphRAG huonyesha lebo zinazoweza kusomwa na binadamu kwa LLM, lakini uelewaji unahitaji URI za asili kwa ajili ya kufuatilia asili.
|
||||
|
||||
### Suluhisho
|
||||
|
||||
`get_labelgraph()` hurudisha vitu viwili:
|
||||
`labeled_edges`: Orodha ya `(label_s, label_p, label_o)` kwa ajili ya LLM
|
||||
`uri_map`: Kamusi inayoeleanisha `edge_id(labels)` → `(uri_s, uri_p, uri_o)`
|
||||
|
||||
Wakati wa kuhifadhi data ya uelewaji, URI kutoka `uri_map` hutumiwa.
|
||||
|
||||
## Kufuatilia Asili
|
||||
|
||||
### Kutoka Kwenye Njia hadi Chanzo
|
||||
|
||||
Njia zilizochaguliwa zinaweza kufuatiliwa hadi kwenye hati za asili:
|
||||
|
||||
1. Tafuta subgraph inayoyajumuisha: `?subgraph tg:contains <<s p o>>`
|
||||
2. Fuata mnyororo wa `prov:wasDerivedFrom` hadi kwenye hati ya msingi
|
||||
3. Kila hatua katika mnyororo: kipande → ukurasa → hati
|
||||
|
||||
### Usaidizi wa Triple Zilizotiwa Nukuu wa Cassandra
|
||||
|
||||
Huduma ya utafutaji ya Cassandra inasaidia kulinganisha triple zilizotiwa nukuu:
|
||||
|
||||
```python
|
||||
# In get_term_value():
|
||||
elif term.type == TRIPLE:
|
||||
return serialize_triple(term.triple)
|
||||
```
|
||||
|
||||
Hii inawezesha maswali kama vile:
|
||||
```
|
||||
?subgraph tg:contains <<http://example.org/s http://example.org/p "value">>
|
||||
```
|
||||
|
||||
## Matumizi ya Kifaa Kikuu (CLI)
|
||||
|
||||
```bash
|
||||
tg-invoke-graph-rag --explainable -q "What was the War on Terror?"
|
||||
```
|
||||
|
||||
### Muundo wa Matokeo
|
||||
|
||||
```
|
||||
[session] urn:trustgraph:session:abc123
|
||||
|
||||
[retrieval] urn:trustgraph:prov:retrieval:abc123
|
||||
|
||||
[selection] urn:trustgraph:prov:selection:abc123
|
||||
Selected 12 edge(s)
|
||||
Edge: (Guantanamo, definition, A detention facility...)
|
||||
Reason: Directly connects Guantanamo to the War on Terror
|
||||
Source: Chunk 1 → Page 2 → Beyond the Vigilant State
|
||||
|
||||
[answer] urn:trustgraph:prov:answer:abc123
|
||||
|
||||
Based on the provided knowledge statements...
|
||||
```
|
||||
|
||||
### Vipengele
|
||||
|
||||
Matukio ya uwazi wa matendo kwa wakati halisi wakati wa swali.
|
||||
Utatuzi wa lebo kwa vipengele vya pembe kupitia `rdfs:label`
|
||||
Ufuatiliaji wa mnyororo wa chanzo kupitia `prov:wasDerivedFrom`
|
||||
Kumbukumbu ya lebo ili kuepuka maswali yanayorudiwa.
|
||||
|
||||
## Faili Zilizotumiwa
|
||||
|
||||
| Faili | Madhumuni |
|
||||
|------|---------|
|
||||
| `trustgraph-base/trustgraph/provenance/uris.py` | Vitu vya kuunda URI |
|
||||
| `trustgraph-base/trustgraph/provenance/namespaces.py` | Mara kwa mara ya nafasi ya RDF |
|
||||
| `trustgraph-base/trustgraph/provenance/triples.py` | Vitu vya kuunda triple |
|
||||
| `trustgraph-base/trustgraph/schema/services/retrieval.py` | Mpango wa GraphRagResponse |
|
||||
| `trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py` | GraphRAG ya msingi na uhifadhi wa URI |
|
||||
| `trustgraph-flow/trustgraph/retrieval/graph_rag/rag.py` | Huduma na ujumuishaji wa msimamizi |
|
||||
| `trustgraph-flow/trustgraph/query/triples/cassandra/service.py` | Usaidizi wa swali la triple lililotiwa nukuu |
|
||||
| `trustgraph-cli/trustgraph/cli/invoke_graph_rag.py` | CLI na onyesho la uwazi |
|
||||
|
||||
## Marejeleo
|
||||
|
||||
PROV-O (Ontolojia ya Asili ya W3C): https://www.w3.org/TR/prov-o/
|
||||
RDF-star: https://w3c.github.io/rdf-star/
|
||||
Asili ya wakati wa uondoaji: `docs/tech-specs/extraction-time-provenance.md`
|
||||
296
docs/tech-specs/sw/rag-streaming-support.sw.md
Normal file
296
docs/tech-specs/sw/rag-streaming-support.sw.md
Normal file
|
|
@ -0,0 +1,296 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vigezo vya Ufundi kwa Usaidizi wa Utiririshaji (Streaming)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vigezo vya Ufundi kwa Usaidizi wa Utiririshaji (Streaming)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Vigezo hivi vinaelezea kuongeza usaidizi wa utiririshaji kwa huduma za GraphRAG na DocumentRAG, na hivyo kuruhusu majibu ya wakati halisi, token kwa token, kwa maswali ya chati ya maarifa na utafutaji wa nyaraka. Hii inaongeza usanifu uliopo wa utiririshaji ambao tayari umetumiwa kwa huduma za kukamilisha maandishi, matamshi, na wakala (agent) za LLM.
|
||||
|
||||
## Lengo
|
||||
|
||||
**Uzoefu sawa wa utiririshaji**: Kutoa uzoefu sawa wa utiririshaji katika huduma zote za TrustGraph.
|
||||
**Mabadiliko madogo ya API**: Kuongeza usaidizi wa utiririshaji kwa bendera moja `streaming`, kufuata mifumo iliyopo.
|
||||
**Ulinganifu na matumizi ya awali**: Kuhifadhi tabia ya sasa isiyo ya utiririshaji kama chaguo-msingi.
|
||||
**Kutumia miundombinu iliyopo**: Kutumia utiririshaji wa PromptClient ambao tayari umetumiwa.
|
||||
**Usaidizi wa lango (gateway)**: Kuruhusu utiririshaji kupitia lango la websocket kwa programu za wateja.
|
||||
|
||||
## Asili
|
||||
|
||||
Huduma za utiririshaji zilizotumiwa kwa sasa:
|
||||
**Huduma ya kukamilisha maandishi ya LLM**: Awamu ya 1 - utiririshaji kutoka kwa watoa huduma wa LLM.
|
||||
**Huduma ya matamshi**: Awamu ya 2 - utiririshaji kupitia vipatacho vya matamshi.
|
||||
**Huduma ya wakala**: Awamu ya 3-4 - utiririshaji wa majibu ya ReAct kwa vipande vya hatua/angalifu/jibu.
|
||||
|
||||
Mapungufu ya sasa kwa huduma za RAG:
|
||||
GraphRAG na DocumentRAG zinaunga mkono tu majibu ya kukomesha.
|
||||
Watumiaji lazima wasubiri majibu kamili ya LLM kabla ya kuona matokeo yoyote.
|
||||
Uzoefu mbaya kwa majibu marefu kutoka kwa chati ya maarifa au maswali ya nyaraka.
|
||||
Uzoefu usio sawa na huduma zingine za TrustGraph.
|
||||
|
||||
Vigezo hivi vinashughulikia pengo hizi kwa kuongeza usaidizi wa utiririshaji kwa GraphRAG na DocumentRAG. Kwa kuruhusu majibu ya token kwa token, TrustGraph inaweza:
|
||||
Kutoa uzoefu sawa wa utiririshaji kwa aina zote za maswali.
|
||||
Kupunguza muda uliodhaniwa wa maswali ya RAG.
|
||||
Kuruhusu maoni bora ya maendeleo kwa maswali yanayoendelea.
|
||||
Kusaidia onyesho la wakati halisi katika programu za wateja.
|
||||
|
||||
## Muundo wa Ufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Utumiaji wa utiririshaji wa RAG hutumia miundombinu iliyopo:
|
||||
|
||||
1. **Utiririshaji wa PromptClient** (Tayari umetumiwa)
|
||||
`kg_prompt()` na `document_prompt()` tayari hupokea vigezo vya `streaming` na `chunk_callback`.
|
||||
Haya huita `prompt()` ndani na usaidizi wa utiririshaji.
|
||||
Hakuna mabadiliko yanayohitajika kwa PromptClient.
|
||||
|
||||
Moduli: `trustgraph-base/trustgraph/base/prompt_client.py`
|
||||
|
||||
2. **Huduma ya GraphRAG** (Inahitaji kupitisha parameter ya utiririshaji)
|
||||
Ongeza parameter ya `streaming` kwa njia ya `query()`.
|
||||
Pasa bendera ya utiririshaji na vipengele vya kurudisha (callbacks) kwa `prompt_client.kg_prompt()`.
|
||||
Schema ya GraphRagRequest inahitaji sehemu ya `streaming`.
|
||||
|
||||
Moduli:
|
||||
`trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py`
|
||||
`trustgraph-flow/trustgraph/retrieval/graph_rag/rag.py` (Mchakato)
|
||||
`trustgraph-base/trustgraph/schema/graph_rag.py` (Schema ya ombi)
|
||||
`trustgraph-flow/trustgraph/gateway/dispatch/graph_rag.py` (Lango)
|
||||
|
||||
3. **Huduma ya DocumentRAG** (Inahitaji kupitisha parameter ya utiririshaji)
|
||||
Ongeza parameter ya `streaming` kwa njia ya `query()`.
|
||||
Pasa bendera ya utiririshaji na vipengele vya kurudisha (callbacks) kwa `prompt_client.document_prompt()`.
|
||||
Schema ya DocumentRagRequest inahitaji sehemu ya `streaming`.
|
||||
|
||||
Moduli:
|
||||
`trustgraph-flow/trustgraph/retrieval/document_rag/document_rag.py`
|
||||
`trustgraph-flow/trustgraph/retrieval/document_rag/rag.py` (Mchakato)
|
||||
`trustgraph-base/trustgraph/schema/document_rag.py` (Schema ya ombi)
|
||||
`trustgraph-flow/trustgraph/gateway/dispatch/document_rag.py` (Lango)
|
||||
|
||||
### Mtiririko wa Data
|
||||
|
||||
**Usiokuwa na utiririshaji (sasa)**:
|
||||
```
|
||||
Client → Gateway → RAG Service → PromptClient.kg_prompt(streaming=False)
|
||||
↓
|
||||
Prompt Service → LLM
|
||||
↓
|
||||
Complete response
|
||||
↓
|
||||
Client ← Gateway ← RAG Service ← Response
|
||||
```
|
||||
|
||||
**Utiririshaji (kupendekezwa):**
|
||||
```
|
||||
Client → Gateway → RAG Service → PromptClient.kg_prompt(streaming=True, chunk_callback=cb)
|
||||
↓
|
||||
Prompt Service → LLM (streaming)
|
||||
↓
|
||||
Chunk → callback → RAG Response (chunk)
|
||||
↓ ↓
|
||||
Client ← Gateway ← ────────────────────────────────── Response stream
|
||||
```
|
||||
|
||||
### API
|
||||
|
||||
**Mabadiliko ya GraphRAG**:
|
||||
|
||||
1. **GraphRag.query()** - Ongeza vigezo vya utiririshaji
|
||||
```python
|
||||
async def query(
|
||||
self, query, user, collection,
|
||||
verbose=False, streaming=False, chunk_callback=None # NEW
|
||||
):
|
||||
# ... existing entity/triple retrieval ...
|
||||
|
||||
if streaming and chunk_callback:
|
||||
resp = await self.prompt_client.kg_prompt(
|
||||
query, kg,
|
||||
streaming=True,
|
||||
chunk_callback=chunk_callback
|
||||
)
|
||||
else:
|
||||
resp = await self.prompt_client.kg_prompt(query, kg)
|
||||
|
||||
return resp
|
||||
```
|
||||
|
||||
2. **Muundo wa GraphRagRequest** - Ongeza sehemu ya utiririshaji.
|
||||
```python
|
||||
class GraphRagRequest(Record):
|
||||
query = String()
|
||||
user = String()
|
||||
collection = String()
|
||||
streaming = Boolean() # NEW
|
||||
```
|
||||
|
||||
3. **Muundo wa GraphRagResponse** - Ongeza sehemu za utiririshaji (fuata mfumo wa Wakala).
|
||||
```python
|
||||
class GraphRagResponse(Record):
|
||||
response = String() # Legacy: complete response
|
||||
chunk = String() # NEW: streaming chunk
|
||||
end_of_stream = Boolean() # NEW: indicates last chunk
|
||||
```
|
||||
|
||||
4. **Mchakato** - Ruhusu mtiririko kupita.
|
||||
```python
|
||||
async def handle(self, msg):
|
||||
# ... existing code ...
|
||||
|
||||
async def send_chunk(chunk):
|
||||
await self.respond(GraphRagResponse(
|
||||
chunk=chunk,
|
||||
end_of_stream=False,
|
||||
response=None
|
||||
))
|
||||
|
||||
if request.streaming:
|
||||
full_response = await self.rag.query(
|
||||
query=request.query,
|
||||
user=request.user,
|
||||
collection=request.collection,
|
||||
streaming=True,
|
||||
chunk_callback=send_chunk
|
||||
)
|
||||
# Send final message
|
||||
await self.respond(GraphRagResponse(
|
||||
chunk=None,
|
||||
end_of_stream=True,
|
||||
response=full_response
|
||||
))
|
||||
else:
|
||||
# Existing non-streaming path
|
||||
response = await self.rag.query(...)
|
||||
await self.respond(GraphRagResponse(response=response))
|
||||
```
|
||||
|
||||
**Mabadiliko ya DocumentRAG**:
|
||||
|
||||
Muundo sawa na GraphRAG:
|
||||
1. Ongeza vigezo `streaming` na `chunk_callback` kwenye `DocumentRag.query()`
|
||||
2. Ongeza sehemu `streaming` kwenye `DocumentRagRequest`
|
||||
3. Ongeza sehemu `chunk` na `end_of_stream` kwenye `DocumentRagResponse`
|
||||
4. Sasisha Processor ili kushughulikia utiririshaji pamoja na arifa
|
||||
|
||||
**Mabadiliko ya Gateway**:
|
||||
|
||||
Zote `graph_rag.py` na `document_rag.py` katika gateway/dispatch zinahitaji sasisho ili kusambaza vipande vya utiririshaji hadi kwenye websocket:
|
||||
|
||||
```python
|
||||
async def handle(self, message, session, websocket):
|
||||
# ... existing code ...
|
||||
|
||||
if request.streaming:
|
||||
async def recipient(resp):
|
||||
if resp.chunk:
|
||||
await websocket.send(json.dumps({
|
||||
"id": message["id"],
|
||||
"response": {"chunk": resp.chunk},
|
||||
"complete": resp.end_of_stream
|
||||
}))
|
||||
return resp.end_of_stream
|
||||
|
||||
await self.rag_client.request(request, recipient=recipient)
|
||||
else:
|
||||
# Existing non-streaming path
|
||||
resp = await self.rag_client.request(request)
|
||||
await websocket.send(...)
|
||||
```
|
||||
|
||||
### Maelekezo ya Utendaji
|
||||
|
||||
**Utaratibu wa utendaji**:
|
||||
1. Ongeza sehemu za schema (Ombi + Jibu kwa huduma zote za RAG)
|
||||
2. Sasisha mbinu za GraphRag.query() na DocumentRag.query()
|
||||
3. Sasisha Wasindikaji ili kushughulikia utiririshaji
|
||||
4. Sasisha vichakavu vya usambazaji
|
||||
5. Ongeza `--no-streaming` bendera kwenye `tg-invoke-graph-rag` na `tg-invoke-document-rag` (utiririshaji umeanzishwa kwa chaguizi, kufuatia mtindo wa CLI ya wakala)
|
||||
|
||||
**Mfumo wa kurudisha matokeo**:
|
||||
Fuata mfumo sawa wa kurudisha matokeo wa async uliopo katika utiririshaji wa Wakala:
|
||||
Wasindikaji hufafanua `async def send_chunk(chunk)` kurudisha matokeo
|
||||
Hutuma kurudisha matokeo kwa huduma ya RAG
|
||||
Huduma ya RAG hutuma kurudisha matokeo kwa PromptClient
|
||||
PromptClient huita kurudisha matokeo kwa kila kipande cha LLM
|
||||
Wasindikaji hutuma ujumbe wa utiririshaji wa jibu kwa kila kipande
|
||||
|
||||
**Usimamizi wa makosa**:
|
||||
Makosa wakati wa utiririshaji yanapaswa kutuma jibu la makosa na `end_of_stream=True`
|
||||
Fuata mifumo iliyopo ya usambazaji wa makosa kutoka kwa utiririshaji wa Wakala
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
Hakuna masuala mapya ya usalama zaidi ya huduma zilizopo za RAG:
|
||||
Majibu ya utiririshaji hutumia kutengwa sawa kwa mtumiaji/mkusanyiko
|
||||
Hakuna mabadiliko ya uthibitishaji au idhini
|
||||
Hifadhi za vipande hazifichui data nyeti
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
**Faida**:
|
||||
Kupunguza latensi iliyohisiwa (vipande vya kwanza vinakuja haraka)
|
||||
Uzoefu bora wa mtumiaji kwa majibu marefu
|
||||
Matumizi ya chini ya kumbukumbu (hakuna haja ya kuhifadhi jibu kamili)
|
||||
|
||||
**Masuala yanayoweza kutokea**:
|
||||
Ujumbe zaidi wa Pulsar kwa majibu ya utiririshaji
|
||||
CPU kidogo ya juu kwa gharama ya vipande/kurudisha matokeo
|
||||
Imepunguzwa na: utiririshaji ni chaguo, chaguo-msingi inabaki bila utiririshaji
|
||||
|
||||
**Masuala ya upimaji**:
|
||||
Pima na vielelezo vikubwa vya maarifa (triple nyingi)
|
||||
Pima na hati nyingi zilizopatikana
|
||||
Pima gharama ya utiririshaji dhidi ya utiririshaji usio na utiririshaji
|
||||
|
||||
## Mkakati wa Upimaji
|
||||
|
||||
**Majaribio ya kitengo**:
|
||||
Pima GraphRag.query() na streaming=True/False
|
||||
Pima DocumentRag.query() na streaming=True/False
|
||||
Fanya PromptClient kuwa bandia ili kuhakikisha utendaji wa kurudisha matokeo
|
||||
|
||||
**Majaribio ya ujumuu**:
|
||||
Pima mtiririko kamili wa GraphRAG wa utiririshaji (sawa na majaribio ya sasa ya utiririshaji wa wakala)
|
||||
Pima mtiririko kamili wa DocumentRAG wa utiririshaji
|
||||
Pima usambazaji wa utiririshaji wa Gateway
|
||||
Pima pato la utiririshaji la CLI
|
||||
|
||||
**Upimaji wa mwongozo**:
|
||||
`tg-invoke-graph-rag -q "What is machine learning?"` (utiririshaji kwa chaguizi)
|
||||
`tg-invoke-document-rag -q "Summarize the documents about AI"` (utiririshaji kwa chaguizi)
|
||||
`tg-invoke-graph-rag --no-streaming -q "..."` (pima hali ya utiririshaji usio na utiririshaji)
|
||||
Hakikisha pato linaloongezwa linaonekana katika hali ya utiririshaji
|
||||
|
||||
## Mpango wa Uhamisho
|
||||
|
||||
Hakuna uhamishaji unaohitajika:
|
||||
Utiririshaji ni chaguo kupitia parameter ya `streaming` (ina chaguizi kuwa Fele)
|
||||
Wateja wenyewe wanaendelea kufanya kazi bila mabadiliko
|
||||
Wateja wapya wanaweza kuchagua utiririshaji
|
||||
|
||||
## Muda
|
||||
|
||||
Muda uliokadiriwa wa utekelezaji: saa 4-6
|
||||
Awamu ya 1 (saa 2): Usaidizi wa utiririshaji wa GraphRAG
|
||||
Awamu ya 2 (saa 2): Usaidizi wa utiririshaji wa DocumentRAG
|
||||
Awamu ya 3 (saa 1-2): Madaisho ya Gateway na bendera za CLI
|
||||
Upimaji: Umejumuishwa katika kila awamu
|
||||
|
||||
## Maswali yaliyofunguliwa
|
||||
|
||||
Je, tunapaswa kuongeza usaidizi wa utiririshaji kwa huduma ya NLP Query pia?
|
||||
Je, tunataka kuonyesha hatua za kati (k.m., "Kupata vyombo vya habari...", "Kusahihisha grafu...") au tu pato la LLM?
|
||||
Je, majibu ya GraphRAG/DocumentRAG yanapaswa kujumuisha metadata ya kipande (k.m., nambari ya kipande, jumla inayotarajiwa)?
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Utekelezaji uliopo: `docs/tech-specs/streaming-llm-responses.md`
|
||||
Utiririshaji wa Wakala: `trustgraph-flow/trustgraph/agent/react/agent_manager.py`
|
||||
PromptClient utiririshaji: `trustgraph-base/trustgraph/base/prompt_client.py`
|
||||
99
docs/tech-specs/sw/schema-refactoring-proposal.sw.md
Normal file
99
docs/tech-specs/sw/schema-refactoring-proposal.sw.md
Normal file
|
|
@ -0,0 +1,99 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Pendekezo la Urekebishaji wa Saraka ya Mfumo"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Pendekezo la Urekebishaji wa Saraka ya Mfumo
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Masuala Yanayoendelea
|
||||
|
||||
1. **Muundo tambarare** - Saraka moja inayokuwa na mifumo yote inafanya iwe ngumu kuelewa uhusiano.
|
||||
2. **Mambo mchanganyikano** - Aina za msingi, vitu vya kikoa, na mikatiba ya API yote yamechanganywa.
|
||||
3. **Majina yasiyo wazi** - Faili kama vile "object.py", "types.py", "topic.py" hazionyeshi wazi madhumuni yake.
|
||||
4. **Hakuna tabaka wazi** - Haiwezekani kuona kwa urahisi nini kinategemea nini.
|
||||
|
||||
## Muundo Ulio Pendekezwa
|
||||
|
||||
```
|
||||
trustgraph-base/trustgraph/schema/
|
||||
├── __init__.py
|
||||
├── core/ # Core primitive types used everywhere
|
||||
│ ├── __init__.py
|
||||
│ ├── primitives.py # Error, Value, Triple, Field, RowSchema
|
||||
│ ├── metadata.py # Metadata record
|
||||
│ └── topic.py # Topic utilities
|
||||
│
|
||||
├── knowledge/ # Knowledge domain models and extraction
|
||||
│ ├── __init__.py
|
||||
│ ├── graph.py # EntityContext, EntityEmbeddings, Triples
|
||||
│ ├── document.py # Document, TextDocument, Chunk
|
||||
│ ├── knowledge.py # Knowledge extraction types
|
||||
│ ├── embeddings.py # All embedding-related types (moved from multiple files)
|
||||
│ └── nlp.py # Definition, Topic, Relationship, Fact types
|
||||
│
|
||||
└── services/ # Service request/response contracts
|
||||
├── __init__.py
|
||||
├── llm.py # TextCompletion, Embeddings, Tool requests/responses
|
||||
├── retrieval.py # GraphRAG, DocumentRAG queries/responses
|
||||
├── query.py # GraphEmbeddingsRequest/Response, DocumentEmbeddingsRequest/Response
|
||||
├── agent.py # Agent requests/responses
|
||||
├── flow.py # Flow requests/responses
|
||||
├── prompt.py # Prompt service requests/responses
|
||||
├── config.py # Configuration service
|
||||
├── library.py # Librarian service
|
||||
└── lookup.py # Lookup service
|
||||
```
|
||||
|
||||
## Mabadiliko Muhimu
|
||||
|
||||
1. **Mpangilio wa kimfumo** - Tofauti wazi kati ya aina kuu, modeli za maarifa, na mikatiba ya huduma.
|
||||
2. **Majina bora zaidi**:
|
||||
`types.py` → `core/primitives.py` (lengo lililoboreshwa)
|
||||
`object.py` → Kugawanywa katika faili zinazofaa kulingana na yaliyomo halisi.
|
||||
`documents.py` → `knowledge/document.py` (moja, thabiti)
|
||||
`models.py` → `services/llm.py` (wazi zaidi ni aina gani ya modeli)
|
||||
`prompt.py` → Kugawanywa: sehemu za huduma hadi `services/prompt.py`, aina za data hadi `knowledge/nlp.py`
|
||||
|
||||
3. **Punguzo la mantiki**:
|
||||
Aina zote za kuingiza zimeunganishwa katika `knowledge/embeddings.py`
|
||||
Mikatiba yote ya huduma inayohusiana na LLM iko katika `services/llm.py`
|
||||
Tofauti wazi ya jozi za ombi/jibu katika saraka ya huduma.
|
||||
Aina za utoaji wa maarifa zimepangwa pamoja na modeli zingine za uwanja wa maarifa.
|
||||
|
||||
4. **Ufafanuzi wa utegemezi**:
|
||||
Aina kuu hazina utegemezi wowote.
|
||||
Modeli za maarifa hutegemea tu aina kuu.
|
||||
Mikatiba ya huduma inaweza kutegemea aina kuu na modeli za maarifa.
|
||||
|
||||
## Faida za Uhamisho
|
||||
|
||||
1. **Uramaji rahisi** - Wasanidi programu wanaweza kupata haraka kile wanachohitaji.
|
||||
2. **Uunganishaji bora zaidi** - Mipaka wazi kati ya masuala tofauti.
|
||||
3. **Uingizaji rahisi zaidi** - Njia za uingizaji ambazo ni za angavu zaidi.
|
||||
4. **Inaweza kudumu kwa muda mrefu** - Rahisi kuongeza aina mpya za maarifa au huduma bila kusumbua.
|
||||
|
||||
## Mfano wa Mabadiliko ya Uingizaji
|
||||
|
||||
```python
|
||||
# Before
|
||||
from trustgraph.schema import Error, Triple, GraphEmbeddings, TextCompletionRequest
|
||||
|
||||
# After
|
||||
from trustgraph.schema.core import Error, Triple
|
||||
from trustgraph.schema.knowledge import GraphEmbeddings
|
||||
from trustgraph.schema.services import TextCompletionRequest
|
||||
```
|
||||
|
||||
## Maelezo ya Utendaji
|
||||
|
||||
1. Hakikisha utangamano wa zamani kwa kudumisha uingizaji wa faili katika sehemu kuu `__init__.py`
|
||||
2. Hamisha faili hatua kwa hatua, na usasishe uingizaji wa faili kama inavyohitajika
|
||||
3. Fikiria kuongeza `legacy.py` ambayo huingiza kila kitu kwa kipindi cha mpito
|
||||
4. Sasisha nyaraka ili kuonyesha muundo mpya
|
||||
|
||||
<function_calls>
|
||||
<invoke name="TodoWrite">
|
||||
<parameter name="todos">[{"id": "1", "content": "Fanyia uchunguzi muundo wa sasa wa saraka ya schema", "status": "imekamilika", "priority": "juu"}, {"id": "2", "content": "Changanua faili za schema na madhumuni yao", "status": "imekamilika", "priority": "juu"}, {"id": "3", "content": "Pendekeza jina na muundo uliobora", "status": "imekamilika", "priority": "juu"}]
|
||||
578
docs/tech-specs/sw/streaming-llm-responses.sw.md
Normal file
578
docs/tech-specs/sw/streaming-llm-responses.sw.md
Normal file
|
|
@ -0,0 +1,578 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vigezo vya Kiufundi vya Utoaji wa Majibu ya LLM kwa Kutiririsha"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vigezo vya Kiufundi vya Utoaji wa Majibu ya LLM kwa Kutiririsha
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Vigezo hivi vinaelezea utekelezaji wa msaada wa utiririshaji kwa majibu ya LLM
|
||||
katika TrustGraph. Utiririshaji unawezesha utoaji wa wakati halisi wa alama (tokens)
|
||||
zinazozalishwa na LLM, badala ya kusubiri hadi majibu kamili yatengenezwe.
|
||||
|
||||
|
||||
Utendaji huu unaunga mkono matumizi yafuatayo:
|
||||
|
||||
1. **Mawasiliano ya Mtumiaji ya Kawaida**: Tuma alama kwenye UI wakati zinaozalishwa,
|
||||
huku ukitoa maoni ya kuonekana mara moja.
|
||||
2. **Punguuzo la Muda wa Alama ya Kwanza**: Watumiaji huona matokeo mara moja
|
||||
badala ya kusubiri hadi utengenezaji kamili utimalike.
|
||||
3. **Usimamizi wa Majibu Marefu**: Shirikisha matokeo marefu sana ambayo vinginevyo
|
||||
yanaweza kusababisha kukatika au kuzidi mipaka ya kumbukumbu.
|
||||
4. **Matumizi Tendo**: Wezesha mawasiliano na mawakala yenye majibu.
|
||||
|
||||
## Lengo
|
||||
|
||||
**Ulinganishaji na Mifumo ya Zamani**: Wateja wa zamani ambao hawatumiwi teknolojia ya utiririshaji wanaendelea kufanya kazi
|
||||
bila mabadiliko.
|
||||
**Muundo wa API Unaofuata Kanuni**: Utiririshaji na mfumo ambao hautiririshi hutumia muundo sawa
|
||||
na tofauti ndogo.
|
||||
**Uwezo wa Mtoa Huduma**: Kusaidia utiririshaji pale unapopatikana, na
|
||||
utaratibu wa kurejesha pale unapokosekana.
|
||||
**Utekelezaji Hatua kwa Hatua**: Utaratibu wa kutekeleza hatua kwa hatua ili kupunguza hatari.
|
||||
**Usaidizi Kamili**: Utiririshaji kutoka kwa mtoa huduma wa LLM hadi kwa programu
|
||||
za mteja kupitia Pulsar, Gateway API, na Python API.
|
||||
|
||||
## Asili
|
||||
|
||||
### Muundo wa Sasa
|
||||
|
||||
Mchakato wa sasa wa kukamilisha maandishi wa LLM unafanya kazi kama ifuatavyo:
|
||||
|
||||
1. Mteja hutuma `TextCompletionRequest` pamoja na sehemu za `system` na `prompt`.
|
||||
2. Huduma ya LLM huchakata ombi na kusubiri uzalishaji kamili.
|
||||
3. `TextCompletionResponse` moja inarejeshwa pamoja na `response` kamili.
|
||||
|
||||
Muundo wa sasa (`trustgraph-base/trustgraph/schema/services/llm.py`):
|
||||
|
||||
```python
|
||||
class TextCompletionRequest(Record):
|
||||
system = String()
|
||||
prompt = String()
|
||||
|
||||
class TextCompletionResponse(Record):
|
||||
error = Error()
|
||||
response = String()
|
||||
in_token = Integer()
|
||||
out_token = Integer()
|
||||
model = String()
|
||||
```
|
||||
|
||||
### Marekebisho ya Sasa
|
||||
|
||||
**Ucheleweshaji**: Watumiaji lazima wasubiri hadi utengenezaji kukamilika kabisa kabla ya kuona matokeo yoyote.
|
||||
**Hatari ya Muda wa Kufikia (Timeout)**: Utengenezaji mrefu unaweza kuzidi mipaka ya muda wa kufikia ya mteja.
|
||||
**Uzoefu duni wa mtumiaji (UX)**: Hakuna maelezo wakati wa utengenezaji huunda hisia ya utaratibu polepole.
|
||||
**Matumizi ya Rasilimali**: Majibu kamili lazima yakahifadhiwe katika kumbukumbu.
|
||||
|
||||
Maelekezo haya yanashughulikia mapungufu haya kwa kuwezesha utoaji wa majibu kwa hatua,
|
||||
huku ikiendelea kudumisha utangamano kamili wa zamani.
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Awamu ya 1: Miundombinu
|
||||
|
||||
Awamu ya 1 huunda msingi wa utiririshaji kwa kufanya mabadiliko katika muundo, API,
|
||||
na zana za CLI.
|
||||
|
||||
#### Mabadiliko ya Muundo
|
||||
|
||||
##### Muundo wa LLM (`trustgraph-base/trustgraph/schema/services/llm.py`)
|
||||
|
||||
**Mabadiliko ya Ombi:**
|
||||
|
||||
```python
|
||||
class TextCompletionRequest(Record):
|
||||
system = String()
|
||||
prompt = String()
|
||||
streaming = Boolean() # NEW: Default false for backward compatibility
|
||||
```
|
||||
|
||||
`streaming`: Wakati `true`, huomba utoaji wa majibu kwa njia ya mtiririko.
|
||||
Chaguya: `false` (tabia iliyopo inahifadhiwa).
|
||||
|
||||
**Mabadiliko ya Majibu:**
|
||||
|
||||
```python
|
||||
class TextCompletionResponse(Record):
|
||||
error = Error()
|
||||
response = String()
|
||||
in_token = Integer()
|
||||
out_token = Integer()
|
||||
model = String()
|
||||
end_of_stream = Boolean() # NEW: Indicates final message
|
||||
```
|
||||
|
||||
`end_of_stream`: Wakati `true`, inaonyesha kwamba hii ndiyo jibu la mwisho (au pekee).
|
||||
Kwa ombi lisilo la utiririshaji: Jibu moja na `end_of_stream=true`.
|
||||
Kwa ombi la utiririshaji: Majibu mengi, yote na `end_of_stream=false`.
|
||||
isipokuwa jibu la mwisho.
|
||||
|
||||
##### Muundo wa Ombi (`trustgraph-base/trustgraph/schema/services/prompt.py`)
|
||||
|
||||
Huduma ya ombi inajumuisha kukamilisha maandishi, kwa hivyo inafuata muundo sawa:
|
||||
|
||||
**Mabadiliko ya Ombi:**
|
||||
|
||||
```python
|
||||
class PromptRequest(Record):
|
||||
id = String()
|
||||
terms = Map(String())
|
||||
streaming = Boolean() # NEW: Default false
|
||||
```
|
||||
|
||||
**Mabadiliko ya Majibu:**
|
||||
|
||||
```python
|
||||
class PromptResponse(Record):
|
||||
error = Error()
|
||||
text = String()
|
||||
object = String()
|
||||
end_of_stream = Boolean() # NEW: Indicates final message
|
||||
```
|
||||
|
||||
#### Mabadiliko ya API ya Langara
|
||||
|
||||
API ya Langara lazima iweze kuonyesha uwezo wa utiririshaji kwa wateja wa HTTP/WebSocket.
|
||||
|
||||
**Sasisho za API ya REST:**
|
||||
|
||||
`POST /api/v1/text-completion`: Kukubali parameter `streaming` katika mwili wa ombi
|
||||
Tabia ya majibu inategemea bendera ya utiririshaji:
|
||||
`streaming=false`: Jibu moja la JSON (tabia ya sasa)
|
||||
`streaming=true`: Mto wa Matukio Yanayotumwa na Server (SSE) au ujumbe wa WebSocket
|
||||
|
||||
**Muundo wa Majibu (Utiririshaji):**
|
||||
|
||||
Kila sehemu iliyoyirishwa ifuataye muundo sawa:
|
||||
```json
|
||||
{
|
||||
"response": "partial text...",
|
||||
"end_of_stream": false,
|
||||
"model": "model-name"
|
||||
}
|
||||
```
|
||||
|
||||
Sehemu ya mwisho:
|
||||
```json
|
||||
{
|
||||
"response": "final text chunk",
|
||||
"end_of_stream": true,
|
||||
"in_token": 150,
|
||||
"out_token": 500,
|
||||
"model": "model-name"
|
||||
}
|
||||
```
|
||||
|
||||
#### Mabadiliko ya API ya Python
|
||||
|
||||
API ya mteja wa Python lazima iunge mkono njia zote mbili za utiririshaji na zisizo za utiririshaji
|
||||
huku ikiendelea kutoa utangamano na matoleo ya awali.
|
||||
|
||||
**Sasisho za LlmClient** (`trustgraph-base/trustgraph/clients/llm_client.py`):
|
||||
|
||||
```python
|
||||
class LlmClient(BaseClient):
|
||||
def request(self, system, prompt, timeout=300, streaming=False):
|
||||
"""
|
||||
Non-streaming request (backward compatible).
|
||||
Returns complete response string.
|
||||
"""
|
||||
# Existing behavior when streaming=False
|
||||
|
||||
async def request_stream(self, system, prompt, timeout=300):
|
||||
"""
|
||||
Streaming request.
|
||||
Yields response chunks as they arrive.
|
||||
"""
|
||||
# New async generator method
|
||||
```
|
||||
|
||||
**Sasisho za PromptClient** (`trustgraph-base/trustgraph/base/prompt_client.py`):
|
||||
|
||||
Mfumo sawa na parameter ya `streaming` na toleo la jenereta isiyo na usumbufu.
|
||||
|
||||
#### Mabadiliko ya Zana ya CLI
|
||||
|
||||
**tg-invoke-llm** (`trustgraph-cli/trustgraph/cli/invoke_llm.py`):
|
||||
|
||||
```
|
||||
tg-invoke-llm [system] [prompt] [--no-streaming] [-u URL] [-f flow-id]
|
||||
```
|
||||
|
||||
Uhamishaji (streaming) huwezeshwa kwa chagu kuendana na uzoefu bora wa mtumiaji.
|
||||
Bendera `--no-streaming` inazuia uhamishaji.
|
||||
Wakati uhamishaji unafanya kazi: Tuma alama (tokens) kwenye stdout kadri zinavyofika.
|
||||
Wakati uhamishaji haufanyi kazi: Subiri jibu kamili, kisha toa.
|
||||
|
||||
**tg-invoke-prompt** (`trustgraph-cli/trustgraph/cli/invoke_prompt.py`):
|
||||
|
||||
```
|
||||
tg-invoke-prompt [template-id] [var=value...] [--no-streaming] [-u URL] [-f flow-id]
|
||||
```
|
||||
|
||||
Mfumo sawa na `tg-invoke-llm`.
|
||||
|
||||
#### Mabadiliko ya Darasa Msingi la Huduma ya LLM.
|
||||
|
||||
**LlmService** (`trustgraph-base/trustgraph/base/llm_service.py`):
|
||||
|
||||
```python
|
||||
class LlmService(FlowProcessor):
|
||||
async def on_request(self, msg, consumer, flow):
|
||||
request = msg.value()
|
||||
streaming = getattr(request, 'streaming', False)
|
||||
|
||||
if streaming and self.supports_streaming():
|
||||
async for chunk in self.generate_content_stream(...):
|
||||
await self.send_response(chunk, end_of_stream=False)
|
||||
await self.send_response(final_chunk, end_of_stream=True)
|
||||
else:
|
||||
response = await self.generate_content(...)
|
||||
await self.send_response(response, end_of_stream=True)
|
||||
|
||||
def supports_streaming(self):
|
||||
"""Override in subclass to indicate streaming support."""
|
||||
return False
|
||||
|
||||
async def generate_content_stream(self, system, prompt, model, temperature):
|
||||
"""Override in subclass to implement streaming."""
|
||||
raise NotImplementedError()
|
||||
```
|
||||
|
||||
--
|
||||
|
||||
### Awamu ya 2: Uthibitisho wa Dhana wa VertexAI
|
||||
|
||||
Awamu ya 2 inatekeleza utiririshaji katika mtoa huduma mmoja (VertexAI) ili kuthibitisha
|
||||
miundombinu na kuwezesha majaribio ya mwisho hadi mwisho.
|
||||
|
||||
#### Utendaji wa VertexAI
|
||||
|
||||
**Moduli:** `trustgraph-vertexai/trustgraph/model/text_completion/vertexai/llm.py`
|
||||
|
||||
**Mabadiliko:**
|
||||
|
||||
1. Badilisha `supports_streaming()` ili irudishe `True`
|
||||
2. Leta mtayarishaji wa async `generate_content_stream()`
|
||||
3. Shiriki modeli zote za Gemini na Claude (kupitia API ya VertexAI Anthropic)
|
||||
|
||||
**Utiririshaji wa Gemini:**
|
||||
|
||||
```python
|
||||
async def generate_content_stream(self, system, prompt, model, temperature):
|
||||
model_instance = self.get_model(model, temperature)
|
||||
response = model_instance.generate_content(
|
||||
[system, prompt],
|
||||
stream=True # Enable streaming
|
||||
)
|
||||
for chunk in response:
|
||||
yield LlmChunk(
|
||||
text=chunk.text,
|
||||
in_token=None, # Available only in final chunk
|
||||
out_token=None,
|
||||
)
|
||||
# Final chunk includes token counts from response.usage_metadata
|
||||
```
|
||||
|
||||
**Claude (kupitia VertexAI Anthropic) Uhamishaji wa Data:**
|
||||
|
||||
```python
|
||||
async def generate_content_stream(self, system, prompt, model, temperature):
|
||||
with self.anthropic_client.messages.stream(...) as stream:
|
||||
for text in stream.text_stream:
|
||||
yield LlmChunk(text=text)
|
||||
# Token counts from stream.get_final_message()
|
||||
```
|
||||
|
||||
#### Mtihani
|
||||
|
||||
Majaribio ya kitengo kwa ajili ya kusanyiko la majibu ya utiririshaji
|
||||
Majaribio ya ujumuishaji na VertexAI (Gemini na Claude)
|
||||
Majaribio kamili: CLI -> Gateway -> Pulsar -> VertexAI -> nyuma
|
||||
Majaribio ya utangamano: Maombi ya isiyo ya utiririshaji bado hufanya kazi
|
||||
|
||||
--
|
||||
|
||||
### Awamu ya 3: Watoa Huduma Wote wa LLM
|
||||
|
||||
Awamu ya 3 inaongeza utiifu wa utiririshaji kwa watoa huduma wote wa LLM katika mfumo.
|
||||
|
||||
#### Hali ya Utumiaji wa Mtoa Huduma
|
||||
|
||||
Kila mtoa huduma lazima ifanye mojawapo ya yafuatayo:
|
||||
1. **Utiifu Kamili wa Utiririshaji**: Tengeneza `generate_content_stream()`
|
||||
2. **Njia ya Utangamano**: Shikilia bendera ya `end_of_stream` kwa usahihi
|
||||
(irudishe jibu moja na `end_of_stream=true`)
|
||||
|
||||
| Mtoa Huduma | Kifurushi | Utiifu wa Utiririshaji |
|
||||
|----------|---------|-------------------|
|
||||
| OpenAI | trustgraph-flow | Kamili (API ya asili ya utiririshaji) |
|
||||
| Claude/Anthropic | trustgraph-flow | Kamili (API ya asili ya utiririshaji) |
|
||||
| Ollama | trustgraph-flow | Kamili (API ya asili ya utiririshaji) |
|
||||
| Cohere | trustgraph-flow | Kamili (API ya asili ya utiririshaji) |
|
||||
| Mistral | trustgraph-flow | Kamili (API ya asili ya utiririshaji) |
|
||||
| Azure OpenAI | trustgraph-flow | Kamili (API ya asili ya utiririshaji) |
|
||||
| Google AI Studio | trustgraph-flow | Kamili (API ya asili ya utiririshaji) |
|
||||
| VertexAI | trustgraph-vertexai | Kamili (Awamu ya 2) |
|
||||
| Bedrock | trustgraph-bedrock | Kamili (API ya asili ya utiririshaji) |
|
||||
| LM Studio | trustgraph-flow | Kamili (Inafaa na OpenAI) |
|
||||
| LlamaFile | trustgraph-flow | Kamili (Inafaa na OpenAI) |
|
||||
| vLLM | trustgraph-flow | Kamili (Inafaa na OpenAI) |
|
||||
| TGI | trustgraph-flow | Itatolewa baadaye |
|
||||
| Azure | trustgraph-flow | Itatolewa baadaye |
|
||||
|
||||
#### Mfumo wa Utumiaji
|
||||
|
||||
Kwa watoa huduma wanaofaa na OpenAI (OpenAI, LM Studio, LlamaFile, vLLM):
|
||||
|
||||
```python
|
||||
async def generate_content_stream(self, system, prompt, model, temperature):
|
||||
response = await self.client.chat.completions.create(
|
||||
model=model,
|
||||
messages=[
|
||||
{"role": "system", "content": system},
|
||||
{"role": "user", "content": prompt}
|
||||
],
|
||||
temperature=temperature,
|
||||
stream=True
|
||||
)
|
||||
async for chunk in response:
|
||||
if chunk.choices[0].delta.content:
|
||||
yield LlmChunk(text=chunk.choices[0].delta.content)
|
||||
```
|
||||
|
||||
--
|
||||
|
||||
### Awamu ya 4: API ya Wakala
|
||||
|
||||
Awamu ya 4 inaongeza utiririshaji kwenye API ya Wakala. Hii ni ngumu zaidi kwa sababu
|
||||
API ya Wakala tayari ni mfumo wa ujumbe mwingi (fikra → kitendo → uchunguzi
|
||||
→ rudia → jibu la mwisho).
|
||||
|
||||
#### Mpango wa Sasa wa Wakala
|
||||
|
||||
```python
|
||||
class AgentStep(Record):
|
||||
thought = String()
|
||||
action = String()
|
||||
arguments = Map(String())
|
||||
observation = String()
|
||||
user = String()
|
||||
|
||||
class AgentRequest(Record):
|
||||
question = String()
|
||||
state = String()
|
||||
group = Array(String())
|
||||
history = Array(AgentStep())
|
||||
user = String()
|
||||
|
||||
class AgentResponse(Record):
|
||||
answer = String()
|
||||
error = Error()
|
||||
thought = String()
|
||||
observation = String()
|
||||
```
|
||||
|
||||
#### Mabadiliko Yanayopendekezwa ya Muundo wa Wakala
|
||||
|
||||
**Omba Mabadiliko:**
|
||||
|
||||
```python
|
||||
class AgentRequest(Record):
|
||||
question = String()
|
||||
state = String()
|
||||
group = Array(String())
|
||||
history = Array(AgentStep())
|
||||
user = String()
|
||||
streaming = Boolean() # NEW: Default false
|
||||
```
|
||||
|
||||
**Mabadiliko ya Majibu:**
|
||||
|
||||
Wakala hutengeneza aina nyingi za matokeo wakati wa mchakato wake wa kufikiri:
|
||||
Mawazo (ufikiri)
|
||||
Vitendo (simu za zana)
|
||||
Uchunguzi (matokeo ya zana)
|
||||
Jibu (jibu la mwisho)
|
||||
Madosa
|
||||
|
||||
Kwa kuwa `chunk_type` inaonyesha aina gani ya maudhui yanatumiwa, nafasi tofauti
|
||||
za `answer`, `error`, `thought`, na `observation` zinaweza kuunganishwa katika
|
||||
nafasi moja ya `content`:
|
||||
|
||||
```python
|
||||
class AgentResponse(Record):
|
||||
chunk_type = String() # "thought", "action", "observation", "answer", "error"
|
||||
content = String() # The actual content (interpretation depends on chunk_type)
|
||||
end_of_message = Boolean() # Current thought/action/observation/answer is complete
|
||||
end_of_dialog = Boolean() # Entire agent dialog is complete
|
||||
```
|
||||
|
||||
**Maana ya Viwanja:**
|
||||
|
||||
`chunk_type`: Inaonyesha aina ya yaliyomo katika sehemu `content`
|
||||
`"thought"`: Tafakari/fikra za wakala
|
||||
`"action"`: Chombo/kitendo kinachotumika
|
||||
`"observation"`: Matokeo ya utekelezaji wa chombo
|
||||
`"answer"`: Jibu la mwisho kwa swali la mtumiaji
|
||||
`"error"`: Ujumbe wa kosa
|
||||
|
||||
`content`: Yaliyomo halisi yanayotiririshwa, ambayo hutafsiriwa kulingana na `chunk_type`
|
||||
|
||||
`end_of_message`: Wakati `true`, aina ya sehemu ya sasa imekamilika
|
||||
Mfano: Alama zote za fikra ya sasa zimetumwa
|
||||
Inaruhusu wateja kujua wakati wa kuendelea na hatua inayofuata
|
||||
|
||||
`end_of_dialog`: Wakati `true`, mwingiliano wote wa wakala umekamilika
|
||||
Hii ndio ujumbe wa mwisho katika mtiririko
|
||||
|
||||
#### Tabia ya Utiririshaji wa Wakala
|
||||
|
||||
Wakati `streaming=true`:
|
||||
|
||||
1. **Utiririshaji wa fikra:**
|
||||
Sehemu nyingi zenye `chunk_type="thought"`, `end_of_message=false`
|
||||
Sehemu ya mwisho ya fikra ina `end_of_message=true`
|
||||
2. **Arifa ya kitendo:**
|
||||
Sehemu moja yenye `chunk_type="action"`, `end_of_message=true`
|
||||
3. **Uchunguzi:**
|
||||
Sehemu(ma) yenye `chunk_type="observation"`, ya mwisho ina `end_of_message=true`
|
||||
4. **Rudia** hatua za 1-3 wakati wakala anafikiri
|
||||
5. **Jibu la mwisho:**
|
||||
`chunk_type="answer"` yenye jibu la mwisho katika `content`
|
||||
Sehemu ya mwisho ina `end_of_message=true`, `end_of_dialog=true`
|
||||
|
||||
**Mfululizo wa Mfano wa Mtiririko:**
|
||||
|
||||
```
|
||||
{chunk_type: "thought", content: "I need to", end_of_message: false, end_of_dialog: false}
|
||||
{chunk_type: "thought", content: " search for...", end_of_message: true, end_of_dialog: false}
|
||||
{chunk_type: "action", content: "search", end_of_message: true, end_of_dialog: false}
|
||||
{chunk_type: "observation", content: "Found: ...", end_of_message: true, end_of_dialog: false}
|
||||
{chunk_type: "thought", content: "Based on this", end_of_message: false, end_of_dialog: false}
|
||||
{chunk_type: "thought", content: " I can answer...", end_of_message: true, end_of_dialog: false}
|
||||
{chunk_type: "answer", content: "The answer is...", end_of_message: true, end_of_dialog: true}
|
||||
```
|
||||
|
||||
Wakati `streaming=false`:
|
||||
Tabia ya sasa inahifadhiwa
|
||||
Jibu moja lenye jibu kamili
|
||||
`end_of_message=true`, `end_of_dialog=true`
|
||||
|
||||
#### Bandari na API ya Python
|
||||
|
||||
Bandari: Njia mpya ya SSE/WebSocket kwa utiririshaji wa wakala
|
||||
API ya Python: Njia mpya ya `agent_stream()` ya jenereta ya async
|
||||
|
||||
--
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
**Hakuna eneo jipya la shambulio**: Utiririshaji hutumia uthibitishaji/idhini sawa
|
||||
**Mipaka ya kasi**: Tumia mipaka ya kasi kwa kila tokeni au kila sehemu ikiwa inahitajika
|
||||
**Usimamizi wa muunganisho**: Vunjeni kwa usahihi mitiririsho wakati mteja anakatiza
|
||||
**Usimamizi wa muda**: Maombi ya utiririshaji yanahitaji usimamizi sahihi wa muda
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
**Kumbukumbu**: Utiririshaji hupunguza matumizi ya juu ya kumbukumbu (hakuna buffering kamili ya jibu)
|
||||
**Ucheleweshaji**: Muda wa hadi tokeni ya kwanza umepunguzwa sana
|
||||
**Mzigo wa muunganisho**: Muunganisho wa SSE/WebSocket una mzigo wa kudumisha muunganisho
|
||||
**Uwezo wa Pulsar**: Ujumbe mdogo mwingi dhidi ya ujumbe mmoja mkubwa
|
||||
mbadala
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
### Majaribio ya Kitengo
|
||||
Usanidi/uondoaji wa schema na sehemu mpya
|
||||
Utangamano wa nyuma (sehemu zilizopotea hutumia chaguo-msingi)
|
||||
Mantiki ya kusanyiko ya sehemu
|
||||
|
||||
### Majaribio ya Uunganisho
|
||||
Utaratibu wa utiririshaji wa kila mtoa huduma wa LLM
|
||||
Njia za utiririshaji za API ya Bandari
|
||||
Njia za utiririshaji za mteja wa Python
|
||||
|
||||
### Majaribio ya Ukingo hadi Ukingo
|
||||
Pato la utiririshaji la zana ya CLI
|
||||
Mchakato kamili: Mteja → Bandari → Pulsar → LLM → kurudi
|
||||
Mizigo mchanganyiko ya utiririshaji/isiyo ya utiririshaji
|
||||
|
||||
### Majaribio ya Utangamano wa Nyuma
|
||||
Wateja wazima hufanya kazi bila mabadiliko
|
||||
Maombi ya utiririshaji hayatendeshwi sawa
|
||||
|
||||
## Mpango wa Uhamishaji
|
||||
|
||||
### Awamu ya 1: Miundombinu
|
||||
Weka mabadiliko ya schema (utangamano wa nyuma)
|
||||
Weka sasisho za API ya Bandari
|
||||
Weka sasisho za API ya Python
|
||||
Toa sasisho za zana ya CLI
|
||||
|
||||
### Awamu ya 2: VertexAI
|
||||
Tuma utekelezaji wa VertexAI unaotumia mtiririko.
|
||||
Thibitisha kwa kutumia majaribio.
|
||||
|
||||
### Awamu ya 3: Watoa Huduma Wote
|
||||
Toa sasisho za watoa huduma hatua kwa hatua.
|
||||
Fuatilia masuala yaliyotokea.
|
||||
|
||||
### Awamu ya 4: API ya Wakala
|
||||
Tuma mabadiliko ya muundo wa wakala.
|
||||
Tuma utekelezaji wa mtiririko wa wakala.
|
||||
Sasisha nyaraka.
|
||||
|
||||
## Ratiba
|
||||
|
||||
| Awamu | Maelezo | Utendaji |
|
||||
|-------|-------------|--------------|
|
||||
| Awamu ya 1 | Miundombinu | Hakuna |
|
||||
| Awamu ya 2 | Jaribio la VertexAI | Awamu ya 1 |
|
||||
| Awamu ya 3 | Watoa Huduma Wote | Awamu ya 2 |
|
||||
| Awamu ya 4 | API ya Wakala | Awamu ya 3 |
|
||||
|
||||
## Maamuzi ya Ubunifu
|
||||
|
||||
Maswali yafuatayo yaliyulizwa yamejibiwa wakati wa maelezo:
|
||||
|
||||
1. **Hesabu za Tokeni katika Mtiririko**: Hesabu za tokeni ni tofauti, sio jumla.
|
||||
Wateja wanaweza kuzijumlisha ikiwa ni lazima. Hii inalingana na jinsi watoa huduma wengi wanavyoripoti
|
||||
matumizi na inarahisisha utekelezaji.
|
||||
|
||||
2. **Usimamizi wa Madhira katika Mitiririko**: Ikiwa hitilafu itatokea, sehemu ya `error`
|
||||
itajazwa na sehemu zingine hazihitajiki. Hitilafu daima ndio mawasiliano ya mwisho
|
||||
jumbe zingine za baadae haziruhusiwi au zinatarajiwa baada ya
|
||||
hitilafu. Kwa mitiririko ya LLM/Prompt, `end_of_stream=true`. Kwa mitiririko ya Wakala,
|
||||
`chunk_type="error"` pamoja na `end_of_dialog=true`.
|
||||
|
||||
3. **Urekebishaji wa Majibu ya Kawaida**: Itifaki ya mawasiliano (Pulsar) ni thabiti,
|
||||
kwa hivyo, kujaribu tena jumbe za mtu binafsi haihitajiki. Ikiwa mteja unapoteza
|
||||
uhusiano wa mtiririko au kukatika, lazima ujaribu tena ombi lote kutoka mwanzo.
|
||||
|
||||
4. **Mtiririko wa Huduma ya Prompt**: Mtiririko unaoendeshwa tu kwa maandishi (`text`)
|
||||
majibu, sio majibu yaliyopangwa (`object`). Huduma ya prompt inajua
|
||||
mapema ikiwa pato itakuwa JSON au maandishi kulingana na kiolezo cha prompt.
|
||||
Ikiwa ombi la mtiririko lilitolewa kwa prompt ya pato ya JSON, huduma
|
||||
inapaswa:
|
||||
Kurudisha JSON kamili katika jibu moja pamoja na `end_of_stream=true`, au
|
||||
Kukataa ombi la mtiririko na hitilafu
|
||||
|
||||
## Maswali Yaliyobaki
|
||||
|
||||
Hakuna kwa sasa.
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Muundo wa sasa wa LLM: `trustgraph-base/trustgraph/schema/services/llm.py`
|
||||
Muundo wa sasa wa prompt: `trustgraph-base/trustgraph/schema/services/prompt.py`
|
||||
Muundo wa sasa wa wakala: `trustgraph-base/trustgraph/schema/services/agent.py`
|
||||
Msingi wa huduma ya LLM: `trustgraph-base/trustgraph/base/llm_service.py`
|
||||
Mtoa huduma wa VertexAI: `trustgraph-vertexai/trustgraph/model/text_completion/vertexai/llm.py`
|
||||
API ya lango: `trustgraph-base/trustgraph/api/`
|
||||
Zana za CLI: `trustgraph-cli/trustgraph/cli/`
|
||||
621
docs/tech-specs/sw/structured-data-2.sw.md
Normal file
621
docs/tech-specs/sw/structured-data-2.sw.md
Normal file
|
|
@ -0,0 +1,621 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Kawaida vya Takwimu Zilizopangwa (Sehemu ya 2)"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vipimo vya Kawaida vya Takwimu Zilizopangwa (Sehemu ya 2)
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelezo haya yanaeleza masuala na pengo ambazo zimetambuliwa wakati wa utekelezaji wa awali wa ujumuishaji wa takwimu zilizopangwa wa TrustGraph, kama ilivyoelezwa katika `structured-data.md`.
|
||||
|
||||
## Matatizo
|
||||
|
||||
### 1. Utangamano Usio sawa wa Majina: "Kitu" dhidi ya "Rata"
|
||||
|
||||
Utaratibu wa sasa hutumia neno "kitembele" katika kila sehemu (k.m., `ExtractedObject`, utoaji wa kitu, uwekaji wa kitu). Neno hili ni la jumla sana na husababisha mchanganyiko:
|
||||
|
||||
"Kitembele" ni neno linalotumika kwa matumizi mengi katika programu (vitu vya Python, vitu vya JSON, n.k.)
|
||||
Data inayoshughulikiwa ni ya aina ya meza - ratiba katika meza zilizo na muundo uliotofautishwa
|
||||
"Rata" inaelezea vizuri zaidi mfumo wa data na inaendana na neno la hifadhi ya data
|
||||
|
||||
Utangamano huu huonekana katika majina ya moduli, majina ya madarasa, aina za ujumbe, na nyaraka.
|
||||
|
||||
### 2. Mipaka ya Ufuatiliaji wa Rata
|
||||
|
||||
Utaratibu wa sasa wa hifadhi ya rata una mipaka muhimu ya ufuatiliaji:
|
||||
|
||||
**Utangamano wa Lugha Asilia:** Ufuatiliaji unashindana na tofauti za ulimwengu halisi. Kwa mfano:
|
||||
Ni vigumu kupata hifadhi ya barabara inayokuwa na `"CHESTNUT ST"` wakati unatafuta `"Chestnut Street"`
|
||||
Marekebisho, tofauti za herufi, na tofauti za umbizo hufutilia ufuatiliaji wa usawizi
|
||||
Watumiaji wanatarajia uelewa wa maana, lakini hifadhi hutoa mechi ya moja kwa moja
|
||||
|
||||
**Masuala ya Mabadiliko ya Muundo:** Mabadiliko ya muundo husababisha matatizo:
|
||||
Data iliyopo inaweza kutosana na muundo uliosasishwa
|
||||
Mabadiliko ya muundo wa meza yanaweza kuvunja ufuatiliaji na uadilifu wa data
|
||||
Hakuna njia wazi ya kusonga muundo kwa mabadiliko ya muundo
|
||||
|
||||
### 3. Uwekaji wa Rata Unahitajika
|
||||
|
||||
Kuhusiana na tatizo la 2, mfumo unahitaji uwekaji wa vector kwa data ya rata ili kuwezesha:
|
||||
|
||||
Ufuatiliaji wa maana katika data iliyopangwa (kupata "Chestnut Street" wakati data ina "CHESTNUT ST")
|
||||
Mechi ya ufanano kwa ufuatiliaji wa uwazi
|
||||
Ufuatiliaji wa mchanganyiko unaounganisha vichujio vilivyopangwa na ufanano wa maana
|
||||
Usaidizi bora wa lugha asilia ya ufuatiliaji
|
||||
|
||||
Huduma ya uwekaji ilikuwa imeelezwa lakini haijatekelezwa.
|
||||
|
||||
### 4. Uongezaji wa Data ya Rata hajakamilika
|
||||
|
||||
Mnyororo wa data ya takwimu zilizopangwa haujafanya kazi kikamilifu:
|
||||
|
||||
Mawazo ya uchunguzi yanapatikana ili kuainisha muundo wa pembejeo (CSV, JSON, n.k.)
|
||||
Huduma ya uongezaji ambayo hutumia mawazo haya haijunganishwa kwenye mfumo
|
||||
Hakuna njia kamili ya kupakia data iliyopangwa tayari kwenye hifadhi ya rata
|
||||
|
||||
## Lengo
|
||||
|
||||
**Unyumbufu wa Muundo:** Kuwezesha mabadiliko ya muundo bila kuvunja data iliyopo au kuhitaji uhamisho
|
||||
**Utangamano wa Majina:** Kuweka "rata" kama neno la kawaida katika kila sehemu ya programu
|
||||
**Ufuatiliaji wa Maana:** Kusaidia mechi ya maana/uwazi kupitia uwekaji wa rata
|
||||
**Mnyororo Kamili wa Uongezaji:** Kutoa njia kamili ya kupakia data iliyopangwa
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Muundo Uliounganishwa wa Hifadhi ya Rata
|
||||
|
||||
Utaratibu wa awali uliunda meza tofauti ya Cassandra kwa kila muundo. Hii ilisababisha matatizo wakati muundo ulibadilika, kwa sababu mabadiliko ya muundo wa meza yalihitaji uhamisho.
|
||||
|
||||
Muundo mpya hutumia meza moja iliyounganishwa kwa data yote ya rata:
|
||||
|
||||
```sql
|
||||
CREATE TABLE rows (
|
||||
collection text,
|
||||
schema_name text,
|
||||
index_name text,
|
||||
index_value frozen<list<text>>,
|
||||
data map<text, text>,
|
||||
source text,
|
||||
PRIMARY KEY ((collection, schema_name, index_name), index_value)
|
||||
)
|
||||
```
|
||||
|
||||
#### Ufafanuzi wa Safu
|
||||
|
||||
| Safu | Aina | Maelezo |
|
||||
|--------|------|-------------|
|
||||
| `collection` | `text` | Kitambulisho cha ukusanyaji/kuingiza data (kutoka kwa metadata) |
|
||||
| `schema_name` | `text` | Jina la mpango ambao safu hii inafuata |
|
||||
| `index_name` | `text` | Majina ya sehemu zilizopangiliwa, yameunganishwa kwa mkato kwa sehemu mbalimbali |
|
||||
| `index_value` | `frozen<list<text>>` | Maadili ya kifunguo kama orodha |
|
||||
| `data` | `map<text, text>` | Data ya safu kama jozi za ufunguo-thamani |
|
||||
| `source` | `text` | URI ya hiari inayounganisha na maelezo ya asili katika mfumo wa maarifa. Mnyororo tupu au NULL inaonyesha kuwa hakuna chanzo. |
|
||||
|
||||
#### Usimamizi wa Faharasa
|
||||
|
||||
Kila safu huhifadhiwa mara nyingi - mara moja kwa kila sehemu iliyopangiliwa iliyobainishwa katika mpango. Sehemu kuu za ufunguo zinatibiwa kama faharasa bila alama maalum, ambayo hutoa uwezekano wa kubadilika katika siku zijazo.
|
||||
|
||||
**Mfano wa faharasa ya sehemu moja:**
|
||||
Mpango unafafanua `email` kuwa iliyopangiliwa
|
||||
`index_name = "email"`
|
||||
`index_value = ['foo@bar.com']`
|
||||
|
||||
**Mfano wa faharasa mchanganyiko:**
|
||||
Mpango unafafanua faharasa mchanganyiko kwenye `region` na `status`
|
||||
`index_name = "region,status"` (majina ya sehemu yamepangwa na yameunganishwa kwa mkato)
|
||||
`index_value = ['US', 'active']` (maadili katika utaratibu sawa na majina ya sehemu)
|
||||
|
||||
**Mfano wa ufunguo mkuu:**
|
||||
Mpango unafafanua `customer_id` kuwa ufunguo mkuu
|
||||
`index_name = "customer_id"`
|
||||
`index_value = ['CUST001']`
|
||||
|
||||
#### Mfano wa Maswali
|
||||
|
||||
Maswali yote yanafuata muundo sawa bila kujali faharasa gani inayotumika:
|
||||
|
||||
```sql
|
||||
SELECT * FROM rows
|
||||
WHERE collection = 'import_2024'
|
||||
AND schema_name = 'customers'
|
||||
AND index_name = 'email'
|
||||
AND index_value = ['foo@bar.com']
|
||||
```
|
||||
|
||||
#### Mizunguko ya Ubunifu
|
||||
|
||||
**Faida:**
|
||||
Mabadiliko ya schema hayahitaji mabadiliko ya muundo wa jedwali
|
||||
Data ya mstari ni ya siri kwa Cassandra - ongezeko/ondoano la nafasi ni wazi
|
||||
Mfumo thabiti wa swali kwa njia zote za upatikanaji
|
||||
Hakuna fahirisi za sekondari za Cassandra (ambazo zinaweza kuwa polepole kwa kiwango kikubwa)
|
||||
Aina za asili za Cassandra katika kila sehemu (`map`, `frozen<list>`)
|
||||
|
||||
**Utofauti:**
|
||||
Kuongezeka kwa uandishi: kila kuingizwa kwa mstari = ongezeko la N (moja kwa kila nafasi iliyofichwa)
|
||||
Gharama ya kuhifadhi kutokana na data ya mstari iliyorudiwa
|
||||
Habari ya aina huhifadhiwa katika usanidi wa schema, ubadilishaji katika safu ya programu
|
||||
|
||||
#### Mfumo wa Ulinganisho
|
||||
|
||||
Ubunifu huu unapokea uboreshaji fulani:
|
||||
|
||||
1. **Hakuna sasisho za mstari**: Mfumo huu ni wa kuongeza tu. Hii inazuia wasiwasi wa ulinganisho kuhusu kusasisha nakala nyingi za mstari mmoja.
|
||||
|
||||
2. **Uvumilivu wa mabadiliko ya schema**: Wakati schemas hubadilika (k.m., fahirisi zinaongezwa/kuondolewa), mistari iliyopo inaendelea kuwa na uwekaji wa fahirisi wake wa awali. Mistari ya zamani haitaweza kupatikana kupitia fahirisi mpya. Watumiaji wanaweza kufuta na kuunda tena schema ili kuhakikisha ulinganisho ikiwa ni lazima.
|
||||
|
||||
### Ufuatiliaji na Ufutilishaji wa Sehemu
|
||||
|
||||
#### Tatizo
|
||||
|
||||
Kwa ufunguo wa sehemu `(collection, schema_name, index_name)`, ufutaji bora unahitaji kujua ufunguo wote wa sehemu ili kufutwa. Ufutilishaji kwa `collection` au `collection + schema_name` pekee unahitaji kujua maadili yote ya `index_name` ambayo yana data.
|
||||
|
||||
#### Jedwali la Ufuatiliaji wa Sehemu
|
||||
|
||||
Jedwali la ziada la utafutaji linafuatilia sehemu zipi zilizo.
|
||||
|
||||
```sql
|
||||
CREATE TABLE row_partitions (
|
||||
collection text,
|
||||
schema_name text,
|
||||
index_name text,
|
||||
PRIMARY KEY ((collection), schema_name, index_name)
|
||||
)
|
||||
```
|
||||
|
||||
Hii inaruhusu ugunduzi bora wa sehemu za kufutwa.
|
||||
|
||||
#### Tabia ya Kifaa cha Kuandika Mistari
|
||||
|
||||
Kifaa cha kuandika mistari kinahifadhi kumbukumbu ya jozi zilizosajiliwa za `(collection, schema_name)`. Wakati wa kuchakata mstari:
|
||||
|
||||
1. Angalia ikiwa `(collection, schema_name)` iko kwenye kumbukumbu
|
||||
2. Ikiwa haijahifadhiwa (mstari wa kwanza kwa jozi hii):
|
||||
Tafuta usanidi wa schema ili kupata majina yote ya index
|
||||
Ingiza vipengele katika `row_partitions` kwa kila `(collection, schema_name, index_name)`
|
||||
Ongeza jozi kwenye kumbukumbu
|
||||
3. Endelea na kuandika data ya mstari
|
||||
|
||||
Kifaa cha kuandika mistari pia kinachunguza mabadiliko ya usanidi wa schema. Wakati schema inabadilika, vipengele muhimu vya kumbukumbu vinafutwa ili mstari unaofuata usisababishwe tena usajili na majina mapya ya index.
|
||||
|
||||
Mbinu hii inahakikisha:
|
||||
Uandikaji wa jedwali la utafutaji hutokea mara moja kwa kila jozi ya `(collection, schema_name)`, sio kwa kila mstari
|
||||
Jedwali la utafutaji linaonyesha indexes ambazo zilikuwa zinafanya kazi wakati data ilipoandikwa
|
||||
Mabadiliko ya schema wakati wa uingizaji yanaonekana kwa usahihi
|
||||
|
||||
#### Operesheni za Ufute
|
||||
|
||||
**Futa mkusanyiko:**
|
||||
```sql
|
||||
-- 1. Discover all partitions
|
||||
SELECT schema_name, index_name FROM row_partitions WHERE collection = 'X';
|
||||
|
||||
-- 2. Delete each partition from rows table
|
||||
DELETE FROM rows WHERE collection = 'X' AND schema_name = '...' AND index_name = '...';
|
||||
-- (repeat for each discovered partition)
|
||||
|
||||
-- 3. Clean up the lookup table
|
||||
DELETE FROM row_partitions WHERE collection = 'X';
|
||||
```
|
||||
|
||||
**Futa mkusanyiko na schema:**
|
||||
```sql
|
||||
-- 1. Discover partitions for this schema
|
||||
SELECT index_name FROM row_partitions WHERE collection = 'X' AND schema_name = 'Y';
|
||||
|
||||
-- 2. Delete each partition from rows table
|
||||
DELETE FROM rows WHERE collection = 'X' AND schema_name = 'Y' AND index_name = '...';
|
||||
-- (repeat for each discovered partition)
|
||||
|
||||
-- 3. Clean up the lookup table entries
|
||||
DELETE FROM row_partitions WHERE collection = 'X' AND schema_name = 'Y';
|
||||
```
|
||||
|
||||
### Uelekezaji wa Data
|
||||
|
||||
Uelekezaji wa data unawezesha utambuzi wa maana/ufanano kwenye maadili yaliyohifadhiwa, na kutatua tatizo la kutofautiana kwa lugha (k.m., kutafuta "CHESTNUT ST" wakati unatafuta "Chestnut Street").
|
||||
|
||||
#### Muhtasari wa Ubunifu
|
||||
|
||||
Kila jambo lililohifadhiwa limeelekezwa na kuhifadhiwa katika hifadhi ya vekta (Qdrant). Wakati wa utafutaji, swali linaelekezwa, vekta sawa zinafanywa, na metadata inayohusiana hutumiwa kutafuta mistari halisi katika Cassandra.
|
||||
|
||||
#### Muundo wa Mkusanyiko wa Qdrant
|
||||
|
||||
Mkusanyiko mmoja wa Qdrant kwa kila jozi ya `(user, collection, schema_name, dimension)`:
|
||||
|
||||
**Jina la mkusanyiko:** `rows_{user}_{collection}_{schema_name}_{dimension}`
|
||||
Majina husafishwa (herufi ambazo sio alfabeti hubadilishwa na `_`, yamebadilishwa kuwa herufi ndogo, prefixes za namba hupata prefix ya `r_`)
|
||||
**Sababu:** Inaruhusu kufutwa kwa jozi ya `(user, collection, schema_name)` kwa kufuta makusanyiko yanayolingana ya Qdrant; kiambishi cha kipimo huruhusu modeli tofauti za uelekezaji kuwepo.
|
||||
|
||||
#### Kile Kinachoelekezwa
|
||||
|
||||
Uwazi wa maandishi wa maadili ya index:
|
||||
|
||||
| Aina ya Index | Mfano wa `index_value` | Maandishi ya Kuelekeza |
|
||||
|------------|----------------------|---------------|
|
||||
| Uwanja mmoja | `['foo@bar.com']` | `"foo@bar.com"` |
|
||||
| Mchanganyiko | `['US', 'active']` | `"US active"` (imeunganishwa na nafasi) |
|
||||
|
||||
#### Muundo wa Pointi
|
||||
|
||||
Kila pointi ya Qdrant ina:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "<uuid>",
|
||||
"vector": [0.1, 0.2, ...],
|
||||
"payload": {
|
||||
"index_name": "street_name",
|
||||
"index_value": ["CHESTNUT ST"],
|
||||
"text": "CHESTNUT ST"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Uwanja wa Data | Maelezo |
|
||||
|---------------|-------------|
|
||||
| `index_name` | Uwanja(s) ulio(o) ambao embedding hii inawakilisha |
|
||||
| `index_value` | Orodha ya awali ya maadili (kwa utafutaji wa Cassandra) |
|
||||
| `text` | Nakala iliyoingizwa (kwa ajili ya utatuzi/kuonyesha) |
|
||||
|
||||
Kumbuka: `user`, `collection`, na `schema_name` zinaonyeshwa moja kwa moja kutoka kwa jina la mkusanyiko wa Qdrant.
|
||||
|
||||
#### Mtiririko wa Utafiti
|
||||
|
||||
1. Mtumiaji anatafuta "Chestnut Street" ndani ya mtumiaji U, mkusanyiko X, schema Y
|
||||
2. Ingiza nakala ya utafutaji
|
||||
3. Tambua jina(s) la mkusanyiko wa Qdrant linalolingana na kielelezo `rows_U_X_Y_`
|
||||
4. Tafuta mkusanyiko(s) unaolingana wa Qdrant kwa vectori za karibu
|
||||
5. Pata pointi zinazolingana zilizo na data zinazozingatia `index_name` na `index_value`
|
||||
6. Tafuta Cassandra:
|
||||
```sql
|
||||
SELECT * FROM rows
|
||||
WHERE collection = 'X'
|
||||
AND schema_name = 'Y'
|
||||
AND index_name = '<from payload>'
|
||||
AND index_value = <from payload>
|
||||
```
|
||||
7. Kurudisha mistari iliyolingana.
|
||||
|
||||
#### Hiari: Kuchuja kwa Jina la Index
|
||||
|
||||
Maswali yanaweza hiari kuchuja kwa `index_name` katika Qdrant ili kutafuta tu sehemu maalum:
|
||||
|
||||
**"Tafuta sehemu yoyote inayolingana na 'Chestnut'"** → tafuta vectori zote katika mkusanyiko.
|
||||
**"Tafuta 'street_name' inayolingana na 'Chestnut'"** → chuja ambapo `payload.index_name = 'street_name'`.
|
||||
|
||||
#### Muundo
|
||||
|
||||
Uwekaji wa mistari unafuata **muundo wa hatua mbili** unaotumika na GraphRAG (uwekaji wa grafu, uwekaji wa hati):
|
||||
|
||||
**Hatua ya 1: Hesabu ya uwekaji** (`trustgraph-flow/trustgraph/embeddings/row_embeddings/`) - Hutumia `ExtractedObject`, huhesabu uwekaji kupitia huduma ya uwekaji, hutoka `RowEmbeddings`.
|
||||
**Hatua ya 2: Uhifadhi wa uwekaji** (`trustgraph-flow/trustgraph/storage/row_embeddings/qdrant/`) - Hutumia `RowEmbeddings`, huandika vectori kwenye Qdrant.
|
||||
|
||||
Mwandishi wa mistari wa Cassandra ni mtumiaji wa ziada unaoendeshwa kwa njia fiche:
|
||||
|
||||
**Mwandishi wa mistari wa Cassandra** (`trustgraph-flow/trustgraph/storage/rows/cassandra`) - Hutumia `ExtractedObject`, huandika mistari kwenye Cassandra.
|
||||
|
||||
Huduma zote tatu hutumia kutoka kwa mtiririko mmoja, na hivyo kuzifanya kuwa huru. Hii inaruhusu:
|
||||
Uongezaji wa kasi wa kujitegemea wa uandishi wa Cassandra dhidi ya utengenezaji wa uwekaji dhidi ya uhifadhi wa vectori.
|
||||
Huduma za uwekaji zinaweza kuzimwa ikiwa hazihitajiki.
|
||||
Hitilafu katika huduma moja hazisababishi athari kwa huduma zingine.
|
||||
Muundo thabiti na mabomba ya GraphRAG.
|
||||
|
||||
#### Njia ya Kuandika
|
||||
|
||||
**Hatua ya 1 (mchakato wa uwekaji wa mistari):** Unapopokea `ExtractedObject`:
|
||||
|
||||
1. Tafuta schema ili kupata sehemu zilizoidishwa.
|
||||
2. Kwa kila sehemu iliyoidishwa:
|
||||
Jenga uwakilishi wa maandishi wa thamani ya index.
|
||||
Hesabu uwekaji kupitia huduma ya uwekaji.
|
||||
3. Toa ujumbe wa `RowEmbeddings` unao na vectori zote zilizohitajiwa.
|
||||
|
||||
**Hatua ya 2 (uandishi wa uwekaji wa mistari-qdrant):** Unapopokea `RowEmbeddings`:
|
||||
|
||||
1. Kwa kila uwekaji katika ujumbe:
|
||||
Tambua mkusanyiko wa Qdrant kutoka `(user, collection, schema_name, dimension)`.
|
||||
Unda mkusanyiko ikiwa unahitajika (utengenezaji wa polepole katika uandishi wa kwanza).
|
||||
Ongeza pointi na vector na mzigo.
|
||||
|
||||
#### Aina za Ujumbe
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class RowIndexEmbedding:
|
||||
index_name: str # The indexed field name(s)
|
||||
index_value: list[str] # The field value(s)
|
||||
text: str # Text that was embedded
|
||||
vectors: list[list[float]] # Computed embedding vectors
|
||||
|
||||
@dataclass
|
||||
class RowEmbeddings:
|
||||
metadata: Metadata
|
||||
schema_name: str
|
||||
embeddings: list[RowIndexEmbedding]
|
||||
```
|
||||
|
||||
#### Jumuisho la Ufuteaji
|
||||
|
||||
Makusanyo ya Qdrant hugunduliwa kwa kutumia utangamano wa jina la makusanyo:
|
||||
|
||||
**Futa `(user, collection)`:**
|
||||
1. Orodha makusanyo yote ya Qdrant yanayolingana na utangamano `rows_{user}_{collection}_`
|
||||
2. Futa kila makusanyo yanayolingana
|
||||
3. Futa sehemu za mistari ya Cassandra (kama ilivyoelezwa hapo juu)
|
||||
4. Safisha maingizo ya `row_partitions`
|
||||
|
||||
**Futa `(user, collection, schema_name)`:**
|
||||
1. Orodha makusanyo yote ya Qdrant yanayolingana na utangamano `rows_{user}_{collection}_{schema_name}_`
|
||||
2. Futa kila makusanyo yanayolingana (inashughulikia vipimo vingi)
|
||||
3. Futa sehemu za mistari ya Cassandra
|
||||
4. Safisha `row_partitions`
|
||||
|
||||
#### Maeneo ya Moduli
|
||||
|
||||
| Hatua | Moduli | Kituo cha Kuanzia |
|
||||
|-------|--------|-------------|
|
||||
| Hatua 1 | `trustgraph-flow/trustgraph/embeddings/row_embeddings/` | `row-embeddings` |
|
||||
| Hatua 2 | `trustgraph-flow/trustgraph/storage/row_embeddings/qdrant/` | `row-embeddings-write-qdrant` |
|
||||
|
||||
### API ya Uchunguzi wa Uelekezo
|
||||
|
||||
Uchunguzi wa uelekezo ni **API tofauti** kutoka kwa huduma ya uchunguzi wa mstari wa GraphQL:
|
||||
|
||||
| API | Madhumuni | Nyuma |
|
||||
|-----|---------|---------|
|
||||
| Uchunguzi wa Mstari (GraphQL) | Utangamano kamili kwenye sehemu zilizofichwa | Cassandra |
|
||||
| Uchunguzi wa Uelekezo | Utangamano wa dhana/maneno | Qdrant |
|
||||
|
||||
Tofauti hii huweka masuala tofauti:
|
||||
Huduma ya GraphQL inazingatia maswali kamili na yaliyo na muundo
|
||||
API ya uelekezo inashughulikia ufanano wa dhana
|
||||
Mchakato wa mtumiaji: utafutaji wa dhana kupitia uelekezo ili kupata wagombea, kisha uchunguzi kamili ili kupata data kamili ya mstari
|
||||
|
||||
#### Mfumo wa Ombi/Jibu
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class RowEmbeddingsRequest:
|
||||
vectors: list[list[float]] # Query vectors (pre-computed embeddings)
|
||||
user: str = ""
|
||||
collection: str = ""
|
||||
schema_name: str = ""
|
||||
index_name: str = "" # Optional: filter to specific index
|
||||
limit: int = 10 # Max results per vector
|
||||
|
||||
@dataclass
|
||||
class RowIndexMatch:
|
||||
index_name: str = "" # The matched index field(s)
|
||||
index_value: list[str] = [] # The matched value(s)
|
||||
text: str = "" # Original text that was embedded
|
||||
score: float = 0.0 # Similarity score
|
||||
|
||||
@dataclass
|
||||
class RowEmbeddingsResponse:
|
||||
error: Error | None = None
|
||||
matches: list[RowIndexMatch] = []
|
||||
```
|
||||
|
||||
#### Mchakato wa Uchunguzi
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/query/row_embeddings/qdrant`
|
||||
|
||||
Kuanzia: `row-embeddings-query-qdrant`
|
||||
|
||||
Mchakato:
|
||||
1. Hupokea `RowEmbeddingsRequest` pamoja na vektor za swali
|
||||
2. Hutafuta mkusanyiko unaofaa wa Qdrant kwa kutumia utangamano wa nenosiri
|
||||
3. Hutafuta vektor za karibu pamoja na kipengele cha `index_name` cha hiari
|
||||
4. Hurudisha `RowEmbeddingsResponse` pamoja na maelezo ya fahirisi yanayolingana
|
||||
|
||||
#### Uunganisho wa Milango ya API
|
||||
|
||||
Lango huonyesha maswali ya uelekezo wa mstari kupitia muundo wa kawaida wa ombi/jibu:
|
||||
|
||||
| Sehemu | Mahali |
|
||||
|-----------|----------|
|
||||
| Msambazaji | `trustgraph-flow/trustgraph/gateway/dispatch/row_embeddings_query.py` |
|
||||
| Usajili | Ongeza `"row-embeddings"` kwenye `request_response_dispatchers` katika `manager.py` |
|
||||
|
||||
Jina la kiungo cha mtiririko: `row-embeddings`
|
||||
|
||||
Ufafanuzi wa kiungo katika mpango wa mtiririko:
|
||||
```json
|
||||
{
|
||||
"interfaces": {
|
||||
"row-embeddings": {
|
||||
"request": "non-persistent://tg/request/row-embeddings:{id}",
|
||||
"response": "non-persistent://tg/response/row-embeddings:{id}"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Usaidizi wa SDK ya Python
|
||||
|
||||
SDK hutoa njia za kuuliza kuhusu uwekaji wa data katika mistari:
|
||||
|
||||
```python
|
||||
# Flow-scoped query (preferred)
|
||||
api = Api(url)
|
||||
flow = api.flow().id("default")
|
||||
|
||||
# Query with text (SDK computes embeddings)
|
||||
matches = flow.row_embeddings_query(
|
||||
text="Chestnut Street",
|
||||
collection="my_collection",
|
||||
schema_name="addresses",
|
||||
index_name="street_name", # Optional filter
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Query with pre-computed vectors
|
||||
matches = flow.row_embeddings_query(
|
||||
vectors=[[0.1, 0.2, ...]],
|
||||
collection="my_collection",
|
||||
schema_name="addresses"
|
||||
)
|
||||
|
||||
# Each match contains:
|
||||
for match in matches:
|
||||
print(match.index_name) # e.g., "street_name"
|
||||
print(match.index_value) # e.g., ["CHESTNUT ST"]
|
||||
print(match.text) # e.g., "CHESTNUT ST"
|
||||
print(match.score) # e.g., 0.95
|
||||
```
|
||||
|
||||
#### Utumizi wa Kamba ya Amri
|
||||
|
||||
Amri: `tg-invoke-row-embeddings`
|
||||
|
||||
```bash
|
||||
# Query by text (computes embedding automatically)
|
||||
tg-invoke-row-embeddings \
|
||||
--text "Chestnut Street" \
|
||||
--collection my_collection \
|
||||
--schema addresses \
|
||||
--index street_name \
|
||||
--limit 10
|
||||
|
||||
# Query by vector file
|
||||
tg-invoke-row-embeddings \
|
||||
--vectors vectors.json \
|
||||
--collection my_collection \
|
||||
--schema addresses
|
||||
|
||||
# Output formats
|
||||
tg-invoke-row-embeddings --text "..." --format json
|
||||
tg-invoke-row-embeddings --text "..." --format table
|
||||
```
|
||||
|
||||
#### Mfano wa Matumizi ya Kawaida
|
||||
|
||||
Uchunguzi wa pembejeo za mstari kwa kawaida hutumika kama sehemu ya mtiririko wa utafutaji wa "vunjifu" hadi "sahihi":
|
||||
|
||||
```python
|
||||
# Step 1: Fuzzy search via embeddings
|
||||
matches = flow.row_embeddings_query(
|
||||
text="chestnut street",
|
||||
collection="geo",
|
||||
schema_name="streets"
|
||||
)
|
||||
|
||||
# Step 2: Exact lookup via GraphQL for full row data
|
||||
for match in matches:
|
||||
query = f'''
|
||||
query {{
|
||||
streets(where: {{ {match.index_name}: {{ eq: "{match.index_value[0]}" }} }}) {{
|
||||
street_name
|
||||
city
|
||||
zip_code
|
||||
}}
|
||||
}}
|
||||
'''
|
||||
rows = flow.rows_query(query, collection="geo")
|
||||
```
|
||||
|
||||
Mfumo huu wa hatua mbili huruhusu:
|
||||
Kugundua "CHESTNUT ST" wakati mtumiaji anatafuta "Chestnut Street"
|
||||
Kuchukua data kamili ya mstari pamoja na sehemu zote
|
||||
Kuchanganya utambulisho wa maana na ufikiaji wa data iliyopangwa
|
||||
|
||||
### Uingizaji wa Data ya Mstari
|
||||
|
||||
Itarefushwa hadi hatua ya baadaye. Itaundwa pamoja na mabadiliko mengine ya uingizaji.
|
||||
|
||||
## Athari ya Utendaji
|
||||
|
||||
### Uchambuzi wa Hali ya Sasa
|
||||
|
||||
Utendaji uliopo una vipengele viwili mikuu:
|
||||
|
||||
| Kipengele | Mahali | Mistari | Maelezo |
|
||||
|-----------|----------|-------|-------------|
|
||||
| Huduma ya Utafutaji | `trustgraph-flow/trustgraph/query/objects/cassandra/service.py` | ~740 | Moja kwa moja: Uundaji wa schema ya GraphQL, uchanganuzi wa vichujio, maswali ya Cassandra, usimamizi wa ombi |
|
||||
| Mwandishi | `trustgraph-flow/trustgraph/storage/objects/cassandra/write.py` | ~540 | Uundaji wa jedwali kwa kila schema, fahirisi za sekondari, kuingiza/kufuta |
|
||||
|
||||
**Mfumo wa Sasa wa Utafutaji:**
|
||||
```sql
|
||||
SELECT * FROM {keyspace}.o_{schema_name}
|
||||
WHERE collection = 'X' AND email = 'foo@bar.com'
|
||||
ALLOW FILTERING
|
||||
```
|
||||
|
||||
**Muundo Mpya wa Ulizaji:**
|
||||
```sql
|
||||
SELECT * FROM {keyspace}.rows
|
||||
WHERE collection = 'X' AND schema_name = 'customers'
|
||||
AND index_name = 'email' AND index_value = ['foo@bar.com']
|
||||
```
|
||||
|
||||
### Mabadiliko Muhimu
|
||||
|
||||
1. **Uboreshaji wa maana ya maswali**: Mfumo mpya unaunga mkono tu mechi kamili kwenye `index_value`. Vifiltrishi vya GraphQL ya sasa (`gt`, `lt`, `contains`, n.k.) ama:
|
||||
Yanakuwa uchujaji wa ziada kwenye data iliyorudishwa (ikiwa bado inahitajika)
|
||||
Yanondolewa ili kutumia API ya embeddings kwa mechi zisizo sahihi
|
||||
|
||||
2. **Msimbo wa GraphQL umeunganishwa sana**: Mfumo wa sasa wa `service.py` unajumuisha utengenezaji wa aina za Strawberry, uchanganuzi wa vifiltrishi, na maswali maalum ya Cassandra. Kuongeza mfumo mwingine wa kuhifadhi data ingeongeza mistari ~400 ya msimbo wa GraphQL.
|
||||
|
||||
### Pendekezo la Urekebishaji
|
||||
|
||||
Urekebishaji una sehemu mbili:
|
||||
|
||||
#### 1. Tenganisha Msimbo wa GraphQL
|
||||
|
||||
Toa vipengele vya GraphQL ambavyo vinaweza kutumika tena katika moduli iliyoshirikiwa:
|
||||
|
||||
```
|
||||
trustgraph-flow/trustgraph/query/graphql/
|
||||
├── __init__.py
|
||||
├── types.py # Filter types (IntFilter, StringFilter, FloatFilter)
|
||||
├── schema.py # Dynamic schema generation from RowSchema
|
||||
└── filters.py # Filter parsing utilities
|
||||
```
|
||||
|
||||
Hii inawezesha:
|
||||
Matumizi upya katika mifumo tofauti ya kuhifadhi data.
|
||||
Tofauti wazi zaidi ya majukumu.
|
||||
Uchunguzi rahisi zaidi wa mantiki ya GraphQL kwa kujitegemea.
|
||||
|
||||
#### 2. Tengeneza Mpango Mpya wa Jedwali
|
||||
|
||||
Badilisha msimbo maalum wa Cassandra ili kutumia jedwali lililo na mpango mmoja:
|
||||
|
||||
**Mwandishi** (`trustgraph-flow/trustgraph/storage/rows/cassandra/`):
|
||||
Jedwali moja la `rows` badala ya jedwali kila mpango.
|
||||
Andika nakala N kwa kila mstari (moja kwa kila fahirisi).
|
||||
Jisajili kwenye jedwali la `row_partitions`.
|
||||
Uundaji rahisi zaidi wa jedwali (usanidi wa mara moja).
|
||||
|
||||
**Huduma ya Utafiti** (`trustgraph-flow/trustgraph/query/rows/cassandra/`):
|
||||
Tafuta kwenye jedwali lililo na mpango mmoja la `rows`.
|
||||
Tumia moduli iliyochimbwa ya GraphQL kwa uundaji wa mpango.
|
||||
Usimamizi ulioboreshwa wa vichujio (mechi kamili tu kwenye kiwango cha hifadhidata).
|
||||
|
||||
### Mabadiliko ya Majina ya Moduli
|
||||
|
||||
Kama sehemu ya usafi wa majina kutoka "object" hadi "row":
|
||||
|
||||
| Sasa | Mpya |
|
||||
|---------|-----|
|
||||
| `storage/objects/cassandra/` | `storage/rows/cassandra/` |
|
||||
| `query/objects/cassandra/` | `query/rows/cassandra/` |
|
||||
| `embeddings/object_embeddings/` | `embeddings/row_embeddings/` |
|
||||
|
||||
### Moduli Mpya
|
||||
|
||||
| Moduli | Lengo |
|
||||
|--------|---------|
|
||||
| `trustgraph-flow/trustgraph/query/graphql/` | Utumizi wa pamoja wa GraphQL. |
|
||||
| `trustgraph-flow/trustgraph/query/row_embeddings/qdrant/` | API ya utafiti wa uingishaji wa mstari. |
|
||||
| `trustgraph-flow/trustgraph/embeddings/row_embeddings/` | Hesabu ya uingishaji wa mstari (Hatua ya 1). |
|
||||
| `trustgraph-flow/trustgraph/storage/row_embeddings/qdrant/` | Uhifadhi wa uingishaji wa mstari (Hatua ya 2). |
|
||||
|
||||
## Marejeleo
|
||||
|
||||
[Maelezo ya Kiufundi ya Data Iliyopangwa](structured-data.md)
|
||||
567
docs/tech-specs/sw/structured-data-descriptor.sw.md
Normal file
567
docs/tech-specs/sw/structured-data-descriptor.sw.md
Normal file
|
|
@ -0,0 +1,567 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Maelezo ya Muundo wa Data"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Maelezo ya Muundo wa Data
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelezo ya Muundo wa Data ni lugha ya usanidi iliyo msingi ya JSON ambayo inaeleza jinsi ya kuchanganua, kubadilisha, na kuingiza data iliyo na muundo katika TrustGraph. Inatoa njia ya matangazo ya jinsi ya kuingiza data, ikisaidia aina tofauti za pembejeo na mnyororo mgumu wa mabadiliko bila kuhitaji nambari maalum.
|
||||
|
||||
## Dhana Muhimu
|
||||
|
||||
### 1. Ufafanuzi wa Aina
|
||||
Inaeleza aina ya faili ya pembejeo na chaguzi za kuchanganua. Inaamua kichanganuzi gani cha kutumia na jinsi ya kutafsiri data ya chanzo.
|
||||
|
||||
### 2. Ramani za Uwanja
|
||||
Inaunganisha njia za chanzo na uwanja wa lengo pamoja na mabadiliko. Inaeleza jinsi data inavyosonga kutoka kwa vyanzo vya pembejeo hadi kwenye uwanja wa schema ya pato.
|
||||
|
||||
### 3. Mnyororo wa Mabadiliko
|
||||
Mnyororo wa mabadiliko ya data ambayo yanaweza kutumika kwenye thamani za uwanja, pamoja na:
|
||||
Usafishaji wa data (kukata, kusawazisha)
|
||||
Ubadilishaji wa aina (uchanganuzi wa tarehe, ubadilishaji wa aina)
|
||||
Hesabu (hesabu, urekebishaji wa maandishi)
|
||||
Utafiti (meza za rejea, mbadala)
|
||||
|
||||
### 4. Kanuni za Uthibitisho
|
||||
Vipimo vya ubora wa data ambavyo hutumika kuhakikisha uadilifu wa data:
|
||||
Uthibitisho wa aina
|
||||
Vipimo vya anuwai
|
||||
Ulinganishi wa muundo (regex)
|
||||
Uthibitisho wa uwanja unaohitajika
|
||||
Mantiki ya uthibitisho maalum
|
||||
|
||||
### 5. Mpangilio wa Jumla
|
||||
Mpangilio ambao hutumika katika mchakato wote wa uingizaji:
|
||||
Meza za utafiti kwa uboreshaji wa data
|
||||
Vigezo na mara kwa mara vya kimataifa
|
||||
Maelezo ya muundo wa pato
|
||||
Sera za kushughulikia makosa
|
||||
|
||||
## Mkakati wa Utendaji
|
||||
|
||||
Utendaji wa programu ya uingizaji unafuata mnyororo huu:
|
||||
|
||||
1. **Changanua Mpangilio** - Pakia na thibitisha maelezo ya JSON
|
||||
2. **Anzisha Kichanganuzi** - Pakia kichanganuzi kinachofaa (CSV, XML, JSON, n.k.) kulingana na `format.type`
|
||||
3. **Tumia Uchujaji wa Awali** - Fanya vichujio na mabadiliko ya kimataifa
|
||||
4. **Chakata Rekodi** - Kwa kila rekodi ya pembejeo:
|
||||
Toa data ukitumia njia za chanzo (JSONPath, XPath, majina ya safu)
|
||||
Tumia mabadiliko ya kiwanja kwa mlolongo
|
||||
Thibitisha matokeo dhidi ya sheria zilizoelezwa
|
||||
Tumia maadili chaguu kwa data inayokosekana
|
||||
5. **Tumia Uchakataji wa Baada** - Fanya uondoaji mara mbili, ukusanyaji, n.k.
|
||||
6. **Toa Pato** - Toa data katika muundo wa lengo uliotakikana
|
||||
|
||||
## Usaidizi wa Maneno ya Njia
|
||||
|
||||
Aina tofauti za pembejeo hutumia lugha zinazofaa za maneno ya njia:
|
||||
|
||||
**CSV**: Majina ya safu au fahirisi (`"column_name"` au `"[2]"`)
|
||||
**JSON**: Lugha ya JSONPath (`"$.user.profile.email"`)
|
||||
**XML**: Maneno ya XPath (`"//product[@id='123']/price"`)
|
||||
**Upana-uliofanyika**: Majina ya uwanja kutoka maelezo ya uwanja
|
||||
|
||||
## Faida
|
||||
|
||||
**Nambari Moja** - Programu moja ya uingizaji inashughulikia aina tofauti za pembejeo
|
||||
**Inafaa kwa Mtumiaji** - Watumiaji ambao hawana ujuzi wa kiufundi wanaweza kuunda mipangilio
|
||||
**Inaweza Kurejeshwa** - Mipangilio inaweza kushirikiwa na kutolewa toleo
|
||||
**Inafaa** - Mabadiliko magumu bila nambari maalum
|
||||
**Imara** - Uthibitisho uliounganishwa na utunzaji wa kina wa makosa
|
||||
**Inaweza Kudumishwa** - Njia ya matangazo hupunguza utata wa utekelezaji
|
||||
|
||||
## Maelezo ya Lugha
|
||||
|
||||
Maelezo ya Muundo wa Data hutumia muundo wa usanidi wa JSON na muundo wa juu unaofuata:
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0",
|
||||
"metadata": {
|
||||
"name": "Configuration Name",
|
||||
"description": "Description of what this config does",
|
||||
"author": "Author Name",
|
||||
"created": "2024-01-01T00:00:00Z"
|
||||
},
|
||||
"format": { ... },
|
||||
"globals": { ... },
|
||||
"preprocessing": [ ... ],
|
||||
"mappings": [ ... ],
|
||||
"postprocessing": [ ... ],
|
||||
"output": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
### Ufafanuzi wa Muundo
|
||||
|
||||
Huainisha muundo wa data ya pembejeo na chaguo za uchakataji:
|
||||
|
||||
```json
|
||||
{
|
||||
"format": {
|
||||
"type": "csv|json|xml|fixed-width|excel|parquet",
|
||||
"encoding": "utf-8",
|
||||
"options": {
|
||||
// Format-specific options
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Chaguo za Muundo wa CSV
|
||||
```json
|
||||
{
|
||||
"format": {
|
||||
"type": "csv",
|
||||
"options": {
|
||||
"delimiter": ",",
|
||||
"quote_char": "\"",
|
||||
"escape_char": "\\",
|
||||
"skip_rows": 1,
|
||||
"has_header": true,
|
||||
"null_values": ["", "NULL", "null", "N/A"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Chaguo za Muundo wa JSON
|
||||
```json
|
||||
{
|
||||
"format": {
|
||||
"type": "json",
|
||||
"options": {
|
||||
"root_path": "$.data",
|
||||
"array_mode": "records|single",
|
||||
"flatten": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Chaguo za Muundo wa XML
|
||||
```json
|
||||
{
|
||||
"format": {
|
||||
"type": "xml",
|
||||
"options": {
|
||||
"root_element": "//records/record",
|
||||
"namespaces": {
|
||||
"ns": "http://example.com/namespace"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Mipangilio ya Ulimwengu
|
||||
|
||||
Fafanua jedwali za utafutaji, vigezo, na usanidi wa jumla:
|
||||
|
||||
```json
|
||||
{
|
||||
"globals": {
|
||||
"variables": {
|
||||
"current_date": "2024-01-01",
|
||||
"batch_id": "BATCH_001",
|
||||
"default_confidence": 0.8
|
||||
},
|
||||
"lookup_tables": {
|
||||
"country_codes": {
|
||||
"US": "United States",
|
||||
"UK": "United Kingdom",
|
||||
"CA": "Canada"
|
||||
},
|
||||
"status_mapping": {
|
||||
"1": "active",
|
||||
"0": "inactive"
|
||||
}
|
||||
},
|
||||
"constants": {
|
||||
"source_system": "legacy_crm",
|
||||
"import_type": "full"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Ramani za Vyanzo vya Habari
|
||||
|
||||
Eleza jinsi data kutoka chanzo inavyolingana na vyanzo vya habari vya lengo, pamoja na mabadiliko:
|
||||
|
||||
```json
|
||||
{
|
||||
"mappings": [
|
||||
{
|
||||
"target_field": "person_name",
|
||||
"source": "$.name",
|
||||
"transforms": [
|
||||
{"type": "trim"},
|
||||
{"type": "title_case"},
|
||||
{"type": "required"}
|
||||
],
|
||||
"validation": [
|
||||
{"type": "min_length", "value": 2},
|
||||
{"type": "max_length", "value": 100},
|
||||
{"type": "pattern", "value": "^[A-Za-z\\s]+$"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"target_field": "age",
|
||||
"source": "$.age",
|
||||
"transforms": [
|
||||
{"type": "to_int"},
|
||||
{"type": "default", "value": 0}
|
||||
],
|
||||
"validation": [
|
||||
{"type": "range", "min": 0, "max": 150}
|
||||
]
|
||||
},
|
||||
{
|
||||
"target_field": "country",
|
||||
"source": "$.country_code",
|
||||
"transforms": [
|
||||
{"type": "lookup", "table": "country_codes"},
|
||||
{"type": "default", "value": "Unknown"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Aina za Ubadilishaji
|
||||
|
||||
Kazi zinazopatikana za kubadilisha:
|
||||
|
||||
#### Ubadilishaji wa Mfumo wa Herufi
|
||||
```json
|
||||
{"type": "trim"},
|
||||
{"type": "upper"},
|
||||
{"type": "lower"},
|
||||
{"type": "title_case"},
|
||||
{"type": "replace", "pattern": "old", "replacement": "new"},
|
||||
{"type": "regex_replace", "pattern": "\\d+", "replacement": "XXX"},
|
||||
{"type": "substring", "start": 0, "end": 10},
|
||||
{"type": "pad_left", "length": 10, "char": "0"}
|
||||
```
|
||||
|
||||
#### Tafsiri za Aina
|
||||
```json
|
||||
{"type": "to_string"},
|
||||
{"type": "to_int"},
|
||||
{"type": "to_float"},
|
||||
{"type": "to_bool"},
|
||||
{"type": "to_date", "format": "YYYY-MM-DD"},
|
||||
{"type": "parse_json"}
|
||||
```
|
||||
|
||||
#### Operesheni za Data
|
||||
```json
|
||||
{"type": "default", "value": "default_value"},
|
||||
{"type": "lookup", "table": "table_name"},
|
||||
{"type": "concat", "values": ["field1", " - ", "field2"]},
|
||||
{"type": "calculate", "expression": "${field1} + ${field2}"},
|
||||
{"type": "conditional", "condition": "${age} > 18", "true_value": "adult", "false_value": "minor"}
|
||||
```
|
||||
|
||||
### Kanuni za Uthibitisho
|
||||
|
||||
Uchunguzi wa ubora wa data pamoja na udhibiti wa makosa unaoweza kusanidiwa:
|
||||
|
||||
### Uthibitisho Msingi
|
||||
```json
|
||||
{"type": "required"},
|
||||
{"type": "not_null"},
|
||||
{"type": "min_length", "value": 5},
|
||||
{"type": "max_length", "value": 100},
|
||||
{"type": "range", "min": 0, "max": 1000},
|
||||
{"type": "pattern", "value": "^[A-Z]{2,3}$"},
|
||||
{"type": "in_list", "values": ["active", "inactive", "pending"]}
|
||||
```
|
||||
|
||||
#### Uthibitisho Maalum
|
||||
```json
|
||||
{
|
||||
"type": "custom",
|
||||
"expression": "${age} >= 18 && ${country} == 'US'",
|
||||
"message": "Must be 18+ and in US"
|
||||
},
|
||||
{
|
||||
"type": "cross_field",
|
||||
"fields": ["start_date", "end_date"],
|
||||
"expression": "${start_date} < ${end_date}",
|
||||
"message": "Start date must be before end date"
|
||||
}
|
||||
```
|
||||
|
||||
### Maandalizi na Urekebishaji
|
||||
|
||||
Operesheni za jumla zinazotumiwa kabla na baada ya uhamishaji wa data:
|
||||
|
||||
```json
|
||||
{
|
||||
"preprocessing": [
|
||||
{
|
||||
"type": "filter",
|
||||
"condition": "${status} != 'deleted'"
|
||||
},
|
||||
{
|
||||
"type": "sort",
|
||||
"field": "created_date",
|
||||
"order": "asc"
|
||||
}
|
||||
],
|
||||
"postprocessing": [
|
||||
{
|
||||
"type": "deduplicate",
|
||||
"key_fields": ["email", "phone"]
|
||||
},
|
||||
{
|
||||
"type": "aggregate",
|
||||
"group_by": ["country"],
|
||||
"functions": {
|
||||
"total_count": {"type": "count"},
|
||||
"avg_age": {"type": "avg", "field": "age"}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Usanidi wa Matokeo
|
||||
|
||||
Eleza jinsi data iliyochakatwa inapaswa kuonyeshwa:
|
||||
|
||||
```json
|
||||
{
|
||||
"output": {
|
||||
"format": "trustgraph-objects",
|
||||
"schema_name": "person",
|
||||
"options": {
|
||||
"batch_size": 1000,
|
||||
"confidence": 0.9,
|
||||
"source_span_field": "raw_text",
|
||||
"metadata": {
|
||||
"source": "crm_import",
|
||||
"version": "1.0"
|
||||
}
|
||||
},
|
||||
"error_handling": {
|
||||
"on_validation_error": "skip|fail|log",
|
||||
"on_transform_error": "skip|fail|default",
|
||||
"max_errors": 100,
|
||||
"error_output": "errors.json"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Mfano Kamili
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0",
|
||||
"metadata": {
|
||||
"name": "Customer Import from CRM CSV",
|
||||
"description": "Imports customer data from legacy CRM system",
|
||||
"author": "Data Team",
|
||||
"created": "2024-01-01T00:00:00Z"
|
||||
},
|
||||
"format": {
|
||||
"type": "csv",
|
||||
"encoding": "utf-8",
|
||||
"options": {
|
||||
"delimiter": ",",
|
||||
"has_header": true,
|
||||
"skip_rows": 1
|
||||
}
|
||||
},
|
||||
"globals": {
|
||||
"variables": {
|
||||
"import_date": "2024-01-01",
|
||||
"default_confidence": 0.85
|
||||
},
|
||||
"lookup_tables": {
|
||||
"country_codes": {
|
||||
"US": "United States",
|
||||
"CA": "Canada",
|
||||
"UK": "United Kingdom"
|
||||
}
|
||||
}
|
||||
},
|
||||
"preprocessing": [
|
||||
{
|
||||
"type": "filter",
|
||||
"condition": "${status} == 'active'"
|
||||
}
|
||||
],
|
||||
"mappings": [
|
||||
{
|
||||
"target_field": "full_name",
|
||||
"source": "customer_name",
|
||||
"transforms": [
|
||||
{"type": "trim"},
|
||||
{"type": "title_case"}
|
||||
],
|
||||
"validation": [
|
||||
{"type": "required"},
|
||||
{"type": "min_length", "value": 2}
|
||||
]
|
||||
},
|
||||
{
|
||||
"target_field": "email",
|
||||
"source": "email_address",
|
||||
"transforms": [
|
||||
{"type": "trim"},
|
||||
{"type": "lower"}
|
||||
],
|
||||
"validation": [
|
||||
{"type": "pattern", "value": "^[\\w.-]+@[\\w.-]+\\.[a-zA-Z]{2,}$"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"target_field": "age",
|
||||
"source": "age",
|
||||
"transforms": [
|
||||
{"type": "to_int"},
|
||||
{"type": "default", "value": 0}
|
||||
],
|
||||
"validation": [
|
||||
{"type": "range", "min": 0, "max": 120}
|
||||
]
|
||||
},
|
||||
{
|
||||
"target_field": "country",
|
||||
"source": "country_code",
|
||||
"transforms": [
|
||||
{"type": "lookup", "table": "country_codes"},
|
||||
{"type": "default", "value": "Unknown"}
|
||||
]
|
||||
}
|
||||
],
|
||||
"output": {
|
||||
"format": "trustgraph-objects",
|
||||
"schema_name": "customer",
|
||||
"options": {
|
||||
"confidence": "${default_confidence}",
|
||||
"batch_size": 500
|
||||
},
|
||||
"error_handling": {
|
||||
"on_validation_error": "log",
|
||||
"max_errors": 50
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Maagizo ya LLM kwa Ajili ya Uundaji wa Maelezo
|
||||
|
||||
Maagizo yafuatayo yanaweza kutumika ili LLM iweze kuchanganua data ya mfano na kuunda usanidi wa maelezo:
|
||||
|
||||
```
|
||||
I need you to analyze the provided data sample and create a Structured Data Descriptor configuration in JSON format.
|
||||
|
||||
The descriptor should follow this specification:
|
||||
- version: "1.0"
|
||||
- metadata: Configuration name, description, author, and creation date
|
||||
- format: Input format type and parsing options
|
||||
- globals: Variables, lookup tables, and constants
|
||||
- preprocessing: Filters and transformations applied before mapping
|
||||
- mappings: Field-by-field mapping from source to target with transformations and validations
|
||||
- postprocessing: Operations like deduplication or aggregation
|
||||
- output: Target format and error handling configuration
|
||||
|
||||
ANALYZE THE DATA:
|
||||
1. Identify the format (CSV, JSON, XML, etc.)
|
||||
2. Detect delimiters, encodings, and structure
|
||||
3. Find data types for each field
|
||||
4. Identify patterns and constraints
|
||||
5. Look for fields that need cleaning or transformation
|
||||
6. Find relationships between fields
|
||||
7. Identify lookup opportunities (codes that map to values)
|
||||
8. Detect required vs optional fields
|
||||
|
||||
CREATE THE DESCRIPTOR:
|
||||
For each field in the sample data:
|
||||
- Map it to an appropriate target field name
|
||||
- Add necessary transformations (trim, case conversion, type casting)
|
||||
- Include appropriate validations (required, patterns, ranges)
|
||||
- Set defaults for missing values
|
||||
|
||||
Include preprocessing if needed:
|
||||
- Filters to exclude invalid records
|
||||
- Sorting requirements
|
||||
|
||||
Include postprocessing if beneficial:
|
||||
- Deduplication on key fields
|
||||
- Aggregation for summary data
|
||||
|
||||
Configure output for TrustGraph:
|
||||
- format: "trustgraph-objects"
|
||||
- schema_name: Based on the data entity type
|
||||
- Appropriate error handling
|
||||
|
||||
DATA SAMPLE:
|
||||
[Insert data sample here]
|
||||
|
||||
ADDITIONAL CONTEXT (optional):
|
||||
- Target schema name: [if known]
|
||||
- Business rules: [any specific requirements]
|
||||
- Data quality issues to address: [known problems]
|
||||
|
||||
Generate a complete, valid Structured Data Descriptor configuration that will properly import this data into TrustGraph. Include comments explaining key decisions.
|
||||
```
|
||||
|
||||
### Mfano wa Matumizi
|
||||
|
||||
```
|
||||
I need you to analyze the provided data sample and create a Structured Data Descriptor configuration in JSON format.
|
||||
|
||||
[Standard instructions from above...]
|
||||
|
||||
DATA SAMPLE:
|
||||
```csv
|
||||
Kitambulisho cha Mteja,Jina,Barua pepe,Umri,Nchi,Hali,Tarehe ya Kujiunga,Ununuzi Jumla
|
||||
1001,"Smith, John",john.smith@email.com,35,US,1,2023-01-15,5420.50
|
||||
1002,"doe, jane",JANE.DOE@GMAIL.COM,28,CA,1,2023-03-22,3200.00
|
||||
1003,"Bob Johnson",bob@,62,UK,0,2022-11-01,0
|
||||
1004,"Alice Chen","alice.chen@company.org",41,US,1,2023-06-10,8900.25
|
||||
1005,,invalid-email,25,XX,1,2024-01-01,100
|
||||
```
|
||||
|
||||
ADDITIONAL CONTEXT:
|
||||
- Target schema name: customer
|
||||
- Business rules: Email should be valid and lowercase, names should be title case
|
||||
- Data quality issues: Some emails are invalid, some names are missing, country codes need mapping
|
||||
```
|
||||
|
||||
### Ombi la Kuchanganua Data Zilizopo Bila Sampuli
|
||||
|
||||
```
|
||||
I need you to help me create a Structured Data Descriptor configuration for importing [data type] data.
|
||||
|
||||
The source data has these characteristics:
|
||||
- Format: [CSV/JSON/XML/etc]
|
||||
- Fields: [list the fields]
|
||||
- Data quality issues: [describe any known issues]
|
||||
- Volume: [approximate number of records]
|
||||
|
||||
Requirements:
|
||||
- [List any specific transformation needs]
|
||||
- [List any validation requirements]
|
||||
- [List any business rules]
|
||||
|
||||
Please generate a Structured Data Descriptor configuration that will:
|
||||
1. Parse the input format correctly
|
||||
2. Clean and standardize the data
|
||||
3. Validate according to the requirements
|
||||
4. Handle errors gracefully
|
||||
5. Output in TrustGraph ExtractedObject format
|
||||
|
||||
Focus on making the configuration robust and reusable.
|
||||
```
|
||||
130
docs/tech-specs/sw/structured-data-schemas.sw.md
Normal file
130
docs/tech-specs/sw/structured-data-schemas.sw.md
Normal file
|
|
@ -0,0 +1,130 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Mfano wa Data, Mbadala ya Pulsar"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Mfano wa Data, Mbadala ya Pulsar
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Maelezo
|
||||
|
||||
Kulingana na toleo la `STRUCTURED_DATA.md`, hati hii inatoa mabadiliko muhimu ya mfano wa Pulsar na mabadiliko, ili kupendeza uwezo wa data iliyoundwa katika TrustGraph.
|
||||
|
||||
## Mabadiliko muhimu ya mfano
|
||||
|
||||
### 1. Uboreshaji wa mfano
|
||||
#### Maelezo ya shamba
|
||||
Sasa, kwenye `Field` class katika `core/primitives.py`, lazima ipate mali zaidi:
|
||||
|
||||
```python
|
||||
class Field(Record):
|
||||
name = String()
|
||||
type = String() # int, string, long, bool, float, double, timestamp
|
||||
size = Integer()
|
||||
primary = Boolean()
|
||||
description = String()
|
||||
# MAELEZO MPYA:
|
||||
required = Boolean() # Mara kama shamba ni muhimu
|
||||
enum_values = Array(String()) # Kwa miundo ya shamba
|
||||
indexed = Boolean() # Mara kama shamba linahitajika
|
||||
```
|
||||
|
||||
### 2. Mfano mpya wa Maarifa
|
||||
|
||||
#### 2.1 Utumaji Data Iliyoundwa
|
||||
Faili mpya: `knowledge/structured.py`
|
||||
|
||||
```python
|
||||
from pulsar.schema import Record, String, Bytes, Map
|
||||
from ..core.metadata import Metadata
|
||||
|
||||
class StructuredDataSubmission(Record):
|
||||
metadata = Metadata()
|
||||
format = String() # "json", "csv", "xml"
|
||||
schema_name = String() # Mara kama mfano katika faili
|
||||
data = Bytes() # Data iliyoundwa
|
||||
options = Map(String()) # Chaguzi maalum kwa format
|
||||
```
|
||||
|
||||
### 3. Mfano mpya wa Huduma
|
||||
|
||||
#### 3.1 Huduma ya NLP hadi Sarani ya Data
|
||||
Faili mpya: `services/nlp_query.py`
|
||||
|
||||
```python
|
||||
from pulsar.schema import Record, String, Array, Map, Integer, Double
|
||||
from ..core.primitives import Error
|
||||
|
||||
class NLPToStructuredQueryRequest(Record):
|
||||
natural_language_query = String()
|
||||
max_results = Integer()
|
||||
context_hints = Map(String()) # Mara kama mawasiliano kwa utengenezaji wa sarani
|
||||
|
||||
class NLPToStructuredQueryResponse(Record):
|
||||
error = Error()
|
||||
graphql_query = String() # Sarani GraphQL iliyoundwa
|
||||
variables = Map(String()) # Chaguzi GraphQL
|
||||
detected_schemas = Array(String()) # Miundo ambazo sarani huangalia
|
||||
confidence = Double()
|
||||
```
|
||||
|
||||
#### 3.2 Sarani ya Data
|
||||
Faili mpya: `services/structured_query.py`
|
||||
|
||||
```python
|
||||
from pulsar.schema import Record, String, Map, Array
|
||||
from ..core.primitives import Error
|
||||
|
||||
class StructuredQueryRequest(Record):
|
||||
query = String() # Sarani GraphQL
|
||||
variables = Map(String()) # Chaguzi GraphQL
|
||||
operation_name = String() # Mara kama jina la operesheni kwa hati za mfululizo
|
||||
|
||||
class StructuredQueryResponse(Record):
|
||||
error = Error()
|
||||
data = String() # Data iliyoundwa kwa JSON
|
||||
errors = Array(String()) # Mara kama ada GraphQL
|
||||
```
|
||||
|
||||
#### 2.2 Pato la Uteuzi wa Madhara
|
||||
Faili mpya: `knowledge/object.py`
|
||||
|
||||
```python
|
||||
from pulsar.schema import Record, String, Map, Double
|
||||
from ..core.metadata import Metadata
|
||||
|
||||
class ExtractedObject(Record):
|
||||
metadata = Metadata()
|
||||
schema_name = String() # Mara kama mfano
|
||||
values = Map(String()) # Jina la shamba -> thamani
|
||||
confidence = Double()
|
||||
source_span = String() # Mara kama kitanzi
|
||||
```
|
||||
|
||||
### 4. Mfano wa Maarifa
|
||||
|
||||
#### 4.1 Uboreshaji wa Embedings
|
||||
Badilisha `knowledge/embeddings.py` ili kusaidia uhifadhi wa madhara iliyoundwa:
|
||||
|
||||
```python
|
||||
class StructuredObjectEmbedding(Record):
|
||||
metadata = Metadata()
|
||||
vectors = Array(Array(Double()))
|
||||
schema_name = String()
|
||||
object_id = String() # Thamani muhimu
|
||||
field_embeddings = Map(Array(Double())) # Embedings kwa kila shamba
|
||||
```
|
||||
|
||||
## Vitu vya Uunganishi
|
||||
|
||||
### Uunganishi wa Mzunguko
|
||||
|
||||
Mifano itatumika na moduli mpya za mzunguko:
|
||||
- `trustgraph-flow/trustgraph/decoding/structured` - Inatumia StructuredDataSubmission
|
||||
- `trustgraph-flow/trustgraph/query/nlp_query/cassandra` - Inatumia mifano za sarani
|
||||
- `trustgraph-flow/trustgraph/query/objects/cassandra` - Inatumia mifano za sarani
|
||||
- `trustgraph-flow/trustgraph/extract/object/row/` - Inatumia Chunk, inatoa ExtractedObject
|
||||
- `trustgraph-flow/trustgraph/storage/objects/cassandra` - Inatumia mfano wa Rows
|
||||
- `trustgraph-flow/trustgraph/embeddings/object_embeddings/qdrant` - Inatumia mifano za embedings
|
||||
260
docs/tech-specs/sw/structured-data.sw.md
Normal file
260
docs/tech-specs/sw/structured-data.sw.md
Normal file
|
|
@ -0,0 +1,260 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vipimo vya Teknisia vya Data Iliyoainishwa"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vipimo vya Teknisia vya Data Iliyoainishwa
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelekezo haya yanaelezea jinsi TrustGraph inavyounganishwa na mtiririko wa data iliyoainishwa, na kuwezesha mfumo kufanya kazi na data ambayo inaweza kuwakilishwa kama mistari katika meza au vitu katika maduka ya vitu. Uunganisho huu unaunga mkono matumizi manne makuu:
|
||||
|
||||
1. **Utoaji kutoka kwa Data Isiyoainishwa hadi Imeinishwa**: Soma vyanzo vya data visivyoainishwa, tambua na uondoe muundo wa vitu, na uihifadhi katika umbizo wa meza.
|
||||
2. **Uingizaji wa Data Imeinishwa**: Pakia data ambayo tayari iko katika umbizo iliyoainishwa moja kwa moja katika duka la data iliyoainishwa pamoja na data iliyoondolewa.
|
||||
3. **Uulizaje kwa Lugha Asilia**: Badilisha maswali ya lugha asilia katika maswali iliyoainishwa ili kuchuja data inayolingana kutoka kwa duka.
|
||||
4. **Uulizaje wa Moja kwa Moja wa Imeinishwa**: Fanya maswali iliyoainishwa moja kwa moja dhidi ya duka la data ili kupata data kwa usahihi.
|
||||
|
||||
## Lengo
|
||||
|
||||
**Ufikiaji Umoja wa Data**: Toa kiungo kimoja cha kufikia data zote, iliyoainishwa na isiyoainishwa, ndani ya TrustGraph.
|
||||
**Uunganisho Kamili**: Uwezesha utendaji wa pamoja kati ya uwakilishi wa maarifa wa TrustGraph unaotegemea chati na umbizo wa jadi wa data iliyoainishwa.
|
||||
**Utoaji Wenye Ugumu**: Unga uondoleaji wa moja kwa moja wa data iliyoainishwa kutoka kwa vyanzo mbalimbali visivyoainishwa (nyaraka, maandishi, n.k.).
|
||||
**Uwezekano wa Uulizaje**: Ruhusu watumiaji kuuliza data kwa kutumia lugha ya asilia na lugha za uulizaje iliyoainishwa.
|
||||
**Ulinganifu wa Data**: Dumishe uadilifu na ulinganifu wa data katika uwakilishi tofauti wa data.
|
||||
**Uboreshaji wa Utendaji**: Hakikisha uhifadhi na upekuzi wa ufanisi wa data iliyoainishwa kwa kiwango kikubwa.
|
||||
**Uwezekano wa Mfumo**: Unga mifumo ya "andika-mfumo" na "soma-mfumo" ili kukidhi vyanzo tofauti vya data.
|
||||
**Ulinganifu na Mifumo ya Zamani**: Dumishe utendaji wa sasa wa TrustGraph huku uongezwa uwezekano wa data iliyoainishwa.
|
||||
|
||||
## Asili
|
||||
|
||||
Hivi sasa, TrustGraph inafaa katika kuchakata data isiyoainishwa na kuunda chati za maarifa kutoka kwa vyanzo tofauti. Hata hivyo, matumizi mengi ya kampuni yanahusisha data ambayo ina muundo - rekodi za wateja, magogo ya miamala, hifadhi za bidhaa, na mengineyo ya seti za data za meza. Data hii iliyoainishwa mara nyingi inahitaji kuchanganuliwa pamoja na maudhui isiyoainishwa ili kutoa ufahamu kamili.
|
||||
|
||||
Mapungufu ya sasa ni pamoja na:
|
||||
Hakuna msaada wa asili kwa kuingiza umbizo la awali la data (CSV, safu za JSON, mauzo ya hifadhi ya data).
|
||||
Uwezekano wa kutohifadhi muundo halisi wakati wa kuondoa data ya meza kutoka kwa nyaraka.
|
||||
Ukosefu wa mitambo ya uulizaje ya ufanisi kwa muundo wa data iliyoainishwa.
|
||||
Upungufu wa daraja kati ya maswali kama ya SQL na maswali ya chati ya TrustGraph.
|
||||
|
||||
Maelekezo haya yanaashiria pengo hizi kwa kuleta safu ya data iliyoainishwa ambayo inakamilisha uwezekano wa sasa wa TrustGraph. Kwa kusaidia data iliyoainishwa kwa asili, TrustGraph inaweza:
|
||||
Kutoa jukwaa la umoja kwa uchanganuzi wa data iliyoainishwa na isiyoainishwa.
|
||||
Kuwezesha maswali ya mchanganyiko ambayo yanaenea katika uhusiano wa chati na data ya meza.
|
||||
Kutoa kiungo cha kawaida kwa watumiaji ambao wamezoea kufanya kazi na data iliyoainishwa.
|
||||
Kufungua matumizi mapya katika ujumuishaji wa data na ujasusi wa biashara.
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Uunganisho wa data iliyoainishwa unahitaji vipengele vifuatavyo vya kiufundi:
|
||||
|
||||
1. **Huduma ya NLP-kwa-Uulizaje-Imeinishwa**
|
||||
Inabadilisha maswali ya lugha asilia katika maswali iliyoainishwa.
|
||||
Inasaidia malengo mengi ya lugha ya uulizaje (hasa, usanifu kama wa SQL).
|
||||
Inaunganishwa na uwezekano wa sasa wa NLP ya TrustGraph.
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/query/nlp_query/cassandra
|
||||
|
||||
2. **Usaidizi wa Mfumo wa Mpangilio** ✅ **[IMEKAMILIKA]**
|
||||
Mfumo ulioongezwa wa mpangilio ili kuhifadhi umbizo wa data iliyoainishwa.
|
||||
Usaidizi wa kufafanua muundo wa meza, aina za sehemu, na uhusiano.
|
||||
Utoleaji wa toleo na uwezekano wa uhamishaji wa mfumo.
|
||||
|
||||
3. **Moduli ya Utoaji wa Vitu** ✅ **[IMEKAMILIKA]**
|
||||
Uunganisho uliorekebishwa wa mtiririko wa uondoleaji wa maarifa.
|
||||
Inatambua na kuondoa vitu vilivyoainishwa kutoka kwa vyanzo visivyoainishwa.
|
||||
Inahifadhi asili na alama za uaminifu.
|
||||
Inasajili kiungo cha usanidi (mfano: trustgraph-flow/trustgraph/prompt/template/service.py) ili kupokea data ya usanidi na kuondoa maelezo ya mfumo.
|
||||
Inapokea vitu na kuyaondoa kuwa vitu vya ExtractedObject ili kuwasilisha kwenye folyo ya Pulsar.
|
||||
NOTE: Kuna msimbo uliopo kwenye `trustgraph-flow/trustgraph/extract/object/row/`. Hii ilikuwa jaribio la awali na itahitaji marekebisho makubwa kwani haikubaliana na API za sasa. Tumia ikiwa ni muhimu, anza kutoka mwanzo ikiwa sio.
|
||||
Inahitaji kiungo cha mstari wa amri: `kg-extract-objects`
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/extract/kg/objects/
|
||||
|
||||
4. **Moduli ya Kuandika ya Duka la Imeinishwa** ✅ **[IMEKAMILIKA]**
|
||||
Inapokea vitu katika umbizo wa ExtractedObject kutoka kwa folyo za Pulsar.
|
||||
Utumiaji wa awali unalenga Apache Cassandra kama duka la data iliyoainishwa.
|
||||
Inashughulikia uundaji wa meza ya moja kwa moja kulingana na umbizo uliokutana.
|
||||
Inadhibiti ramani ya mfumo-kwa-meza ya Cassandra na ubadilishaji wa data.
|
||||
Inatoa operesheni za kuandika za kundi na za mtiririko kwa uboreshaji wa utendaji.
|
||||
Hakuna matokeo ya Pulsar - hii ni huduma ya mwisho katika mtiririko wa data.
|
||||
|
||||
**Ushughulikiaji wa Mfumo**:
|
||||
Inafuatilia meseji zinazoingia za ExtractedObject kwa marejeleo ya mfumo.
|
||||
|
||||
|
||||
Inapaswa kuzingatia kama itapokea maelezo ya muundo moja kwa moja au itategemea majina ya muundo katika ujumbe wa ExtractedObject.
|
||||
|
||||
**Ramapishi ya Jedwali la Cassandra**:
|
||||
Jina la keyspace linatokana na sehemu `user` kutoka Metadata ya ExtractedObject
|
||||
Jina la jedwali linatokana na sehemu `schema_name` kutoka ExtractedObject
|
||||
Mkusanyiko kutoka Metadata unakuwa sehemu ya ufunguo wa partition ili kuhakikisha:
|
||||
Usambazaji wa data kwa njia ya asili katika nodi za Cassandra
|
||||
Maswali (queries) bora ndani ya mkusanyiko maalum
|
||||
Utengano wa mantiki kati ya uingizaji wa data tofauti/vyanzo
|
||||
Muundo wa ufunguo mkuu: `PRIMARY KEY ((collection, <schema_primary_key_fields>), <clustering_keys>)`
|
||||
Mkusanyiko huwa sehemu ya kwanza ya ufunguo wa partition
|
||||
Sehemu za ufunguo mkuu zilizobainishwa katika schema zinafuata kama sehemu ya ufunguo wa partition iliyounganishwa
|
||||
Hii inahitaji maswali (queries) yataonyesha mkusanyiko, kuhakikisha utendaji unaoweza kutabirika
|
||||
Ufafanuzi wa sehemu unahusishwa na safu za Cassandra na mabadiliko ya aina:
|
||||
`string` → `text`
|
||||
`integer` → `int` au `bigint` kulingana na ukubwa
|
||||
`float` → `float` au `double` kulingana na mahitaji ya usahihi
|
||||
`boolean` → `boolean`
|
||||
`timestamp` → `timestamp`
|
||||
`enum` → `text` na uthibitishaji wa kiwango cha programu
|
||||
Sehemu zilizo na fahirisi huunda fahirisi za sekondari za Cassandra (isipokuwa sehemu zilizopo katika ufunguo mkuu)
|
||||
Sehemu zinazohitajika zinafanywa katika kiwango cha programu (Cassandra haitumii NOT NULL)
|
||||
|
||||
**Hifadhi ya Data (Object Storage)**:
|
||||
Inatoa maadili kutoka ramani ya ExtractedObject.values
|
||||
Inafanya mabadiliko ya aina na uthibitishaji kabla ya kuingizwa
|
||||
Inashughulikia sehemu za hiari ambazo hazipo kwa utulivu
|
||||
Inahifadhi metadata kuhusu asili ya data (hati ya chanzo, alama za uaminifu)
|
||||
Inasaidia uandikaji ambao unaweza kufanywa tena ili kushughulikia hali za kucheza tena ujumbe
|
||||
|
||||
**Maelezo ya Utendaji**:
|
||||
Msimbo uliopo katika `trustgraph-flow/trustgraph/storage/objects/cassandra/` ni wa zamani na haukidhi vipimo vya sasa vya API
|
||||
Inapaswa kurejelea `trustgraph-flow/trustgraph/storage/triples/cassandra` kama mfano wa mchakato wa hifadhi unaofanya kazi
|
||||
Inahitaji tathmini ya msimbo uliopo ili kuona ikiwa kuna sehemu ambazo zinaweza kutumika tena kabla ya kuamua kufanya marekebisho au kuandika upya
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/storage/objects/cassandra
|
||||
|
||||
5. **Huduma ya Maswali (Structured Query Service)** ✅ **[IMEKAMILIKA]**
|
||||
Inakubali maswali ya muundo katika muundo uliotolewa
|
||||
Inatekeleza maswali dhidi ya hifadhi ya muundo
|
||||
Inarudisha data inayolingana na vigezo vya swali
|
||||
Inasaidia upangishaji na uchujaji wa matokeo
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/query/objects/cassandra
|
||||
|
||||
6. **Uunganisho wa Zana za Wakala (Agent Tool Integration)**
|
||||
Darasa jipya la zana kwa mifumo ya wakala
|
||||
Inaruhusu wakala kuuliza hifadhi za data zilizopangwa
|
||||
Inatoa interfaces ya lugha ya asili na maswali ya muundo
|
||||
Inajumuishwa na michakato iliyopo ya wakala ya kufanya maamuzi
|
||||
|
||||
7. **Huduma ya Uingizaji wa Data Iliyopangwa (Structured Data Ingestion Service)**
|
||||
Inakubali data iliyopangwa katika muundo mbalimbali (JSON, CSV, XML)
|
||||
Inachanganua na kuthibitisha data inayokuja dhidi ya schemas zilizobainishwa
|
||||
Inabadilisha data kuwa mitirisho ya data iliyopangwa
|
||||
Inatoa data kwa folyo za ujumbe zinazofaa kwa usindikaji
|
||||
Inasaidia upakiaji wa wingi na uingizaji wa mtiririko
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/decoding/structured
|
||||
|
||||
8. **Huduma ya Uwekaji wa Data (Object Embedding Service)**
|
||||
Inazalisha uwekaji wa vector kwa data iliyopangwa
|
||||
Inaruhusu utafutaji wa semantic katika data iliyopangwa
|
||||
Inasaidia utafutaji wa mchanganyiko unaounganisha maswali ya muundo na ufanano wa semantic
|
||||
Inajumuishwa na hifadhi za vector zilizopo
|
||||
|
||||
Moduli: trustgraph-flow/trustgraph/embeddings/object_embeddings/qdrant
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
#### Utaratibu wa Uhifadhi wa Schema
|
||||
|
||||
Schemas zinawekwa katika mfumo wa usanidi wa TrustGraph kwa kutumia muundo ufuatao:
|
||||
|
||||
**Aina**: `schema` (thamani iliyobainishwa kwa schemas zote za data iliyopangwa)
|
||||
**Ufunguo**: Jina/kitambulisho cha kipekee cha schema (k.m., `customer_records`, `transaction_log`)
|
||||
**Thamani**: Ufafanuzi wa schema ya JSON unao na muundo
|
||||
|
||||
Ingizo la usanidi wa mfano:
|
||||
```
|
||||
Type: schema
|
||||
Key: customer_records
|
||||
Value: {
|
||||
"name": "customer_records",
|
||||
"description": "Customer information table",
|
||||
"fields": [
|
||||
{
|
||||
"name": "customer_id",
|
||||
"type": "string",
|
||||
"primary_key": true
|
||||
},
|
||||
{
|
||||
"name": "name",
|
||||
"type": "string",
|
||||
"required": true
|
||||
},
|
||||
{
|
||||
"name": "email",
|
||||
"type": "string",
|
||||
"required": true
|
||||
},
|
||||
{
|
||||
"name": "registration_date",
|
||||
"type": "timestamp"
|
||||
},
|
||||
{
|
||||
"name": "status",
|
||||
"type": "string",
|
||||
"enum": ["active", "inactive", "suspended"]
|
||||
}
|
||||
],
|
||||
"indexes": ["email", "registration_date"]
|
||||
}
|
||||
```
|
||||
|
||||
Mbinu hii inaruhusu:
|
||||
Ufafanuzi wa muundo (schema) unaobadilika bila mabadiliko ya programu
|
||||
Marekebisho na matoleo ya muundo (schema) rahisi
|
||||
Uunganishaji thabiti na usimamizi wa usanidi wa TrustGraph uliopo
|
||||
Usaidizi wa muundo (schemas) nyingi ndani ya matumizi moja
|
||||
|
||||
### API
|
||||
|
||||
API mpya:
|
||||
Muundo (schemas) za Pulsar kwa aina zilizo hapo juu
|
||||
Vifaa vya Pulsar katika mtiririko mpya
|
||||
Inahitajika njia ya kutaja aina za muundo (schema) katika mitiririko ili mitiririko iweze kujua
|
||||
aina gani za muundo (schema) kupakua
|
||||
API zimeongezwa kwenye lango na lango la marekebisho
|
||||
|
||||
API zilizobadilishwa:
|
||||
Vifaa vya utoaji wa maarifa - Ongeza chaguo la pato la kitu kilicho na muundo
|
||||
Vifaa vya wakala - Ongeza usaidizi wa zana za data iliyo na muundo
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
Kufuata mbinu zilizopo - haya ni moduli mpya tu za usindikaji.
|
||||
Kila kitu kiko katika vifurushi vya trustgraph-flow isipokuwa vipengele vya muundo (schema)
|
||||
katika trustgraph-base.
|
||||
|
||||
Inahitajika kazi ya UI katika Workbench ili kuweza kuonyesha / majaribio ya
|
||||
uwezo huu.
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
Hakuna masuala ya ziada.
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
Maswali kadhaa kuhusu matumizi ya maswali na fahirisi za Cassandra ili maswali
|
||||
yasichanganye.
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
Tumia mkakati wa majaribio uliopo, tutaunda majaribio ya kitengo, mkataba na ujumuishaji.
|
||||
|
||||
## Mpango wa Uhamisho
|
||||
|
||||
Hakuna.
|
||||
|
||||
## Muda
|
||||
|
||||
Haikubainishwa.
|
||||
|
||||
## Maswali Yaliyofunguliwa
|
||||
|
||||
Je, hii inaweza kufanywa ili kufanya kazi na aina zingine za hifadhi? Tunalenga kutumia
|
||||
vifaa ambavyo hufanya moduli zinazofanya kazi na hifadhi moja kuwa zinapatikana kwa
|
||||
hifadhi zingine.
|
||||
|
||||
## Marejeleo
|
||||
|
||||
n/a.
|
||||
281
docs/tech-specs/sw/structured-diag-service.sw.md
Normal file
281
docs/tech-specs/sw/structured-diag-service.sw.md
Normal file
|
|
@ -0,0 +1,281 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Vigezo vya Kisaikolojia kwa Huduma ya Utambuzi wa Data Imebuniwa"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Vigezo vya Kisaikolojia kwa Huduma ya Utambuzi wa Data Imebuniwa
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Vigezo hivi vinaelezea huduma mpya inayoweza kutumika kwa utambuzi na uchambuzi wa data iliyobuniwa ndani ya TrustGraph. Huduma hii hutoa utendakazi kutoka kwa zana ya `tg-load-structured-data` iliyopo ya mstari wa amri na kuifanya kuwa huduma ya ombi/jibu, na hivyo kuwezesha ufikiaji wa programu kwa uwezo wa utambuzi wa aina ya data na uundaji wa maelezo.
|
||||
|
||||
Huduma hii inasaidia operesheni tatu kuu:
|
||||
|
||||
1. **Utambuzi wa Aina ya Data**: Changanua sampuli ya data ili kubaini muundo wake (CSV, JSON, au XML)
|
||||
2. **Uundaji wa Maelezo**: Unda maelezo ya TrustGraph ya data iliyobuniwa kwa sampuli fulani ya data na aina
|
||||
3. **Utambuzi Mchanganyiko**: Fanya utambuzi wa aina na uundaji wa maelezo kwa pamoja
|
||||
|
||||
## Lengo
|
||||
|
||||
**Kugawa Uchunguzi wa Data**: Toa mantiki ya utambuzi wa data kutoka kwa CLI hadi vipengele vya huduma vinavyoweza kutumika tena
|
||||
**Kuwezesha Ufikiaji wa Programu**: Toa ufikiaji wa API kwa uwezo wa uchambuzi wa data
|
||||
**Kusaidia Muundo Mbalimbali wa Data**: Shirikisha muundo wa data wa CSV, JSON, na XML kwa uthabiti
|
||||
**Kuzalisha Maelezo Sahihi**: Toa maelezo ya data iliyobuniwa ambayo yanaelea data ya chanzo kwa schemas za TrustGraph
|
||||
**Kuhifadhi Utangamano wa Zamani**: Hakikisha utendakazi wa sasa wa CLI unaendelea kufanya kazi
|
||||
**Kuwezesha Uundaji wa Huduma**: Ruhusu huduma zingine kutumia uwezo wa utambuzi wa data
|
||||
**Kuboresha Uwezekano wa Kujaribu**: Tenganisha mantiki ya biashara kutoka kwa kiolesura cha CLI kwa ajili ya majaribio bora
|
||||
**Kusaidia Uchanganuzi wa Msururu**: Wezesha uchanganuzi wa sampuli za data bila kulaini faili nzima
|
||||
|
||||
## Asili
|
||||
|
||||
Kwa sasa, amri ya `tg-load-structured-data` hutoa utendakazi kamili kwa uchambuzi wa data iliyobuniwa na uundaji wa maelezo. Hata hivyo, utendakazi huu umeunganishwa sana na kiolesura cha CLI, na hivyo kupunguza uwezekano wake wa kutumika tena.
|
||||
|
||||
Mapungufu ya sasa ni pamoja na:
|
||||
Mantiki ya utambuzi wa data iliyo ndani ya nambari ya CLI
|
||||
Hakuna ufikiaji wa programu kwa utambuzi wa aina na uundaji wa maelezo
|
||||
Ni vigumu kuunganisha uwezo wa utambuzi katika huduma zingine
|
||||
Uwezo mdogo wa kuunda mchakato wa uchambuzi wa data
|
||||
|
||||
Vigezo hivi vinashughulikia pengo hizi kwa kuunda huduma maalum ya utambuzi wa data iliyobuniwa. Kwa kuonyesha uwezo huu kama huduma, TrustGraph inaweza:
|
||||
Kuwezesha huduma zingine kuchambua data kwa programu
|
||||
Kusaidia mnyororo wa uchakataji wa data unaozidi
|
||||
Kurahisisha ushirikiano na mifumo ya nje
|
||||
Kuboresha uendelevu kupitia kutenganisha masuala
|
||||
|
||||
## Muundo wa Kiufundi
|
||||
|
||||
### Usanifu
|
||||
|
||||
Huduma ya utambuzi wa data iliyobuniwa inahitaji vipengele vifuatavyo vya kiufundi:
|
||||
|
||||
1. **Mchakato wa Huduma ya Utambuzi**
|
||||
Hushughulikia ombi la utambuzi linalokuja
|
||||
Huendesha utambuzi wa aina na uundaji wa maelezo
|
||||
Hurudisha majibu yaliyobuniwa na matokeo ya utambuzi
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/diagnosis/structured_data/service.py`
|
||||
|
||||
2. **Kigunduzi cha Aina ya Data**
|
||||
Hutumia utambuzi wa algorithm ili kutambua muundo wa data (CSV, JSON, XML)
|
||||
Inachanganua muundo wa data, vichakavu, na mifumo ya sintaksia
|
||||
Hurudisha muundo uliogunduliwa na alama za uaminifu
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/diagnosis/structured_data/type_detector.py`
|
||||
|
||||
3. **Mundua wa Maelezo**
|
||||
Hutumia huduma ya ombi ili kuzalisha maelezo
|
||||
Huita ombi maalum ya muundo (diagnose-csv, diagnose-json, diagnose-xml)
|
||||
Inoelekeza nafasi za data kwa nafasi za schema za TrustGraph kupitia majibu ya ombi
|
||||
|
||||
Moduli: `trustgraph-flow/trustgraph/diagnosis/structured_data/descriptor_generator.py`
|
||||
|
||||
### Mifano ya Data
|
||||
|
||||
#### StructuredDataDiagnosisRequest
|
||||
|
||||
Ujumbe wa ombi kwa operesheni za utambuzi wa data iliyobuniwa:
|
||||
|
||||
```python
|
||||
class StructuredDataDiagnosisRequest:
|
||||
operation: str # "detect-type", "generate-descriptor", or "diagnose"
|
||||
sample: str # Data sample to analyze (text content)
|
||||
type: Optional[str] # Data type (csv, json, xml) - required for generate-descriptor
|
||||
schema_name: Optional[str] # Target schema name for descriptor generation
|
||||
options: Dict[str, Any] # Additional options (e.g., delimiter for CSV)
|
||||
```
|
||||
|
||||
#### Jibu la Uchambuzi wa Data Iliyopangwa
|
||||
|
||||
Ujumbe wa jibu unaoonyesha matokeo ya uchambuzi:
|
||||
|
||||
```python
|
||||
class StructuredDataDiagnosisResponse:
|
||||
operation: str # The operation that was performed
|
||||
detected_type: Optional[str] # Detected data type (for detect-type/diagnose)
|
||||
confidence: Optional[float] # Confidence score for type detection
|
||||
descriptor: Optional[Dict] # Generated descriptor (for generate-descriptor/diagnose)
|
||||
error: Optional[str] # Error message if operation failed
|
||||
metadata: Dict[str, Any] # Additional metadata (e.g., field count, sample records)
|
||||
```
|
||||
|
||||
#### Muundo wa Kisajili
|
||||
|
||||
Kisajili kinachozalishwa kinafuata muundo wa sasa wa kisajili cha data iliyopangwa:
|
||||
|
||||
```json
|
||||
{
|
||||
"format": {
|
||||
"type": "csv",
|
||||
"encoding": "utf-8",
|
||||
"options": {
|
||||
"delimiter": ",",
|
||||
"has_header": true
|
||||
}
|
||||
},
|
||||
"mappings": [
|
||||
{
|
||||
"source_field": "customer_id",
|
||||
"target_field": "id",
|
||||
"transforms": [
|
||||
{"type": "trim"}
|
||||
]
|
||||
}
|
||||
],
|
||||
"output": {
|
||||
"schema_name": "customer",
|
||||
"options": {
|
||||
"batch_size": 1000,
|
||||
"confidence": 0.9
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Kiolesho cha Muunganisho
|
||||
|
||||
Huduma itatoa huduma zifuatazo kupitia mfumo wa ombi/jibu:
|
||||
|
||||
1. **Operesheni ya Udagilizaji wa Aina**
|
||||
Ingizo: Sampuli ya data
|
||||
Uchakataji: Angalia muundo wa data kwa kutumia ugani wa uchunguzi
|
||||
Patoto: Aina iliyogunduliwa pamoja na alama ya uaminifu
|
||||
|
||||
2. **Operesheni ya Uundaji wa Kisajili**
|
||||
Ingizo: Sampuli ya data, aina, jina la mpango (schema) unaolengwa
|
||||
Uchakataji:
|
||||
Piga huduma ya ombi kwa kitambulisho cha ombi maalum kwa aina (diagnose-csv, diagnose-json, au diagnose-xml)
|
||||
Pasa sampuli ya data na mipango inayopatikana kwa ombi
|
||||
Pokea kisajili kilichoundwa kutoka kwa jibu la ombi
|
||||
Patoto: Kisajili cha data iliyopangwa
|
||||
|
||||
3. **Operesheni ya Uchambuzi Mchanganyiko**
|
||||
Ingizo: Sampuli ya data, jina la mpango (schema) la hiari
|
||||
Uchakataji:
|
||||
Tumia ugani wa uchunguzi ili kubaini aina kwanza
|
||||
Chagua ombi maalum kwa aina kulingana na aina iliyogunduliwa
|
||||
Piga huduma ya ombi ili kuunda kisajili
|
||||
Patoto: Aina iliyogunduliwa na kisajili
|
||||
|
||||
### Maelezo ya Utendaji
|
||||
|
||||
Huduma itafuata miongozo ya huduma ya TrustGraph:
|
||||
|
||||
1. **Usajili wa Huduma**
|
||||
Sajili kama aina ya `structured-diag`
|
||||
Tumia mada za kipekee za ombi/jibu
|
||||
Lenga darasa la msingi la FlowProcessor
|
||||
Sajili PromptClientSpec kwa mwingiliano wa huduma ya ombi
|
||||
|
||||
2. **Usimamizi wa Usanidi**
|
||||
Pata usanidi wa mpango kupitia huduma ya usanidi
|
||||
Hifadhi mipango kwa utendaji
|
||||
Shirikisha mabadiliko ya usanidi kwa utaratibu
|
||||
|
||||
3. **Uunganisho wa Ombi**
|
||||
Tumia miundombinu iliyopo ya huduma ya ombi
|
||||
Piga huduma ya ombi kwa kitambulisho cha ombi maalum kwa aina:
|
||||
`diagnose-csv`: Kwa uchambuzi wa data ya CSV
|
||||
`diagnose-json`: Kwa uchambuzi wa data ya JSON
|
||||
`diagnose-xml`: Kwa uchambuzi wa data ya XML
|
||||
Ombi zimepangwa katika usanidi wa ombi, sio zilizopangwa katika huduma
|
||||
Pasa mipango na sampuli za data kama vigezo vya ombi
|
||||
Changanua majibu ya ombi ili kuchimbua visajili
|
||||
|
||||
4. **Usimamizi wa Hitilafu**
|
||||
Thibitisha sampuli za ingizo
|
||||
Toa ujumbe wa kosa unaoeleweka
|
||||
Shirikisha data iliyo na kasoro kwa utaratibu
|
||||
Shirikisha hitilafu za huduma ya ombi
|
||||
|
||||
5. **Uchukuzi wa Sampuli**
|
||||
Chakata saizi za sampuli zinazoweza kusanidiwa
|
||||
Shirikisha rekodi zisizo kamili kwa utaratibu
|
||||
Dumishe utaratibu wa uchukuzi
|
||||
|
||||
### Uunganisho wa API
|
||||
|
||||
Huduma itounganisha na API za TrustGraph zilizopo:
|
||||
|
||||
Vipengele Vilivyobadilishwa:
|
||||
`tg-load-structured-data` CLI - Imepangwa upya ili kutumia huduma mpya kwa operesheni za uchambuzi
|
||||
Flow API - Imepanuliwa ili kusaidia ombi za uchambuzi wa data iliyopangwa
|
||||
|
||||
Ncha Mpya za Huduma:
|
||||
`/api/v1/flow/{flow}/diagnose/structured-data` - Ncha ya WebSocket kwa ombi za uchambuzi
|
||||
`/api/v1/diagnose/structured-data` - Ncha ya REST kwa uchambuzi wa synchronous
|
||||
|
||||
### Mtiririko wa Ujumbe
|
||||
|
||||
```
|
||||
Client → Gateway → Structured Diag Service → Config Service (for schemas)
|
||||
↓
|
||||
Type Detector (algorithmic)
|
||||
↓
|
||||
Prompt Service (diagnose-csv/json/xml)
|
||||
↓
|
||||
Descriptor Generator (parses prompt response)
|
||||
↓
|
||||
Client ← Gateway ← Structured Diag Service (response)
|
||||
```
|
||||
|
||||
## Masuala ya Usalama
|
||||
|
||||
Uthibitishaji wa pembejeo ili kuzuia mashambulizi ya kuingiza data
|
||||
Mipaka ya ukubwa ya sampuli za data ili kuzuia mashambulizi ya aina ya "Denial of Service" (DoS)
|
||||
Usafishaji wa maelezo yaliyoundwa
|
||||
Udhibiti wa ufikiaji kupitia uthibitishaji wa TrustGraph uliopo
|
||||
|
||||
## Masuala ya Utendaji
|
||||
|
||||
Hifadhi maelezo ya muundo ili kupunguza idadi ya ombi kwa huduma ya usanidi
|
||||
Punguza ukubwa wa sampuli ili kudumisha utendaji wa haraka
|
||||
Tumia usindikaji wa mtiririko kwa sampuli kubwa za data
|
||||
Lenga mitambo ya muda kwa uchambuzi unaochukua muda mrefu
|
||||
|
||||
## Mkakati wa Majaribio
|
||||
|
||||
1. **Majaribio ya Kitengo**
|
||||
Utambuzi wa aina kwa muundo tofauti wa data
|
||||
Usahihi wa uundaji wa maelezo
|
||||
Hali za kushughulikia makosa
|
||||
|
||||
2. **Majaribio ya Uunganisho**
|
||||
Mtiririko wa ombi/jibu wa huduma
|
||||
Kupata na kuhifadhi muundo
|
||||
Uunganisho wa CLI
|
||||
|
||||
3. **Majaribio ya Utendaji**
|
||||
Usindikaji wa sampuli kubwa
|
||||
Kushughulikia ombi kwa wakati mmoja
|
||||
Matumizi ya kumbukumbu chini ya mzigo
|
||||
|
||||
## Mpango wa Uhamisho
|
||||
|
||||
1. **Awamu ya 1**: Tekeleza huduma na utendaji wa msingi
|
||||
2. **Awamu ya 2**: Badilisha CLI ili itumie huduma (dumishe utangamano wa zamani)
|
||||
3. **Awamu ya 3**: Ongeza vidokezo vya API ya REST
|
||||
4. **Awamu ya 4**: Ondoa mantiki iliyojumuishwa ya CLI (na kipindi cha taarifa)
|
||||
|
||||
## Ratiba
|
||||
|
||||
Wiki ya 1-2: Tekeleza huduma ya msingi na utambuzi wa aina
|
||||
Wiki ya 3-4: Ongeza uundaji wa maelezo na uunganisho
|
||||
Wiki ya 5: Majaribio na maandishi
|
||||
Wiki ya 6: Ubadilishaji wa CLI na uhamishaji
|
||||
|
||||
## Maswali ya Wazi
|
||||
|
||||
Je, huduma inapaswa kusaidia muundo wa ziada wa data (e.g., Parquet, Avro)?
|
||||
Je, ukubwa wa juu wa sampuli kwa uchambuzi unapaswa kuwa gani?
|
||||
Je, matokeo ya uchunguzi yanapaswa kuhifadhiwa kwa ombi zinazorudia?
|
||||
Huduma inapaswa kushughulikia hali ya muundo mwingi vipi?
|
||||
Je, kitambulisho cha ombi (prompt IDs) vinaweza kupangwa kama vigezo vya huduma?
|
||||
|
||||
## Marejeleo
|
||||
|
||||
[Maelezo ya Muundo wa Data](structured-data-descriptor.md)
|
||||
[Maandishi ya Kupakua Data Imeundwa](structured-data.md)
|
||||
`tg-load-structured-data` utekelezaji: `trustgraph-cli/trustgraph/cli/load_structured_data.py`
|
||||
499
docs/tech-specs/sw/tool-group.sw.md
Normal file
499
docs/tech-specs/sw/tool-group.sw.md
Normal file
|
|
@ -0,0 +1,499 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Mfumo wa Kikundi cha Zana za TrustGraph"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Mfumo wa Kikundi cha Zana za TrustGraph
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
## Maelezo ya Kiufundi v1.0
|
||||
|
||||
### Muhtasari
|
||||
|
||||
Maelezo haya yanafafanua mfumo wa kuunganisha zana kwa wakala wa TrustGraph ambao huruhusu udhibiti wa kina kuhusu zana zipi zinazopatikana kwa ombi fulani. Mfumo huu huleta uchujaji wa zana unaotegemea kikundi kupitia usanidi na maelezo ya ombi, na hivyo kuwezesha mipaka bora ya usalama, usimamizi wa rasilimali, na ugawaji wa kazi wa uwezo wa wakala.
|
||||
|
||||
### 1. Muhtasari
|
||||
|
||||
#### 1.1 Tatizo
|
||||
|
||||
Kwa sasa, wakala wa TrustGraph wana uwezo wa kutumia zana zote zilizosanidiwa, bila kujali muktadha wa ombi au mahitaji ya usalama. Hii huleta changamoto kadhaa:
|
||||
|
||||
**Hatari ya Usalama**: Zana nyeti (k.m., uhariri wa data) zinapatikana hata kwa maswali ya kusoma tu.
|
||||
**Uharibifu wa Rasilimali**: Zana ngumu huzingirwa hata wakati maswali rahisi hayahitaji.
|
||||
**Uchanganyifu wa Kazi**: Wakala wanaweza kuchagua zana zisizofaa wakati mbadala rahisi zipo.
|
||||
**Tenganisho la Wafanyabiashara Wengi**: Makundi tofauti ya watumiaji yanahitaji ufikiaji wa seti tofauti za zana.
|
||||
|
||||
#### 1.2 Muhtasari wa Suluhisho
|
||||
|
||||
Mfumo wa kikundi cha zana huleta:
|
||||
|
||||
1. **Uainishaji wa Kikundi**: Zana huwekwa alama na uanachama wa kikundi wakati wa usanidi.
|
||||
2. **Uchujaji wa Kwenye Ombi**: Ombi la Wakala (AgentRequest) linaonyesha ambayo makundi ya zana yanaruhusiwa.
|
||||
3. **Utendaji wa Wakati Halisi**: Wakala wana uwezo wa kutumia zana zinazolingana na makundi yaliyoomba.
|
||||
4. **Uunganishaji Wenye Ugumu**: Zana zinaweza kuwa katika makundi mengi kwa hali ngumu.
|
||||
|
||||
### 2. Mabadiliko ya Muundo
|
||||
|
||||
#### 2.1 Ongezeko la Muundo wa Usanidi wa Zana
|
||||
|
||||
Usanidi wa zana uliopo umeongezwa na sehemu `group`:
|
||||
|
||||
**Kabla:**
|
||||
```json
|
||||
{
|
||||
"name": "knowledge-query",
|
||||
"type": "knowledge-query",
|
||||
"description": "Query the knowledge graph"
|
||||
}
|
||||
```
|
||||
|
||||
**Baada ya:**
|
||||
```json
|
||||
{
|
||||
"name": "knowledge-query",
|
||||
"type": "knowledge-query",
|
||||
"description": "Query the knowledge graph",
|
||||
"group": ["read-only", "knowledge", "basic"]
|
||||
}
|
||||
```
|
||||
|
||||
**Maelezo ya Kundi:**
|
||||
`group`: Array(String) - Orodha ya vikundi ambavyo zana hii inahusishwa nayo.
|
||||
**Hiari:** Zana ambazo hazina uwanja wa kundi huenda katika kundi "linaloepuka".
|
||||
**Uanachama wa mengi:** Zana zinaweza kuhusishwa na vikundi vingi.
|
||||
**Huwezi kubadilishwa (Case-sensitive):** Majina ya vikundi ni mechi kamili ya herufi.
|
||||
|
||||
#### 2.1.2 Kuboresha Mabadiliko ya Hali ya Zana
|
||||
|
||||
Zana zinaweza, kwa hiari, kutaja mabadiliko ya hali na upatikanaji unaotegemea hali:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "knowledge-query",
|
||||
"type": "knowledge-query",
|
||||
"description": "Query the knowledge graph",
|
||||
"group": ["read-only", "knowledge", "basic"],
|
||||
"state": "analysis",
|
||||
"available_in_states": ["undefined", "research"]
|
||||
}
|
||||
```
|
||||
|
||||
**Maelezo ya Kikoa:**
|
||||
`state`: String - **Hiari** - Hali ya kuhamia baada ya utekelezaji wa zana.
|
||||
`available_in_states`: Array(String) - **Hiari** - Hali ambazo zana hii inapatikana.
|
||||
**Tabia ya kawaida**: Zana ambazo hazina `available_in_states` zinapatikana katika hali zote.
|
||||
**Mabadiliko ya hali**: Hutokea tu baada ya utekelezaji wa zana.
|
||||
|
||||
#### 2.2 Uboreshaji wa Mfumo wa AgentRequest
|
||||
|
||||
Mfumo wa `AgentRequest` katika `trustgraph-base/trustgraph/schema/services/agent.py` umeboreshwa:
|
||||
|
||||
**AgentRequest ya Sasa:**
|
||||
`question`: String - Uliza wa mtumiaji.
|
||||
`plan`: String - Mpango wa utekelezaji (unaweza kuondolewa).
|
||||
`state`: String - Hali ya wakala.
|
||||
`history`: Array(AgentStep) - Historia ya utekelezaji.
|
||||
|
||||
**AgentRequest Iliyoboreshwa:**
|
||||
`question`: String - Uliza wa mtumiaji.
|
||||
`state`: String - Hali ya utekelezaji wa wakala (sasa inatumika kikamilifu kwa kuchuja zana).
|
||||
`history`: Array(AgentStep) - Historia ya utekelezaji.
|
||||
`group`: Array(String) - **MPYA** - Vikundi vya zana ambavyo vinaruhusiwa kwa ombi hili.
|
||||
|
||||
**Mabadiliko ya Mfumo:**
|
||||
**Imeondolewa**: Uwanja wa `plan` hauhitajiki tena na unaweza kuondolewa (hapo awali ulikuwa umekusudiwa kwa vipimo vya zana).
|
||||
**Imeongezwa**: Uwanja wa `group` kwa vipimo vya kikundi cha zana.
|
||||
**Imeboreshwa**: Uwanja wa `state` sasa unadhibiti upatikanaji wa zana wakati wa utekelezaji.
|
||||
|
||||
**Tabia za Uwanja:**
|
||||
|
||||
**Kikundi cha Uwanja:**
|
||||
**Hiari**: Ikiwa haijaainishwa, huanguka kwenye ["default"].
|
||||
**Uunganishaji**: Zana zinazofanana na angalau kikundi kilichoainishwa ndizo zinazopatikana.
|
||||
**Safisha ya tupu**: Hakuna zana zinazopatikana (wakala anaweza kutumia tu utafakari wa ndani).
|
||||
**Kikundi cha "Wildcard"**: Kikundi maalum "*" kinatoa ufikiaji kwa zana zote.
|
||||
|
||||
**Uwanja wa Hali:**
|
||||
**Hiari**: Ikiwa haijaainishwa, huanguka kwenye "haijulikani".
|
||||
**Uchujaji wa msingi wa hali**: Zana zinazopatikana katika hali ya sasa ndizo zinazoweza kutumika.
|
||||
**Hali ya kawaida**: Hali ya "haijulikani" inaruhusu zana zote (kulingana na uchujaji wa kikundi).
|
||||
**Mabadiliko ya hali**: Zana zinaweza kubadilisha hali baada ya utekelezaji wa mafanikio.
|
||||
|
||||
### 3. Mifano ya Kikundi Maalum
|
||||
|
||||
Mashirika yanaweza kuainisha vikundi maalum ya kikoa:
|
||||
|
||||
```json
|
||||
{
|
||||
"financial-tools": ["stock-query", "portfolio-analysis"],
|
||||
"medical-tools": ["diagnosis-assist", "drug-interaction"],
|
||||
"legal-tools": ["contract-analysis", "case-search"]
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Maelezo ya Utendaji
|
||||
|
||||
#### 4.1 Kupakia na Kuchuja Vifaa
|
||||
|
||||
**Awamu ya Usanidi:**
|
||||
1. Vifaa vyote hupakuliwa kutoka usanidi pamoja na uamlisho wao wa kundi.
|
||||
2. Vifaa visivyokuwa na uamlisho wa kundi wanakabidhiwa kwenye kundi la "default".
|
||||
3. Uanachama wa kundi huthibitishwa na kuhifadhiwa kwenye rejista ya vifaa.
|
||||
|
||||
**Awamu ya Usimamizi wa Ombi:**
|
||||
1. Ombi la Wakala (AgentRequest) linafika pamoja na maelezo ya kundi (group) ambayo ni ya hiari.
|
||||
2. Wakala huchuja vifaa vinavyopatikana kulingana na msalaba wa makundi.
|
||||
3. Vifaa vinavyolingana pekee hupitishwa kwa muktadha wa utekelezaji wa wakala.
|
||||
4. Wakala hutumia seti ya vifaa vilivyochujwa katika mchakato wote wa ombi.
|
||||
|
||||
#### 4.2 Mantiki ya Kuchuja Vifaa
|
||||
|
||||
**Kuchuja kwa Pamoja kwa Kundi na Hali:**
|
||||
|
||||
```
|
||||
For each configured tool:
|
||||
tool_groups = tool.group || ["default"]
|
||||
tool_states = tool.available_in_states || ["*"] // Available in all states
|
||||
|
||||
For each request:
|
||||
requested_groups = request.group || ["default"]
|
||||
current_state = request.state || "undefined"
|
||||
|
||||
Tool is available if:
|
||||
// Group filtering
|
||||
(intersection(tool_groups, requested_groups) is not empty OR "*" in requested_groups)
|
||||
AND
|
||||
// State filtering
|
||||
(current_state in tool_states OR "*" in tool_states)
|
||||
```
|
||||
|
||||
**Mantiki ya Mabadiliko ya Hali:**
|
||||
|
||||
```
|
||||
After successful tool execution:
|
||||
if tool.state is defined:
|
||||
next_request.state = tool.state
|
||||
else:
|
||||
next_request.state = current_request.state // No change
|
||||
```
|
||||
|
||||
#### 4.3 Maeneo ya Uunganisho wa Mwakala
|
||||
|
||||
**Mwakala wa ReAct:**
|
||||
Uchujaji wa zana hufanyika katika `agent_manager.py` wakati wa uundaji wa usajili wa zana.
|
||||
Orodha ya zana zinazopatikana huchujwa na kikundi na hali kabla ya utayarishaji wa mpango.
|
||||
Mabadiliko ya hali husasisha sehemu `AgentRequest.state` baada ya utekelezaji wa zana kwa mafanikio.
|
||||
Iteration inayofuata hutumia hali iliyosasishwa kwa uchujaji wa zana.
|
||||
|
||||
**Mwakala Kulingana na Umoja wa Maoni:**
|
||||
Uchujaji wa zana hufanyika katika `planner.py` wakati wa utayarishaji wa mpango.
|
||||
Uthibitisho wa `ExecutionStep` huhakikisha kuwa zana zinazofaa tu za kikundi na hali hutumiwa.
|
||||
Kidhibiti cha mtiririko huweka upatikanaji wa zana wakati wa utendakazi.
|
||||
Mabadiliko ya hali yanadhibitiwa na Kidhibiti cha Mtiririko kati ya hatua.
|
||||
|
||||
### 5. Mifano ya Usanidi
|
||||
|
||||
#### 5.1 Usanidi wa Zana na Vikundi na Hali
|
||||
|
||||
```yaml
|
||||
tool:
|
||||
knowledge-query:
|
||||
type: knowledge-query
|
||||
name: "Knowledge Graph Query"
|
||||
description: "Query the knowledge graph for entities and relationships"
|
||||
group: ["read-only", "knowledge", "basic"]
|
||||
state: "analysis"
|
||||
available_in_states: ["undefined", "research"]
|
||||
|
||||
graph-update:
|
||||
type: graph-update
|
||||
name: "Graph Update"
|
||||
description: "Add or modify entities in the knowledge graph"
|
||||
group: ["write", "knowledge", "admin"]
|
||||
available_in_states: ["analysis", "modification"]
|
||||
|
||||
text-completion:
|
||||
type: text-completion
|
||||
name: "Text Completion"
|
||||
description: "Generate text using language models"
|
||||
group: ["read-only", "text", "basic"]
|
||||
state: "undefined"
|
||||
# No available_in_states = available in all states
|
||||
|
||||
complex-analysis:
|
||||
type: mcp-tool
|
||||
name: "Complex Analysis Tool"
|
||||
description: "Perform complex data analysis"
|
||||
group: ["advanced", "compute", "expensive"]
|
||||
state: "results"
|
||||
available_in_states: ["analysis"]
|
||||
mcp_tool_id: "analysis-server"
|
||||
|
||||
reset-workflow:
|
||||
type: mcp-tool
|
||||
name: "Reset Workflow"
|
||||
description: "Reset to initial state"
|
||||
group: ["admin"]
|
||||
state: "undefined"
|
||||
available_in_states: ["analysis", "results"]
|
||||
```
|
||||
|
||||
#### 5.2 Mifano ya Maombi na Mchakato wa Kazi wa Jimbo
|
||||
|
||||
**Maombi ya Uchunguzi wa Mwanzo:**
|
||||
```json
|
||||
{
|
||||
"question": "What entities are connected to Company X?",
|
||||
"group": ["read-only", "knowledge"],
|
||||
"state": "undefined"
|
||||
}
|
||||
```
|
||||
*Vifaa vinavyopatikana: knowledge-query, text-completion*
|
||||
*Baada ya knowledge-query: hali → "uchambuzi"*
|
||||
|
||||
**Awamu ya Uchambuzi:**
|
||||
```json
|
||||
{
|
||||
"question": "Continue analysis based on previous results",
|
||||
"group": ["advanced", "compute", "write"],
|
||||
"state": "analysis"
|
||||
}
|
||||
```
|
||||
*Vifaa vinavyopatikana: uchambuzi-wa-mazingo, sasisho-la-picha, upya-mchakato*
|
||||
*Baada ya uchambuzi-wa-mazingo: hali → "matokeo"*
|
||||
|
||||
**Awamu ya Matokeo:**
|
||||
```json
|
||||
{
|
||||
"question": "What should I do with these results?",
|
||||
"group": ["admin"],
|
||||
"state": "results"
|
||||
}
|
||||
```
|
||||
*Vifaa vinavyopatikana: reset-workflow pekee*
|
||||
*Baada ya reset-workflow: hali → "haijulikani"*
|
||||
|
||||
**Mfano wa Mchakato - Mchakato Kamili:**
|
||||
1. **Anza (haijulikani)**: Tumia utafutaji wa maarifa → mabadiliko hadi "uchambuzi"
|
||||
2. **Hali ya uchambuzi**: Tumia uchambuzi tata → mabadiliko hadi "matokeo"
|
||||
3. **Hali ya matokeo**: Tumia reset-workflow → mabadiliko kurudi "haijulikani"
|
||||
4. **Kurudi kwenye mwanzo**: Vifaa vyote vya awali vinapatikana tena
|
||||
|
||||
### 6. Masuala ya Usalama
|
||||
|
||||
#### 6.1 Uunganisho wa Udhibiti wa Ufikiaji
|
||||
|
||||
**Uchujaji wa Kawaida ya Mawasiliano:**
|
||||
Kawaida ya mawasiliano inaweza kutekeleza vizuizi vya kikundi kulingana na ruhusa za mtumiaji
|
||||
Kuzuia ongezeko la madaraka kupitia ubadilishaji wa ombi
|
||||
Rekodi ya ukaguzi inajumuisha vikundi vya vifaa vilivyoomba na vilivyokabidhiwa
|
||||
|
||||
**Mfano wa Mantiki ya Kawaida ya Mawasiliano:**
|
||||
```
|
||||
user_permissions = get_user_permissions(request.user_id)
|
||||
allowed_groups = user_permissions.tool_groups
|
||||
requested_groups = request.group
|
||||
|
||||
# Validate request doesn't exceed permissions
|
||||
if not is_subset(requested_groups, allowed_groups):
|
||||
reject_request("Insufficient permissions for requested tool groups")
|
||||
```
|
||||
|
||||
#### 6.2 Ukaguzi na Ufuatiliaji
|
||||
|
||||
**Ukaguzi Ulioboreshwa:**
|
||||
Kurekodi vikundi vya zana vilivyoomba na hali ya awali kwa kila ombi
|
||||
Kufuatilia mabadiliko ya hali na matumizi ya zana kwa kila kundi
|
||||
Kufuatilia majaribio ya kupata vikundi bila ruhusa na mabadiliko ya hali yasiyofaa
|
||||
Kutoa arifa kuhusu mifumo isiyo ya kawaida ya matumizi ya kundi au mchakato wa hali unaotishiwa
|
||||
|
||||
### 7. Mkakati wa Uhamisho
|
||||
|
||||
#### 7.1 Ulinganifu na Mifumo ya Zamani
|
||||
|
||||
**Awamu ya 1: Mabadiliko ya Ongezeko**
|
||||
Ongeza sehemu ya `group` ya hiari kwenye usanidi wa zana
|
||||
Ongeza sehemu ya `group` ya hiari kwenye schema ya AgentRequest
|
||||
Tabia ya chagu ya: Zana zote zilizopo zinahusishwa na kundi "linalingana"
|
||||
Maombi yaliyopo bila sehemu ya kundi hutumia kundi "linalingana"
|
||||
|
||||
**Tabia Zilizopo Zinahifadhiwa:**
|
||||
Zana ambazo hazina usanidi wa kundi zinaendelea kufanya kazi (kundi linalingana)
|
||||
Zana ambazo hazina usanidi wa hali zinapatikana katika hali zote
|
||||
Maombi ambayo hayajainisha kundi hupata zana zote (kundi linalingana)
|
||||
Maombi ambayo hayajainisha hali hutumia hali "isiyojulikana" (zana zote zinapatikana)
|
||||
Hakuna mabadiliko yanayoweza kusababisha hitilafu katika matumizi yaliyopo
|
||||
|
||||
### 8. Ufuatiliaji na Uonevu
|
||||
|
||||
#### 8.1 Vipimo Vipya
|
||||
|
||||
**Matumizi ya Kundi la Zana:**
|
||||
`agent_tool_group_requests_total` - Idadi ya maombi kwa kila kundi
|
||||
`agent_tool_group_availability` - Kiwango cha zana zinazopatikana kwa kila kundi
|
||||
`agent_filtered_tools_count` - Jadili ya idadi ya zana baada ya kuchujwa kwa kundi na hali
|
||||
|
||||
**Vipimo vya Mchakato wa Hali:**
|
||||
`agent_state_transitions_total` - Idadi ya mabadiliko ya hali kwa kila zana
|
||||
`agent_workflow_duration_seconds` - Jadili ya muda uliotumika katika kila hali
|
||||
`agent_state_availability` - Kiwango cha zana zinazopatikana kwa kila hali
|
||||
|
||||
**Vipimo vya Usalama:**
|
||||
`agent_group_access_denied_total` - Idadi ya upataji usioidhinishwa wa kundi
|
||||
`agent_invalid_state_transition_total` - Idadi ya mabadiliko ya hali yasiyofaa
|
||||
`agent_privilege_escalation_attempts_total` - Idadi ya maombi yanayoshukiwa
|
||||
|
||||
#### 8.2 Uboreshaji wa Kurekodi
|
||||
|
||||
**Kurekodi ya Maombi:**
|
||||
```json
|
||||
{
|
||||
"request_id": "req-123",
|
||||
"requested_groups": ["read-only", "knowledge"],
|
||||
"initial_state": "undefined",
|
||||
"state_transitions": [
|
||||
{"tool": "knowledge-query", "from": "undefined", "to": "analysis", "timestamp": "2024-01-01T10:00:01Z"}
|
||||
],
|
||||
"available_tools": ["knowledge-query", "text-completion"],
|
||||
"filtered_by_group": ["graph-update", "admin-tool"],
|
||||
"filtered_by_state": [],
|
||||
"execution_time": "1.2s"
|
||||
}
|
||||
```
|
||||
|
||||
### 9. Mbinu ya Majaribio
|
||||
|
||||
#### 9.1 Majaribio ya Kitengo
|
||||
|
||||
**Mantiki ya Kuchuja Zana:**
|
||||
Majaribio ya hesabu za makutano ya vikundi
|
||||
Majaribio ya mantiki ya kuchuja kulingana na hali
|
||||
Thibitisha utoaji wa kikundi na hali chagu
|
||||
Majaribio ya tabia ya kikundi cha "wildcard"
|
||||
Thibitisha usimamizi wa kikundi tupu
|
||||
Majaribio ya hali ya kuchuja iliyounganisha kikundi+hali
|
||||
|
||||
**Uthibitisho wa Usanidi:**
|
||||
Majaribio ya kupakia zana pamoja na usanidi mbalimbali wa kikundi na hali
|
||||
Thibitisha uthibitisho wa schema kwa vipimo visivyo sahihi vya kikundi na hali
|
||||
Majaribio ya utangamano wa nyuma na usanidi uliopo
|
||||
Thibitisha ufafanuzi na mizunguko ya mabadiliko ya hali
|
||||
|
||||
#### 9.2 Majaribio ya Uunganisho
|
||||
|
||||
**Tabia ya Wakala:**
|
||||
Thibitisha kwamba wakala huona tu zana zilizochujwa kwa kikundi+hali
|
||||
Majaribio ya utekelezaji wa ombi kwa mchanganyiko mbalimbali wa vikundi
|
||||
Majaribio ya mabadiliko ya hali wakati wa utekelezaji wa wakala
|
||||
Thibitisha usimamizi wa makosa wakati hakuna zana zinazopatikana
|
||||
Majaribio ya maendeleo ya mtiririko wa kazi kupitia hali nyingi
|
||||
|
||||
**Majaribio ya Usalama:**
|
||||
Majaribio ya kuzuia kupanda kwa madaraka
|
||||
Thibitisha usahihi wa njia ya ukaguzi
|
||||
Majaribio ya ujumuishaji wa lango pamoja na ruhusa za mtumiaji
|
||||
|
||||
#### 9.3 Hali za Jumla
|
||||
|
||||
**Matumizi ya Mfumo Mwingi Pamoja na Mitiririko ya Kazi ya Hali:**
|
||||
```
|
||||
Scenario: Different users with different tool access and workflow states
|
||||
Given: User A has "read-only" permissions, state "undefined"
|
||||
And: User B has "write" permissions, state "analysis"
|
||||
When: Both request knowledge operations
|
||||
Then: User A gets read-only tools available in "undefined" state
|
||||
And: User B gets write tools available in "analysis" state
|
||||
And: State transitions are tracked per user session
|
||||
And: All usage and transitions are properly audited
|
||||
```
|
||||
|
||||
**Maendeleo ya Hatua ya Mchakato:**
|
||||
```
|
||||
Scenario: Complete workflow execution
|
||||
Given: Request with groups ["knowledge", "compute"] and state "undefined"
|
||||
When: Agent executes knowledge-query tool (transitions to "analysis")
|
||||
And: Agent executes complex-analysis tool (transitions to "results")
|
||||
And: Agent executes reset-workflow tool (transitions to "undefined")
|
||||
Then: Each step has correctly filtered available tools
|
||||
And: State transitions are logged with timestamps
|
||||
And: Final state allows initial workflow to repeat
|
||||
```
|
||||
|
||||
### 10. Mambo Muhimu ya Utendaji
|
||||
|
||||
#### 10.1 Athari ya Kuanzisha Zana
|
||||
|
||||
**Uteuzi wa Vipimo:**
|
||||
Meta data ya kikundi na hali huwekwa mara moja wakati wa kuanzishwa
|
||||
Uwezekano mdogo wa matumizi ya kumbukumbu kwa kila zana (nafasi za ziada)
|
||||
Hakuna athari kwenye muda wa kuanzisha zana
|
||||
|
||||
**Uchakataji wa Maombi:**
|
||||
Kuchujwa kwa pamoja kwa kikundi + hali hufanyika mara moja kwa kila ombi
|
||||
Ufumbuzi wa O(n) ambapo n = idadi ya zana zilizosanidiwa
|
||||
Mabadiliko ya hali huongeza uwezekano mdogo (utambulisho wa herufi)
|
||||
Athari ndogo kwa idadi ya kawaida ya zana (< 100)
|
||||
|
||||
#### 10.2 Mikakati ya Ubora
|
||||
|
||||
**Kikundi cha Zana Zilizopangwa Mapema:**
|
||||
Hifadhi vikundi vya zana kwa kila mchanganyiko wa kikundi + hali
|
||||
Epuka kuchujwa mara kwa mara kwa mifumo ya kawaida ya kikundi/hali
|
||||
Usawa kati ya kumbukumbu na hesabu kwa mchanganyiko unaotumika mara kwa mara
|
||||
|
||||
**Kupakua kwa Kila Matumizi:**
|
||||
Pakua matumizi ya zana tu wakati inahitajika
|
||||
Punguza muda wa kuanzishwa kwa matumizi ambayo yana zana nyingi
|
||||
Usajili wa zana kwa njia ya moja kwa moja kulingana na mahitaji ya kikundi
|
||||
|
||||
### 11. Maboresho ya Baadaye
|
||||
|
||||
#### 11.1 Uteuzi wa Kikundi wa Kila Muda
|
||||
|
||||
**Uteuzi wa Kikundi Kulingana na Mazingira:**
|
||||
Weka zana katika vikundi kulingana na mazingira ya ombi
|
||||
Upatikanaji wa kikundi kulingana na wakati (sawa za biashara tu)
|
||||
Marekebisho ya kikundi kulingana na mzigo (zana ghali wakati wa matumizi kidogo)
|
||||
|
||||
#### 11.2 Hierarkia za Kikundi
|
||||
|
||||
**Muundo Ulioingilishwa wa Kikundi:**
|
||||
```json
|
||||
{
|
||||
"knowledge": {
|
||||
"read": ["knowledge-query", "entity-search"],
|
||||
"write": ["graph-update", "entity-create"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 11.3 Mapendekezo ya Zana
|
||||
|
||||
**Mapendekezo Kulingana na Kikundi:**
|
||||
Pendekeza vikundi bora vya zana kwa aina za ombi.
|
||||
Jifunze kutoka kwa mitindo ya matumizi ili kuboresha mapendekezo.
|
||||
Toa vikundi vya dharura wakati zana zinazopendekezwa hazipatikani.
|
||||
|
||||
### 12. Maswali ya Kufungua
|
||||
|
||||
1. **Uthibitisho wa Kikundi**: Je, majina ya vikundi yasiyo halali katika maombi yanapaswa kusababisha hitilafu kubwa au onyo?
|
||||
|
||||
2. **Udagano wa Kikundi**: Je, mfumo unapaswa kutoa API ili kuorodhesha vikundi vinavyopatikana na zana zao?
|
||||
|
||||
3. **Vikundi vya Njia Moja Moja**: Je, vikundi vinapaswa kupangwa wakati wa utendaji au wakati wa kuanzishwa tu?
|
||||
|
||||
4. **Urithi wa Kikundi**: Je, zana zinapaswa kurithi vikundi kutoka kwa makundi yao ya wazazi au matoleo?
|
||||
|
||||
5. **Ufuatiliaji wa Utendaji**: Ni vipimo vipi vya ziada vinavyohitajika kufuatilia matumizi ya zana kulingana na vikundi kwa ufanisi?
|
||||
|
||||
### 13. Hitimisho
|
||||
|
||||
Mfumo wa vikundi vya zana hutoa:
|
||||
|
||||
**Usalama**: Udhibiti wa kina wa ufikiaji wa uwezo wa wakala.
|
||||
**Utendaji**: Kupunguza mzigo wa kupakua na kuchagua zana.
|
||||
**Unyumbufu**: Uainishaji wa zana wa mwelekeo mwingi.
|
||||
**Ulinganifu**: Ujumuishaji laini na miundo ya wakala iliyopo.
|
||||
|
||||
Mfumo huu huruhusu usakinishaji wa TrustGraph kusimamia ufikiaji wa zana vizuri zaidi, kuboresha mipaka ya usalama, na kuongeza matumizi ya rasilimali huku ikiendelea kuwa na ulinganifu kamili na usanidi na maombi iliyopo.
|
||||
479
docs/tech-specs/sw/tool-services.sw.md
Normal file
479
docs/tech-specs/sw/tool-services.sw.md
Normal file
|
|
@ -0,0 +1,479 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Huduma za Zana: Zana za Wakala Zinazoweza Kuunganishwa Kwenye Mfumo"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Huduma za Zana: Zana za Wakala Zinazoweza Kuunganishwa Kwenye Mfumo
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Hali
|
||||
|
||||
Imetekelezwa
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Maelezo haya yanafafanua mfumo wa zana za wakala ambazo zinaweza kuunganishwa kwenye mfumo, zinazojulikana kama "huduma za zana". Tofauti na aina za zana zilizojumuishwa awali (`KnowledgeQueryImpl`, `McpToolImpl`, n.k.), huduma za zana huruhusu zana mpya kuletwa kwa:
|
||||
|
||||
1. Kuanzisha huduma mpya iliyojengwa kwenye Pulsar
|
||||
2. Kuongeza maelezo ya usanidi ambayo huambia wakala jinsi ya kuiita
|
||||
|
||||
Hii inaruhusu uboreshaji bila kubadilisha mfumo msingi wa wakala.
|
||||
|
||||
## Dhana
|
||||
|
||||
| Neno | Ufafanuzi |
|
||||
|------|------------|
|
||||
| **Zana Iliyojumuishwa** | Aina za zana zilizopo zilizo na matumizi yaliyopangwa mapema katika `tools.py` |
|
||||
| **Huduma ya Zana** | Huduma ya Pulsar ambayo inaweza kuita kama zana ya wakala, iliyoainishwa na maelezo ya huduma |
|
||||
| **Zana** | Toleo lililosanidiwa ambalo linarejelea huduma ya zana, lililowezeshwa kwa wakala/LLM |
|
||||
|
||||
Hii ni mfumo wa tabaka mbili, sawa na zana za MCP:
|
||||
MCP: Seva ya MCP inaainisha kiolesura cha zana → Usanidi wa zana huirejelea
|
||||
Huduma za Zana: Huduma ya zana inaainisha kiolesura cha Pulsar → Usanidi wa zana huirejelea
|
||||
|
||||
## Asili: Zana Zilizopo
|
||||
|
||||
### Utendaji wa Zana Iliyojumuishwa
|
||||
|
||||
Zana kwa sasa zinafafanuliwa katika `trustgraph-flow/trustgraph/agent/react/tools.py` na matumizi yaliyopangwa:
|
||||
|
||||
```python
|
||||
class KnowledgeQueryImpl:
|
||||
async def invoke(self, question):
|
||||
client = self.context("graph-rag-request")
|
||||
return await client.rag(question, self.collection)
|
||||
```
|
||||
|
||||
Kila aina ya zana:
|
||||
Ina huduma ya Pulsar iliyopangwa tayari ambayo inaitumia (k.m., `graph-rag-request`)
|
||||
Inajua njia sahihi ya kuita kwenye mteja (k.m., `client.rag()`)
|
||||
Ina vigezo vilivyopangwa ambavyo vimefafuliwa katika utekelezaji
|
||||
|
||||
### Usajili wa Zana (service.py:105-214)
|
||||
|
||||
Zana huzamilishwa kutoka kwa usanidi na sehemu ya `type` ambayo inaelekeza kwenye utekelezaji:
|
||||
|
||||
```python
|
||||
if impl_id == "knowledge-query":
|
||||
impl = functools.partial(KnowledgeQueryImpl, collection=data.get("collection"))
|
||||
elif impl_id == "text-completion":
|
||||
impl = TextCompletionImpl
|
||||
# ... etc
|
||||
```
|
||||
|
||||
## Usanifu
|
||||
|
||||
### Mfumo wa Tabaka Mbili
|
||||
|
||||
#### Tabaka la 1: Kisajili cha Huduma ya Zana
|
||||
|
||||
Huduma ya zana inaelezea kiolesura cha huduma ya Pulsar. Inaangazia:
|
||||
Mizinga ya Pulsar kwa ombi/jibu
|
||||
Vigezo vya usanidi ambavyo vinahitajika na zana zinazotumia huduma hiyo
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "custom-rag",
|
||||
"request-queue": "non-persistent://tg/request/custom-rag",
|
||||
"response-queue": "non-persistent://tg/response/custom-rag",
|
||||
"config-params": [
|
||||
{"name": "collection", "required": true}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Huduma ya zana ambayo haihitaji vigezo vya usanidi:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "calculator",
|
||||
"request-queue": "non-persistent://tg/request/calc",
|
||||
"response-queue": "non-persistent://tg/response/calc",
|
||||
"config-params": []
|
||||
}
|
||||
```
|
||||
|
||||
#### Kategoria ya 2: Kisajili cha Zana
|
||||
|
||||
Zana inarejelea huduma ya zana na hutoa:
|
||||
Maelezo ya vigezo ya usanidi (yakinayo na mahitaji ya huduma)
|
||||
Meta-data ya zana kwa wakala (jina, maelezo)
|
||||
Ufafanuzi wa hoja kwa mfumo wa lugha (LLM)
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "tool-service",
|
||||
"name": "query-customers",
|
||||
"description": "Query the customer knowledge base",
|
||||
"service": "custom-rag",
|
||||
"collection": "customers",
|
||||
"arguments": [
|
||||
{
|
||||
"name": "question",
|
||||
"type": "string",
|
||||
"description": "The question to ask about customers"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Zana nyingi zinaweza kurejelea huduma moja kwa usanidi tofauti:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "tool-service",
|
||||
"name": "query-products",
|
||||
"description": "Query the product knowledge base",
|
||||
"service": "custom-rag",
|
||||
"collection": "products",
|
||||
"arguments": [
|
||||
{
|
||||
"name": "question",
|
||||
"type": "string",
|
||||
"description": "The question to ask about products"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Muundo wa Ombi
|
||||
|
||||
Wakati zana inaitwa, ombi kwa huduma ya zana linajumuisha:
|
||||
`user`: Kutoka ombi la wakala (utumiaji wa pamoja)
|
||||
`config`: Maelezo ya usanidi yaliyokandwa katika umbizo la JSON kutoka kwa maelezo ya zana
|
||||
`arguments`: Majadilisho yaliyokandwa katika umbizo la JSON kutoka kwa mfumo wa lugha kubwa (LLM)
|
||||
|
||||
```json
|
||||
{
|
||||
"user": "alice",
|
||||
"config": "{\"collection\": \"customers\"}",
|
||||
"arguments": "{\"question\": \"What are the top customer complaints?\"}"
|
||||
}
|
||||
```
|
||||
|
||||
Huduma ya zana hupokea haya kama dictionaries yaliyochanganishwa katika njia ya `invoke`.
|
||||
|
||||
### Utendakazi wa Huduma ya Zana ya Jumla
|
||||
|
||||
Darasa la `ToolServiceImpl` huita huduma za zana kulingana na usanidi:
|
||||
|
||||
```python
|
||||
class ToolServiceImpl:
|
||||
def __init__(self, context, request_queue, response_queue, config_values, arguments, processor):
|
||||
self.request_queue = request_queue
|
||||
self.response_queue = response_queue
|
||||
self.config_values = config_values # e.g., {"collection": "customers"}
|
||||
# ...
|
||||
|
||||
async def invoke(self, **arguments):
|
||||
client = await self._get_or_create_client()
|
||||
response = await client.call(user, self.config_values, arguments)
|
||||
if isinstance(response, str):
|
||||
return response
|
||||
else:
|
||||
return json.dumps(response)
|
||||
```
|
||||
|
||||
## Maamuzi ya Ubunifu
|
||||
|
||||
### Mfumo wa Uwekaji wa Tabaka Mbili
|
||||
|
||||
Huduma za zana zinafuata mfumo wa tabaka mbili, sawa na zana za MCP:
|
||||
|
||||
1. **Huduma ya Zana**: Inaelezea kiolesura cha huduma ya Pulsar (mada, vigezo muhimu vya usanidi)
|
||||
2. **Zana**: Inarejelea huduma ya zana, hutoa maadili ya usanidi, inaelezea hoja za LLM
|
||||
|
||||
Tofauti hii inaruhusu:
|
||||
Huduma moja ya zana kutumika na zana nyingi zenye usanidi tofauti
|
||||
Tofauti wazi kati ya kiolesura cha huduma na usanidi wa zana
|
||||
Ujuzi wa matumizi wa maelezo ya huduma
|
||||
|
||||
### Ramani ya Ombi: Kupitisha na Kifurushi
|
||||
|
||||
Ombi kwa huduma ya zana ni kifurushi kilicho na muundo, kinachojumuisha:
|
||||
`user`: Inasambazwa kutoka ombi la wakala kwa ajili ya utumiaji wa pamoja
|
||||
Maadili ya usanidi: Kutoka kwa maelezo ya zana (k.m., `collection`)
|
||||
`arguments`: Hoja zilizotolewa na LLM, zinazopitishwa kama kamusi
|
||||
|
||||
Kidhibiti cha wakala huchanganua jibu la LLM kuwa `act.arguments` kama kamusi (`agent_manager.py:117-154`). Kamusi hii inajumuishwa katika kifurushi cha ombi.
|
||||
|
||||
### Usimamizi wa Mpango: Bila Aina
|
||||
|
||||
Maombi na majibu hutumia kamusi zisizo na aina. Hakuna uthibitishaji wa mpango katika ngazi ya wakala - huduma ya zana inawajibika kwa uthibitishaji wa pembejeo zake. Hii hutoa uwezo mkubwa wa kufafanua huduma mpya.
|
||||
|
||||
### Kiolesura cha Mteja: Mada za Moja kwa Moja za Pulsar
|
||||
|
||||
Huduma za zana hutumia mada za moja kwa moja za Pulsar bila kuhitaji usanidi wa mtiririko. Maelezo ya huduma-ya-zana yanaelezea majina kamili ya folyo:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "joke-service",
|
||||
"request-queue": "non-persistent://tg/request/joke",
|
||||
"response-queue": "non-persistent://tg/response/joke",
|
||||
"config-params": [...]
|
||||
}
|
||||
```
|
||||
|
||||
Hii inaruhusu huduma kuwa zimepakwa kwenye nafasi yoyote.
|
||||
|
||||
### Usimamizi wa Makosa: Mfumo wa Makosa wa Kawaida
|
||||
|
||||
Majibu ya huduma ya zana yanafuata mfumo wa sasa wa muundo na sehemu ya `error`:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Error:
|
||||
type: str = ""
|
||||
message: str = ""
|
||||
```
|
||||
|
||||
Muundo wa majibu:
|
||||
Mafanikio: `error` ni `None`, majibu yana matokeo
|
||||
Kosa: `error` imejaa na `type` na `message`
|
||||
|
||||
Hii inafanana na muundo uliotumika katika huduma zingine (e.g., `PromptResponse`, `QueryResponse`, `AgentResponse`).
|
||||
|
||||
### Uhusiano wa Ombi/Jibu
|
||||
|
||||
Maombi na majibu yanahusishwa kwa kutumia `id` katika vipengele vya ujumbe wa Pulsar:
|
||||
|
||||
Ombi linajumuisha `id` katika vipengele: `properties={"id": id}`
|
||||
Majibu (mengi) yanajumuisha `id` sawa: `properties={"id": id}`
|
||||
|
||||
Hii inafuata muundo uliopo katika codebase (e.g., `agent_service.py`, `llm_service.py`).
|
||||
|
||||
### Usaidizi wa Uhamaji (Streaming)
|
||||
|
||||
Huduma za zana zinaweza kurejesha majibu ya uhamaji:
|
||||
|
||||
Ujumbe mwingi wa majibu wenye `id` sawa katika vipengele
|
||||
Kila jibu linajumuisha `end_of_stream: bool`
|
||||
Jibu la mwisho lina `end_of_stream: True`
|
||||
|
||||
Hii inafanana na muundo uliotumika katika `AgentResponse` na huduma zingine za uhamaji.
|
||||
|
||||
### Usimamizi wa Majibu: Kurudisha Kamba (String)
|
||||
|
||||
Zana zote zilizopo zinafuata muundo huo: **kupokea hoja kama orodha, kurejesha matokeo kama kamba**.
|
||||
|
||||
| Zana | Usimamizi wa Majibu |
|
||||
|------|------------------|
|
||||
| `KnowledgeQueryImpl` | Inarejea `client.rag()` moja kwa moja (kamba) |
|
||||
| `TextCompletionImpl` | Inarejea `client.question()` moja kwa moja (kamba) |
|
||||
| `McpToolImpl` | Inarejea kamba, au `json.dumps(output)` ikiwa si kamba |
|
||||
| `StructuredQueryImpl` | Inaweka matokeo katika kamba |
|
||||
| `PromptImpl` | Inarejea `client.prompt()` moja kwa moja (kamba) |
|
||||
|
||||
Huduma za zana zinafuata mkataba huo:
|
||||
Huduma inarejea jibu la kamba (matokeo)
|
||||
Ikiwa jibu si kamba, linabadilishwa kupitia `json.dumps()`
|
||||
Hakuna usanidi wa uondoaji unaohitajika katika maelezo
|
||||
|
||||
Hii huweka maelezo kuwa rahisi na kuweka jukumu kwa huduma kurejesha jibu la maandishi linalofaa kwa wakala.
|
||||
|
||||
## Mwongozo wa Usanidi
|
||||
|
||||
Ili kuongeza huduma mpya ya zana, vipengele viwili vya usanidi vinahitajika:
|
||||
|
||||
### 1. Usanidi wa Huduma ya Zana
|
||||
|
||||
Inahifadhiwa chini ya ufunguo wa usanidi `tool-service`. Inaelezea folyo za Pulsar na vigezo vinavyopatikana vya usanidi.
|
||||
|
||||
| Uwanja | Inahitajika | Maelezo |
|
||||
|-------|----------|-------------|
|
||||
| `id` | Ndiyo | Kitambulisho cha kipekee kwa huduma ya zana |
|
||||
| `request-queue` | Ndiyo | Mada kamili ya Pulsar kwa maombi (e.g., `non-persistent://tg/request/joke`) |
|
||||
| `response-queue` | Ndiyo | Mada kamili ya Pulsar kwa majibu (e.g., `non-persistent://tg/response/joke`) |
|
||||
| `config-params` | Hapana | Safu ya vigezo vya usanidi ambavyo huduma inakubali |
|
||||
|
||||
Kila kiparamu cha usanidi kinaweza kuainisha:
|
||||
`name`: Jina la kiparamu (inahitajika)
|
||||
`required`: Ikiwa kiparamu lazima kipewe na zana (cha kawaida: bandia)
|
||||
|
||||
Mfano:
|
||||
```json
|
||||
{
|
||||
"id": "joke-service",
|
||||
"request-queue": "non-persistent://tg/request/joke",
|
||||
"response-queue": "non-persistent://tg/response/joke",
|
||||
"config-params": [
|
||||
{"name": "style", "required": false}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Usanidi wa Zana
|
||||
|
||||
Hifadhiwa chini ya ufunguo wa `tool`. Inaelezea zana ambayo wakala anaweza kutumia.
|
||||
|
||||
| Sehemu | Inahitajika | Maelezo |
|
||||
|-------|----------|-------------|
|
||||
| `type` | Ndiyo | Lazima iwe `"tool-service"` |
|
||||
| `name` | Ndiyo | Jina la zana linaloonyeshwa kwa LLM |
|
||||
| `description` | Ndiyo | Maelezo ya kile ambacho zana inafanya (yanaonyeshwa kwa LLM) |
|
||||
| `service` | Ndiyo | Kitambulisho cha huduma ya zana inayotumiwa |
|
||||
| `arguments` | Hapana | Safu ya maelezo ya hoja kwa ajili ya LLM |
|
||||
| *(vigezo vya usanidi)* | Hubadilika | Vigezo vyovyote vya usanidi vilivyobainishwa na huduma |
|
||||
|
||||
Kila hoja inaweza kubainisha:
|
||||
`name`: Jina la hoja (inahitajika)
|
||||
`type`: Aina ya data, kwa mfano, `"string"` (inahitajika)
|
||||
`description`: Maelezo yanayoonyeshwa kwa LLM (inahitajika)
|
||||
|
||||
Mfano:
|
||||
```json
|
||||
{
|
||||
"type": "tool-service",
|
||||
"name": "tell-joke",
|
||||
"description": "Tell a joke on a given topic",
|
||||
"service": "joke-service",
|
||||
"style": "pun",
|
||||
"arguments": [
|
||||
{
|
||||
"name": "topic",
|
||||
"type": "string",
|
||||
"description": "The topic for the joke (e.g., programming, animals, food)"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Upakiaji wa Mipangilio
|
||||
|
||||
Tumia `tg-put-config-item` ili kupakia mipangilio:
|
||||
|
||||
```bash
|
||||
# Load tool-service config
|
||||
tg-put-config-item tool-service/joke-service < joke-service.json
|
||||
|
||||
# Load tool config
|
||||
tg-put-config-item tool/tell-joke < tell-joke.json
|
||||
```
|
||||
|
||||
Wakala-mtawala lazima uanzishwe tena ili kuchukua usanidi mpya.
|
||||
|
||||
## Maelezo ya Utendaji
|
||||
|
||||
### Mpango
|
||||
|
||||
Aina za ombi na majibu katika `trustgraph-base/trustgraph/schema/services/tool_service.py`:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ToolServiceRequest:
|
||||
user: str = "" # User context for multi-tenancy
|
||||
config: str = "" # JSON-encoded config values from tool descriptor
|
||||
arguments: str = "" # JSON-encoded arguments from LLM
|
||||
|
||||
@dataclass
|
||||
class ToolServiceResponse:
|
||||
error: Error | None = None
|
||||
response: str = "" # String response (the observation)
|
||||
end_of_stream: bool = False
|
||||
```
|
||||
|
||||
### Upande wa Server: Huduma ya DynamicToolService
|
||||
|
||||
Darasa la msingi katika `trustgraph-base/trustgraph/base/dynamic_tool_service.py`:
|
||||
|
||||
```python
|
||||
class DynamicToolService(AsyncProcessor):
|
||||
"""Base class for implementing tool services."""
|
||||
|
||||
def __init__(self, **params):
|
||||
topic = params.get("topic", default_topic)
|
||||
# Constructs topics: non-persistent://tg/request/{topic}, non-persistent://tg/response/{topic}
|
||||
# Sets up Consumer and Producer
|
||||
|
||||
async def invoke(self, user, config, arguments):
|
||||
"""Override this method to implement the tool's logic."""
|
||||
raise NotImplementedError()
|
||||
```
|
||||
|
||||
### Upande wa Mteja: Huduma ya ToolServiceImpl
|
||||
|
||||
Utendaji katika `trustgraph-flow/trustgraph/agent/react/tools.py`:
|
||||
|
||||
```python
|
||||
class ToolServiceImpl:
|
||||
def __init__(self, context, request_queue, response_queue, config_values, arguments, processor):
|
||||
# Uses the provided queue paths directly
|
||||
# Creates ToolServiceClient on first use
|
||||
|
||||
async def invoke(self, **arguments):
|
||||
client = await self._get_or_create_client()
|
||||
response = await client.call(user, config_values, arguments)
|
||||
return response if isinstance(response, str) else json.dumps(response)
|
||||
```
|
||||
|
||||
### Faili
|
||||
|
||||
| Faili | Madhumuni |
|
||||
|------|---------|
|
||||
| `trustgraph-base/trustgraph/schema/services/tool_service.py` | Mifumo ya ombi/jibu |
|
||||
| `trustgraph-base/trustgraph/base/tool_service_client.py` | Mteja wa kutumia huduma |
|
||||
| `trustgraph-base/trustgraph/base/dynamic_tool_service.py` | Darasa la msingi kwa utekelezaji wa huduma |
|
||||
| `trustgraph-flow/trustgraph/agent/react/tools.py` | Darasa la `ToolServiceImpl` |
|
||||
| `trustgraph-flow/trustgraph/agent/react/service.py` | Kupakia usanidi |
|
||||
|
||||
### Mifano: Huduma ya Utani
|
||||
|
||||
Mfano wa huduma katika `trustgraph-flow/trustgraph/tool_service/joke/`:
|
||||
|
||||
```python
|
||||
class Processor(DynamicToolService):
|
||||
async def invoke(self, user, config, arguments):
|
||||
style = config.get("style", "pun")
|
||||
topic = arguments.get("topic", "")
|
||||
joke = pick_joke(topic, style)
|
||||
return f"Hey {user}! Here's a {style} for you:\n\n{joke}"
|
||||
```
|
||||
|
||||
Usanidi wa huduma za zana:
|
||||
```json
|
||||
{
|
||||
"id": "joke-service",
|
||||
"request-queue": "non-persistent://tg/request/joke",
|
||||
"response-queue": "non-persistent://tg/response/joke",
|
||||
"config-params": [{"name": "style", "required": false}]
|
||||
}
|
||||
```
|
||||
|
||||
Usanidi wa zana:
|
||||
```json
|
||||
{
|
||||
"type": "tool-service",
|
||||
"name": "tell-joke",
|
||||
"description": "Tell a joke on a given topic",
|
||||
"service": "joke-service",
|
||||
"style": "pun",
|
||||
"arguments": [
|
||||
{"name": "topic", "type": "string", "description": "The topic for the joke"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Ulinganishi na Matoleo ya Zamani
|
||||
|
||||
Aina za zana zilizopo zilizojumuishwa zinaendelea kufanya kazi bila kubadilishwa.
|
||||
`tool-service` ni aina mpya ya zana pamoja na aina zilizopo (`knowledge-query`, `mcp-tool`, n.k.).
|
||||
|
||||
## Mambo ya Kuzingatia Baadaye
|
||||
|
||||
### Huduma Zinajitangaza Zenyewe
|
||||
|
||||
Uboreshaji wa siku zijazo unaweza kuruhusu huduma kuchapisha maelezo yao wenyewe:
|
||||
|
||||
Huduma huchapisha kwenye mada iliyojulikana ya `tool-descriptors` wakati wa kuanza.
|
||||
Wakala husajili na kusajili zana kwa njia ya moja kwa moja.
|
||||
Inaruhusu uunganishaji halisi wa "plug-and-play" bila mabadiliko ya usanidi.
|
||||
|
||||
Hii ni nje ya upeo wa utekelezaji wa awali.
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Utaratibu wa sasa wa zana: `trustgraph-flow/trustgraph/agent/react/tools.py`
|
||||
Usajili wa zana: `trustgraph-flow/trustgraph/agent/react/service.py:105-214`
|
||||
Schemas za wakala: `trustgraph-base/trustgraph/schema/services/agent.py`
|
||||
228
docs/tech-specs/sw/universal-decoder.sw.md
Normal file
228
docs/tech-specs/sw/universal-decoder.sw.md
Normal file
|
|
@ -0,0 +1,228 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Derekezi Universal ya Hati"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Derekezi Universal ya Hati
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Mada
|
||||
|
||||
Derekezi universal ya hati inayotumiwa na `unstructured` — ingiza aina yoyote ya hati inayotumika kupitia huduma moja, pamoja na taarifa kamili kuhusu chanzo na ushirikiano na mtaalamu wa maktaba, huku ikiandika nafasi za asili kama metadata ya grafu ya maarifa kwa ufuatiliaji kamili.
|
||||
|
||||
## Tatizo
|
||||
|
||||
Sasa, derekezi ya TrustGraph inalenga tu hati za PDF. Kuunga mkono aina zingine (DOCX, XLSX, HTML, Markdown, maandishi safi, PPTX, n.k.) inahitaji ama kuandika derekezi mpya kwa kila aina au kutumia maktaba ya uondoaji wa universal. Kila aina ina muundo tofauti — baadhi zinategemea kurasa, huku zingine hazitegemei — na mnyororo wa chanzo lazima urekodize ambako kila sehemu ya maandishi iliyopatikana ilitoka kwenye hati asili.
|
||||
|
||||
## Mbinu
|
||||
|
||||
### Maktaba: `unstructured`
|
||||
|
||||
Tumia `unstructured.partition.auto.partition()` ambayo hugundua kiotomatiki aina kutoka kwa mime type au upanuzi wa faili na huondoa vipengele vilivyopangwa (Kichwa, Nakala, Jedwali, Kipengele cha Orodha, n.k.). Kila kipengele kina metadata, pamoja na:
|
||||
|
||||
- `page_number` (kwa aina zinazotumia kurasa kama vile PDF, PPTX)
|
||||
- `element_id` (ya kipekee kwa kila kipengele)
|
||||
- `coordinates` (sanduku la mipaka kwa PDFs)
|
||||
- `text` (maandishi yaliyopatikana)
|
||||
- `category` (aina ya kipengele: Kichwa, Nakala, Jedwali, n.k.)
|
||||
|
||||
### Aina za Vipengele
|
||||
|
||||
`unstructured` huondoa vipengele vilivyopangwa kutoka kwenye hati. Kila kipengele kina aina na metadata inayohusiana:
|
||||
|
||||
**Vipengele vya maandishi:**
|
||||
- `Title` — vichwa vya sehemu
|
||||
- `NarrativeText` — aya za mwili
|
||||
- `ListItem` — vipengele vya orodha (pointi/nambari)
|
||||
- `Header`, `Footer` — vichwa/miguu ya ukurasa
|
||||
- `FigureCaption` — maandishi ya maelezo kwa picha/akili
|
||||
- `Formula` — maandishi ya hesabu
|
||||
- `Address`, `EmailAddress` — maelezo ya mawasiliano
|
||||
- `CodeSnippet` — vipengele vya nambari (kutoka kwenye maandishi)
|
||||
|
||||
**Jedwali:**
|
||||
- `Table` — data iliyopangwa katika meza. `unstructured` hutoa `element.text` (maandishi safi) na `element.metadata.text_as_html` (HTML kamili ya `<table>` pamoja na mistari, safu, na vichwa). Kwa aina zilizo na muundo wazi wa jedwali (DOCX, XLSX, HTML), uondoaji una uaminifu mkubwa. Kwa PDFs, ugunduzi wa jedwali hutegemea mkakati wa `hi_res` pamoja na uchambuzi wa muundo.
|
||||
|
||||
**Picha:**
|
||||
- `Image` — picha zilizowekwa ndani ambazo hugunduliwa kupitia uchambuzi wa muundo (inahitaji mkakati wa `hi_res`). Pamoja na `extract_image_block_to_payload=True`, inarudisha data ya picha kama base64 katika `element.metadata.image_base64`. Maandishi ya maandishi kutoka kwenye picha yanapatikana katika `element.text`.
|
||||
|
||||
### Usimamizi wa Jedwali
|
||||
|
||||
Jedwali hupewa umuhimu wa pekee. Pale derekezi inapokutana na kipengele cha `Table`, inahifadhi muundo wa HTML badala ya kuirekebisha kuwa maandishi safi. Hii inatoa derekezi ya LLM (Large Language Model) ya chini ya uondoaji ya taarifa bora kwa kuchimbua maarifa kutoka kwa data iliyopangwa.
|
||||
|
||||
Maandishi ya ukurasa/sehemu yanakusanywa kama ifuatavyo:
|
||||
- Vipengele vya maandishi: maandishi safi, yakiunganishwa na mistari mipya
|
||||
- Vipengele vya jedwali: alama ya HTML ya `<table>` kutoka `text_as_html`, ambayo inazingatiwa ili derekezi ya LLM iweze kutofautisha jedwali na maandishi.
|
||||
|
||||
Kwa mfano, ukurasa wenye kichwa, aya, na jedwali hutengenezwa kama ifuatavyo:
|
||||
|
||||
```
|
||||
Muhtasari wa Kifedha
|
||||
|
||||
Mapato yaliongezeka kwa 15% mwaka jana kutokana na matumizi ya kampuni.
|
||||
|
||||
<table>
|
||||
<tr><th>Kila Mara</th><th>Mapato</th><th>Kukua</th></tr>
|
||||
<tr><td>Q1</td><td>$12M</td><td>12%</td></tr>
|
||||
<tr><td>Q2</td><td>$14M</td><td>17%</td></tr>
|
||||
</table>
|
||||
```
|
||||
|
||||
Hii inahifadhi muundo wa jedwali wakati wa kuunganisha na katika mlolongo wa uondoaji, ambapo derekezi ya LLM inaweza kuchimbua uhusiano moja kwa moja kutoka kwa seli zilizopangwa badala ya kujaribu nadharia kuhusu ulinganishi wa safu kutoka kwa umbali.
|
||||
|
||||
### Usimamizi wa Picha
|
||||
|
||||
Picha zinaondolewa na kuhifadhiwa katika mtaalamu wa maktaba kama hati ndogo zenye `document_type="image"` na `urn:image:{uuid}` ID. Inapata triples za chanzo pamoja na aina `tg:Image`, ambazo zinahusiana na ukurasa/sehemu yake ya asili kupitia `prov:wasDerivedFrom`. Metadata ya picha (nafasi, vipimo, `element_id`) inarekodiwa katika chanzo.
|
||||
|
||||
**Muhimu, picha HAZIPI tokelezwa kama matokeo ya `TextDocument`.** Inahifadhiwa tu — haitumwi kwa derekezi ya sehemu au mlolongo wowote wa kuchakata maandishi. Hii ni kwa makusudi:
|
||||
|
||||
1. Bado hakuna mlolongo wa kuchakata picha (kuunganisha na modeli ya maono ni kazi ya baadaye)
|
||||
2. Kupeleka data ya picha au vipande vya maandishi kutoka kwa OCR kwenye mlolongo wa uondoaji wa maandishi kutatoa triples za KG (Knowledge Graph) zisizo na maana.
|
||||
|
||||
Picha pia hazijumuishwi katika maandishi ya ukurasa — vipengele vyovyote vya `Image` hupita bila kutambuliwa wakati wa kuunganisha maandishi ya kipengele kwa ukurasa/sehemu. Mnyororo wa chanzo unaandika kwamba picha zipo na zilipoonekana katika hati, hivyo zinaweza kuchukuliwa na mlolongo wa baadaye wa kuchakata picha bila kuhitajika kurejesha hati.
|
||||
|
||||
#### Kazi za Baadaye
|
||||
|
||||
- Tuma vitu vya `tg:Image` kwa modeli ya maono kwa ajili ya maelezo, tafsiri ya michoro, au uondoaji wa data ya chati.
|
||||
- Hifadhi maelezo ya picha kama hati ndogo za maandishi ambazo hufika katika mlolongo wa kawaida wa kuunganisha/uondoaji.
|
||||
- Unganisha maarifa yaliyopatikana nyuma kwenye picha za asili kupitia chanzo.
|
||||
|
||||
### Mikakati ya Sehemu
|
||||
|
||||
Kwa aina zinazotumia kurasa (PDF, PPTX, XLSX), vipengele daima huunganishwa kwa ukurasa/slide/karatasi kwanza. Kwa aina ambazo hazitumii kurasa (DOCX, HTML, Markdown, n.k.), derekezi inahitaji mkakati wa kuainisha hati katika sehemu. Hii inaweza kubadilishwa wakati wa utendaji kupitia `--section-strategy`.
|
||||
|
||||
Kila mkakati ni kazi ya kuunganisha juu ya orodha ya vipengele vya `unstructured`. Matokeo ni orodha ya vikundi vya vipengele; mlolongo mwingine (kuunganisha maandishi, kuhifadhi katika mtaalamu wa maktaba, chanzo, matokeo ya `TextDocument`) ni sawa bila kujali mkakati.
|
||||
|
||||
#### `whole-document` (chaguo-msingi)
|
||||
|
||||
Tuma hati nzima kama sehemu moja. Acha derekezi ya chini ya uunganisha iweze kuainisha yote.
|
||||
|
||||
- Mbinu rahisi, kiwango kizuri
|
||||
- Inaweza kuzalisha `TextDocument` kubwa kwa faili kubwa, lakini derekezi ya chini ya uunganisha inaweza kushughulikia hili
|
||||
- Ni bora wakati unataka muktadha mwingi kwa kila sehemu
|
||||
|
||||
#### `heading`
|
||||
|
||||
Aina katika vipengele vya vichwa (`Title`). Kila sehemu ni kichwa na yote yaliyo yafuatayo hadi kichwa cha kiwango sawa au cha juu. Vichwa vilivyoingiliana huunda sehemu zilizounganishwa.
|
||||
|
||||
- Inazalisha vitengo vya mada ambavyo vina maana
|
||||
- Inafaa kwa hati zilizopangwa (ripoti, manwal, vipimo)
|
||||
- Inatoa kwa derekezi ya LLM muktadha wa vichwa pamoja na maudhui
|
||||
- Inarudi kwenye `whole-document` ikiwa hakuna vichwa vilivyopatikana
|
||||
|
||||
#### `element-type`
|
||||
|
||||
Aina wakati aina ya kipengele inabadilika sana — haswa, anza sehemu mpya katika mabadiliko kati ya maandishi na jedwali. Vipengele vilivyofuata vya aina moja (maandishi, maandishi, maandishi au jedwali, jedwali) huendelea kuunganishwa.
|
||||
|
||||
- Inahifadhi jedwali kama sehemu zilizojitenga
|
||||
- Ni nzuri kwa hati zilizo na maudhui mchanganyiko (ripoti na meza za data)
|
||||
- Jedwali hupata umakini maalum wa uondoaji
|
||||
|
||||
#### `count`
|
||||
|
||||
Unganisha idadi fulani ya vipengele kwa kila sehemu. Inaweza kubadilishwa kupitia `--section-element-count` (chaguo-msingi: 20).
|
||||
|
||||
- Rahisi na yanayotabirika
|
||||
- Hayazingati muundo wa hati
|
||||
- Ni muhimu kama njia mbadala au kwa majaribio
|
||||
|
||||
#### `size`
|
||||
|
||||
Unganisha vipengele hadi kikomo cha herufi kifikie, kisha anza sehemu mpya. Inazingatia mipaka ya kipengele — haigawisi katikati ya kipengele. Inaweza kubadilishwa kupitia `--section-max-size` (chaguo-msingi: 4000 herufi).
|
||||
|
||||
- Inazalisha ukubwa wa sehemu unaozingatia
|
||||
- Inazingatia mipaka ya kipengele (tofauti na derekezi ya chini ya uunganisha)
|
||||
- Ni suluhisho bora kati ya muundo na udhibiti wa ukubwa
|
||||
- Ikiwa kipengele kimoja kinazidi kikomo, kinakuwa sehemu yake mwenyewe
|
||||
|
||||
#### Mwingiliano na Aina Zinazotumia Kurasa
|
||||
|
||||
Kwa aina zinazotumia kurasa, uunganishaji wa ukurasa daima huwapa kipaumbele. Mikakati ya sehemu inaweza kutumika *ndani* ya ukurasa ikiwa ukurasa ni kubwa sana (kwa mfano, ukurasa wa PDF wenye jedwali kubwa sana), hii inadhibitiwa na `--section-within-pages` (chaguo-msingi: false). Wakati chaguo hili limezimwa, kila ukurasa ni sehemu moja bila kujali ukubwa wake.
|
||||
|
||||
### Ugunduzi wa Aina
|
||||
|
||||
Derekezi inahitaji kujua aina ya hati ili iweze kuipitisha kwa `partition()` ya `unstructured`. Kuna njia mbili:
|
||||
|
||||
- **Njia ya mtaalamu wa maktaba** (`document_id` imewekwa): kwanza pata metadata ya hati kutoka kwa mtaalamu wa maktaba — hii inatuonyesha `kind` (aina) ambayo ilirekodiwa wakati wa kupakia. Kisha pata maudhui ya hati. Hii inahitaji simu mbili za mtaalamu wa maktaba, lakini kupata metadata ni rahisi.
|
||||
- **Njia ya ndani** (utangamano wa nyuma, `data` imewekwa): hakuna metadata inayopatikana kwenye ujumbe. Tumia `python-magic` kuchunguza aina kutoka kwa bytes za maudhui kama njia mbadala.
|
||||
|
||||
Hakuna mabadiliko yanayohitajika kwenye schema ya `Document` — mtaalamu wa maktaba hurejesha aina ya mime.
|
||||
|
||||
### Muundo
|
||||
|
||||
Huduma moja ya `universal-decoder` ambayo:
|
||||
|
||||
1. Inapokea ujumbe wa `Document` (ndani au kupitia marejeleo ya mtaalamu wa maktaba)
|
||||
2. Ikiwa ni njia ya mtaalamu wa maktaba: pata metadata ya hati (pata aina), kisha pata maudhui. Ikiwa ni njia ya ndani: chunguza aina kutoka kwa bytes za maudhui.
|
||||
3. Inaitisha `partition()` ili kuondoa vipengele
|
||||
4. Inaainisha vipengele: kwa ukurasa kwa aina zinazotumia kurasa, kwa mkakati wa sehemu uliopangwa kwa aina ambazo hazitumii kurasa
|
||||
5. Kwa kila ukurasa/sehemu:
|
||||
- Inazalisha `urn:page:{uuid}` au `urn:section:{uuid}` ID
|
||||
- Inaunganisha maandishi ya ukurasa: maandishi safi, jedwali kama HTML, picha zinarudiwa
|
||||
- Inahesabu nafasi za herufi kwa kila kipengele ndani ya maandishi ya ukurasa
|
||||
- Inahifadhi katika mtaalamu wa maktaba kama hati ndogo
|
||||
- Inazalisha triples za chanzo na metadata ya nafasi
|
||||
- Inatuma `TextDocument` chini ya uunganishaji
|
||||
6. Kwa kila kipengele cha picha:
|
||||
- Inazalisha `urn:image:{uuid}` ID
|
||||
- Inahifadhi data ya picha katika mtaalamu wa maktaba kama hati ndogo
|
||||
- Inazalisha triples za chanzo (zinawekwa tu, hazitumwi chini)
|
||||
|
||||
### Usanidi wa Huduma
|
||||
|
||||
Majadiliano ya mstari wa amri:
|
||||
|
||||
```
|
||||
--strategy Mbinu ya kuainisha: auto, hi_res, fast (chaguo-msingi: auto)
|
||||
--languages Orodha ya msimbo wa lugha inayotumika kwa OCR (chaguo-msingi: eng)
|
||||
--section-strategy Mbinu ya kuunganisha sehemu: whole-document, heading, element-type,
|
||||
count, size (chaguo-msingi: whole-document)
|
||||
--section-element-count Vipengele kwa kila sehemu kwa mbinu ya 'count' (chaguo-msingi: 20)
|
||||
--section-max-size Kikomo cha herufi kwa kila sehemu kwa mbinu ya 'size' (chaguo-msingi: 4000)
|
||||
--section-within-pages Tumia mkakati wa sehemu ndani ya ukurasa pia (chaguo-msingi: false)
|
||||
```
|
||||
|
||||
Pia, majadiliano ya kawaida ya `FlowProcessor` na folyo ya mtaalamu wa maktaba.
|
||||
|
||||
### Ushirikiano wa Flow
|
||||
|
||||
Derekezi ya universal huishia katika nafasi sawa ya mlolongo wa uendeshaji kama derekezi ya PDF iliyopo:
|
||||
|
||||
```
|
||||
Hati → [derekezi-universal] → TextDocument → [derekezi-ya-kuunganisha] → Sehemu → ...
|
||||
```
|
||||
|
||||
Inajisajili:
|
||||
- Mtumiaji wa `input` (schema ya Hati)
|
||||
- Mtayarishaji wa `output` (schema ya TextDocument)
|
||||
- Mtayarishaji wa `triples` (schema ya Triples)
|
||||
- Ombi/jibu la mtaalamu wa maktaba (kwa kupata na kuhifadhi hati ndogo)
|
||||
|
||||
### Uwekaji
|
||||
|
||||
- Chini mpya: `trustgraph-flow-universal-decoder`
|
||||
- Utendakazi: `unstructured[all-docs]` (inajumuisha PDF, DOCX, PPTX, n.k.)
|
||||
- Inaweza kuendeshwa pamoja au kubadilisha derekezi ya PDF iliyopo kulingana na usanidi wa mlolongo
|
||||
- Derekezi ya PDF iliyopo inabaki inapatikana kwa mazingira ambapo utendakazi wa `unstructured` ni mzito sana.
|
||||
|
||||
### Mabadiliko
|
||||
|
||||
| Komponenti | Mabadiliko |
|
||||
|------------------------------|-------------------------------------------------|
|
||||
| `provenance/namespaces.py` | Ongeza `TG_SECTION_TYPE`, `TG_IMAGE_TYPE`, `TG_ELEMENT_TYPES`, `TG_TABLE_COUNT`, `TG_IMAGE_COUNT` |
|
||||
| `provenance/triples.py` | Ongeza `mime_type`, `element_types`, `table_count`, `image_count` kwa `kwargs` |
|
||||
| `provenance/__init__.py` | Exporti mara kwa mara |
|
||||
| Mpya: `decoding/universal/` | Moduli mpya ya derekezi |
|
||||
| `setup.cfg` / `pyproject` | Ongeza utendakazi wa `unstructured[all-docs]` |
|
||||
| Docker | Pata picha mpya ya chumba |
|
||||
| Maelezo ya mlolongo | Unganisha derekezi-universal kama input ya hati |
|
||||
|
||||
### Mambo ambayo Hayabadiliki
|
||||
|
||||
- Derekezi ya kuchakata (Inapokea `TextDocument`, inafanya kama ilivyokuwa)
|
||||
- Vitu vya uondoaji (Inapokea `Sehemu`, inafanya kama ilivyokuwa)
|
||||
- Schema ya `Document` — inahitaji maelezo yafuatayo:
|
||||
- `file_name`: Jina la faili.
|
||||
- `file_type`: Aina ya faili.
|
||||
307
docs/tech-specs/sw/vector-store-lifecycle.sw.md
Normal file
307
docs/tech-specs/sw/vector-store-lifecycle.sw.md
Normal file
|
|
@ -0,0 +1,307 @@
|
|||
---
|
||||
layout: default
|
||||
title: "Usimamizi wa Mzunguko wa Hifadhi ya Vektor"
|
||||
parent: "Swahili (Beta)"
|
||||
---
|
||||
|
||||
# Usimamizi wa Mzunguko wa Hifadhi ya Vektor
|
||||
|
||||
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
||||
|
||||
## Muhtasari
|
||||
|
||||
Hati hii inaeleza jinsi TrustGraph inavyosimamia mkusanyiko wa hifadhi ya vektor katika matumizi tofauti ya backend (Qdrant, Pinecone, Milvus). Muundo huu unashughulikia changamoto ya kusaidia embeddings zenye vipimo tofauti bila kuweka maadili ya vipimo yaliyopangwa awali.
|
||||
|
||||
## Tatizo
|
||||
|
||||
Hifadhi za vektor zinahitaji kipimo cha embedding kuainishwa wakati wa kuunda mkusanyiko/fahirisi. Hata hivyo:
|
||||
Modeli tofauti za embedding hutoa vipimo tofauti (k.m., 384, 768, 1536)
|
||||
Kipimo hakijulikani hadi embedding ya kwanza itengenezwe
|
||||
Mkusaniko mmoja wa TrustGraph unaweza kupokea embeddings kutoka kwa modeli nyingi
|
||||
Kuweka kipimo (k.m., 384) husababisha hitilafu na saizi zingine za embedding
|
||||
|
||||
## Kanuni za Muundo
|
||||
|
||||
1. **Uundaji wa Kila Mara:** Mikusanyiko huundwa wakati wa kuandika mara ya kwanza, sio wakati wa shughuli za usimamizi wa mkusanyiko.
|
||||
2. **Jina Kulingana na Kipimo:** Majina ya mkusanyiko yanajumuisha kipimo cha embedding kama sehemu ya mwisho.
|
||||
3. **Ufanisi:** Maswali dhidi ya mikusanyiko isiyopo hurudisha matokeo tupu, sio makosa.
|
||||
4. **Usaidizi wa Vipimo Vingi:** Mkusaniko mmoja wa kimantiki unaweza kuwa na mikusanyiko mingi ya kimwili (moja kwa kila kipimo).
|
||||
|
||||
## Muundo
|
||||
|
||||
### Mfumo wa Majina ya Mkusaniko
|
||||
|
||||
Mikusanyiko ya hifadhi ya vektor hutumia sehemu za mwisho za kipimo ili kusaidia saizi nyingi za embedding:
|
||||
|
||||
**Embeddings za Hati:**
|
||||
Qdrant: `d_{user}_{collection}_{dimension}`
|
||||
Pinecone: `d-{user}-{collection}-{dimension}`
|
||||
Milvus: `doc_{user}_{collection}_{dimension}`
|
||||
|
||||
**Embeddings za Grafu:**
|
||||
Qdrant: `t_{user}_{collection}_{dimension}`
|
||||
Pinecone: `t-{user}-{collection}-{dimension}`
|
||||
Milvus: `entity_{user}_{collection}_{dimension}`
|
||||
|
||||
Mifano:
|
||||
`d_alice_papers_384` - Mkusaniko wa "makala za Alice" wenye embeddings za vipimo 384
|
||||
`d_alice_papers_768` - Mkusaniko huo huo wa kimantiki wenye embeddings za vipimo 768
|
||||
`t_bob_knowledge_1536` - Grafu ya maarifa ya "Bob" yenye embeddings za vipimo 1536
|
||||
|
||||
### Awamu za Mzunguko
|
||||
|
||||
#### 1. Ombi la Uundaji wa Mkusaniko
|
||||
|
||||
**Mwendo wa Ombi:**
|
||||
```
|
||||
User/System → Librarian → Storage Management Topic → Vector Stores
|
||||
```
|
||||
|
||||
**Tabia:**
|
||||
Msimamizi wa maktaba hutuma ombi la `create-collection` kwa kila mfumo wa kuhifadhi data.
|
||||
Vifaa vya usindikaji vya hifadhi ya vector hutambua ombi hilo lakini **havitaunda makusanyo halisi**
|
||||
Jibu hurudishwa mara moja kwa mafanikio.
|
||||
Uundaji halisi wa makusanyo huahirishwa hadi wakati wa kuandika wa kwanza.
|
||||
|
||||
**Sababu:**
|
||||
Vipimo havijulikani wakati wa uundaji.
|
||||
Inazuia uundaji wa makusanyo yenye vipimo vibaya.
|
||||
Inarahisha mantiki ya usimamizi wa makusanyo.
|
||||
|
||||
#### 2. Operesheni za Kuandika (Uundaji Ulioahirishwa)
|
||||
|
||||
**Mchakato wa Kuandika:**
|
||||
```
|
||||
Data → Storage Processor → Check Collection → Create if Needed → Insert
|
||||
```
|
||||
|
||||
**Tabia:**
|
||||
1. Pata kipimo cha pembejeo kutoka kwenye vektari: `dim = len(vector)`
|
||||
2. Unda jina la mkusanyiko pamoja na kiambishi cha kipimo
|
||||
3. Angalia ikiwa mkusanyiko unapatikana na kipimo hicho maalum
|
||||
4. Ikiwa haupo:
|
||||
Unda mkusanyiko wenye kipimo sahihi
|
||||
Rekodi: `"Lazily creating collection {name} with dimension {dim}"`
|
||||
5. Ingiza pembejeo kwenye mkusanyiko maalum wa kipimo
|
||||
|
||||
**Mfano wa Matukio:**
|
||||
```
|
||||
1. User creates collection "papers"
|
||||
→ No physical collections created yet
|
||||
|
||||
2. First document with 384-dim embedding arrives
|
||||
→ Creates d_user_papers_384
|
||||
→ Inserts data
|
||||
|
||||
3. Second document with 768-dim embedding arrives
|
||||
→ Creates d_user_papers_768
|
||||
→ Inserts data
|
||||
|
||||
Result: Two physical collections for one logical collection
|
||||
```
|
||||
|
||||
#### 3. Operesheni za Uchunguzi
|
||||
|
||||
**Mwendo wa Uchunguzi:**
|
||||
```
|
||||
Query Vector → Determine Dimension → Check Collection → Search or Return Empty
|
||||
```
|
||||
|
||||
**Tabia:**
|
||||
1. Pata kipimo kutoka kwa vektor ya swali: `dim = len(vector)`
|
||||
2. Unda jina la mkusanyiko pamoja na kiambishi cha kipimo
|
||||
3. Angalia ikiwa mkusanyiko unapatikana
|
||||
4. Ikiwa unapatikana:
|
||||
Fanya utafutaji wa kufanana
|
||||
Rudi na matokeo
|
||||
5. Ikiwa haupatikani:
|
||||
Rekodi: `"Collection {name} does not exist, returning empty results"`
|
||||
Rudi na orodha tupu (hakuna kosa lililotokea)
|
||||
|
||||
**Vipimo Vingi katika Swali Moja:**
|
||||
Ikiwa swali lina vektor za vipimo tofauti
|
||||
Kila kipimo hufanya utafutaji katika mkusanyiko wake unaohusiana
|
||||
Matokeo huunganishwa
|
||||
Mikusanyiko inayokosekana huachwa (hayatibiwi kama madosa)
|
||||
|
||||
**Sababu:**
|
||||
Kuuliza mkusanyiko ambao hauna data ni matumizi halali
|
||||
Kurudi na matokeo tupu ni sahihi kwa maana
|
||||
Inazuia madosa wakati wa kuanza kwa mfumo au kabla ya kuingiza data
|
||||
|
||||
#### 4. Ufutilishaji wa Mkusaniko
|
||||
|
||||
**Mchakato wa Ufutilishaji:**
|
||||
```
|
||||
Delete Request → List All Collections → Filter by Prefix → Delete All Matches
|
||||
```
|
||||
|
||||
**Tabia:**
|
||||
1. Unda muundo wa kielelezo: `d_{user}_{collection}_` (angalia alama ya chini)
|
||||
2. Orodha zote za makusanyo katika hifadhi ya vekta
|
||||
3. Chuja makusanyo yanayolingana na kielelezo
|
||||
4. Futa makusanyo yote yanayolingana
|
||||
5. Rekodi kila kufutwa: `"Deleted collection {name}"`
|
||||
6. Rekodi ya jumla: `"Deleted {count} collection(s) for {user}/{collection}"`
|
||||
|
||||
**Mfano:**
|
||||
```
|
||||
Collections in store:
|
||||
- d_alice_papers_384
|
||||
- d_alice_papers_768
|
||||
- d_alice_reports_384
|
||||
- d_bob_papers_384
|
||||
|
||||
Delete "papers" for alice:
|
||||
→ Deletes: d_alice_papers_384, d_alice_papers_768
|
||||
→ Keeps: d_alice_reports_384, d_bob_papers_384
|
||||
```
|
||||
|
||||
**Sababu:**
|
||||
Inahakikisha usafishaji kamili wa aina zote za vipimo.
|
||||
Ulinganishaji wa muundo huuzuia kufutwa kwa makusudi kwa mkusanyiko usiohusiana.
|
||||
Operesheni ya atomu kutoka kwa mtazamo wa mtumiaji (vipimo vyote hufutwa pamoja).
|
||||
|
||||
## Tabia za Utendaji
|
||||
|
||||
### Operesheni za Kawaida
|
||||
|
||||
**Uundaji wa Mkusanyiko:**
|
||||
✓ Inarudisha mafanikio mara moja.
|
||||
✓ Hakuna uhifadhi wa kimwili unaoombwa.
|
||||
✓ Operesheni ya haraka (hakuna pembejeo/patto la nyuma).
|
||||
|
||||
**Uandishi wa Kwanza:**
|
||||
✓ Huunda mkusanyiko na kipimo sahihi.
|
||||
✓ Huwa polepole kidogo kwa sababu ya gharama ya uundaji wa mkusanyiko.
|
||||
✓ Uandishi wa baadaye kwenye kipimo sawa huwa wa haraka.
|
||||
|
||||
**Umasilisho Kabla ya Uandishi Wowote:**
|
||||
✓ Inarudisha matokeo tupu.
|
||||
✓ Hakuna makosa au ubaguzi.
|
||||
✓ Mfumo unaendelea kuwa thabiti.
|
||||
|
||||
**Uandishi Mseto wa Vipimo:**
|
||||
✓ Huunda moja kwa moja makusanyiko tofauti kwa kila kipimo.
|
||||
✓ Kila kipimo kimetengwa katika mkusanyiko wake mwenyewe.
|
||||
✓ Hakuna migogoro ya kipimo au makosa ya muundo.
|
||||
|
||||
**Ufutaji wa Mkusanyiko:**
|
||||
✓ Huondoa aina zote za vipimo.
|
||||
✓ Usafishaji kamili.
|
||||
✓ Hakuna makusanyiko yaliyotelekezwa.
|
||||
|
||||
### Hali Maalum
|
||||
|
||||
**Miundo Mbalimbali ya Uingizaji:**
|
||||
```
|
||||
Scenario: User switches from model A (384-dim) to model B (768-dim)
|
||||
Behavior:
|
||||
- Both dimensions coexist in separate collections
|
||||
- Old data (384-dim) remains queryable with 384-dim vectors
|
||||
- New data (768-dim) queryable with 768-dim vectors
|
||||
- Cross-dimension queries return results only for matching dimension
|
||||
```
|
||||
|
||||
**Uandikaji wa Kwanza Unaofanyika Pamoja:**
|
||||
```
|
||||
Scenario: Multiple processes write to same collection simultaneously
|
||||
Behavior:
|
||||
- Each process checks for existence before creating
|
||||
- Most vector stores handle concurrent creation gracefully
|
||||
- If race condition occurs, second create is typically idempotent
|
||||
- Final state: Collection exists and both writes succeed
|
||||
```
|
||||
|
||||
**Uhamisho wa Vipimo:**
|
||||
```
|
||||
Scenario: User wants to migrate from 384-dim to 768-dim embeddings
|
||||
Behavior:
|
||||
- No automatic migration
|
||||
- Old collection (384-dim) persists
|
||||
- New collection (768-dim) created on first new write
|
||||
- Both dimensions remain accessible
|
||||
- Manual deletion of old dimension collections possible
|
||||
```
|
||||
|
||||
**Maswali ya Mkusanyiko Tupu:**
|
||||
```
|
||||
Scenario: Query a collection that has never received data
|
||||
Behavior:
|
||||
- Collection doesn't exist (never created)
|
||||
- Query returns empty list
|
||||
- No error state
|
||||
- System logs: "Collection does not exist, returning empty results"
|
||||
```
|
||||
|
||||
## Maelekezo ya Utendaji
|
||||
|
||||
### Maelezo Maalum ya Hifadhi ya Data
|
||||
|
||||
**Qdrant:**
|
||||
Hutumia `collection_exists()` kwa upangaji wa kuangalia uwepo
|
||||
Hutumia `get_collections()` kwa orodha wakati wa kufuta
|
||||
Uundaji wa mkusanyiko unahitaji `VectorParams(size=dim, distance=Distance.COSINE)`
|
||||
|
||||
**Pinecone:**
|
||||
Hutumia `has_index()` kwa upangaji wa kuangalia uwepo
|
||||
Hutumia `list_indexes()` kwa orodha wakati wa kufuta
|
||||
Uundaji wa faharasa unahitaji kusubiri hali ya "tayari"
|
||||
Vipimo vya seva zisizo na utunzaji vimepangwa na eneo la wingu
|
||||
|
||||
**Milvus:**
|
||||
Darasa za moja kwa moja (`DocVectors`, `EntityVectors`) husimamia mzunguko wa maisha
|
||||
Kumbukumbu ya ndani `self.collections[(dim, user, collection)]` kwa utendaji
|
||||
Majina ya mkusanyiko husafishwa (herufi na nambari pekee + alama ya nukta)
|
||||
Inasaidia schema na vitambulisho ambavyo huongezeka kiotomatiki
|
||||
|
||||
### Mambo ya Kuzingatia ya Utendaji
|
||||
|
||||
**Ucheleweshaji wa Uandikishaji wa Kwanza:**
|
||||
Gharama ya ziada kutokana na uundaji wa mkusanyiko
|
||||
Qdrant: ~100-500ms
|
||||
Pinecone: ~10-30 sekunde (utayarishaji wa seva zisizo na utunzaji)
|
||||
Milvus: ~500-2000ms (pamoja na uwekaji wa faharasa)
|
||||
|
||||
**Utendaji wa Umasilisho:**
|
||||
Uangaliaji wa uwepo unaongeza gharama ndogo (~1-10ms)
|
||||
Hakuna athari ya utendaji mara tu mkusanyiko ukiwepo
|
||||
Kila mkusanyiko wa vipimo unafanywa kazi kwa kujitegemea
|
||||
|
||||
**Gharama ya Hifadhi:**
|
||||
Meta-data ndogo kwa kila mkusanyiko
|
||||
Gharama kuu ni kwa kila kipimo
|
||||
Ulinganisho: Nafasi ya hifadhi dhidi ya uwezekano wa vipimo
|
||||
|
||||
## Mambo ya Kuzingatia ya Baadaye
|
||||
|
||||
**Uunganishaji Otomatiki wa Vipimo:**
|
||||
Inaweza kuongeza mchakato wa asilia wa kutambua na kuunganisha toleo lisilo la vipimo
|
||||
Itahitaji kuweka upya au kupunguza vipimo
|
||||
|
||||
**Unyonyaji wa Vipimo:**
|
||||
Inaweza kuonyesha API ya kuorodhesha vipimo vyote vinavyotumika kwa mkusanyiko
|
||||
Ni muhimu kwa utawala na ufuatiliaji
|
||||
|
||||
**Upendeleo wa Vipimo vya Msingi:**
|
||||
Inaweza kufuatilia kipimo "cha msingi" kwa kila mkusanyiko
|
||||
Tumia kwa masilisho wakati hali ya kipimo haipatikani
|
||||
|
||||
**Mgao wa Hifadhi:**
|
||||
Inaweza kuhitaji mipaka ya kipimo kwa kila mkusanyiko
|
||||
Kuzuia ongezeko la toleo la vipimo
|
||||
|
||||
## Maelekezo ya Uhamishaji
|
||||
|
||||
**Kutoka kwa Mfumo wa Zamani wa Jina la Kipimo:**
|
||||
Mkusanyiko wa zamani: `d_{user}_{collection}` (hakuna jina la kipimo)
|
||||
Mkusanyiko mpya: `d_{user}_{collection}_{dim}` (na jina la kipimo)
|
||||
Hakuna uhamishaji otomatiki - mkusanyiko wa zamani wanaendelea kuwa na ufikiaji
|
||||
Fikiria programu ya uhamishaji ya mwongozo ikiwa inahitajika
|
||||
Unaweza kuendesha mifumo miwili ya majina kwa wakati mmoja
|
||||
|
||||
## Marejeleo
|
||||
|
||||
Usimamizi wa Mkusanyiko: `docs/tech-specs/collection-management.md`
|
||||
Schema ya Hifadhi: `trustgraph-base/trustgraph/schema/services/storage.py`
|
||||
Huduma ya Maktaba: `trustgraph-flow/trustgraph/librarian/service.py`
|
||||
Loading…
Add table
Add a link
Reference in a new issue