mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
769 lines
24 KiB
Markdown
769 lines
24 KiB
Markdown
---
|
|
layout: default
|
|
title: "Dondoo la Maarifa la Ontolojia - Awamu ya 2 ya Urekebishaji"
|
|
parent: "Swahili (Beta)"
|
|
---
|
|
|
|
# Dondoo la Maarifa la Ontolojia - Awamu ya 2 ya Urekebishaji
|
|
|
|
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
|
|
|
|
**Hali**: Rasimu
|
|
**Mwandishi**: Mkutano wa Uchambuzi wa 2025-12-03
|
|
**Inahusiana na**: `ontology.md`, `ontorag.md`
|
|
|
|
## Muhtasari
|
|
|
|
Hati hii inataja kutofautiana katika mfumo wa sasa wa dondoo la maarifa unaotegemea ontolojia na inapendekeza urekebishaji ili kuboresha utendaji wa LLM na kupunguza upotevu wa habari.
|
|
|
|
## Utendaji wa Sasa
|
|
|
|
### Inavyofanya Sasa
|
|
|
|
1. **Kupakia Ontolojia** (`ontology_loader.py`)
|
|
Inapakia JSON ya ontolojia na vitufe kama `"fo/Recipe"`, `"fo/Food"`, `"fo/produces"`
|
|
Nambari za darasa zina jalizi la nafasi katika kitufe yenyewe
|
|
Mfano kutoka `food.ontology`:
|
|
```json
|
|
"classes": {
|
|
"fo/Recipe": {
|
|
"uri": "http://purl.org/ontology/fo/Recipe",
|
|
"rdfs:comment": "A Recipe is a combination..."
|
|
}
|
|
}
|
|
```
|
|
|
|
2. **Uundaji wa Maagizo** (`extract.py:299-307`, `ontology-prompt.md`)
|
|
Kiolezo kinapokea dictionaries `classes`, `object_properties`, `datatype_properties`
|
|
Kiolezo huchanganua: `{% for class_id, class_def in classes.items() %}`
|
|
LLM inaona: `**fo/Recipe**: A Recipe is a combination...`
|
|
Muundo wa mfano wa matokeo unaonyesha:
|
|
```json
|
|
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}
|
|
{"subject": "recipe:cornish-pasty", "predicate": "has_ingredient", "object": "ingredient:flour"}
|
|
```
|
|
|
|
3. **Uchambuzi wa Majibu** (`extract.py:382-428`)
|
|
Inatarajia safu ya JSON: `[{"subject": "...", "predicate": "...", "object": "..."}]`
|
|
Inathibitisha dhidi ya sehemu ya ontolojia
|
|
Inapanua URI kupitia `expand_uri()` (extract.py:473-521)
|
|
|
|
4. **Upanuzi wa URI** (`extract.py:473-521`)
|
|
Inangalia ikiwa thamani iko katika kamusi `ontology_subset.classes`
|
|
Ikiwa imepatikana, inatoa URI kutoka kwenye ufafanuzi wa darasa
|
|
Ikiwa haijapatikana, inaunda URI: `f"https://trustgraph.ai/ontology/{ontology_id}#{value}"`
|
|
|
|
### Mfano wa Mtiririko wa Data
|
|
|
|
**Ontolojia ya JSON → Mpakuzi → Ombi:**
|
|
```
|
|
"fo/Recipe" → classes["fo/Recipe"] → LLM sees "**fo/Recipe**"
|
|
```
|
|
|
|
**LLM → Mfumo wa Uchambuzi → Matokeo:**
|
|
```
|
|
"Recipe" → not in classes["fo/Recipe"] → constructs URI → LOSES original URI
|
|
"fo/Recipe" → found in classes → uses original URI → PRESERVES URI
|
|
```
|
|
|
|
## Matatizo Yaliyobainika
|
|
|
|
### 1. **Mfano Usiofuata Kanuni katika Maagizo**
|
|
|
|
**Tatizo**: Kiolezo cha maagizo huonyesha vitambulisho vya darasa na mabainisha (`fo/Recipe`) lakini matokeo ya mfano hutumia majina ya darasa yasiyo na mabainisha (`Recipe`).
|
|
|
|
**Mahali**: `ontology-prompt.md:5-52`
|
|
|
|
```markdown
|
|
## Ontology Classes:
|
|
- **fo/Recipe**: A Recipe is...
|
|
|
|
## Example Output:
|
|
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}
|
|
```
|
|
|
|
**Athari**: Mfumo wa lugha (LLM) hupokea ishara tofauti kuhusu muundo ambao unapaswa kutumika.
|
|
|
|
### 2. **Upatanishi wa Habari katika Upanuzi wa URI**
|
|
|
|
**Tatizo**: Wakati LLM hurudisha majina ya darasa ambayo hayana alama ya mbele, kama ilivyoelezwa katika mfano, `expand_uri()` hayawezi kuyakuta katika kamusi ya ontolojia na huunda URI za dharura, na kusababisha kupoteza URI za asili.
|
|
|
|
**Mahali**: `extract.py:494-500`
|
|
|
|
```python
|
|
if value in ontology_subset.classes: # Looks for "Recipe"
|
|
class_def = ontology_subset.classes[value] # But key is "fo/Recipe"
|
|
if isinstance(class_def, dict) and 'uri' in class_def:
|
|
return class_def['uri'] # Never reached!
|
|
return f"https://trustgraph.ai/ontology/{ontology_id}#{value}" # Fallback
|
|
```
|
|
|
|
**Athari**:
|
|
URI asili: `http://purl.org/ontology/fo/Recipe`
|
|
URI iliyoundwa: `https://trustgraph.ai/ontology/food#Recipe`
|
|
Maana ya kielelezo yamepotea, husababisha kutofanya kazi kwa pamoja.
|
|
|
|
### 3. **Muundo Usio Wazi wa Eneo la Kitu**
|
|
|
|
**Tatizo**: Hakuna mwongozo wazi kuhusu muundo wa URI ya eneo la kitu.
|
|
|
|
**Mfano katika maagizo**:
|
|
`"recipe:cornish-pasty"` (kielezi kama kielezi)
|
|
`"ingredient:flour"` (kielezi tofauti)
|
|
|
|
**Tabia halisi** (extract.py:517-520):
|
|
```python
|
|
# Treat as entity instance - construct unique URI
|
|
normalized = value.replace(" ", "-").lower()
|
|
return f"https://trustgraph.ai/{ontology_id}/{normalized}"
|
|
```
|
|
|
|
**Athari**: Mfumo wa lugha (LLM) lazima ajue mbinu ya kuweka alama (prefixing) bila kuwa na msingi wa elimu (ontology).
|
|
|
|
### 4. **Hakuna Maelekezo ya Mbele ya Nafasi (Namespace)**
|
|
|
|
**Tatizo**: Faili ya JSON ya elimu ina maelezo ya nafasi (namespace) (kwa mstari wa 10-25 katika food.ontology):
|
|
```json
|
|
"namespaces": {
|
|
"fo": "http://purl.org/ontology/fo/",
|
|
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
|
|
...
|
|
}
|
|
```
|
|
|
|
Lakini mistari hii haionyeshwi kwa mfumo wa lugha (LLM). MFUMO WA LUGHA (LLM) haujua:
|
|
Maana ya "fo"
|
|
Njia gani ya kutumia kwa vitu
|
|
Nafasi gani inayotumika kwa vipengele
|
|
|
|
### 5. **Lebo Ambazo Hazitumiki katika Swali**
|
|
|
|
**Tatizo**: Kila darasa lina sehemu za `rdfs:label` (k.m., `{"value": "Recipe", "lang": "en-gb"}`), lakini kigezo cha swali haziitumii.
|
|
|
|
**Hali ya sasa**: Inaonyesha tu `class_id` na `comment`
|
|
```jinja
|
|
- **{{class_id}}**{% if class_def.comment %}: {{class_def.comment}}{% endif %}
|
|
```
|
|
|
|
**Inapatikana lakini haitumiki:**
|
|
```python
|
|
"rdfs:label": [{"value": "Recipe", "lang": "en-gb"}]
|
|
```
|
|
|
|
**Athari**: Inaweza kutoa majina ambayo yanaweza kueleweka kwa binadamu pamoja na vitambulisho vya kiufundi.
|
|
|
|
## Suluhisho Zilizopendekezwa
|
|
|
|
### Chaguo A: Kuweka Vipengele sawa na Vitambulisho visivyo na Mbele
|
|
|
|
**Mbinu**: Ondoa mbele kutoka kwa vitambulisho vya darasa kabla ya kuviwasha kwa mfumo wa akili bandia (LLM).
|
|
|
|
**Mabadiliko**:
|
|
1. Badilisha `build_extraction_variables()` ili kubadilisha funguo:
|
|
```python
|
|
classes_for_prompt = {
|
|
k.split('/')[-1]: v # "fo/Recipe" → "Recipe"
|
|
for k, v in ontology_subset.classes.items()
|
|
}
|
|
```
|
|
|
|
2. Sasisha mfano wa maagizo ili ufanane (tayari hutumia majina yasiyo na alama).
|
|
|
|
3. Badilisha `expand_uri()` ili iweze kushughulikia aina zote mbili:
|
|
```python
|
|
# Try exact match first
|
|
if value in ontology_subset.classes:
|
|
return ontology_subset.classes[value]['uri']
|
|
|
|
# Try with prefix
|
|
for prefix in ['fo/', 'rdf:', 'rdfs:']:
|
|
prefixed = f"{prefix}{value}"
|
|
if prefixed in ontology_subset.classes:
|
|
return ontology_subset.classes[prefixed]['uri']
|
|
```
|
|
|
|
**Faida:**
|
|
Safi zaidi, rahisi zaidi kusoma na kuelewa.
|
|
Inafanana na mifano iliyopo ya maagizo.
|
|
Mifumo ya lugha kubwa (LLMs) hufanya kazi vizuri zaidi na alama (tokens) rahisi.
|
|
|
|
**Hasara:**
|
|
Migongano ya majina ya madarasa ikiwa ontolojia nyingi zina jina sawa la darasa.
|
|
Inapoteza habari ya nafasi (namespace).
|
|
Inahitaji mantiki ya dharura kwa utafutaji.
|
|
|
|
### Chaguo B: Tumia Kitambulisho Kamili Chenye Alama (Prefix) kwa Ufanisi
|
|
|
|
**Mbinu:** Sasisha mifano ili kutumia kitambulisho chenye alama kinacholingana na kile kinachoonyeshwa katika orodha ya madarasa.
|
|
|
|
**Mabadiliko:**
|
|
1. Sasisha mfano wa agizo (ontology-prompt.md:46-52):
|
|
```json
|
|
[
|
|
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "fo/Recipe"},
|
|
{"subject": "recipe:cornish-pasty", "predicate": "rdfs:label", "object": "Cornish Pasty"},
|
|
{"subject": "recipe:cornish-pasty", "predicate": "fo/produces", "object": "food:cornish-pasty"},
|
|
{"subject": "food:cornish-pasty", "predicate": "rdf:type", "object": "fo/Food"}
|
|
]
|
|
```
|
|
|
|
2. Ongeza maelezo ya nafasi ya kazi kwenye swali:
|
|
```markdown
|
|
## Namespace Prefixes:
|
|
- **fo/**: Food Ontology (http://purl.org/ontology/fo/)
|
|
- **rdf:**: RDF Schema
|
|
- **rdfs:**: RDF Schema
|
|
|
|
Use these prefixes exactly as shown when referencing classes and properties.
|
|
```
|
|
|
|
3. Acha `expand_uri()` kama ilivyo (hufanya kazi vizuri wakati mechi zinapopatikana).
|
|
|
|
**Faida:**
|
|
Ulinganisho kati ya ingizo na pato.
|
|
Hakuna upotevu wa habari.
|
|
Inahifadhi maana ya nafasi (namespace).
|
|
Inafanya kazi na ontolojia nyingi.
|
|
|
|
**Hasara:**
|
|
Alama (tokens) zaidi kwa LLM.
|
|
Inahitaji LLM kufuatilia alama za mbele (prefixes).
|
|
|
|
### Chaguo C: Mchanganyiko - Onyesha Lebo na Kitambulisho (ID)
|
|
|
|
**Mbinu:** Ongeza maagizo katika swali ili kuonyesha lebo zinazoweza kusomwa na binadamu na kitambulisho (ID) cha kiufundi.
|
|
|
|
**Mabadiliko:**
|
|
1. Sasisha mfumo wa swali:
|
|
```jinja
|
|
{% for class_id, class_def in classes.items() %}
|
|
- **{{class_id}}** (label: "{{class_def.labels[0].value if class_def.labels else class_id}}"){% if class_def.comment %}: {{class_def.comment}}{% endif %}
|
|
{% endfor %}
|
|
```
|
|
|
|
Matokeo ya mfano:
|
|
```markdown
|
|
- **fo/Recipe** (label: "Recipe"): A Recipe is a combination...
|
|
```
|
|
|
|
2. Maelekezo ya sasisho:
|
|
```markdown
|
|
When referencing classes:
|
|
- Use the full prefixed ID (e.g., "fo/Recipe") in JSON output
|
|
- The label (e.g., "Recipe") is for human understanding only
|
|
```
|
|
|
|
**Faida:**
|
|
Inafaa zaidi kwa mifumo ya lugha kubwa (LLM).
|
|
Inahifadhi habari yote.
|
|
Inaeleza wazi ni nini kinachotakiwa kutumika.
|
|
|
|
**Hasara:**
|
|
Ombi refu zaidi.
|
|
Mfumo mgumu zaidi.
|
|
|
|
## Njia Iliyotekelezwa
|
|
|
|
**Muundo Ulioboreshwa wa Muhusiano wa Vitu na Sifa** - unaibadilisha kabisa mfumo wa zamani unaotegemea triplet.
|
|
|
|
Njia mpya ilichaguliwa kwa sababu:
|
|
|
|
1. **Hakuna Upotevu wa Habari:** Anwani za mtandaoni (URIs) za awali zinaendelea kuhifadhiwa kwa usahihi.
|
|
2. **Mantiki Rahisi:** Hakuna mabadiliko yanayohitajika, utafutaji wa moja kwa moja wa kamusi unafanya kazi.
|
|
3. **Usalama wa Nafasi:** Inashughulikia ontolojia nyingi bila migongano.
|
|
4. **Ukweli wa Kisia:** Inahifadhi maana ya RDF/OWL.
|
|
|
|
## Utendaji Uliofanyika
|
|
|
|
### Kilichojengwa:
|
|
|
|
1. **Mfumo Mpya wa Ombi** (`prompts/ontology-extract-v2.txt`)
|
|
✅ Sehemu zilizoelezwa wazi: Aina za Vitu, Mahusiano, Sifa.
|
|
✅ Mfano unaotumia kitambulisho kamili cha aina (`fo/Recipe`, `fo/has_ingredient`).
|
|
✅ Maelekezo ya kutumia kitambulisho halisi kutoka kwa schema.
|
|
✅ Muundo mpya wa JSON na safu za vitu/mahusiano/sifa.
|
|
|
|
2. **Urekebishaji wa Vitu** (`entity_normalizer.py`)
|
|
✅ `normalize_entity_name()` - Inabadilisha majina kuwa muundo salama wa URI.
|
|
✅ `normalize_type_identifier()` - Inashughulikia alama za upande katika aina (`fo/Recipe` → `fo-recipe`).
|
|
✅ `build_entity_uri()` - Inaunda anwani za kipekee (URIs) kwa kutumia jozi (jina, aina).
|
|
✅ `EntityRegistry` - Inafuatilia vitu ili kuepuka marudia.
|
|
|
|
3. **Mchangamizi wa JSON** (`simplified_parser.py`)
|
|
✅ Inachanganua muundo mpya: `{entities: [...], relationships: [...], attributes: [...]}`.
|
|
✅ Inasaidia majina ya sehemu katika muundo wa kebab na muundo wa nyoka.
|
|
✅ Inarudisha madarasa ya data iliyopangwa.
|
|
✅ Usimamizi wa makosa kwa njia nzuri pamoja na uandishi wa matukio.
|
|
|
|
4. **Mabadilishaji wa Triplet** (`triple_converter.py`)
|
|
✅ `convert_entity()` - Inaunda triplet za aina + lebo moja kwa moja.
|
|
✅ `convert_relationship()` - Inaunganisha anwani za vitu (URIs) kupitia sifa.
|
|
✅ `convert_attribute()` - Inaongeza maadili ya moja kwa moja.
|
|
✅ Inatafuta anwani kamili kutoka kwa maelezo ya ontolojia.
|
|
|
|
5. **Mchakato Mkuu Uliosasishwa** (`extract.py`)
|
|
✅ Imeondoa msimbo wa zamani wa uondoaji wa triplet.
|
|
✅ Imeongeza `extract_with_simplified_format()`.
|
|
✅ Sasa inatumia tu muundo uliorahisishwa.
|
|
✅ Inaitisha ombi na kitambulisho `extract-with-ontologies-v2`.
|
|
|
|
## Majaribio
|
|
|
|
### Jaribio la 1: Uhifadhi wa URI
|
|
```python
|
|
# Given ontology class
|
|
classes = {"fo/Recipe": {"uri": "http://purl.org/ontology/fo/Recipe", ...}}
|
|
|
|
# When LLM returns
|
|
llm_output = {"subject": "x", "predicate": "rdf:type", "object": "fo/Recipe"}
|
|
|
|
# Then expanded URI should be
|
|
assert expanded == "http://purl.org/ontology/fo/Recipe"
|
|
# Not: "https://trustgraph.ai/ontology/food#Recipe"
|
|
```
|
|
|
|
### Mtihani wa 2: Mzozo wa Ontolojia Nyingi
|
|
```python
|
|
# Given two ontologies
|
|
ont1 = {"fo/Recipe": {...}}
|
|
ont2 = {"cooking/Recipe": {...}}
|
|
|
|
# LLM should use full prefix to disambiguate
|
|
llm_output = {"object": "fo/Recipe"} # Not just "Recipe"
|
|
```
|
|
|
|
### Mtihani wa 3: Muundo wa Eneo la Mfano
|
|
```python
|
|
# Given prompt with food ontology
|
|
# LLM should create instances like
|
|
{"subject": "recipe:cornish-pasty"} # Namespace-style
|
|
{"subject": "food:beef"} # Consistent prefix
|
|
```
|
|
|
|
## Maswali ya Kufungua
|
|
|
|
1. **Je, vipozi vya mifano ya vitu vinapaswa kutumia mbele za nafasi?**
|
|
Sasa: `"recipe:cornish-pasty"` (ya hiari)
|
|
Mbadala: Je, kutumia mbele ya ontolojia `"fo:cornish-pasty"`?
|
|
Mbadala: Hakuna mbele, kupanua katika URI `"cornish-pasty"` → URI kamili?
|
|
|
|
2. **Jinsi ya kushughulikia uwanja/jukumu katika swali?**
|
|
Kwa sasa inaonyesha: `(Recipe → Food)`
|
|
Je, inapaswa kuwa: `(fo/Recipe → fo/Food)`?
|
|
|
|
3. **Je, tunapaswa kuthibitisha vikwazo vya uwanja/jukumu?**
|
|
TODO maoni katika extract.py:470
|
|
Itakamata makosa zaidi lakini ni ngumu zaidi
|
|
|
|
4. **Hebu kuhusu sifa za kinyume na usawa?**
|
|
Ontolojia ina `owl:inverseOf`, `owl:equivalentClass`
|
|
Hasa haitumiki katika uondoaji
|
|
Je, inapaswa kutumika?
|
|
|
|
## Viashiria vya Mafanikio
|
|
|
|
✅ Hakuna upotevu wa habari ya URI (uhifadhi wa 100% wa URI za awali)
|
|
✅ Muundo wa pato la LLM unalingana na muundo wa ingizo
|
|
✅ Hakuna mifano ya kusumbua katika swali
|
|
✅ Vipimo hufanikiwa na ontolojia nyingi
|
|
✅ Ubora wa uondoaji ulioboreshwa (uliofanywa na asilimia ya triple halali)
|
|
|
|
## Mbinu Mbadala: Muundo Ulioboreshwa wa Uondoaji
|
|
|
|
### Falsafa
|
|
|
|
Badala ya kuuliza LLM kuelewa maana ya RDF/OWL, waulize ifanye kile ambacho ni nzuri: **kutafuta vitu na uhusiano katika maandishi**.
|
|
|
|
Acha msimbo kushughulikia uundaji wa URI, ubadilishaji wa RDF, na mambo rasmi ya wavuti ya kiakili.
|
|
|
|
### Mfano: Uainishaji wa Vitu
|
|
|
|
**Maandishi ya Ingizo:**
|
|
```
|
|
Cornish pasty is a traditional British pastry filled with meat and vegetables.
|
|
```
|
|
|
|
**Muundo wa Ontolojia (unaonyeshwa kwa mfumo wa lugha kubwa):**
|
|
```markdown
|
|
## Entity Types:
|
|
- Recipe: A recipe is a combination of ingredients and a method
|
|
- Food: A food is something that can be eaten
|
|
- Ingredient: An ingredient combines a quantity and a food
|
|
```
|
|
|
|
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
|
|
```json
|
|
{
|
|
"entities": [
|
|
{
|
|
"entity": "Cornish pasty",
|
|
"type": "Recipe"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Ni Nini Ambo Inazalisha (Triple za RDF):**
|
|
```python
|
|
# 1. Normalize entity name + type to ID (type prevents collisions)
|
|
entity_id = "recipe-cornish-pasty" # normalize("Cornish pasty", "Recipe")
|
|
entity_uri = "https://trustgraph.ai/food/recipe-cornish-pasty"
|
|
|
|
# Note: Same name, different type = different URI
|
|
# "Cornish pasty" (Recipe) → recipe-cornish-pasty
|
|
# "Cornish pasty" (Food) → food-cornish-pasty
|
|
|
|
# 2. Generate triples
|
|
triples = [
|
|
# Type triple
|
|
Triple(
|
|
s=Value(value=entity_uri, is_uri=True),
|
|
p=Value(value="http://www.w3.org/1999/02/22-rdf-syntax-ns#type", is_uri=True),
|
|
o=Value(value="http://purl.org/ontology/fo/Recipe", is_uri=True)
|
|
),
|
|
# Label triple (automatic)
|
|
Triple(
|
|
s=Value(value=entity_uri, is_uri=True),
|
|
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
|
o=Value(value="Cornish pasty", is_uri=False)
|
|
)
|
|
]
|
|
```
|
|
|
|
### Faida
|
|
|
|
1. **LLM haihitaji:**
|
|
Kuelewa sintaksia ya URI
|
|
Kuunda mbele za kitambulisho (`recipe:`, `ingredient:`)
|
|
Kujua kuhusu `rdf:type` au `rdfs:label`
|
|
Kuunda kitambulisho cha mtandao wa maana
|
|
|
|
2. **LLM inahitaji tu:**
|
|
Kupata vitu katika maandishi
|
|
Kuviweka katika madarasa ya ontolojia
|
|
Kuchukua uhusiano na sifa
|
|
|
|
3. **Msimbo hushughulikia:**
|
|
Usanifu na uundaji wa URI
|
|
Uzalishaji wa triple za RDF
|
|
Uwekaji wa kiotomatiki wa lebo
|
|
Usimamizi wa nafasi
|
|
|
|
### Kwa Nini Hii Inafanya Vyema
|
|
|
|
**Swali rahisi** = uchanganyifu mdogo = makosa machache
|
|
**Kitambulisho thabiti** = msimbo udhibiti sheria za usanifu
|
|
**Lebo zilizozalishwa kiotomatiki** = hakuna triple za rdfs:label zilizopotea
|
|
**LLM inazingatia uondoaji** = ambayo ni jambo ambalo inafaa
|
|
|
|
### Mfano: Uhusiano wa Vitu
|
|
|
|
**Maandishi ya Ingizo:**
|
|
```
|
|
Cornish pasty is a traditional British pastry filled with beef and potatoes.
|
|
```
|
|
|
|
**Muundo wa Ontolojia (unaonyeshwa kwa LLM):**
|
|
```markdown
|
|
## Entity Types:
|
|
- Recipe: A recipe is a combination of ingredients and a method
|
|
- Food: A food is something that can be eaten
|
|
- Ingredient: An ingredient combines a quantity and a food
|
|
|
|
## Relationships:
|
|
- has_ingredient: Relates a recipe to an ingredient it uses (Recipe → Ingredient)
|
|
- food: Relates an ingredient to the food that is required (Ingredient → Food)
|
|
```
|
|
|
|
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
|
|
```json
|
|
{
|
|
"entities": [
|
|
{
|
|
"entity": "Cornish pasty",
|
|
"type": "Recipe"
|
|
},
|
|
{
|
|
"entity": "beef",
|
|
"type": "Food"
|
|
},
|
|
{
|
|
"entity": "potatoes",
|
|
"type": "Food"
|
|
}
|
|
],
|
|
"relationships": [
|
|
{
|
|
"subject": "Cornish pasty",
|
|
"subject-type": "Recipe",
|
|
"relation": "has_ingredient",
|
|
"object": "beef",
|
|
"object-type": "Food"
|
|
},
|
|
{
|
|
"subject": "Cornish pasty",
|
|
"subject-type": "Recipe",
|
|
"relation": "has_ingredient",
|
|
"object": "potatoes",
|
|
"object-type": "Food"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Ni Nini Ambo Inazalisha (Triple za RDF):**
|
|
```python
|
|
# Normalize entity names to URIs
|
|
cornish_pasty_uri = "https://trustgraph.ai/food/cornish-pasty"
|
|
beef_uri = "https://trustgraph.ai/food/beef"
|
|
potatoes_uri = "https://trustgraph.ai/food/potatoes"
|
|
|
|
# Look up relation URI from ontology
|
|
has_ingredient_uri = "http://purl.org/ontology/fo/ingredients" # from fo/has_ingredient
|
|
|
|
triples = [
|
|
# Entity type triples (as before)
|
|
Triple(s=cornish_pasty_uri, p=rdf_type, o="http://purl.org/ontology/fo/Recipe"),
|
|
Triple(s=cornish_pasty_uri, p=rdfs_label, o="Cornish pasty"),
|
|
|
|
Triple(s=beef_uri, p=rdf_type, o="http://purl.org/ontology/fo/Food"),
|
|
Triple(s=beef_uri, p=rdfs_label, o="beef"),
|
|
|
|
Triple(s=potatoes_uri, p=rdf_type, o="http://purl.org/ontology/fo/Food"),
|
|
Triple(s=potatoes_uri, p=rdfs_label, o="potatoes"),
|
|
|
|
# Relationship triples
|
|
Triple(
|
|
s=Value(value=cornish_pasty_uri, is_uri=True),
|
|
p=Value(value=has_ingredient_uri, is_uri=True),
|
|
o=Value(value=beef_uri, is_uri=True)
|
|
),
|
|
Triple(
|
|
s=Value(value=cornish_pasty_uri, is_uri=True),
|
|
p=Value(value=has_ingredient_uri, is_uri=True),
|
|
o=Value(value=potatoes_uri, is_uri=True)
|
|
)
|
|
]
|
|
```
|
|
|
|
**Pointi Muhimu:**
|
|
LLM hurudia majina ya vitu katika lugha ya asili: `"Cornish pasty"`, `"beef"`, `"potatoes"`
|
|
LLM hujumuisha aina ili kufafanua: `subject-type`, `object-type`
|
|
LLM hutumia jina la uhusiano kutoka kwa schema: `"has_ingredient"`
|
|
Msimbo hutengeneza vitambulisho vinavyolingana kwa kutumia (jina, aina): `("Cornish pasty", "Recipe")` → `recipe-cornish-pasty`
|
|
Msimbo hutafuta URI ya uhusiano kutoka kwa ontolojia: `fo/has_ingredient` → URI kamili
|
|
Jozi sawa (jina, aina) daima hupata URI sawa (kuondoa marudia)
|
|
|
|
### Mfano: Utambuzi wa Jina la Kitu
|
|
|
|
**Tatizo:** Jina lile lile linaweza kurejelea aina tofauti za vitu.
|
|
|
|
**Mfano halisi:**
|
|
```
|
|
"Cornish pasty" can be:
|
|
- A Recipe (instructions for making it)
|
|
- A Food (the dish itself)
|
|
```
|
|
|
|
**Jinsi Inavyoshughuliwa:**
|
|
|
|
Mfumo wa lugha kubwa (LLM) hurudisha yote kama vitu tofauti:
|
|
```json
|
|
{
|
|
"entities": [
|
|
{"entity": "Cornish pasty", "type": "Recipe"},
|
|
{"entity": "Cornish pasty", "type": "Food"}
|
|
],
|
|
"relationships": [
|
|
{
|
|
"subject": "Cornish pasty",
|
|
"subject-type": "Recipe",
|
|
"relation": "produces",
|
|
"object": "Cornish pasty",
|
|
"object-type": "Food"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Suluhisho la Msimbo:**
|
|
```python
|
|
# Different types → different URIs
|
|
recipe_uri = normalize("Cornish pasty", "Recipe")
|
|
# → "https://trustgraph.ai/food/recipe-cornish-pasty"
|
|
|
|
food_uri = normalize("Cornish pasty", "Food")
|
|
# → "https://trustgraph.ai/food/food-cornish-pasty"
|
|
|
|
# Relationship connects them correctly
|
|
triple = Triple(
|
|
s=recipe_uri, # The Recipe
|
|
p="http://purl.org/ontology/fo/produces",
|
|
o=food_uri # The Food
|
|
)
|
|
```
|
|
|
|
**Kwa Nini Hifanya Kazi:**
|
|
Aina (type) imejumuishwa katika marejeleo yote (vitu, uhusiano, sifa).
|
|
Msimbo hutumia `(name, type)` kama ufunguo wa utafutaji.
|
|
Hakuna ukosefu wa uwazi, hakuna migongano.
|
|
|
|
### Mifano: Sifa za Vitu
|
|
|
|
**Nakala ya Ingizo:**
|
|
```
|
|
This Cornish pasty recipe serves 4-6 people and takes 45 minutes to prepare.
|
|
```
|
|
|
|
**Muundo wa Ontolojia (unaonyeshwa kwa LLM):**
|
|
```markdown
|
|
## Entity Types:
|
|
- Recipe: A recipe is a combination of ingredients and a method
|
|
|
|
## Attributes:
|
|
- serves: Indicates what the recipe is intended to serve (Recipe → text)
|
|
- preparation_time: Time needed to prepare the recipe (Recipe → text)
|
|
```
|
|
|
|
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
|
|
```json
|
|
{
|
|
"entities": [
|
|
{
|
|
"entity": "Cornish pasty recipe",
|
|
"type": "Recipe"
|
|
}
|
|
],
|
|
"attributes": [
|
|
{
|
|
"entity": "Cornish pasty recipe",
|
|
"entity-type": "Recipe",
|
|
"attribute": "serves",
|
|
"value": "4-6 people"
|
|
},
|
|
{
|
|
"entity": "Cornish pasty recipe",
|
|
"entity-type": "Recipe",
|
|
"attribute": "preparation_time",
|
|
"value": "45 minutes"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Ni Nini Ambo Inazalisha (Triple za RDF):**
|
|
```python
|
|
# Normalize entity name to URI
|
|
recipe_uri = "https://trustgraph.ai/food/cornish-pasty-recipe"
|
|
|
|
# Look up attribute URIs from ontology
|
|
serves_uri = "http://purl.org/ontology/fo/serves" # from fo/serves
|
|
prep_time_uri = "http://purl.org/ontology/fo/preparation_time" # from fo/preparation_time
|
|
|
|
triples = [
|
|
# Entity type triple
|
|
Triple(
|
|
s=Value(value=recipe_uri, is_uri=True),
|
|
p=Value(value=rdf_type, is_uri=True),
|
|
o=Value(value="http://purl.org/ontology/fo/Recipe", is_uri=True)
|
|
),
|
|
|
|
# Label triple (automatic)
|
|
Triple(
|
|
s=Value(value=recipe_uri, is_uri=True),
|
|
p=Value(value=rdfs_label, is_uri=True),
|
|
o=Value(value="Cornish pasty recipe", is_uri=False)
|
|
),
|
|
|
|
# Attribute triples (objects are literals, not URIs)
|
|
Triple(
|
|
s=Value(value=recipe_uri, is_uri=True),
|
|
p=Value(value=serves_uri, is_uri=True),
|
|
o=Value(value="4-6 people", is_uri=False) # Literal value!
|
|
),
|
|
Triple(
|
|
s=Value(value=recipe_uri, is_uri=True),
|
|
p=Value(value=prep_time_uri, is_uri=True),
|
|
o=Value(value="45 minutes", is_uri=False) # Literal value!
|
|
)
|
|
]
|
|
```
|
|
|
|
**Pointi Muhimu:**
|
|
LLM huchukua maadili halisi: `"4-6 people"`, `"45 minutes"`
|
|
LLM hujumuisha aina ya kitu ili kuepusha utofauti: `entity-type`
|
|
LLM hutumia jina la sifa kutoka kwa schema: `"serves"`, `"preparation_time"`
|
|
Msimbo hutafuta URI ya sifa kutoka kwa sifa za aina ya ontology
|
|
**Kitu ni halali** (`is_uri=False`), si rejea la URI
|
|
Maadili husalia kama maandishi ya asili, hakuna haja ya urekebishaji
|
|
|
|
**Tofauti na Mahusiano:**
|
|
Mahusiano: kitu cha kwanza na cha pili ni vitu (URIs)
|
|
Sifa: kitu cha kwanza ni kitu (URI), kitu cha pili ni thamani halali (mstari/nambari)
|
|
|
|
### Mfano Kamili: Vitu + Mahusiano + Sifa
|
|
|
|
**Maandishi ya Ingizo:**
|
|
```
|
|
Cornish pasty is a savory pastry filled with beef and potatoes.
|
|
This recipe serves 4 people.
|
|
```
|
|
|
|
**Hili Ni Lile Ambalo Mfumo wa Lugha Kubwa Hurudisha:**
|
|
```json
|
|
{
|
|
"entities": [
|
|
{
|
|
"entity": "Cornish pasty",
|
|
"type": "Recipe"
|
|
},
|
|
{
|
|
"entity": "beef",
|
|
"type": "Food"
|
|
},
|
|
{
|
|
"entity": "potatoes",
|
|
"type": "Food"
|
|
}
|
|
],
|
|
"relationships": [
|
|
{
|
|
"subject": "Cornish pasty",
|
|
"subject-type": "Recipe",
|
|
"relation": "has_ingredient",
|
|
"object": "beef",
|
|
"object-type": "Food"
|
|
},
|
|
{
|
|
"subject": "Cornish pasty",
|
|
"subject-type": "Recipe",
|
|
"relation": "has_ingredient",
|
|
"object": "potatoes",
|
|
"object-type": "Food"
|
|
}
|
|
],
|
|
"attributes": [
|
|
{
|
|
"entity": "Cornish pasty",
|
|
"entity-type": "Recipe",
|
|
"attribute": "serves",
|
|
"value": "4 people"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Matokeo:** Triple 11 za RDF zilizoundwa:
|
|
Triple 3 za aina ya kitu (rdf:type)
|
|
Triple 3 za lebo ya kitu (rdfs:label) - moja kwa moja
|
|
Triple 2 za uhusiano (ina_viungo)
|
|
Triple 1 ya sifa (inafaa)
|
|
|
|
Yote kutoka kwa uundaji rahisi, wa lugha ya asili na mfumo wa akili bandia (LLM)!
|
|
|
|
## Marejeleo
|
|
|
|
Utaratibu wa sasa: `trustgraph-flow/trustgraph/extract/kg/ontology/extract.py`
|
|
Mfumo wa swali: `ontology-prompt.md`
|
|
Majaribio: `tests/unit/test_extract/test_ontology/`
|
|
Ontolojia ya mfano: `e2e/test-data/food.ontology`
|