trustgraph/docs/tech-specs/ontology-extract-phase-2.sw.md
Alex Jenkins 8954fa3ad7 Feat: TrustGraph i18n & Documentation Translation Updates (#781)
Native CLI i18n: The TrustGraph CLI has built-in translation support
that dynamically loads language strings. You can test and use
different languages by simply passing the --lang flag (e.g., --lang
es for Spanish, --lang ru for Russian) or by configuring your
environment's LANG variable.

Automated Docs Translations: This PR introduces autonomously
translated Markdown documentation into several target languages,
including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew,
Arabic, Simplified Chinese, and Russian.
2026-04-14 12:08:32 +01:00

769 lines
24 KiB
Markdown

---
layout: default
title: "Dondoo la Maarifa la Ontolojia - Awamu ya 2 ya Urekebishaji"
parent: "Swahili (Beta)"
---
# Dondoo la Maarifa la Ontolojia - Awamu ya 2 ya Urekebishaji
> **Beta Translation:** This document was translated via Machine Learning and as such may not be 100% accurate. All non-English languages are currently classified as Beta.
**Hali**: Rasimu
**Mwandishi**: Mkutano wa Uchambuzi wa 2025-12-03
**Inahusiana na**: `ontology.md`, `ontorag.md`
## Muhtasari
Hati hii inataja kutofautiana katika mfumo wa sasa wa dondoo la maarifa unaotegemea ontolojia na inapendekeza urekebishaji ili kuboresha utendaji wa LLM na kupunguza upotevu wa habari.
## Utendaji wa Sasa
### Inavyofanya Sasa
1. **Kupakia Ontolojia** (`ontology_loader.py`)
Inapakia JSON ya ontolojia na vitufe kama `"fo/Recipe"`, `"fo/Food"`, `"fo/produces"`
Nambari za darasa zina jalizi la nafasi katika kitufe yenyewe
Mfano kutoka `food.ontology`:
```json
"classes": {
"fo/Recipe": {
"uri": "http://purl.org/ontology/fo/Recipe",
"rdfs:comment": "A Recipe is a combination..."
}
}
```
2. **Uundaji wa Maagizo** (`extract.py:299-307`, `ontology-prompt.md`)
Kiolezo kinapokea dictionaries `classes`, `object_properties`, `datatype_properties`
Kiolezo huchanganua: `{% for class_id, class_def in classes.items() %}`
LLM inaona: `**fo/Recipe**: A Recipe is a combination...`
Muundo wa mfano wa matokeo unaonyesha:
```json
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}
{"subject": "recipe:cornish-pasty", "predicate": "has_ingredient", "object": "ingredient:flour"}
```
3. **Uchambuzi wa Majibu** (`extract.py:382-428`)
Inatarajia safu ya JSON: `[{"subject": "...", "predicate": "...", "object": "..."}]`
Inathibitisha dhidi ya sehemu ya ontolojia
Inapanua URI kupitia `expand_uri()` (extract.py:473-521)
4. **Upanuzi wa URI** (`extract.py:473-521`)
Inangalia ikiwa thamani iko katika kamusi `ontology_subset.classes`
Ikiwa imepatikana, inatoa URI kutoka kwenye ufafanuzi wa darasa
Ikiwa haijapatikana, inaunda URI: `f"https://trustgraph.ai/ontology/{ontology_id}#{value}"`
### Mfano wa Mtiririko wa Data
**Ontolojia ya JSON → Mpakuzi → Ombi:**
```
"fo/Recipe" → classes["fo/Recipe"] → LLM sees "**fo/Recipe**"
```
**LLM → Mfumo wa Uchambuzi → Matokeo:**
```
"Recipe" → not in classes["fo/Recipe"] → constructs URI → LOSES original URI
"fo/Recipe" → found in classes → uses original URI → PRESERVES URI
```
## Matatizo Yaliyobainika
### 1. **Mfano Usiofuata Kanuni katika Maagizo**
**Tatizo**: Kiolezo cha maagizo huonyesha vitambulisho vya darasa na mabainisha (`fo/Recipe`) lakini matokeo ya mfano hutumia majina ya darasa yasiyo na mabainisha (`Recipe`).
**Mahali**: `ontology-prompt.md:5-52`
```markdown
## Ontology Classes:
- **fo/Recipe**: A Recipe is...
## Example Output:
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}
```
**Athari**: Mfumo wa lugha (LLM) hupokea ishara tofauti kuhusu muundo ambao unapaswa kutumika.
### 2. **Upatanishi wa Habari katika Upanuzi wa URI**
**Tatizo**: Wakati LLM hurudisha majina ya darasa ambayo hayana alama ya mbele, kama ilivyoelezwa katika mfano, `expand_uri()` hayawezi kuyakuta katika kamusi ya ontolojia na huunda URI za dharura, na kusababisha kupoteza URI za asili.
**Mahali**: `extract.py:494-500`
```python
if value in ontology_subset.classes: # Looks for "Recipe"
class_def = ontology_subset.classes[value] # But key is "fo/Recipe"
if isinstance(class_def, dict) and 'uri' in class_def:
return class_def['uri'] # Never reached!
return f"https://trustgraph.ai/ontology/{ontology_id}#{value}" # Fallback
```
**Athari**:
URI asili: `http://purl.org/ontology/fo/Recipe`
URI iliyoundwa: `https://trustgraph.ai/ontology/food#Recipe`
Maana ya kielelezo yamepotea, husababisha kutofanya kazi kwa pamoja.
### 3. **Muundo Usio Wazi wa Eneo la Kitu**
**Tatizo**: Hakuna mwongozo wazi kuhusu muundo wa URI ya eneo la kitu.
**Mfano katika maagizo**:
`"recipe:cornish-pasty"` (kielezi kama kielezi)
`"ingredient:flour"` (kielezi tofauti)
**Tabia halisi** (extract.py:517-520):
```python
# Treat as entity instance - construct unique URI
normalized = value.replace(" ", "-").lower()
return f"https://trustgraph.ai/{ontology_id}/{normalized}"
```
**Athari**: Mfumo wa lugha (LLM) lazima ajue mbinu ya kuweka alama (prefixing) bila kuwa na msingi wa elimu (ontology).
### 4. **Hakuna Maelekezo ya Mbele ya Nafasi (Namespace)**
**Tatizo**: Faili ya JSON ya elimu ina maelezo ya nafasi (namespace) (kwa mstari wa 10-25 katika food.ontology):
```json
"namespaces": {
"fo": "http://purl.org/ontology/fo/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
...
}
```
Lakini mistari hii haionyeshwi kwa mfumo wa lugha (LLM). MFUMO WA LUGHA (LLM) haujua:
Maana ya "fo"
Njia gani ya kutumia kwa vitu
Nafasi gani inayotumika kwa vipengele
### 5. **Lebo Ambazo Hazitumiki katika Swali**
**Tatizo**: Kila darasa lina sehemu za `rdfs:label` (k.m., `{"value": "Recipe", "lang": "en-gb"}`), lakini kigezo cha swali haziitumii.
**Hali ya sasa**: Inaonyesha tu `class_id` na `comment`
```jinja
- **{{class_id}}**{% if class_def.comment %}: {{class_def.comment}}{% endif %}
```
**Inapatikana lakini haitumiki:**
```python
"rdfs:label": [{"value": "Recipe", "lang": "en-gb"}]
```
**Athari**: Inaweza kutoa majina ambayo yanaweza kueleweka kwa binadamu pamoja na vitambulisho vya kiufundi.
## Suluhisho Zilizopendekezwa
### Chaguo A: Kuweka Vipengele sawa na Vitambulisho visivyo na Mbele
**Mbinu**: Ondoa mbele kutoka kwa vitambulisho vya darasa kabla ya kuviwasha kwa mfumo wa akili bandia (LLM).
**Mabadiliko**:
1. Badilisha `build_extraction_variables()` ili kubadilisha funguo:
```python
classes_for_prompt = {
k.split('/')[-1]: v # "fo/Recipe" → "Recipe"
for k, v in ontology_subset.classes.items()
}
```
2. Sasisha mfano wa maagizo ili ufanane (tayari hutumia majina yasiyo na alama).
3. Badilisha `expand_uri()` ili iweze kushughulikia aina zote mbili:
```python
# Try exact match first
if value in ontology_subset.classes:
return ontology_subset.classes[value]['uri']
# Try with prefix
for prefix in ['fo/', 'rdf:', 'rdfs:']:
prefixed = f"{prefix}{value}"
if prefixed in ontology_subset.classes:
return ontology_subset.classes[prefixed]['uri']
```
**Faida:**
Safi zaidi, rahisi zaidi kusoma na kuelewa.
Inafanana na mifano iliyopo ya maagizo.
Mifumo ya lugha kubwa (LLMs) hufanya kazi vizuri zaidi na alama (tokens) rahisi.
**Hasara:**
Migongano ya majina ya madarasa ikiwa ontolojia nyingi zina jina sawa la darasa.
Inapoteza habari ya nafasi (namespace).
Inahitaji mantiki ya dharura kwa utafutaji.
### Chaguo B: Tumia Kitambulisho Kamili Chenye Alama (Prefix) kwa Ufanisi
**Mbinu:** Sasisha mifano ili kutumia kitambulisho chenye alama kinacholingana na kile kinachoonyeshwa katika orodha ya madarasa.
**Mabadiliko:**
1. Sasisha mfano wa agizo (ontology-prompt.md:46-52):
```json
[
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "fo/Recipe"},
{"subject": "recipe:cornish-pasty", "predicate": "rdfs:label", "object": "Cornish Pasty"},
{"subject": "recipe:cornish-pasty", "predicate": "fo/produces", "object": "food:cornish-pasty"},
{"subject": "food:cornish-pasty", "predicate": "rdf:type", "object": "fo/Food"}
]
```
2. Ongeza maelezo ya nafasi ya kazi kwenye swali:
```markdown
## Namespace Prefixes:
- **fo/**: Food Ontology (http://purl.org/ontology/fo/)
- **rdf:**: RDF Schema
- **rdfs:**: RDF Schema
Use these prefixes exactly as shown when referencing classes and properties.
```
3. Acha `expand_uri()` kama ilivyo (hufanya kazi vizuri wakati mechi zinapopatikana).
**Faida:**
Ulinganisho kati ya ingizo na pato.
Hakuna upotevu wa habari.
Inahifadhi maana ya nafasi (namespace).
Inafanya kazi na ontolojia nyingi.
**Hasara:**
Alama (tokens) zaidi kwa LLM.
Inahitaji LLM kufuatilia alama za mbele (prefixes).
### Chaguo C: Mchanganyiko - Onyesha Lebo na Kitambulisho (ID)
**Mbinu:** Ongeza maagizo katika swali ili kuonyesha lebo zinazoweza kusomwa na binadamu na kitambulisho (ID) cha kiufundi.
**Mabadiliko:**
1. Sasisha mfumo wa swali:
```jinja
{% for class_id, class_def in classes.items() %}
- **{{class_id}}** (label: "{{class_def.labels[0].value if class_def.labels else class_id}}"){% if class_def.comment %}: {{class_def.comment}}{% endif %}
{% endfor %}
```
Matokeo ya mfano:
```markdown
- **fo/Recipe** (label: "Recipe"): A Recipe is a combination...
```
2. Maelekezo ya sasisho:
```markdown
When referencing classes:
- Use the full prefixed ID (e.g., "fo/Recipe") in JSON output
- The label (e.g., "Recipe") is for human understanding only
```
**Faida:**
Inafaa zaidi kwa mifumo ya lugha kubwa (LLM).
Inahifadhi habari yote.
Inaeleza wazi ni nini kinachotakiwa kutumika.
**Hasara:**
Ombi refu zaidi.
Mfumo mgumu zaidi.
## Njia Iliyotekelezwa
**Muundo Ulioboreshwa wa Muhusiano wa Vitu na Sifa** - unaibadilisha kabisa mfumo wa zamani unaotegemea triplet.
Njia mpya ilichaguliwa kwa sababu:
1. **Hakuna Upotevu wa Habari:** Anwani za mtandaoni (URIs) za awali zinaendelea kuhifadhiwa kwa usahihi.
2. **Mantiki Rahisi:** Hakuna mabadiliko yanayohitajika, utafutaji wa moja kwa moja wa kamusi unafanya kazi.
3. **Usalama wa Nafasi:** Inashughulikia ontolojia nyingi bila migongano.
4. **Ukweli wa Kisia:** Inahifadhi maana ya RDF/OWL.
## Utendaji Uliofanyika
### Kilichojengwa:
1. **Mfumo Mpya wa Ombi** (`prompts/ontology-extract-v2.txt`)
✅ Sehemu zilizoelezwa wazi: Aina za Vitu, Mahusiano, Sifa.
✅ Mfano unaotumia kitambulisho kamili cha aina (`fo/Recipe`, `fo/has_ingredient`).
✅ Maelekezo ya kutumia kitambulisho halisi kutoka kwa schema.
✅ Muundo mpya wa JSON na safu za vitu/mahusiano/sifa.
2. **Urekebishaji wa Vitu** (`entity_normalizer.py`)
✅ `normalize_entity_name()` - Inabadilisha majina kuwa muundo salama wa URI.
✅ `normalize_type_identifier()` - Inashughulikia alama za upande katika aina (`fo/Recipe` → `fo-recipe`).
✅ `build_entity_uri()` - Inaunda anwani za kipekee (URIs) kwa kutumia jozi (jina, aina).
✅ `EntityRegistry` - Inafuatilia vitu ili kuepuka marudia.
3. **Mchangamizi wa JSON** (`simplified_parser.py`)
✅ Inachanganua muundo mpya: `{entities: [...], relationships: [...], attributes: [...]}`.
✅ Inasaidia majina ya sehemu katika muundo wa kebab na muundo wa nyoka.
✅ Inarudisha madarasa ya data iliyopangwa.
✅ Usimamizi wa makosa kwa njia nzuri pamoja na uandishi wa matukio.
4. **Mabadilishaji wa Triplet** (`triple_converter.py`)
✅ `convert_entity()` - Inaunda triplet za aina + lebo moja kwa moja.
✅ `convert_relationship()` - Inaunganisha anwani za vitu (URIs) kupitia sifa.
✅ `convert_attribute()` - Inaongeza maadili ya moja kwa moja.
✅ Inatafuta anwani kamili kutoka kwa maelezo ya ontolojia.
5. **Mchakato Mkuu Uliosasishwa** (`extract.py`)
✅ Imeondoa msimbo wa zamani wa uondoaji wa triplet.
✅ Imeongeza `extract_with_simplified_format()`.
✅ Sasa inatumia tu muundo uliorahisishwa.
✅ Inaitisha ombi na kitambulisho `extract-with-ontologies-v2`.
## Majaribio
### Jaribio la 1: Uhifadhi wa URI
```python
# Given ontology class
classes = {"fo/Recipe": {"uri": "http://purl.org/ontology/fo/Recipe", ...}}
# When LLM returns
llm_output = {"subject": "x", "predicate": "rdf:type", "object": "fo/Recipe"}
# Then expanded URI should be
assert expanded == "http://purl.org/ontology/fo/Recipe"
# Not: "https://trustgraph.ai/ontology/food#Recipe"
```
### Mtihani wa 2: Mzozo wa Ontolojia Nyingi
```python
# Given two ontologies
ont1 = {"fo/Recipe": {...}}
ont2 = {"cooking/Recipe": {...}}
# LLM should use full prefix to disambiguate
llm_output = {"object": "fo/Recipe"} # Not just "Recipe"
```
### Mtihani wa 3: Muundo wa Eneo la Mfano
```python
# Given prompt with food ontology
# LLM should create instances like
{"subject": "recipe:cornish-pasty"} # Namespace-style
{"subject": "food:beef"} # Consistent prefix
```
## Maswali ya Kufungua
1. **Je, vipozi vya mifano ya vitu vinapaswa kutumia mbele za nafasi?**
Sasa: `"recipe:cornish-pasty"` (ya hiari)
Mbadala: Je, kutumia mbele ya ontolojia `"fo:cornish-pasty"`?
Mbadala: Hakuna mbele, kupanua katika URI `"cornish-pasty"` → URI kamili?
2. **Jinsi ya kushughulikia uwanja/jukumu katika swali?**
Kwa sasa inaonyesha: `(Recipe → Food)`
Je, inapaswa kuwa: `(fo/Recipe → fo/Food)`?
3. **Je, tunapaswa kuthibitisha vikwazo vya uwanja/jukumu?**
TODO maoni katika extract.py:470
Itakamata makosa zaidi lakini ni ngumu zaidi
4. **Hebu kuhusu sifa za kinyume na usawa?**
Ontolojia ina `owl:inverseOf`, `owl:equivalentClass`
Hasa haitumiki katika uondoaji
Je, inapaswa kutumika?
## Viashiria vya Mafanikio
✅ Hakuna upotevu wa habari ya URI (uhifadhi wa 100% wa URI za awali)
✅ Muundo wa pato la LLM unalingana na muundo wa ingizo
✅ Hakuna mifano ya kusumbua katika swali
✅ Vipimo hufanikiwa na ontolojia nyingi
✅ Ubora wa uondoaji ulioboreshwa (uliofanywa na asilimia ya triple halali)
## Mbinu Mbadala: Muundo Ulioboreshwa wa Uondoaji
### Falsafa
Badala ya kuuliza LLM kuelewa maana ya RDF/OWL, waulize ifanye kile ambacho ni nzuri: **kutafuta vitu na uhusiano katika maandishi**.
Acha msimbo kushughulikia uundaji wa URI, ubadilishaji wa RDF, na mambo rasmi ya wavuti ya kiakili.
### Mfano: Uainishaji wa Vitu
**Maandishi ya Ingizo:**
```
Cornish pasty is a traditional British pastry filled with meat and vegetables.
```
**Muundo wa Ontolojia (unaonyeshwa kwa mfumo wa lugha kubwa):**
```markdown
## Entity Types:
- Recipe: A recipe is a combination of ingredients and a method
- Food: A food is something that can be eaten
- Ingredient: An ingredient combines a quantity and a food
```
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
```json
{
"entities": [
{
"entity": "Cornish pasty",
"type": "Recipe"
}
]
}
```
**Ni Nini Ambo Inazalisha (Triple za RDF):**
```python
# 1. Normalize entity name + type to ID (type prevents collisions)
entity_id = "recipe-cornish-pasty" # normalize("Cornish pasty", "Recipe")
entity_uri = "https://trustgraph.ai/food/recipe-cornish-pasty"
# Note: Same name, different type = different URI
# "Cornish pasty" (Recipe) → recipe-cornish-pasty
# "Cornish pasty" (Food) → food-cornish-pasty
# 2. Generate triples
triples = [
# Type triple
Triple(
s=Value(value=entity_uri, is_uri=True),
p=Value(value="http://www.w3.org/1999/02/22-rdf-syntax-ns#type", is_uri=True),
o=Value(value="http://purl.org/ontology/fo/Recipe", is_uri=True)
),
# Label triple (automatic)
Triple(
s=Value(value=entity_uri, is_uri=True),
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
o=Value(value="Cornish pasty", is_uri=False)
)
]
```
### Faida
1. **LLM haihitaji:**
Kuelewa sintaksia ya URI
Kuunda mbele za kitambulisho (`recipe:`, `ingredient:`)
Kujua kuhusu `rdf:type` au `rdfs:label`
Kuunda kitambulisho cha mtandao wa maana
2. **LLM inahitaji tu:**
Kupata vitu katika maandishi
Kuviweka katika madarasa ya ontolojia
Kuchukua uhusiano na sifa
3. **Msimbo hushughulikia:**
Usanifu na uundaji wa URI
Uzalishaji wa triple za RDF
Uwekaji wa kiotomatiki wa lebo
Usimamizi wa nafasi
### Kwa Nini Hii Inafanya Vyema
**Swali rahisi** = uchanganyifu mdogo = makosa machache
**Kitambulisho thabiti** = msimbo udhibiti sheria za usanifu
**Lebo zilizozalishwa kiotomatiki** = hakuna triple za rdfs:label zilizopotea
**LLM inazingatia uondoaji** = ambayo ni jambo ambalo inafaa
### Mfano: Uhusiano wa Vitu
**Maandishi ya Ingizo:**
```
Cornish pasty is a traditional British pastry filled with beef and potatoes.
```
**Muundo wa Ontolojia (unaonyeshwa kwa LLM):**
```markdown
## Entity Types:
- Recipe: A recipe is a combination of ingredients and a method
- Food: A food is something that can be eaten
- Ingredient: An ingredient combines a quantity and a food
## Relationships:
- has_ingredient: Relates a recipe to an ingredient it uses (Recipe → Ingredient)
- food: Relates an ingredient to the food that is required (Ingredient → Food)
```
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
```json
{
"entities": [
{
"entity": "Cornish pasty",
"type": "Recipe"
},
{
"entity": "beef",
"type": "Food"
},
{
"entity": "potatoes",
"type": "Food"
}
],
"relationships": [
{
"subject": "Cornish pasty",
"subject-type": "Recipe",
"relation": "has_ingredient",
"object": "beef",
"object-type": "Food"
},
{
"subject": "Cornish pasty",
"subject-type": "Recipe",
"relation": "has_ingredient",
"object": "potatoes",
"object-type": "Food"
}
]
}
```
**Ni Nini Ambo Inazalisha (Triple za RDF):**
```python
# Normalize entity names to URIs
cornish_pasty_uri = "https://trustgraph.ai/food/cornish-pasty"
beef_uri = "https://trustgraph.ai/food/beef"
potatoes_uri = "https://trustgraph.ai/food/potatoes"
# Look up relation URI from ontology
has_ingredient_uri = "http://purl.org/ontology/fo/ingredients" # from fo/has_ingredient
triples = [
# Entity type triples (as before)
Triple(s=cornish_pasty_uri, p=rdf_type, o="http://purl.org/ontology/fo/Recipe"),
Triple(s=cornish_pasty_uri, p=rdfs_label, o="Cornish pasty"),
Triple(s=beef_uri, p=rdf_type, o="http://purl.org/ontology/fo/Food"),
Triple(s=beef_uri, p=rdfs_label, o="beef"),
Triple(s=potatoes_uri, p=rdf_type, o="http://purl.org/ontology/fo/Food"),
Triple(s=potatoes_uri, p=rdfs_label, o="potatoes"),
# Relationship triples
Triple(
s=Value(value=cornish_pasty_uri, is_uri=True),
p=Value(value=has_ingredient_uri, is_uri=True),
o=Value(value=beef_uri, is_uri=True)
),
Triple(
s=Value(value=cornish_pasty_uri, is_uri=True),
p=Value(value=has_ingredient_uri, is_uri=True),
o=Value(value=potatoes_uri, is_uri=True)
)
]
```
**Pointi Muhimu:**
LLM hurudia majina ya vitu katika lugha ya asili: `"Cornish pasty"`, `"beef"`, `"potatoes"`
LLM hujumuisha aina ili kufafanua: `subject-type`, `object-type`
LLM hutumia jina la uhusiano kutoka kwa schema: `"has_ingredient"`
Msimbo hutengeneza vitambulisho vinavyolingana kwa kutumia (jina, aina): `("Cornish pasty", "Recipe")` → `recipe-cornish-pasty`
Msimbo hutafuta URI ya uhusiano kutoka kwa ontolojia: `fo/has_ingredient` → URI kamili
Jozi sawa (jina, aina) daima hupata URI sawa (kuondoa marudia)
### Mfano: Utambuzi wa Jina la Kitu
**Tatizo:** Jina lile lile linaweza kurejelea aina tofauti za vitu.
**Mfano halisi:**
```
"Cornish pasty" can be:
- A Recipe (instructions for making it)
- A Food (the dish itself)
```
**Jinsi Inavyoshughuliwa:**
Mfumo wa lugha kubwa (LLM) hurudisha yote kama vitu tofauti:
```json
{
"entities": [
{"entity": "Cornish pasty", "type": "Recipe"},
{"entity": "Cornish pasty", "type": "Food"}
],
"relationships": [
{
"subject": "Cornish pasty",
"subject-type": "Recipe",
"relation": "produces",
"object": "Cornish pasty",
"object-type": "Food"
}
]
}
```
**Suluhisho la Msimbo:**
```python
# Different types → different URIs
recipe_uri = normalize("Cornish pasty", "Recipe")
# → "https://trustgraph.ai/food/recipe-cornish-pasty"
food_uri = normalize("Cornish pasty", "Food")
# → "https://trustgraph.ai/food/food-cornish-pasty"
# Relationship connects them correctly
triple = Triple(
s=recipe_uri, # The Recipe
p="http://purl.org/ontology/fo/produces",
o=food_uri # The Food
)
```
**Kwa Nini Hifanya Kazi:**
Aina (type) imejumuishwa katika marejeleo yote (vitu, uhusiano, sifa).
Msimbo hutumia `(name, type)` kama ufunguo wa utafutaji.
Hakuna ukosefu wa uwazi, hakuna migongano.
### Mifano: Sifa za Vitu
**Nakala ya Ingizo:**
```
This Cornish pasty recipe serves 4-6 people and takes 45 minutes to prepare.
```
**Muundo wa Ontolojia (unaonyeshwa kwa LLM):**
```markdown
## Entity Types:
- Recipe: A recipe is a combination of ingredients and a method
## Attributes:
- serves: Indicates what the recipe is intended to serve (Recipe → text)
- preparation_time: Time needed to prepare the recipe (Recipe → text)
```
**Kinachorudishwa na Mfumo wa Lugha Kubwa (JSON Rahisi):**
```json
{
"entities": [
{
"entity": "Cornish pasty recipe",
"type": "Recipe"
}
],
"attributes": [
{
"entity": "Cornish pasty recipe",
"entity-type": "Recipe",
"attribute": "serves",
"value": "4-6 people"
},
{
"entity": "Cornish pasty recipe",
"entity-type": "Recipe",
"attribute": "preparation_time",
"value": "45 minutes"
}
]
}
```
**Ni Nini Ambo Inazalisha (Triple za RDF):**
```python
# Normalize entity name to URI
recipe_uri = "https://trustgraph.ai/food/cornish-pasty-recipe"
# Look up attribute URIs from ontology
serves_uri = "http://purl.org/ontology/fo/serves" # from fo/serves
prep_time_uri = "http://purl.org/ontology/fo/preparation_time" # from fo/preparation_time
triples = [
# Entity type triple
Triple(
s=Value(value=recipe_uri, is_uri=True),
p=Value(value=rdf_type, is_uri=True),
o=Value(value="http://purl.org/ontology/fo/Recipe", is_uri=True)
),
# Label triple (automatic)
Triple(
s=Value(value=recipe_uri, is_uri=True),
p=Value(value=rdfs_label, is_uri=True),
o=Value(value="Cornish pasty recipe", is_uri=False)
),
# Attribute triples (objects are literals, not URIs)
Triple(
s=Value(value=recipe_uri, is_uri=True),
p=Value(value=serves_uri, is_uri=True),
o=Value(value="4-6 people", is_uri=False) # Literal value!
),
Triple(
s=Value(value=recipe_uri, is_uri=True),
p=Value(value=prep_time_uri, is_uri=True),
o=Value(value="45 minutes", is_uri=False) # Literal value!
)
]
```
**Pointi Muhimu:**
LLM huchukua maadili halisi: `"4-6 people"`, `"45 minutes"`
LLM hujumuisha aina ya kitu ili kuepusha utofauti: `entity-type`
LLM hutumia jina la sifa kutoka kwa schema: `"serves"`, `"preparation_time"`
Msimbo hutafuta URI ya sifa kutoka kwa sifa za aina ya ontology
**Kitu ni halali** (`is_uri=False`), si rejea la URI
Maadili husalia kama maandishi ya asili, hakuna haja ya urekebishaji
**Tofauti na Mahusiano:**
Mahusiano: kitu cha kwanza na cha pili ni vitu (URIs)
Sifa: kitu cha kwanza ni kitu (URI), kitu cha pili ni thamani halali (mstari/nambari)
### Mfano Kamili: Vitu + Mahusiano + Sifa
**Maandishi ya Ingizo:**
```
Cornish pasty is a savory pastry filled with beef and potatoes.
This recipe serves 4 people.
```
**Hili Ni Lile Ambalo Mfumo wa Lugha Kubwa Hurudisha:**
```json
{
"entities": [
{
"entity": "Cornish pasty",
"type": "Recipe"
},
{
"entity": "beef",
"type": "Food"
},
{
"entity": "potatoes",
"type": "Food"
}
],
"relationships": [
{
"subject": "Cornish pasty",
"subject-type": "Recipe",
"relation": "has_ingredient",
"object": "beef",
"object-type": "Food"
},
{
"subject": "Cornish pasty",
"subject-type": "Recipe",
"relation": "has_ingredient",
"object": "potatoes",
"object-type": "Food"
}
],
"attributes": [
{
"entity": "Cornish pasty",
"entity-type": "Recipe",
"attribute": "serves",
"value": "4 people"
}
]
}
```
**Matokeo:** Triple 11 za RDF zilizoundwa:
Triple 3 za aina ya kitu (rdf:type)
Triple 3 za lebo ya kitu (rdfs:label) - moja kwa moja
Triple 2 za uhusiano (ina_viungo)
Triple 1 ya sifa (inafaa)
Yote kutoka kwa uundaji rahisi, wa lugha ya asili na mfumo wa akili bandia (LLM)!
## Marejeleo
Utaratibu wa sasa: `trustgraph-flow/trustgraph/extract/kg/ontology/extract.py`
Mfumo wa swali: `ontology-prompt.md`
Majaribio: `tests/unit/test_extract/test_ontology/`
Ontolojia ya mfano: `e2e/test-data/food.ontology`