mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
Feature/improve ontology extract (#576)
* Tech spec to change ontology extraction * Ontology extract refactoring
This commit is contained in:
parent
517434c075
commit
b957004db9
6 changed files with 1496 additions and 19 deletions
761
docs/tech-specs/ontology-extract-phase-2.md
Normal file
761
docs/tech-specs/ontology-extract-phase-2.md
Normal file
|
|
@ -0,0 +1,761 @@
|
||||||
|
# Ontology Knowledge Extraction - Phase 2 Refactor
|
||||||
|
|
||||||
|
**Status**: Draft
|
||||||
|
**Author**: Analysis Session 2025-12-03
|
||||||
|
**Related**: `ontology.md`, `ontorag.md`
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This document identifies inconsistencies in the current ontology-based knowledge extraction system and proposes a refactor to improve LLM performance and reduce information loss.
|
||||||
|
|
||||||
|
## Current Implementation
|
||||||
|
|
||||||
|
### How It Works Now
|
||||||
|
|
||||||
|
1. **Ontology Loading** (`ontology_loader.py`)
|
||||||
|
- Loads ontology JSON with keys like `"fo/Recipe"`, `"fo/Food"`, `"fo/produces"`
|
||||||
|
- Class IDs include namespace prefix in the key itself
|
||||||
|
- Example from `food.ontology`:
|
||||||
|
```json
|
||||||
|
"classes": {
|
||||||
|
"fo/Recipe": {
|
||||||
|
"uri": "http://purl.org/ontology/fo/Recipe",
|
||||||
|
"rdfs:comment": "A Recipe is a combination..."
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Prompt Construction** (`extract.py:299-307`, `ontology-prompt.md`)
|
||||||
|
- Template receives `classes`, `object_properties`, `datatype_properties` dicts
|
||||||
|
- Template iterates: `{% for class_id, class_def in classes.items() %}`
|
||||||
|
- LLM sees: `**fo/Recipe**: A Recipe is a combination...`
|
||||||
|
- Example output format shows:
|
||||||
|
```json
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "has_ingredient", "object": "ingredient:flour"}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Response Parsing** (`extract.py:382-428`)
|
||||||
|
- Expects JSON array: `[{"subject": "...", "predicate": "...", "object": "..."}]`
|
||||||
|
- Validates against ontology subset
|
||||||
|
- Expands URIs via `expand_uri()` (extract.py:473-521)
|
||||||
|
|
||||||
|
4. **URI Expansion** (`extract.py:473-521`)
|
||||||
|
- Checks if value is in `ontology_subset.classes` dict
|
||||||
|
- If found, extracts URI from class definition
|
||||||
|
- If not found, constructs URI: `f"https://trustgraph.ai/ontology/{ontology_id}#{value}"`
|
||||||
|
|
||||||
|
### Data Flow Example
|
||||||
|
|
||||||
|
**Ontology JSON → Loader → Prompt:**
|
||||||
|
```
|
||||||
|
"fo/Recipe" → classes["fo/Recipe"] → LLM sees "**fo/Recipe**"
|
||||||
|
```
|
||||||
|
|
||||||
|
**LLM → Parser → Output:**
|
||||||
|
```
|
||||||
|
"Recipe" → not in classes["fo/Recipe"] → constructs URI → LOSES original URI
|
||||||
|
"fo/Recipe" → found in classes → uses original URI → PRESERVES URI
|
||||||
|
```
|
||||||
|
|
||||||
|
## Problems Identified
|
||||||
|
|
||||||
|
### 1. **Inconsistent Examples in Prompt**
|
||||||
|
|
||||||
|
**Issue**: The prompt template shows class IDs with prefixes (`fo/Recipe`) but the example output uses unprefixed class names (`Recipe`).
|
||||||
|
|
||||||
|
**Location**: `ontology-prompt.md:5-52`
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Ontology Classes:
|
||||||
|
- **fo/Recipe**: A Recipe is...
|
||||||
|
|
||||||
|
## Example Output:
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Impact**: LLM receives conflicting signals about what format to use.
|
||||||
|
|
||||||
|
### 2. **Information Loss in URI Expansion**
|
||||||
|
|
||||||
|
**Issue**: When LLM returns unprefixed class names following the example, `expand_uri()` can't find them in the ontology dict and constructs fallback URIs, losing the original proper URIs.
|
||||||
|
|
||||||
|
**Location**: `extract.py:494-500`
|
||||||
|
|
||||||
|
```python
|
||||||
|
if value in ontology_subset.classes: # Looks for "Recipe"
|
||||||
|
class_def = ontology_subset.classes[value] # But key is "fo/Recipe"
|
||||||
|
if isinstance(class_def, dict) and 'uri' in class_def:
|
||||||
|
return class_def['uri'] # Never reached!
|
||||||
|
return f"https://trustgraph.ai/ontology/{ontology_id}#{value}" # Fallback
|
||||||
|
```
|
||||||
|
|
||||||
|
**Impact**:
|
||||||
|
- Original URI: `http://purl.org/ontology/fo/Recipe`
|
||||||
|
- Constructed URI: `https://trustgraph.ai/ontology/food#Recipe`
|
||||||
|
- Semantic meaning lost, breaks interoperability
|
||||||
|
|
||||||
|
### 3. **Ambiguous Entity Instance Format**
|
||||||
|
|
||||||
|
**Issue**: No clear guidance on entity instance URI format.
|
||||||
|
|
||||||
|
**Examples in prompt**:
|
||||||
|
- `"recipe:cornish-pasty"` (namespace-like prefix)
|
||||||
|
- `"ingredient:flour"` (different prefix)
|
||||||
|
|
||||||
|
**Actual behavior** (extract.py:517-520):
|
||||||
|
```python
|
||||||
|
# Treat as entity instance - construct unique URI
|
||||||
|
normalized = value.replace(" ", "-").lower()
|
||||||
|
return f"https://trustgraph.ai/{ontology_id}/{normalized}"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Impact**: LLM must guess prefixing convention with no ontology context.
|
||||||
|
|
||||||
|
### 4. **No Namespace Prefix Guidance**
|
||||||
|
|
||||||
|
**Issue**: The ontology JSON contains namespace definitions (line 10-25 in food.ontology):
|
||||||
|
```json
|
||||||
|
"namespaces": {
|
||||||
|
"fo": "http://purl.org/ontology/fo/",
|
||||||
|
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
But these are never surfaced to the LLM. The LLM doesn't know:
|
||||||
|
- What "fo" means
|
||||||
|
- What prefix to use for entities
|
||||||
|
- Which namespace applies to which elements
|
||||||
|
|
||||||
|
### 5. **Labels Not Used in Prompt**
|
||||||
|
|
||||||
|
**Issue**: Every class has `rdfs:label` fields (e.g., `{"value": "Recipe", "lang": "en-gb"}`), but the prompt template doesn't use them.
|
||||||
|
|
||||||
|
**Current**: Shows only `class_id` and `comment`
|
||||||
|
```jinja
|
||||||
|
- **{{class_id}}**{% if class_def.comment %}: {{class_def.comment}}{% endif %}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Available but unused**:
|
||||||
|
```python
|
||||||
|
"rdfs:label": [{"value": "Recipe", "lang": "en-gb"}]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Impact**: Could provide human-readable names alongside technical IDs.
|
||||||
|
|
||||||
|
## Proposed Solutions
|
||||||
|
|
||||||
|
### Option A: Normalize to Unprefixed IDs
|
||||||
|
|
||||||
|
**Approach**: Strip prefixes from class IDs before showing to LLM.
|
||||||
|
|
||||||
|
**Changes**:
|
||||||
|
1. Modify `build_extraction_variables()` to transform keys:
|
||||||
|
```python
|
||||||
|
classes_for_prompt = {
|
||||||
|
k.split('/')[-1]: v # "fo/Recipe" → "Recipe"
|
||||||
|
for k, v in ontology_subset.classes.items()
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Update prompt example to match (already uses unprefixed names)
|
||||||
|
|
||||||
|
3. Modify `expand_uri()` to handle both formats:
|
||||||
|
```python
|
||||||
|
# Try exact match first
|
||||||
|
if value in ontology_subset.classes:
|
||||||
|
return ontology_subset.classes[value]['uri']
|
||||||
|
|
||||||
|
# Try with prefix
|
||||||
|
for prefix in ['fo/', 'rdf:', 'rdfs:']:
|
||||||
|
prefixed = f"{prefix}{value}"
|
||||||
|
if prefixed in ontology_subset.classes:
|
||||||
|
return ontology_subset.classes[prefixed]['uri']
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pros**:
|
||||||
|
- Cleaner, more human-readable
|
||||||
|
- Matches existing prompt examples
|
||||||
|
- LLMs work better with simpler tokens
|
||||||
|
|
||||||
|
**Cons**:
|
||||||
|
- Class name collisions if multiple ontologies have same class name
|
||||||
|
- Loses namespace information
|
||||||
|
- Requires fallback logic for lookups
|
||||||
|
|
||||||
|
### Option B: Use Full Prefixed IDs Consistently
|
||||||
|
|
||||||
|
**Approach**: Update examples to use prefixed IDs matching what's shown in the class list.
|
||||||
|
|
||||||
|
**Changes**:
|
||||||
|
1. Update prompt example (ontology-prompt.md:46-52):
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "fo/Recipe"},
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "rdfs:label", "object": "Cornish Pasty"},
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "fo/produces", "object": "food:cornish-pasty"},
|
||||||
|
{"subject": "food:cornish-pasty", "predicate": "rdf:type", "object": "fo/Food"}
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Add namespace explanation to prompt:
|
||||||
|
```markdown
|
||||||
|
## Namespace Prefixes:
|
||||||
|
- **fo/**: Food Ontology (http://purl.org/ontology/fo/)
|
||||||
|
- **rdf:**: RDF Schema
|
||||||
|
- **rdfs:**: RDF Schema
|
||||||
|
|
||||||
|
Use these prefixes exactly as shown when referencing classes and properties.
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Keep `expand_uri()` as-is (works correctly when matches found)
|
||||||
|
|
||||||
|
**Pros**:
|
||||||
|
- Input = Output consistency
|
||||||
|
- No information loss
|
||||||
|
- Preserves namespace semantics
|
||||||
|
- Works with multiple ontologies
|
||||||
|
|
||||||
|
**Cons**:
|
||||||
|
- More verbose tokens for LLM
|
||||||
|
- Requires LLM to track prefixes
|
||||||
|
|
||||||
|
### Option C: Hybrid - Show Both Label and ID
|
||||||
|
|
||||||
|
**Approach**: Enhance prompt to show both human-readable labels and technical IDs.
|
||||||
|
|
||||||
|
**Changes**:
|
||||||
|
1. Update prompt template:
|
||||||
|
```jinja
|
||||||
|
{% for class_id, class_def in classes.items() %}
|
||||||
|
- **{{class_id}}** (label: "{{class_def.labels[0].value if class_def.labels else class_id}}"){% if class_def.comment %}: {{class_def.comment}}{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
```
|
||||||
|
|
||||||
|
Example output:
|
||||||
|
```markdown
|
||||||
|
- **fo/Recipe** (label: "Recipe"): A Recipe is a combination...
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Update instructions:
|
||||||
|
```markdown
|
||||||
|
When referencing classes:
|
||||||
|
- Use the full prefixed ID (e.g., "fo/Recipe") in JSON output
|
||||||
|
- The label (e.g., "Recipe") is for human understanding only
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pros**:
|
||||||
|
- Clearest for LLM
|
||||||
|
- Preserves all information
|
||||||
|
- Explicit about what to use
|
||||||
|
|
||||||
|
**Cons**:
|
||||||
|
- Longer prompt
|
||||||
|
- More complex template
|
||||||
|
|
||||||
|
## Implemented Approach
|
||||||
|
|
||||||
|
**Simplified Entity-Relationship-Attribute Format** - completely replaces the old triple-based format.
|
||||||
|
|
||||||
|
The new approach was chosen because:
|
||||||
|
|
||||||
|
1. **No Information Loss**: Original URIs preserved correctly
|
||||||
|
2. **Simpler Logic**: No transformation needed, direct dict lookups work
|
||||||
|
3. **Namespace Safety**: Handles multiple ontologies without collisions
|
||||||
|
4. **Semantic Correctness**: Maintains RDF/OWL semantics
|
||||||
|
|
||||||
|
## Implementation Complete
|
||||||
|
|
||||||
|
### What Was Built:
|
||||||
|
|
||||||
|
1. **New Prompt Template** (`prompts/ontology-extract-v2.txt`)
|
||||||
|
- ✅ Clear sections: Entity Types, Relationships, Attributes
|
||||||
|
- ✅ Example using full type identifiers (`fo/Recipe`, `fo/has_ingredient`)
|
||||||
|
- ✅ Instructions to use exact identifiers from schema
|
||||||
|
- ✅ New JSON format with entities/relationships/attributes arrays
|
||||||
|
|
||||||
|
2. **Entity Normalization** (`entity_normalizer.py`)
|
||||||
|
- ✅ `normalize_entity_name()` - Converts names to URI-safe format
|
||||||
|
- ✅ `normalize_type_identifier()` - Handles slashes in types (`fo/Recipe` → `fo-recipe`)
|
||||||
|
- ✅ `build_entity_uri()` - Creates unique URIs using (name, type) tuple
|
||||||
|
- ✅ `EntityRegistry` - Tracks entities for deduplication
|
||||||
|
|
||||||
|
3. **JSON Parser** (`simplified_parser.py`)
|
||||||
|
- ✅ Parses new format: `{entities: [...], relationships: [...], attributes: [...]}`
|
||||||
|
- ✅ Supports kebab-case and snake_case field names
|
||||||
|
- ✅ Returns structured dataclasses
|
||||||
|
- ✅ Graceful error handling with logging
|
||||||
|
|
||||||
|
4. **Triple Converter** (`triple_converter.py`)
|
||||||
|
- ✅ `convert_entity()` - Generates type + label triples automatically
|
||||||
|
- ✅ `convert_relationship()` - Connects entity URIs via properties
|
||||||
|
- ✅ `convert_attribute()` - Adds literal values
|
||||||
|
- ✅ Looks up full URIs from ontology definitions
|
||||||
|
|
||||||
|
5. **Updated Main Processor** (`extract.py`)
|
||||||
|
- ✅ Removed old triple-based extraction code
|
||||||
|
- ✅ Added `extract_with_simplified_format()` method
|
||||||
|
- ✅ Now exclusively uses new simplified format
|
||||||
|
- ✅ Calls prompt with `extract-with-ontologies-v2` ID
|
||||||
|
|
||||||
|
## Test Cases
|
||||||
|
|
||||||
|
### Test 1: URI Preservation
|
||||||
|
```python
|
||||||
|
# Given ontology class
|
||||||
|
classes = {"fo/Recipe": {"uri": "http://purl.org/ontology/fo/Recipe", ...}}
|
||||||
|
|
||||||
|
# When LLM returns
|
||||||
|
llm_output = {"subject": "x", "predicate": "rdf:type", "object": "fo/Recipe"}
|
||||||
|
|
||||||
|
# Then expanded URI should be
|
||||||
|
assert expanded == "http://purl.org/ontology/fo/Recipe"
|
||||||
|
# Not: "https://trustgraph.ai/ontology/food#Recipe"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test 2: Multi-Ontology Collision
|
||||||
|
```python
|
||||||
|
# Given two ontologies
|
||||||
|
ont1 = {"fo/Recipe": {...}}
|
||||||
|
ont2 = {"cooking/Recipe": {...}}
|
||||||
|
|
||||||
|
# LLM should use full prefix to disambiguate
|
||||||
|
llm_output = {"object": "fo/Recipe"} # Not just "Recipe"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test 3: Entity Instance Format
|
||||||
|
```python
|
||||||
|
# Given prompt with food ontology
|
||||||
|
# LLM should create instances like
|
||||||
|
{"subject": "recipe:cornish-pasty"} # Namespace-style
|
||||||
|
{"subject": "food:beef"} # Consistent prefix
|
||||||
|
```
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. **Should entity instances use namespace prefixes?**
|
||||||
|
- Current: `"recipe:cornish-pasty"` (arbitrary)
|
||||||
|
- Alternative: Use ontology prefix `"fo:cornish-pasty"`?
|
||||||
|
- Alternative: No prefix, expand in URI `"cornish-pasty"` → full URI?
|
||||||
|
|
||||||
|
2. **How to handle domain/range in prompt?**
|
||||||
|
- Currently shows: `(Recipe → Food)`
|
||||||
|
- Should it be: `(fo/Recipe → fo/Food)`?
|
||||||
|
|
||||||
|
3. **Should we validate domain/range constraints?**
|
||||||
|
- TODO comment at extract.py:470
|
||||||
|
- Would catch more errors but more complex
|
||||||
|
|
||||||
|
4. **What about inverse properties and equivalences?**
|
||||||
|
- Ontology has `owl:inverseOf`, `owl:equivalentClass`
|
||||||
|
- Not currently used in extraction
|
||||||
|
- Should they be?
|
||||||
|
|
||||||
|
## Success Metrics
|
||||||
|
|
||||||
|
- ✅ Zero URI information loss (100% preservation of original URIs)
|
||||||
|
- ✅ LLM output format matches input format
|
||||||
|
- ✅ No ambiguous examples in prompt
|
||||||
|
- ✅ Tests pass with multiple ontologies
|
||||||
|
- ✅ Improved extraction quality (measured by valid triple %)
|
||||||
|
|
||||||
|
## Alternative Approach: Simplified Extraction Format
|
||||||
|
|
||||||
|
### Philosophy
|
||||||
|
|
||||||
|
Instead of asking the LLM to understand RDF/OWL semantics, ask it to do what it's good at: **find entities and relationships in text**.
|
||||||
|
|
||||||
|
Let the code handle URI construction, RDF conversion, and semantic web formalities.
|
||||||
|
|
||||||
|
### Example: Entity Classification
|
||||||
|
|
||||||
|
**Input Text:**
|
||||||
|
```
|
||||||
|
Cornish pasty is a traditional British pastry filled with meat and vegetables.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Ontology Schema (shown to LLM):**
|
||||||
|
```markdown
|
||||||
|
## Entity Types:
|
||||||
|
- Recipe: A recipe is a combination of ingredients and a method
|
||||||
|
- Food: A food is something that can be eaten
|
||||||
|
- Ingredient: An ingredient combines a quantity and a food
|
||||||
|
```
|
||||||
|
|
||||||
|
**What LLM Returns (Simple JSON):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entities": [
|
||||||
|
{
|
||||||
|
"entity": "Cornish pasty",
|
||||||
|
"type": "Recipe"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**What Code Produces (RDF Triples):**
|
||||||
|
```python
|
||||||
|
# 1. Normalize entity name + type to ID (type prevents collisions)
|
||||||
|
entity_id = "recipe-cornish-pasty" # normalize("Cornish pasty", "Recipe")
|
||||||
|
entity_uri = "https://trustgraph.ai/food/recipe-cornish-pasty"
|
||||||
|
|
||||||
|
# Note: Same name, different type = different URI
|
||||||
|
# "Cornish pasty" (Recipe) → recipe-cornish-pasty
|
||||||
|
# "Cornish pasty" (Food) → food-cornish-pasty
|
||||||
|
|
||||||
|
# 2. Generate triples
|
||||||
|
triples = [
|
||||||
|
# Type triple
|
||||||
|
Triple(
|
||||||
|
s=Value(value=entity_uri, is_uri=True),
|
||||||
|
p=Value(value="http://www.w3.org/1999/02/22-rdf-syntax-ns#type", is_uri=True),
|
||||||
|
o=Value(value="http://purl.org/ontology/fo/Recipe", is_uri=True)
|
||||||
|
),
|
||||||
|
# Label triple (automatic)
|
||||||
|
Triple(
|
||||||
|
s=Value(value=entity_uri, is_uri=True),
|
||||||
|
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||||
|
o=Value(value="Cornish pasty", is_uri=False)
|
||||||
|
)
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Benefits
|
||||||
|
|
||||||
|
1. **LLM doesn't need to:**
|
||||||
|
- Understand URI syntax
|
||||||
|
- Invent identifier prefixes (`recipe:`, `ingredient:`)
|
||||||
|
- Know about `rdf:type` or `rdfs:label`
|
||||||
|
- Construct semantic web identifiers
|
||||||
|
|
||||||
|
2. **LLM just needs to:**
|
||||||
|
- Find entities in text
|
||||||
|
- Map them to ontology classes
|
||||||
|
- Extract relationships and attributes
|
||||||
|
|
||||||
|
3. **Code handles:**
|
||||||
|
- URI normalization and construction
|
||||||
|
- RDF triple generation
|
||||||
|
- Automatic label assignment
|
||||||
|
- Namespace management
|
||||||
|
|
||||||
|
### Why This Works Better
|
||||||
|
|
||||||
|
- **Simpler prompt** = less confusion = fewer errors
|
||||||
|
- **Consistent IDs** = code controls normalization rules
|
||||||
|
- **Auto-generated labels** = no missing rdfs:label triples
|
||||||
|
- **LLM focuses on extraction** = what it's actually good at
|
||||||
|
|
||||||
|
### Example: Entity Relationships
|
||||||
|
|
||||||
|
**Input Text:**
|
||||||
|
```
|
||||||
|
Cornish pasty is a traditional British pastry filled with beef and potatoes.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Ontology Schema (shown to LLM):**
|
||||||
|
```markdown
|
||||||
|
## Entity Types:
|
||||||
|
- Recipe: A recipe is a combination of ingredients and a method
|
||||||
|
- Food: A food is something that can be eaten
|
||||||
|
- Ingredient: An ingredient combines a quantity and a food
|
||||||
|
|
||||||
|
## Relationships:
|
||||||
|
- has_ingredient: Relates a recipe to an ingredient it uses (Recipe → Ingredient)
|
||||||
|
- food: Relates an ingredient to the food that is required (Ingredient → Food)
|
||||||
|
```
|
||||||
|
|
||||||
|
**What LLM Returns (Simple JSON):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entities": [
|
||||||
|
{
|
||||||
|
"entity": "Cornish pasty",
|
||||||
|
"type": "Recipe"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"entity": "beef",
|
||||||
|
"type": "Food"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"entity": "potatoes",
|
||||||
|
"type": "Food"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"relationships": [
|
||||||
|
{
|
||||||
|
"subject": "Cornish pasty",
|
||||||
|
"subject-type": "Recipe",
|
||||||
|
"relation": "has_ingredient",
|
||||||
|
"object": "beef",
|
||||||
|
"object-type": "Food"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"subject": "Cornish pasty",
|
||||||
|
"subject-type": "Recipe",
|
||||||
|
"relation": "has_ingredient",
|
||||||
|
"object": "potatoes",
|
||||||
|
"object-type": "Food"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**What Code Produces (RDF Triples):**
|
||||||
|
```python
|
||||||
|
# Normalize entity names to URIs
|
||||||
|
cornish_pasty_uri = "https://trustgraph.ai/food/cornish-pasty"
|
||||||
|
beef_uri = "https://trustgraph.ai/food/beef"
|
||||||
|
potatoes_uri = "https://trustgraph.ai/food/potatoes"
|
||||||
|
|
||||||
|
# Look up relation URI from ontology
|
||||||
|
has_ingredient_uri = "http://purl.org/ontology/fo/ingredients" # from fo/has_ingredient
|
||||||
|
|
||||||
|
triples = [
|
||||||
|
# Entity type triples (as before)
|
||||||
|
Triple(s=cornish_pasty_uri, p=rdf_type, o="http://purl.org/ontology/fo/Recipe"),
|
||||||
|
Triple(s=cornish_pasty_uri, p=rdfs_label, o="Cornish pasty"),
|
||||||
|
|
||||||
|
Triple(s=beef_uri, p=rdf_type, o="http://purl.org/ontology/fo/Food"),
|
||||||
|
Triple(s=beef_uri, p=rdfs_label, o="beef"),
|
||||||
|
|
||||||
|
Triple(s=potatoes_uri, p=rdf_type, o="http://purl.org/ontology/fo/Food"),
|
||||||
|
Triple(s=potatoes_uri, p=rdfs_label, o="potatoes"),
|
||||||
|
|
||||||
|
# Relationship triples
|
||||||
|
Triple(
|
||||||
|
s=Value(value=cornish_pasty_uri, is_uri=True),
|
||||||
|
p=Value(value=has_ingredient_uri, is_uri=True),
|
||||||
|
o=Value(value=beef_uri, is_uri=True)
|
||||||
|
),
|
||||||
|
Triple(
|
||||||
|
s=Value(value=cornish_pasty_uri, is_uri=True),
|
||||||
|
p=Value(value=has_ingredient_uri, is_uri=True),
|
||||||
|
o=Value(value=potatoes_uri, is_uri=True)
|
||||||
|
)
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Points:**
|
||||||
|
- LLM returns natural language entity names: `"Cornish pasty"`, `"beef"`, `"potatoes"`
|
||||||
|
- LLM includes types to disambiguate: `subject-type`, `object-type`
|
||||||
|
- LLM uses relation name from schema: `"has_ingredient"`
|
||||||
|
- Code derives consistent IDs using (name, type): `("Cornish pasty", "Recipe")` → `recipe-cornish-pasty`
|
||||||
|
- Code looks up relation URI from ontology: `fo/has_ingredient` → full URI
|
||||||
|
- Same (name, type) tuple always gets same URI (deduplication)
|
||||||
|
|
||||||
|
### Example: Entity Name Disambiguation
|
||||||
|
|
||||||
|
**Problem:** Same name can refer to different entity types.
|
||||||
|
|
||||||
|
**Real-world case:**
|
||||||
|
```
|
||||||
|
"Cornish pasty" can be:
|
||||||
|
- A Recipe (instructions for making it)
|
||||||
|
- A Food (the dish itself)
|
||||||
|
```
|
||||||
|
|
||||||
|
**How It's Handled:**
|
||||||
|
|
||||||
|
LLM returns both as separate entities:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entities": [
|
||||||
|
{"entity": "Cornish pasty", "type": "Recipe"},
|
||||||
|
{"entity": "Cornish pasty", "type": "Food"}
|
||||||
|
],
|
||||||
|
"relationships": [
|
||||||
|
{
|
||||||
|
"subject": "Cornish pasty",
|
||||||
|
"subject-type": "Recipe",
|
||||||
|
"relation": "produces",
|
||||||
|
"object": "Cornish pasty",
|
||||||
|
"object-type": "Food"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Code Resolution:**
|
||||||
|
```python
|
||||||
|
# Different types → different URIs
|
||||||
|
recipe_uri = normalize("Cornish pasty", "Recipe")
|
||||||
|
# → "https://trustgraph.ai/food/recipe-cornish-pasty"
|
||||||
|
|
||||||
|
food_uri = normalize("Cornish pasty", "Food")
|
||||||
|
# → "https://trustgraph.ai/food/food-cornish-pasty"
|
||||||
|
|
||||||
|
# Relationship connects them correctly
|
||||||
|
triple = Triple(
|
||||||
|
s=recipe_uri, # The Recipe
|
||||||
|
p="http://purl.org/ontology/fo/produces",
|
||||||
|
o=food_uri # The Food
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why This Works:**
|
||||||
|
- Type is included in ALL references (entities, relationships, attributes)
|
||||||
|
- Code uses `(name, type)` tuple as lookup key
|
||||||
|
- No ambiguity, no collisions
|
||||||
|
|
||||||
|
### Example: Entity Attributes
|
||||||
|
|
||||||
|
**Input Text:**
|
||||||
|
```
|
||||||
|
This Cornish pasty recipe serves 4-6 people and takes 45 minutes to prepare.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Ontology Schema (shown to LLM):**
|
||||||
|
```markdown
|
||||||
|
## Entity Types:
|
||||||
|
- Recipe: A recipe is a combination of ingredients and a method
|
||||||
|
|
||||||
|
## Attributes:
|
||||||
|
- serves: Indicates what the recipe is intended to serve (Recipe → text)
|
||||||
|
- preparation_time: Time needed to prepare the recipe (Recipe → text)
|
||||||
|
```
|
||||||
|
|
||||||
|
**What LLM Returns (Simple JSON):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entities": [
|
||||||
|
{
|
||||||
|
"entity": "Cornish pasty recipe",
|
||||||
|
"type": "Recipe"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"attributes": [
|
||||||
|
{
|
||||||
|
"entity": "Cornish pasty recipe",
|
||||||
|
"entity-type": "Recipe",
|
||||||
|
"attribute": "serves",
|
||||||
|
"value": "4-6 people"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"entity": "Cornish pasty recipe",
|
||||||
|
"entity-type": "Recipe",
|
||||||
|
"attribute": "preparation_time",
|
||||||
|
"value": "45 minutes"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**What Code Produces (RDF Triples):**
|
||||||
|
```python
|
||||||
|
# Normalize entity name to URI
|
||||||
|
recipe_uri = "https://trustgraph.ai/food/cornish-pasty-recipe"
|
||||||
|
|
||||||
|
# Look up attribute URIs from ontology
|
||||||
|
serves_uri = "http://purl.org/ontology/fo/serves" # from fo/serves
|
||||||
|
prep_time_uri = "http://purl.org/ontology/fo/preparation_time" # from fo/preparation_time
|
||||||
|
|
||||||
|
triples = [
|
||||||
|
# Entity type triple
|
||||||
|
Triple(
|
||||||
|
s=Value(value=recipe_uri, is_uri=True),
|
||||||
|
p=Value(value=rdf_type, is_uri=True),
|
||||||
|
o=Value(value="http://purl.org/ontology/fo/Recipe", is_uri=True)
|
||||||
|
),
|
||||||
|
|
||||||
|
# Label triple (automatic)
|
||||||
|
Triple(
|
||||||
|
s=Value(value=recipe_uri, is_uri=True),
|
||||||
|
p=Value(value=rdfs_label, is_uri=True),
|
||||||
|
o=Value(value="Cornish pasty recipe", is_uri=False)
|
||||||
|
),
|
||||||
|
|
||||||
|
# Attribute triples (objects are literals, not URIs)
|
||||||
|
Triple(
|
||||||
|
s=Value(value=recipe_uri, is_uri=True),
|
||||||
|
p=Value(value=serves_uri, is_uri=True),
|
||||||
|
o=Value(value="4-6 people", is_uri=False) # Literal value!
|
||||||
|
),
|
||||||
|
Triple(
|
||||||
|
s=Value(value=recipe_uri, is_uri=True),
|
||||||
|
p=Value(value=prep_time_uri, is_uri=True),
|
||||||
|
o=Value(value="45 minutes", is_uri=False) # Literal value!
|
||||||
|
)
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Points:**
|
||||||
|
- LLM extracts literal values: `"4-6 people"`, `"45 minutes"`
|
||||||
|
- LLM includes entity type for disambiguation: `entity-type`
|
||||||
|
- LLM uses attribute name from schema: `"serves"`, `"preparation_time"`
|
||||||
|
- Code looks up attribute URI from ontology datatype properties
|
||||||
|
- **Object is literal** (`is_uri=False`), not a URI reference
|
||||||
|
- Values stay as natural text, no normalization needed
|
||||||
|
|
||||||
|
**Difference from Relationships:**
|
||||||
|
- Relationships: both subject and object are entities (URIs)
|
||||||
|
- Attributes: subject is entity (URI), object is literal value (string/number)
|
||||||
|
|
||||||
|
### Complete Example: Entities + Relationships + Attributes
|
||||||
|
|
||||||
|
**Input Text:**
|
||||||
|
```
|
||||||
|
Cornish pasty is a savory pastry filled with beef and potatoes.
|
||||||
|
This recipe serves 4 people.
|
||||||
|
```
|
||||||
|
|
||||||
|
**What LLM Returns:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entities": [
|
||||||
|
{
|
||||||
|
"entity": "Cornish pasty",
|
||||||
|
"type": "Recipe"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"entity": "beef",
|
||||||
|
"type": "Food"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"entity": "potatoes",
|
||||||
|
"type": "Food"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"relationships": [
|
||||||
|
{
|
||||||
|
"subject": "Cornish pasty",
|
||||||
|
"subject-type": "Recipe",
|
||||||
|
"relation": "has_ingredient",
|
||||||
|
"object": "beef",
|
||||||
|
"object-type": "Food"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"subject": "Cornish pasty",
|
||||||
|
"subject-type": "Recipe",
|
||||||
|
"relation": "has_ingredient",
|
||||||
|
"object": "potatoes",
|
||||||
|
"object-type": "Food"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"attributes": [
|
||||||
|
{
|
||||||
|
"entity": "Cornish pasty",
|
||||||
|
"entity-type": "Recipe",
|
||||||
|
"attribute": "serves",
|
||||||
|
"value": "4 people"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result:** 11 RDF triples generated:
|
||||||
|
- 3 entity type triples (rdf:type)
|
||||||
|
- 3 entity label triples (rdfs:label) - automatic
|
||||||
|
- 2 relationship triples (has_ingredient)
|
||||||
|
- 1 attribute triple (serves)
|
||||||
|
|
||||||
|
All from simple, natural language extractions by the LLM!
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- Current implementation: `trustgraph-flow/trustgraph/extract/kg/ontology/extract.py`
|
||||||
|
- Prompt template: `ontology-prompt.md`
|
||||||
|
- Test cases: `tests/unit/test_extract/test_ontology/`
|
||||||
|
- Example ontology: `e2e/test-data/food.ontology`
|
||||||
54
ontology-prompt.md
Normal file
54
ontology-prompt.md
Normal file
|
|
@ -0,0 +1,54 @@
|
||||||
|
You are a knowledge extraction expert. Extract structured triples from text using ONLY the provided ontology elements.
|
||||||
|
|
||||||
|
## Ontology Classes:
|
||||||
|
|
||||||
|
{% for class_id, class_def in classes.items() %}
|
||||||
|
- **{{class_id}}**{% if class_def.subclass_of %} (subclass of {{class_def.subclass_of}}){% endif %}{% if class_def.comment %}: {{class_def.comment}}{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
## Object Properties (connect entities):
|
||||||
|
|
||||||
|
{% for prop_id, prop_def in object_properties.items() %}
|
||||||
|
- **{{prop_id}}**{% if prop_def.domain and prop_def.range %} ({{prop_def.domain}} → {{prop_def.range}}){% endif %}{% if prop_def.comment %}: {{prop_def.comment}}{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
## Datatype Properties (entity attributes):
|
||||||
|
|
||||||
|
{% for prop_id, prop_def in datatype_properties.items() %}
|
||||||
|
- **{{prop_id}}**{% if prop_def.domain and prop_def.range %} ({{prop_def.domain}} → {{prop_def.range}}){% endif %}{% if prop_def.comment %}: {{prop_def.comment}}{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
## Text to Analyze:
|
||||||
|
|
||||||
|
{{text}}
|
||||||
|
|
||||||
|
## Extraction Rules:
|
||||||
|
|
||||||
|
1. Only use classes defined above for entity types
|
||||||
|
2. Only use properties defined above for relationships and attributes
|
||||||
|
3. Respect domain and range constraints where specified
|
||||||
|
4. For class instances, use `rdf:type` as the predicate
|
||||||
|
5. Include `rdfs:label` for new entities to provide human-readable names
|
||||||
|
6. Extract all relevant triples that can be inferred from the text
|
||||||
|
7. Use entity URIs or meaningful identifiers as subjects/objects
|
||||||
|
|
||||||
|
## Output Format:
|
||||||
|
|
||||||
|
Return ONLY a valid JSON array (no markdown, no code blocks) containing objects with these fields:
|
||||||
|
- "subject": the subject entity (URI or identifier)
|
||||||
|
- "predicate": the property (from ontology or rdf:type/rdfs:label)
|
||||||
|
- "object": the object entity or literal value
|
||||||
|
|
||||||
|
Important: Return raw JSON only, with no markdown formatting, no code blocks, and no backticks.
|
||||||
|
|
||||||
|
## Example Output:
|
||||||
|
|
||||||
|
[
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "rdf:type", "object": "Recipe"},
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "rdfs:label", "object": "Cornish Pasty"},
|
||||||
|
{"subject": "recipe:cornish-pasty", "predicate": "has_ingredient", "object": "ingredient:flour"},
|
||||||
|
{"subject": "ingredient:flour", "predicate": "rdf:type", "object": "Ingredient"},
|
||||||
|
{"subject": "ingredient:flour", "predicate": "rdfs:label", "object": "plain flour"}
|
||||||
|
]
|
||||||
|
|
||||||
|
Now extract triples from the text above.
|
||||||
|
|
@ -0,0 +1,164 @@
|
||||||
|
"""
|
||||||
|
Entity URI normalization for ontology-based knowledge extraction.
|
||||||
|
|
||||||
|
Converts entity names and types into consistent, collision-free URIs.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
|
from typing import Tuple
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_entity_name(entity_name: str) -> str:
|
||||||
|
"""Normalize entity name to URI-safe identifier.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
entity_name: Natural language entity name (e.g., "Cornish pasty")
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Normalized identifier (e.g., "cornish-pasty")
|
||||||
|
"""
|
||||||
|
# Convert to lowercase
|
||||||
|
normalized = entity_name.lower()
|
||||||
|
|
||||||
|
# Replace spaces and underscores with hyphens
|
||||||
|
normalized = re.sub(r'[\s_]+', '-', normalized)
|
||||||
|
|
||||||
|
# Remove any characters that aren't alphanumeric, hyphens, or periods
|
||||||
|
normalized = re.sub(r'[^a-z0-9\-.]', '', normalized)
|
||||||
|
|
||||||
|
# Remove leading/trailing hyphens
|
||||||
|
normalized = normalized.strip('-')
|
||||||
|
|
||||||
|
# Collapse multiple hyphens
|
||||||
|
normalized = re.sub(r'-+', '-', normalized)
|
||||||
|
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_type_identifier(type_id: str) -> str:
|
||||||
|
"""Normalize ontology type identifier to URI-safe format.
|
||||||
|
|
||||||
|
Handles prefixed types like "fo/Recipe" by converting to "fo-recipe".
|
||||||
|
|
||||||
|
Args:
|
||||||
|
type_id: Ontology type identifier (e.g., "fo/Recipe", "Food")
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Normalized type identifier (e.g., "fo-recipe", "food")
|
||||||
|
"""
|
||||||
|
# Convert to lowercase
|
||||||
|
normalized = type_id.lower()
|
||||||
|
|
||||||
|
# Replace slashes, colons, and spaces with hyphens
|
||||||
|
normalized = re.sub(r'[/:.\s_]+', '-', normalized)
|
||||||
|
|
||||||
|
# Remove any remaining non-alphanumeric characters except hyphens
|
||||||
|
normalized = re.sub(r'[^a-z0-9\-]', '', normalized)
|
||||||
|
|
||||||
|
# Remove leading/trailing hyphens
|
||||||
|
normalized = normalized.strip('-')
|
||||||
|
|
||||||
|
# Collapse multiple hyphens
|
||||||
|
normalized = re.sub(r'-+', '-', normalized)
|
||||||
|
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def build_entity_uri(entity_name: str, entity_type: str, ontology_id: str,
|
||||||
|
base_uri: str = "https://trustgraph.ai") -> str:
|
||||||
|
"""Build a unique URI for an entity based on its name and type.
|
||||||
|
|
||||||
|
The type is included in the URI to prevent collisions when the same
|
||||||
|
name refers to different entity types (e.g., "Cornish pasty" as both
|
||||||
|
Recipe and Food).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
entity_name: Natural language entity name (e.g., "Cornish pasty")
|
||||||
|
entity_type: Ontology type (e.g., "fo/Recipe")
|
||||||
|
ontology_id: Ontology identifier (e.g., "food")
|
||||||
|
base_uri: Base URI for entity URIs (default: "https://trustgraph.ai")
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Full entity URI (e.g., "https://trustgraph.ai/food/fo-recipe-cornish-pasty")
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
>>> build_entity_uri("Cornish pasty", "fo/Recipe", "food")
|
||||||
|
'https://trustgraph.ai/food/fo-recipe-cornish-pasty'
|
||||||
|
|
||||||
|
>>> build_entity_uri("Cornish pasty", "fo/Food", "food")
|
||||||
|
'https://trustgraph.ai/food/fo-food-cornish-pasty'
|
||||||
|
|
||||||
|
>>> build_entity_uri("beef", "fo/Food", "food")
|
||||||
|
'https://trustgraph.ai/food/fo-food-beef'
|
||||||
|
"""
|
||||||
|
type_part = normalize_type_identifier(entity_type)
|
||||||
|
name_part = normalize_entity_name(entity_name)
|
||||||
|
|
||||||
|
# Combine type and name to ensure uniqueness
|
||||||
|
entity_id = f"{type_part}-{name_part}"
|
||||||
|
|
||||||
|
# Build full URI
|
||||||
|
return f"{base_uri}/{ontology_id}/{entity_id}"
|
||||||
|
|
||||||
|
|
||||||
|
class EntityRegistry:
|
||||||
|
"""Registry to track entity name/type tuples and their assigned URIs.
|
||||||
|
|
||||||
|
Ensures that the same (entity_name, entity_type) tuple always maps
|
||||||
|
to the same URI, enabling deduplication across the extraction process.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, ontology_id: str, base_uri: str = "https://trustgraph.ai"):
|
||||||
|
"""Initialize the entity registry.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
ontology_id: Ontology identifier (e.g., "food")
|
||||||
|
base_uri: Base URI for entity URIs
|
||||||
|
"""
|
||||||
|
self.ontology_id = ontology_id
|
||||||
|
self.base_uri = base_uri
|
||||||
|
self._registry = {} # (entity_name, entity_type) -> uri
|
||||||
|
|
||||||
|
def get_or_create_uri(self, entity_name: str, entity_type: str) -> str:
|
||||||
|
"""Get existing URI or create new one for entity.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
entity_name: Natural language entity name
|
||||||
|
entity_type: Ontology type identifier
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
URI for this entity (same URI for same name/type tuple)
|
||||||
|
"""
|
||||||
|
key = (entity_name, entity_type)
|
||||||
|
|
||||||
|
if key not in self._registry:
|
||||||
|
uri = build_entity_uri(
|
||||||
|
entity_name,
|
||||||
|
entity_type,
|
||||||
|
self.ontology_id,
|
||||||
|
self.base_uri
|
||||||
|
)
|
||||||
|
self._registry[key] = uri
|
||||||
|
|
||||||
|
return self._registry[key]
|
||||||
|
|
||||||
|
def lookup(self, entity_name: str, entity_type: str) -> str:
|
||||||
|
"""Look up URI for entity (returns None if not registered).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
entity_name: Natural language entity name
|
||||||
|
entity_type: Ontology type identifier
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
URI for this entity, or None if not found
|
||||||
|
"""
|
||||||
|
key = (entity_name, entity_type)
|
||||||
|
return self._registry.get(key)
|
||||||
|
|
||||||
|
def clear(self):
|
||||||
|
"""Clear all registered entities."""
|
||||||
|
self._registry.clear()
|
||||||
|
|
||||||
|
def size(self) -> int:
|
||||||
|
"""Get number of registered entities."""
|
||||||
|
return len(self._registry)
|
||||||
|
|
@ -20,6 +20,8 @@ from .ontology_embedder import OntologyEmbedder
|
||||||
from .vector_store import InMemoryVectorStore
|
from .vector_store import InMemoryVectorStore
|
||||||
from .text_processor import TextProcessor
|
from .text_processor import TextProcessor
|
||||||
from .ontology_selector import OntologySelector, OntologySubset
|
from .ontology_selector import OntologySelector, OntologySubset
|
||||||
|
from .simplified_parser import parse_extraction_response
|
||||||
|
from .triple_converter import TripleConverter
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
@ -298,25 +300,10 @@ class Processor(FlowProcessor):
|
||||||
# Build extraction prompt variables
|
# Build extraction prompt variables
|
||||||
prompt_variables = self.build_extraction_variables(chunk, ontology_subset)
|
prompt_variables = self.build_extraction_variables(chunk, ontology_subset)
|
||||||
|
|
||||||
# Call prompt service for extraction
|
# Extract using simplified entity-relationship-attribute format
|
||||||
try:
|
triples = await self.extract_with_simplified_format(
|
||||||
# Use prompt() method with extract-with-ontologies prompt ID
|
flow, chunk, ontology_subset, prompt_variables
|
||||||
triples_response = await flow("prompt-request").prompt(
|
)
|
||||||
id="extract-with-ontologies",
|
|
||||||
variables=prompt_variables
|
|
||||||
)
|
|
||||||
logger.debug(f"Extraction response: {triples_response}")
|
|
||||||
|
|
||||||
if not isinstance(triples_response, list):
|
|
||||||
logger.error("Expected list of triples from prompt service")
|
|
||||||
triples_response = []
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Prompt service error: {e}", exc_info=True)
|
|
||||||
triples_response = []
|
|
||||||
|
|
||||||
# Parse and validate triples
|
|
||||||
triples = self.parse_and_validate_triples(triples_response, ontology_subset)
|
|
||||||
|
|
||||||
# Add metadata triples
|
# Add metadata triples
|
||||||
for t in v.metadata.metadata:
|
for t in v.metadata.metadata:
|
||||||
|
|
@ -362,6 +349,55 @@ class Processor(FlowProcessor):
|
||||||
[]
|
[]
|
||||||
)
|
)
|
||||||
|
|
||||||
|
async def extract_with_simplified_format(
|
||||||
|
self,
|
||||||
|
flow,
|
||||||
|
chunk: str,
|
||||||
|
ontology_subset: OntologySubset,
|
||||||
|
prompt_variables: Dict[str, Any]
|
||||||
|
) -> List[Triple]:
|
||||||
|
"""Extract triples using simplified entity-relationship-attribute format.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
flow: Flow object for accessing services
|
||||||
|
chunk: Text chunk to extract from
|
||||||
|
ontology_subset: Selected ontology subset
|
||||||
|
prompt_variables: Variables for prompt template
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of Triple objects
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Call prompt service with simplified format prompt
|
||||||
|
extraction_response = await flow("prompt-request").prompt(
|
||||||
|
id="extract-with-ontologies",
|
||||||
|
variables=prompt_variables
|
||||||
|
)
|
||||||
|
logger.debug(f"Simplified extraction response: {extraction_response}")
|
||||||
|
|
||||||
|
# Parse response into structured format
|
||||||
|
extraction_result = parse_extraction_response(extraction_response)
|
||||||
|
|
||||||
|
if not extraction_result:
|
||||||
|
logger.warning("Failed to parse extraction response")
|
||||||
|
return []
|
||||||
|
|
||||||
|
logger.info(f"Parsed {len(extraction_result.entities)} entities, "
|
||||||
|
f"{len(extraction_result.relationships)} relationships, "
|
||||||
|
f"{len(extraction_result.attributes)} attributes")
|
||||||
|
|
||||||
|
# Convert to RDF triples
|
||||||
|
converter = TripleConverter(ontology_subset, ontology_subset.ontology_id)
|
||||||
|
triples = converter.convert_all(extraction_result)
|
||||||
|
|
||||||
|
logger.info(f"Generated {len(triples)} RDF triples from simplified extraction")
|
||||||
|
|
||||||
|
return triples
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Simplified extraction error: {e}", exc_info=True)
|
||||||
|
return []
|
||||||
|
|
||||||
def build_extraction_variables(self, chunk: str, ontology_subset: OntologySubset) -> Dict[str, Any]:
|
def build_extraction_variables(self, chunk: str, ontology_subset: OntologySubset) -> Dict[str, Any]:
|
||||||
"""Build variables for ontology-based extraction prompt template.
|
"""Build variables for ontology-based extraction prompt template.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,234 @@
|
||||||
|
"""
|
||||||
|
Parser for simplified ontology extraction JSON format.
|
||||||
|
|
||||||
|
Parses the new entity-relationship-attribute format from LLM responses.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from typing import List, Dict, Any, Optional
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Entity:
|
||||||
|
"""Represents an extracted entity."""
|
||||||
|
entity: str
|
||||||
|
type: str
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Relationship:
|
||||||
|
"""Represents an extracted relationship."""
|
||||||
|
subject: str
|
||||||
|
subject_type: str
|
||||||
|
relation: str
|
||||||
|
object: str
|
||||||
|
object_type: str
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Attribute:
|
||||||
|
"""Represents an extracted attribute."""
|
||||||
|
entity: str
|
||||||
|
entity_type: str
|
||||||
|
attribute: str
|
||||||
|
value: str
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ExtractionResult:
|
||||||
|
"""Complete extraction result."""
|
||||||
|
entities: List[Entity]
|
||||||
|
relationships: List[Relationship]
|
||||||
|
attributes: List[Attribute]
|
||||||
|
|
||||||
|
|
||||||
|
def parse_extraction_response(response: Any) -> Optional[ExtractionResult]:
|
||||||
|
"""Parse LLM extraction response into structured format.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
response: LLM response (string JSON or already parsed dict)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
ExtractionResult with parsed entities/relationships/attributes,
|
||||||
|
or None if parsing fails
|
||||||
|
"""
|
||||||
|
# Handle string response (parse JSON)
|
||||||
|
if isinstance(response, str):
|
||||||
|
try:
|
||||||
|
data = json.loads(response)
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
logger.error(f"Failed to parse JSON response: {e}")
|
||||||
|
logger.debug(f"Response was: {response[:500]}")
|
||||||
|
return None
|
||||||
|
elif isinstance(response, dict):
|
||||||
|
data = response
|
||||||
|
else:
|
||||||
|
logger.error(f"Unexpected response type: {type(response)}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Validate structure
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
logger.error(f"Expected dict, got {type(data)}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Parse entities
|
||||||
|
entities = []
|
||||||
|
entities_data = data.get('entities', [])
|
||||||
|
if not isinstance(entities_data, list):
|
||||||
|
logger.warning(f"'entities' is not a list: {type(entities_data)}")
|
||||||
|
entities_data = []
|
||||||
|
|
||||||
|
for entity_data in entities_data:
|
||||||
|
try:
|
||||||
|
entity = parse_entity(entity_data)
|
||||||
|
if entity:
|
||||||
|
entities.append(entity)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to parse entity {entity_data}: {e}")
|
||||||
|
|
||||||
|
# Parse relationships
|
||||||
|
relationships = []
|
||||||
|
relationships_data = data.get('relationships', [])
|
||||||
|
if not isinstance(relationships_data, list):
|
||||||
|
logger.warning(f"'relationships' is not a list: {type(relationships_data)}")
|
||||||
|
relationships_data = []
|
||||||
|
|
||||||
|
for rel_data in relationships_data:
|
||||||
|
try:
|
||||||
|
relationship = parse_relationship(rel_data)
|
||||||
|
if relationship:
|
||||||
|
relationships.append(relationship)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to parse relationship {rel_data}: {e}")
|
||||||
|
|
||||||
|
# Parse attributes
|
||||||
|
attributes = []
|
||||||
|
attributes_data = data.get('attributes', [])
|
||||||
|
if not isinstance(attributes_data, list):
|
||||||
|
logger.warning(f"'attributes' is not a list: {type(attributes_data)}")
|
||||||
|
attributes_data = []
|
||||||
|
|
||||||
|
for attr_data in attributes_data:
|
||||||
|
try:
|
||||||
|
attribute = parse_attribute(attr_data)
|
||||||
|
if attribute:
|
||||||
|
attributes.append(attribute)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to parse attribute {attr_data}: {e}")
|
||||||
|
|
||||||
|
return ExtractionResult(
|
||||||
|
entities=entities,
|
||||||
|
relationships=relationships,
|
||||||
|
attributes=attributes
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_entity(data: Dict[str, Any]) -> Optional[Entity]:
|
||||||
|
"""Parse entity from dict.
|
||||||
|
|
||||||
|
Supports both kebab-case and snake_case field names for compatibility.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
data: Entity dict with 'entity' and 'type' fields
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Entity object or None if invalid
|
||||||
|
"""
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
logger.warning(f"Entity data is not a dict: {type(data)}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
entity = data.get('entity')
|
||||||
|
entity_type = data.get('type')
|
||||||
|
|
||||||
|
if not entity or not entity_type:
|
||||||
|
logger.warning(f"Missing required fields in entity: {data}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if not isinstance(entity, str) or not isinstance(entity_type, str):
|
||||||
|
logger.warning(f"Entity fields must be strings: {data}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
return Entity(entity=entity, type=entity_type)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_relationship(data: Dict[str, Any]) -> Optional[Relationship]:
|
||||||
|
"""Parse relationship from dict.
|
||||||
|
|
||||||
|
Supports both kebab-case and snake_case field names for compatibility.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
data: Relationship dict with subject, subject-type, relation, object, object-type
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Relationship object or None if invalid
|
||||||
|
"""
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
logger.warning(f"Relationship data is not a dict: {type(data)}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
subject = data.get('subject')
|
||||||
|
subject_type = data.get('subject-type') or data.get('subject_type')
|
||||||
|
relation = data.get('relation')
|
||||||
|
obj = data.get('object')
|
||||||
|
object_type = data.get('object-type') or data.get('object_type')
|
||||||
|
|
||||||
|
if not all([subject, subject_type, relation, obj, object_type]):
|
||||||
|
logger.warning(f"Missing required fields in relationship: {data}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if not all(isinstance(v, str) for v in [subject, subject_type, relation, obj, object_type]):
|
||||||
|
logger.warning(f"Relationship fields must be strings: {data}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
return Relationship(
|
||||||
|
subject=subject,
|
||||||
|
subject_type=subject_type,
|
||||||
|
relation=relation,
|
||||||
|
object=obj,
|
||||||
|
object_type=object_type
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_attribute(data: Dict[str, Any]) -> Optional[Attribute]:
|
||||||
|
"""Parse attribute from dict.
|
||||||
|
|
||||||
|
Supports both kebab-case and snake_case field names for compatibility.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
data: Attribute dict with entity, entity-type, attribute, value
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Attribute object or None if invalid
|
||||||
|
"""
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
logger.warning(f"Attribute data is not a dict: {type(data)}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
entity = data.get('entity')
|
||||||
|
entity_type = data.get('entity-type') or data.get('entity_type')
|
||||||
|
attribute = data.get('attribute')
|
||||||
|
value = data.get('value')
|
||||||
|
|
||||||
|
if not all([entity, entity_type, attribute, value is not None]):
|
||||||
|
logger.warning(f"Missing required fields in attribute: {data}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
if not all(isinstance(v, str) for v in [entity, entity_type, attribute]):
|
||||||
|
logger.warning(f"Attribute fields must be strings: {data}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Value can be string, number, bool - convert to string
|
||||||
|
if not isinstance(value, str):
|
||||||
|
value = str(value)
|
||||||
|
|
||||||
|
return Attribute(
|
||||||
|
entity=entity,
|
||||||
|
entity_type=entity_type,
|
||||||
|
attribute=attribute,
|
||||||
|
value=value
|
||||||
|
)
|
||||||
|
|
@ -0,0 +1,228 @@
|
||||||
|
"""
|
||||||
|
Converts simplified extraction format to RDF triples.
|
||||||
|
|
||||||
|
Transforms entities, relationships, and attributes into proper RDF triples
|
||||||
|
with full URIs and correct is_uri flags.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from typing import List, Optional
|
||||||
|
|
||||||
|
from .... schema import Triple, Value
|
||||||
|
from .... rdf import RDF_TYPE, RDF_LABEL
|
||||||
|
|
||||||
|
from .simplified_parser import Entity, Relationship, Attribute, ExtractionResult
|
||||||
|
from .entity_normalizer import EntityRegistry
|
||||||
|
from .ontology_selector import OntologySubset
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class TripleConverter:
|
||||||
|
"""Converts extraction results to RDF triples."""
|
||||||
|
|
||||||
|
def __init__(self, ontology_subset: OntologySubset, ontology_id: str):
|
||||||
|
"""Initialize converter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
ontology_subset: Ontology subset with classes and properties
|
||||||
|
ontology_id: Ontology identifier for URI generation
|
||||||
|
"""
|
||||||
|
self.ontology_subset = ontology_subset
|
||||||
|
self.ontology_id = ontology_id
|
||||||
|
self.entity_registry = EntityRegistry(ontology_id)
|
||||||
|
|
||||||
|
def convert_all(self, extraction: ExtractionResult) -> List[Triple]:
|
||||||
|
"""Convert complete extraction result to RDF triples.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
extraction: Parsed extraction with entities/relationships/attributes
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of RDF Triple objects
|
||||||
|
"""
|
||||||
|
triples = []
|
||||||
|
|
||||||
|
# Convert entities (generates type + label triples)
|
||||||
|
for entity in extraction.entities:
|
||||||
|
entity_triples = self.convert_entity(entity)
|
||||||
|
triples.extend(entity_triples)
|
||||||
|
|
||||||
|
# Convert relationships
|
||||||
|
for relationship in extraction.relationships:
|
||||||
|
rel_triple = self.convert_relationship(relationship)
|
||||||
|
if rel_triple:
|
||||||
|
triples.append(rel_triple)
|
||||||
|
|
||||||
|
# Convert attributes
|
||||||
|
for attribute in extraction.attributes:
|
||||||
|
attr_triple = self.convert_attribute(attribute)
|
||||||
|
if attr_triple:
|
||||||
|
triples.append(attr_triple)
|
||||||
|
|
||||||
|
return triples
|
||||||
|
|
||||||
|
def convert_entity(self, entity: Entity) -> List[Triple]:
|
||||||
|
"""Convert entity to RDF triples (type + label).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
entity: Entity object with name and type
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List containing type triple and label triple
|
||||||
|
"""
|
||||||
|
triples = []
|
||||||
|
|
||||||
|
# Get or create URI for this entity
|
||||||
|
entity_uri = self.entity_registry.get_or_create_uri(
|
||||||
|
entity.entity,
|
||||||
|
entity.type
|
||||||
|
)
|
||||||
|
|
||||||
|
# Look up class URI from ontology
|
||||||
|
class_uri = self._get_class_uri(entity.type)
|
||||||
|
if not class_uri:
|
||||||
|
logger.warning(f"Unknown entity type '{entity.type}', skipping entity '{entity.entity}'")
|
||||||
|
return triples
|
||||||
|
|
||||||
|
# Generate type triple: entity rdf:type ClassURI
|
||||||
|
type_triple = Triple(
|
||||||
|
s=Value(value=entity_uri, is_uri=True),
|
||||||
|
p=Value(value=RDF_TYPE, is_uri=True),
|
||||||
|
o=Value(value=class_uri, is_uri=True)
|
||||||
|
)
|
||||||
|
triples.append(type_triple)
|
||||||
|
|
||||||
|
# Generate label triple: entity rdfs:label "entity name"
|
||||||
|
label_triple = Triple(
|
||||||
|
s=Value(value=entity_uri, is_uri=True),
|
||||||
|
p=Value(value=RDF_LABEL, is_uri=True),
|
||||||
|
o=Value(value=entity.entity, is_uri=False) # Literal!
|
||||||
|
)
|
||||||
|
triples.append(label_triple)
|
||||||
|
|
||||||
|
return triples
|
||||||
|
|
||||||
|
def convert_relationship(self, relationship: Relationship) -> Optional[Triple]:
|
||||||
|
"""Convert relationship to RDF triple.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
relationship: Relationship with subject/object entities and relation
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Triple connecting two entity URIs via property URI, or None if invalid
|
||||||
|
"""
|
||||||
|
# Get URIs for subject and object entities
|
||||||
|
subject_uri = self.entity_registry.get_or_create_uri(
|
||||||
|
relationship.subject,
|
||||||
|
relationship.subject_type
|
||||||
|
)
|
||||||
|
|
||||||
|
object_uri = self.entity_registry.get_or_create_uri(
|
||||||
|
relationship.object,
|
||||||
|
relationship.object_type
|
||||||
|
)
|
||||||
|
|
||||||
|
# Look up property URI from ontology
|
||||||
|
property_uri = self._get_object_property_uri(relationship.relation)
|
||||||
|
if not property_uri:
|
||||||
|
logger.warning(f"Unknown relationship '{relationship.relation}', skipping")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Generate triple: subject property object
|
||||||
|
return Triple(
|
||||||
|
s=Value(value=subject_uri, is_uri=True),
|
||||||
|
p=Value(value=property_uri, is_uri=True),
|
||||||
|
o=Value(value=object_uri, is_uri=True)
|
||||||
|
)
|
||||||
|
|
||||||
|
def convert_attribute(self, attribute: Attribute) -> Optional[Triple]:
|
||||||
|
"""Convert attribute to RDF triple.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
attribute: Attribute with entity, attribute name, and literal value
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Triple with entity URI, property URI, and literal value, or None if invalid
|
||||||
|
"""
|
||||||
|
# Get URI for entity
|
||||||
|
entity_uri = self.entity_registry.get_or_create_uri(
|
||||||
|
attribute.entity,
|
||||||
|
attribute.entity_type
|
||||||
|
)
|
||||||
|
|
||||||
|
# Look up property URI from ontology
|
||||||
|
property_uri = self._get_datatype_property_uri(attribute.attribute)
|
||||||
|
if not property_uri:
|
||||||
|
logger.warning(f"Unknown attribute '{attribute.attribute}', skipping")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Generate triple: entity property "literal value"
|
||||||
|
return Triple(
|
||||||
|
s=Value(value=entity_uri, is_uri=True),
|
||||||
|
p=Value(value=property_uri, is_uri=True),
|
||||||
|
o=Value(value=attribute.value, is_uri=False) # Literal!
|
||||||
|
)
|
||||||
|
|
||||||
|
def _get_class_uri(self, class_id: str) -> Optional[str]:
|
||||||
|
"""Get full URI for ontology class.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
class_id: Class identifier (e.g., "fo/Recipe")
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Full class URI or None if not found
|
||||||
|
"""
|
||||||
|
if class_id not in self.ontology_subset.classes:
|
||||||
|
return None
|
||||||
|
|
||||||
|
class_def = self.ontology_subset.classes[class_id]
|
||||||
|
|
||||||
|
# Extract URI from class definition
|
||||||
|
if isinstance(class_def, dict) and 'uri' in class_def:
|
||||||
|
return class_def['uri']
|
||||||
|
|
||||||
|
# Fallback: construct URI
|
||||||
|
return f"https://trustgraph.ai/ontology/{self.ontology_id}#{class_id}"
|
||||||
|
|
||||||
|
def _get_object_property_uri(self, property_id: str) -> Optional[str]:
|
||||||
|
"""Get full URI for object property.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
property_id: Property identifier (e.g., "fo/has_ingredient")
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Full property URI or None if not found
|
||||||
|
"""
|
||||||
|
if property_id not in self.ontology_subset.object_properties:
|
||||||
|
return None
|
||||||
|
|
||||||
|
prop_def = self.ontology_subset.object_properties[property_id]
|
||||||
|
|
||||||
|
# Extract URI from property definition
|
||||||
|
if isinstance(prop_def, dict) and 'uri' in prop_def:
|
||||||
|
return prop_def['uri']
|
||||||
|
|
||||||
|
# Fallback: construct URI
|
||||||
|
return f"https://trustgraph.ai/ontology/{self.ontology_id}#{property_id}"
|
||||||
|
|
||||||
|
def _get_datatype_property_uri(self, property_id: str) -> Optional[str]:
|
||||||
|
"""Get full URI for datatype property.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
property_id: Property identifier (e.g., "fo/serves")
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Full property URI or None if not found
|
||||||
|
"""
|
||||||
|
if property_id not in self.ontology_subset.datatype_properties:
|
||||||
|
return None
|
||||||
|
|
||||||
|
prop_def = self.ontology_subset.datatype_properties[property_id]
|
||||||
|
|
||||||
|
# Extract URI from property definition
|
||||||
|
if isinstance(prop_def, dict) and 'uri' in prop_def:
|
||||||
|
return prop_def['uri']
|
||||||
|
|
||||||
|
# Fallback: construct URI
|
||||||
|
return f"https://trustgraph.ai/ontology/{self.ontology_id}#{property_id}"
|
||||||
Loading…
Add table
Add a link
Reference in a new issue