mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-26 00:46:22 +02:00
Ontology extraction tests (#560)
This commit is contained in:
parent
2baf21c5e1
commit
6c85038c75
10 changed files with 3041 additions and 0 deletions
148
tests/unit/test_extract/test_ontology/README.md
Normal file
148
tests/unit/test_extract/test_ontology/README.md
Normal file
|
|
@ -0,0 +1,148 @@
|
|||
# Ontology Extractor Unit Tests
|
||||
|
||||
Comprehensive unit tests for the OntoRAG ontology extraction system.
|
||||
|
||||
## Test Coverage
|
||||
|
||||
### 1. `test_ontology_selector.py` - Auto-Include Properties Feature
|
||||
|
||||
Tests the critical dependency resolution that automatically includes all properties related to selected classes.
|
||||
|
||||
**Key Tests:**
|
||||
- `test_auto_include_properties_for_recipe_class` - Verifies Recipe class auto-includes `ingredients`, `method`, `produces`, `serves`
|
||||
- `test_auto_include_properties_for_ingredient_class` - Verifies Ingredient class auto-includes `food` property
|
||||
- `test_auto_include_properties_for_range_class` - Tests properties are included when class appears in range
|
||||
- `test_auto_include_adds_domain_and_range_classes` - Ensures related classes are added too
|
||||
- `test_multiple_classes_get_all_related_properties` - Tests combining multiple class selections
|
||||
- `test_no_duplicate_properties_added` - Ensures properties aren't duplicated
|
||||
|
||||
### 2. `test_uri_expansion.py` - URI Expansion
|
||||
|
||||
Tests that URIs are properly expanded using ontology definitions instead of constructed fallback URIs.
|
||||
|
||||
**Key Tests:**
|
||||
- `test_expand_class_uri_from_ontology` - Class names expand to ontology URIs
|
||||
- `test_expand_object_property_uri_from_ontology` - Object properties use ontology URIs
|
||||
- `test_expand_datatype_property_uri_from_ontology` - Datatype properties use ontology URIs
|
||||
- `test_expand_rdf_prefix` - Standard RDF prefixes expand correctly
|
||||
- `test_expand_rdfs_prefix`, `test_expand_owl_prefix`, `test_expand_xsd_prefix` - Other standard prefixes
|
||||
- `test_fallback_uri_for_instance` - Entity instances get constructed URIs
|
||||
- `test_already_full_uri_unchanged` - Full URIs pass through
|
||||
- `test_dict_access_not_object_attribute` - **Critical test** verifying dict access works (not object attributes)
|
||||
|
||||
### 3. `test_ontology_triples.py` - Ontology Triple Generation
|
||||
|
||||
Tests that ontology elements (classes and properties) are properly converted to RDF triples with labels, comments, domains, and ranges.
|
||||
|
||||
**Key Tests:**
|
||||
- `test_generates_class_type_triples` - Classes get `rdf:type owl:Class` triples
|
||||
- `test_generates_class_labels` - Classes get `rdfs:label` triples
|
||||
- `test_generates_class_comments` - Classes get `rdfs:comment` triples
|
||||
- `test_generates_object_property_type_triples` - Object properties get proper type triples
|
||||
- `test_generates_object_property_labels` - Object properties get labels
|
||||
- `test_generates_object_property_domain` - Object properties get `rdfs:domain` triples
|
||||
- `test_generates_object_property_range` - Object properties get `rdfs:range` triples
|
||||
- `test_generates_datatype_property_type_triples` - Datatype properties get proper type triples
|
||||
- `test_generates_datatype_property_range` - Datatype properties get XSD type ranges
|
||||
- `test_uses_dict_field_names_not_rdf_names` - **Critical test** verifying dict field names work
|
||||
- `test_total_triple_count_is_reasonable` - Validates expected number of triples
|
||||
|
||||
### 4. `test_text_processing.py` - Text Processing and Segmentation
|
||||
|
||||
Tests that text is properly split into sentences for ontology matching, including NLTK tokenization and TextSegment creation.
|
||||
|
||||
**Key Tests:**
|
||||
- `test_segment_single_sentence` - Single sentence produces one segment
|
||||
- `test_segment_multiple_sentences` - Multiple sentences split correctly
|
||||
- `test_segment_positions` - Segment start/end positions are correct
|
||||
- `test_segment_complex_punctuation` - Handles abbreviations (Dr., U.S.A., Mr.)
|
||||
- `test_segment_question_and_exclamation` - Different sentence terminators
|
||||
- `test_segment_preserves_original_text` - Segments can reconstruct original
|
||||
- `test_text_segment_non_overlapping` - Segments don't overlap
|
||||
- `test_nltk_punkt_availability` - NLTK tokenizer is available
|
||||
- `test_unicode_text` - Handles unicode characters
|
||||
- `test_quoted_text` - Handles quoted text correctly
|
||||
|
||||
### 5. `test_prompt_and_extraction.py` - LLM Prompt Construction and Triple Extraction
|
||||
|
||||
Tests that the system correctly constructs prompts with ontology constraints and extracts/validates triples from LLM responses.
|
||||
|
||||
**Key Tests:**
|
||||
- `test_build_extraction_variables_includes_text` - Prompt includes input text
|
||||
- `test_build_extraction_variables_includes_classes` - Prompt includes ontology classes
|
||||
- `test_build_extraction_variables_includes_properties` - Prompt includes properties
|
||||
- `test_validates_rdf_type_triple_with_valid_class` - Validates rdf:type against ontology
|
||||
- `test_rejects_rdf_type_triple_with_invalid_class` - Rejects invalid classes
|
||||
- `test_validates_object_property_triple` - Validates object properties
|
||||
- `test_rejects_unknown_property` - Rejects properties not in ontology
|
||||
- `test_parse_simple_triple_dict` - Parses triple from dict format
|
||||
- `test_filters_invalid_triples` - Filters out invalid triples
|
||||
- `test_expands_uris_in_parsed_triples` - Expands URIs using ontology
|
||||
- `test_creates_proper_triple_objects` - Creates Triple objects with Value subjects/predicates/objects
|
||||
|
||||
### 6. `test_embedding_and_similarity.py` - Ontology Embedding and Similarity Matching
|
||||
|
||||
Tests that ontology elements are properly embedded and matched against input text using vector similarity.
|
||||
|
||||
**Key Tests:**
|
||||
- `test_create_text_from_class_with_id` - Text representation includes class ID
|
||||
- `test_create_text_from_class_with_labels` - Includes labels in text
|
||||
- `test_create_text_from_class_with_comment` - Includes comments in text
|
||||
- `test_create_text_from_property_with_domain_range` - Includes domain/range in property text
|
||||
- `test_normalizes_id_with_underscores` - Normalizes IDs (underscores to spaces)
|
||||
- `test_includes_subclass_info_for_classes` - Includes subclass relationships
|
||||
- `test_vector_store_api_structure` - Vector store has expected API
|
||||
- `test_selector_handles_text_segments` - Selector processes text segments
|
||||
- `test_merge_subsets_combines_elements` - Merging combines ontology elements
|
||||
- `test_ontology_element_metadata_structure` - Metadata structure is correct
|
||||
|
||||
## Running the Tests
|
||||
|
||||
### Run all ontology extractor tests:
|
||||
```bash
|
||||
cd /home/mark/work/trustgraph.ai/trustgraph
|
||||
pytest tests/unit/test_extract/test_ontology/ -v
|
||||
```
|
||||
|
||||
### Run specific test file:
|
||||
```bash
|
||||
pytest tests/unit/test_extract/test_ontology/test_ontology_selector.py -v
|
||||
pytest tests/unit/test_extract/test_ontology/test_uri_expansion.py -v
|
||||
pytest tests/unit/test_extract/test_ontology/test_ontology_triples.py -v
|
||||
pytest tests/unit/test_extract/test_ontology/test_text_processing.py -v
|
||||
pytest tests/unit/test_extract/test_ontology/test_prompt_and_extraction.py -v
|
||||
pytest tests/unit/test_extract/test_ontology/test_embedding_and_similarity.py -v
|
||||
```
|
||||
|
||||
### Run specific test:
|
||||
```bash
|
||||
pytest tests/unit/test_extract/test_ontology/test_ontology_selector.py::TestOntologySelector::test_auto_include_properties_for_recipe_class -v
|
||||
```
|
||||
|
||||
### Run with coverage:
|
||||
```bash
|
||||
pytest tests/unit/test_extract/test_ontology/ --cov=trustgraph.extract.kg.ontology --cov-report=html
|
||||
```
|
||||
|
||||
## Test Fixtures
|
||||
|
||||
- `sample_ontology` - Complete Food Ontology with Recipe, Ingredient, Food, Method classes
|
||||
- `ontology_loader_with_sample` - Mock OntologyLoader with the sample ontology
|
||||
- `ontology_embedder` - Mock embedder for testing
|
||||
- `mock_embedding_service` - Mock service for generating deterministic embeddings
|
||||
- `vector_store` - InMemoryVectorStore for testing
|
||||
- `extractor` - Processor instance for URI expansion tests
|
||||
- `ontology_subset_with_uris` - OntologySubset with proper URIs defined
|
||||
- `sample_ontology_subset` - OntologySubset for testing triple generation
|
||||
- `text_processor` - TextProcessor instance for text segmentation tests
|
||||
- `sample_ontology_class` - Sample OntologyClass for testing
|
||||
- `sample_ontology_property` - Sample OntologyProperty for testing
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
These tests verify the fixes made to address:
|
||||
1. **Disconnected graph problem** - Auto-include properties feature ensures all relevant relationships are available
|
||||
2. **Wrong URIs problem** - URI expansion using ontology definitions instead of constructed fallbacks
|
||||
3. **Dict vs object attribute problem** - URI expansion works with dicts (from `cls.__dict__`) not object attributes
|
||||
4. **Ontology visibility in KG** - Ontology elements themselves appear in the knowledge graph with proper metadata
|
||||
5. **Text segmentation** - Proper sentence splitting for ontology matching using NLTK
|
||||
1
tests/unit/test_extract/test_ontology/__init__.py
Normal file
1
tests/unit/test_extract/test_ontology/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
"""Tests for ontology-based extraction."""
|
||||
|
|
@ -0,0 +1,423 @@
|
|||
"""
|
||||
Unit tests for ontology embedding and similarity matching.
|
||||
|
||||
Tests that ontology elements are properly embedded and matched against
|
||||
input text using vector similarity.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import numpy as np
|
||||
from unittest.mock import AsyncMock, MagicMock
|
||||
from trustgraph.extract.kg.ontology.ontology_embedder import (
|
||||
OntologyEmbedder,
|
||||
OntologyElementMetadata
|
||||
)
|
||||
from trustgraph.extract.kg.ontology.ontology_loader import (
|
||||
Ontology,
|
||||
OntologyClass,
|
||||
OntologyProperty
|
||||
)
|
||||
from trustgraph.extract.kg.ontology.vector_store import InMemoryVectorStore, SearchResult
|
||||
from trustgraph.extract.kg.ontology.text_processor import TextSegment
|
||||
from trustgraph.extract.kg.ontology.ontology_selector import OntologySelector, OntologySubset
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_embedding_service():
|
||||
"""Create a mock embedding service."""
|
||||
service = AsyncMock()
|
||||
# Return deterministic embeddings for testing
|
||||
async def mock_embed(text):
|
||||
# Simple hash-based embedding for deterministic tests
|
||||
hash_val = hash(text) % 1000
|
||||
return np.array([hash_val / 1000.0, (1000 - hash_val) / 1000.0])
|
||||
service.embed = mock_embed
|
||||
return service
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def vector_store():
|
||||
"""Create an empty vector store."""
|
||||
return InMemoryVectorStore()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def ontology_embedder(mock_embedding_service, vector_store):
|
||||
"""Create an ontology embedder with mock service."""
|
||||
return OntologyEmbedder(
|
||||
embedding_service=mock_embedding_service,
|
||||
vector_store=vector_store
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_ontology_class():
|
||||
"""Create a sample ontology class."""
|
||||
return OntologyClass(
|
||||
uri="http://purl.org/ontology/fo/Recipe",
|
||||
type="owl:Class",
|
||||
labels=[{"value": "Recipe", "lang": "en-gb"}],
|
||||
comment="A Recipe is a combination of ingredients and a method.",
|
||||
subclass_of=None
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_ontology_property():
|
||||
"""Create a sample ontology property."""
|
||||
return OntologyProperty(
|
||||
uri="http://purl.org/ontology/fo/ingredients",
|
||||
type="owl:ObjectProperty",
|
||||
labels=[{"value": "ingredients", "lang": "en-gb"}],
|
||||
comment="The ingredients property relates a recipe to an ingredient list.",
|
||||
domain="Recipe",
|
||||
range="IngredientList"
|
||||
)
|
||||
|
||||
|
||||
class TestTextRepresentation:
|
||||
"""Test suite for creating text representations of ontology elements."""
|
||||
|
||||
def test_create_text_from_class_with_id(self, ontology_embedder, sample_ontology_class):
|
||||
"""Test that class ID is included in text representation."""
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"Recipe",
|
||||
sample_ontology_class,
|
||||
"class"
|
||||
)
|
||||
|
||||
assert "Recipe" in text, "Should include class ID"
|
||||
|
||||
def test_create_text_from_class_with_labels(self, ontology_embedder, sample_ontology_class):
|
||||
"""Test that class labels are included in text representation."""
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"Recipe",
|
||||
sample_ontology_class,
|
||||
"class"
|
||||
)
|
||||
|
||||
assert "Recipe" in text, "Should include label value"
|
||||
|
||||
def test_create_text_from_class_with_comment(self, ontology_embedder, sample_ontology_class):
|
||||
"""Test that class comments are included in text representation."""
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"Recipe",
|
||||
sample_ontology_class,
|
||||
"class"
|
||||
)
|
||||
|
||||
assert "combination of ingredients" in text, "Should include comment"
|
||||
|
||||
def test_create_text_from_property_with_domain_range(self, ontology_embedder, sample_ontology_property):
|
||||
"""Test that property domain and range are included in text."""
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"ingredients",
|
||||
sample_ontology_property,
|
||||
"objectProperty"
|
||||
)
|
||||
|
||||
assert "domain: Recipe" in text, "Should include domain"
|
||||
assert "range: IngredientList" in text, "Should include range"
|
||||
|
||||
def test_normalizes_id_with_underscores(self, ontology_embedder):
|
||||
"""Test that IDs with underscores are normalized."""
|
||||
mock_element = MagicMock()
|
||||
mock_element.labels = []
|
||||
mock_element.comment = None
|
||||
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"some_property_name",
|
||||
mock_element,
|
||||
"objectProperty"
|
||||
)
|
||||
|
||||
assert "some property name" in text, "Should replace underscores with spaces"
|
||||
|
||||
def test_normalizes_id_with_hyphens(self, ontology_embedder):
|
||||
"""Test that IDs with hyphens are normalized."""
|
||||
mock_element = MagicMock()
|
||||
mock_element.labels = []
|
||||
mock_element.comment = None
|
||||
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"some-property-name",
|
||||
mock_element,
|
||||
"objectProperty"
|
||||
)
|
||||
|
||||
assert "some property name" in text, "Should replace hyphens with spaces"
|
||||
|
||||
def test_handles_element_without_labels(self, ontology_embedder):
|
||||
"""Test handling of elements without labels."""
|
||||
mock_element = MagicMock()
|
||||
mock_element.labels = None
|
||||
mock_element.comment = "Test comment"
|
||||
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"TestElement",
|
||||
mock_element,
|
||||
"class"
|
||||
)
|
||||
|
||||
assert "TestElement" in text, "Should still include ID"
|
||||
assert "Test comment" in text, "Should include comment"
|
||||
|
||||
def test_includes_subclass_info_for_classes(self, ontology_embedder):
|
||||
"""Test that subclass information is included for classes."""
|
||||
mock_class = MagicMock()
|
||||
mock_class.labels = []
|
||||
mock_class.comment = None
|
||||
mock_class.subclass_of = "ParentClass"
|
||||
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"ChildClass",
|
||||
mock_class,
|
||||
"class"
|
||||
)
|
||||
|
||||
assert "subclass of ParentClass" in text, "Should include subclass relationship"
|
||||
|
||||
|
||||
class TestVectorStoreOperations:
|
||||
"""Test suite for vector store operations."""
|
||||
|
||||
def test_vector_store_starts_empty(self, vector_store):
|
||||
"""Test that vector store initializes empty."""
|
||||
assert vector_store.size() == 0, "New vector store should be empty"
|
||||
|
||||
def test_vector_store_api_structure(self, vector_store):
|
||||
"""Test that vector store has expected API methods."""
|
||||
assert hasattr(vector_store, 'add'), "Should have add method"
|
||||
assert hasattr(vector_store, 'add_batch'), "Should have add_batch method"
|
||||
assert hasattr(vector_store, 'search'), "Should have search method"
|
||||
assert hasattr(vector_store, 'size'), "Should have size method"
|
||||
|
||||
def test_search_result_class_structure(self):
|
||||
"""Test that SearchResult has expected structure."""
|
||||
# Create a sample SearchResult
|
||||
result = SearchResult(id="test-1", score=0.95, metadata={"element": "Test"})
|
||||
|
||||
assert hasattr(result, 'id'), "Should have id attribute"
|
||||
assert hasattr(result, 'score'), "Should have score attribute"
|
||||
assert hasattr(result, 'metadata'), "Should have metadata attribute"
|
||||
assert result.id == "test-1"
|
||||
assert result.score == 0.95
|
||||
assert result.metadata["element"] == "Test"
|
||||
|
||||
|
||||
class TestOntologySelectorIntegration:
|
||||
"""Test suite for ontology selector with embeddings."""
|
||||
|
||||
@pytest.fixture
|
||||
def sample_ontology(self):
|
||||
"""Create a sample ontology for testing."""
|
||||
return Ontology(
|
||||
id="food",
|
||||
classes={
|
||||
"Recipe": OntologyClass(
|
||||
uri="http://purl.org/ontology/fo/Recipe",
|
||||
type="owl:Class",
|
||||
labels=[{"value": "Recipe", "lang": "en-gb"}],
|
||||
comment="A Recipe is a combination of ingredients and a method."
|
||||
),
|
||||
"Ingredient": OntologyClass(
|
||||
uri="http://purl.org/ontology/fo/Ingredient",
|
||||
type="owl:Class",
|
||||
labels=[{"value": "Ingredient", "lang": "en-gb"}],
|
||||
comment="An Ingredient combines a quantity and a food."
|
||||
)
|
||||
},
|
||||
object_properties={
|
||||
"ingredients": OntologyProperty(
|
||||
uri="http://purl.org/ontology/fo/ingredients",
|
||||
type="owl:ObjectProperty",
|
||||
labels=[{"value": "ingredients", "lang": "en-gb"}],
|
||||
comment="Relates a recipe to its ingredients.",
|
||||
domain="Recipe",
|
||||
range="IngredientList"
|
||||
)
|
||||
},
|
||||
datatype_properties={},
|
||||
metadata={"name": "Food Ontology"}
|
||||
)
|
||||
|
||||
@pytest.fixture
|
||||
def ontology_loader_mock(self, sample_ontology):
|
||||
"""Create a mock ontology loader."""
|
||||
loader = MagicMock()
|
||||
loader.get_ontology.return_value = sample_ontology
|
||||
loader.get_all_ontology_ids.return_value = ["food"]
|
||||
return loader
|
||||
|
||||
async def test_selector_handles_text_segments(
|
||||
self, ontology_embedder, ontology_loader_mock
|
||||
):
|
||||
"""Test that selector can process text segments."""
|
||||
# Create selector
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=ontology_embedder,
|
||||
ontology_loader=ontology_loader_mock,
|
||||
top_k=5,
|
||||
similarity_threshold=0.3
|
||||
)
|
||||
|
||||
# Create text segments
|
||||
segments = [
|
||||
TextSegment(text="Recipe for cornish pasty", type="sentence", position=0),
|
||||
TextSegment(text="ingredients needed", type="sentence", position=1)
|
||||
]
|
||||
|
||||
# Select ontology subset (will be empty since we haven't embedded anything)
|
||||
subsets = await selector.select_ontology_subset(segments)
|
||||
|
||||
# Should return a list (even if empty)
|
||||
assert isinstance(subsets, list), "Should return a list of subsets"
|
||||
|
||||
async def test_selector_with_no_embedding_service(self, vector_store, ontology_loader_mock):
|
||||
"""Test that selector handles missing embedding service gracefully."""
|
||||
embedder = OntologyEmbedder(embedding_service=None, vector_store=vector_store)
|
||||
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=embedder,
|
||||
ontology_loader=ontology_loader_mock,
|
||||
top_k=5,
|
||||
similarity_threshold=0.7
|
||||
)
|
||||
|
||||
segments = [
|
||||
TextSegment(text="Test text", type="sentence", position=0)
|
||||
]
|
||||
|
||||
# Should return empty results without crashing
|
||||
subsets = await selector.select_ontology_subset(segments)
|
||||
assert isinstance(subsets, list), "Should return a list even without embeddings"
|
||||
|
||||
def test_merge_subsets_combines_elements(self, ontology_loader_mock, ontology_embedder):
|
||||
"""Test that merging subsets combines all elements."""
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=ontology_embedder,
|
||||
ontology_loader=ontology_loader_mock,
|
||||
top_k=5,
|
||||
similarity_threshold=0.7
|
||||
)
|
||||
|
||||
# Create two subsets from same ontology
|
||||
subset1 = OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={"Recipe": {"uri": "http://example.com/Recipe"}},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata={},
|
||||
relevance_score=0.8
|
||||
)
|
||||
|
||||
subset2 = OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={"Ingredient": {"uri": "http://example.com/Ingredient"}},
|
||||
object_properties={"ingredients": {"uri": "http://example.com/ingredients"}},
|
||||
datatype_properties={},
|
||||
metadata={},
|
||||
relevance_score=0.9
|
||||
)
|
||||
|
||||
merged = selector.merge_subsets([subset1, subset2])
|
||||
|
||||
assert len(merged.classes) == 2, "Should combine classes"
|
||||
# Keys may be prefixed with ontology id
|
||||
assert any("Recipe" in key for key in merged.classes.keys())
|
||||
assert any("Ingredient" in key for key in merged.classes.keys())
|
||||
assert len(merged.object_properties) == 1, "Should include properties"
|
||||
|
||||
|
||||
class TestEmbeddingEdgeCases:
|
||||
"""Test suite for edge cases in embedding."""
|
||||
|
||||
async def test_embed_element_with_no_labels(self, ontology_embedder):
|
||||
"""Test embedding element without labels."""
|
||||
mock_element = MagicMock()
|
||||
mock_element.labels = None
|
||||
mock_element.comment = "Test element"
|
||||
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"TestElement",
|
||||
mock_element,
|
||||
"class"
|
||||
)
|
||||
|
||||
# Should not crash and should include ID and comment
|
||||
assert "TestElement" in text
|
||||
assert "Test element" in text
|
||||
|
||||
async def test_embed_element_with_empty_comment(self, ontology_embedder):
|
||||
"""Test embedding element with empty comment."""
|
||||
mock_element = MagicMock()
|
||||
mock_element.labels = [{"value": "Label"}]
|
||||
mock_element.comment = None
|
||||
|
||||
text = ontology_embedder._create_text_representation(
|
||||
"TestElement",
|
||||
mock_element,
|
||||
"class"
|
||||
)
|
||||
|
||||
# Should not crash
|
||||
assert "Label" in text
|
||||
|
||||
def test_ontology_element_metadata_structure(self):
|
||||
"""Test OntologyElementMetadata structure."""
|
||||
metadata = OntologyElementMetadata(
|
||||
type="class",
|
||||
ontology="food",
|
||||
element="Recipe",
|
||||
definition={"uri": "http://example.com/Recipe"},
|
||||
text="Recipe A combination of ingredients"
|
||||
)
|
||||
|
||||
assert metadata.type == "class"
|
||||
assert metadata.ontology == "food"
|
||||
assert metadata.element == "Recipe"
|
||||
assert "uri" in metadata.definition
|
||||
|
||||
def test_vector_store_search_on_empty_store(self):
|
||||
"""Test searching empty vector store."""
|
||||
# Need a non-empty store for faiss to work
|
||||
# This test verifies the store can be created but searching requires dimension
|
||||
store = InMemoryVectorStore()
|
||||
assert store.size() == 0, "Empty store should have size 0"
|
||||
|
||||
|
||||
class TestOntologySubsetStructure:
|
||||
"""Test suite for OntologySubset structure."""
|
||||
|
||||
def test_ontology_subset_creation(self):
|
||||
"""Test creating an OntologySubset."""
|
||||
subset = OntologySubset(
|
||||
ontology_id="test",
|
||||
classes={"Recipe": {}},
|
||||
object_properties={"produces": {}},
|
||||
datatype_properties={"serves": {}},
|
||||
metadata={"name": "Test"},
|
||||
relevance_score=0.85
|
||||
)
|
||||
|
||||
assert subset.ontology_id == "test"
|
||||
assert len(subset.classes) == 1
|
||||
assert len(subset.object_properties) == 1
|
||||
assert len(subset.datatype_properties) == 1
|
||||
assert subset.relevance_score == 0.85
|
||||
|
||||
def test_ontology_subset_default_score(self):
|
||||
"""Test that OntologySubset has default score."""
|
||||
subset = OntologySubset(
|
||||
ontology_id="test",
|
||||
classes={},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata={}
|
||||
)
|
||||
|
||||
assert subset.relevance_score == 0.0, "Should have default score of 0.0"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
353
tests/unit/test_extract/test_ontology/test_entity_contexts.py
Normal file
353
tests/unit/test_extract/test_ontology/test_entity_contexts.py
Normal file
|
|
@ -0,0 +1,353 @@
|
|||
"""
|
||||
Unit tests for entity context building.
|
||||
|
||||
Tests that entity contexts are properly created from extracted triples,
|
||||
collecting labels and definitions for entity embedding and retrieval.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.extract.kg.ontology.extract import Processor
|
||||
from trustgraph.schema.core.primitives import Triple, Value
|
||||
from trustgraph.schema.knowledge.graph import EntityContext
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def processor():
|
||||
"""Create a Processor instance for testing."""
|
||||
processor = object.__new__(Processor)
|
||||
return processor
|
||||
|
||||
|
||||
class TestEntityContextBuilding:
|
||||
"""Test suite for entity context building from triples."""
|
||||
|
||||
def test_builds_context_from_label(self, processor):
|
||||
"""Test that entity context is built from rdfs:label."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/cornish-pasty", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Cornish Pasty", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1, "Should create one entity context"
|
||||
assert isinstance(contexts[0], EntityContext)
|
||||
assert contexts[0].entity.value == "https://example.com/entity/cornish-pasty"
|
||||
assert "Label: Cornish Pasty" in contexts[0].context
|
||||
|
||||
def test_builds_context_from_definition(self, processor):
|
||||
"""Test that entity context includes definitions."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/pasty", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2004/02/skos/core#definition", is_uri=True),
|
||||
o=Value(value="A baked pastry filled with savory ingredients", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
assert "A baked pastry filled with savory ingredients" in contexts[0].context
|
||||
|
||||
def test_combines_label_and_definition(self, processor):
|
||||
"""Test that label and definition are combined in context."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/recipe1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Pasty Recipe", is_uri=False)
|
||||
),
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/recipe1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2004/02/skos/core#definition", is_uri=True),
|
||||
o=Value(value="Traditional Cornish pastry recipe", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
context_text = contexts[0].context
|
||||
assert "Label: Pasty Recipe" in context_text
|
||||
assert "Traditional Cornish pastry recipe" in context_text
|
||||
assert ". " in context_text, "Should join parts with period and space"
|
||||
|
||||
def test_uses_first_label_only(self, processor):
|
||||
"""Test that only the first label is used in context."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/food1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="First Label", is_uri=False)
|
||||
),
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/food1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Second Label", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
assert "Label: First Label" in contexts[0].context
|
||||
assert "Second Label" not in contexts[0].context
|
||||
|
||||
def test_includes_all_definitions(self, processor):
|
||||
"""Test that all definitions are included in context."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/food1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2004/02/skos/core#definition", is_uri=True),
|
||||
o=Value(value="First definition", is_uri=False)
|
||||
),
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/food1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2004/02/skos/core#definition", is_uri=True),
|
||||
o=Value(value="Second definition", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
context_text = contexts[0].context
|
||||
assert "First definition" in context_text
|
||||
assert "Second definition" in context_text
|
||||
|
||||
def test_supports_schema_org_description(self, processor):
|
||||
"""Test that schema.org description is treated as definition."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/food1", is_uri=True),
|
||||
p=Value(value="https://schema.org/description", is_uri=True),
|
||||
o=Value(value="A delicious food item", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
assert "A delicious food item" in contexts[0].context
|
||||
|
||||
def test_handles_multiple_entities(self, processor):
|
||||
"""Test that contexts are created for multiple entities."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/entity1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Entity One", is_uri=False)
|
||||
),
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/entity2", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Entity Two", is_uri=False)
|
||||
),
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/entity3", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Entity Three", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 3, "Should create context for each entity"
|
||||
entity_uris = [ctx.entity.value for ctx in contexts]
|
||||
assert "https://example.com/entity/entity1" in entity_uris
|
||||
assert "https://example.com/entity/entity2" in entity_uris
|
||||
assert "https://example.com/entity/entity3" in entity_uris
|
||||
|
||||
def test_ignores_uri_literals(self, processor):
|
||||
"""Test that URI objects are ignored (only literal labels/definitions)."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/food1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="https://example.com/some/uri", is_uri=True) # URI, not literal
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
# Should not create context since label is URI
|
||||
assert len(contexts) == 0, "Should not create context for URI labels"
|
||||
|
||||
def test_ignores_non_label_non_definition_triples(self, processor):
|
||||
"""Test that other predicates are ignored."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/food1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/1999/02/22-rdf-syntax-ns#type", is_uri=True),
|
||||
o=Value(value="http://example.com/Food", is_uri=True)
|
||||
),
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/food1", is_uri=True),
|
||||
p=Value(value="http://example.com/produces", is_uri=True),
|
||||
o=Value(value="https://example.com/entity/food2", is_uri=True)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
# Should not create context since no labels or definitions
|
||||
assert len(contexts) == 0, "Should not create context without labels/definitions"
|
||||
|
||||
def test_handles_empty_triple_list(self, processor):
|
||||
"""Test handling of empty triple list."""
|
||||
triples = []
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 0, "Empty triple list should return empty contexts"
|
||||
|
||||
def test_entity_context_has_value_object(self, processor):
|
||||
"""Test that EntityContext.entity is a Value object."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/test", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Test Entity", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
assert isinstance(contexts[0].entity, Value), "Entity should be Value object"
|
||||
assert contexts[0].entity.is_uri, "Entity should be marked as URI"
|
||||
|
||||
def test_entity_context_text_is_string(self, processor):
|
||||
"""Test that EntityContext.context is a string."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/test", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Test Entity", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
assert isinstance(contexts[0].context, str), "Context should be string"
|
||||
|
||||
def test_only_creates_contexts_with_meaningful_info(self, processor):
|
||||
"""Test that contexts are only created when there's meaningful information."""
|
||||
triples = [
|
||||
# Entity with label - should create context
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/entity1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Entity One", is_uri=False)
|
||||
),
|
||||
# Entity with only rdf:type - should NOT create context
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/entity2", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/1999/02/22-rdf-syntax-ns#type", is_uri=True),
|
||||
o=Value(value="http://example.com/Food", is_uri=True)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1, "Should only create context for entity with label/definition"
|
||||
assert contexts[0].entity.value == "https://example.com/entity/entity1"
|
||||
|
||||
|
||||
class TestEntityContextEdgeCases:
|
||||
"""Test suite for edge cases in entity context building."""
|
||||
|
||||
def test_handles_unicode_in_labels(self, processor):
|
||||
"""Test handling of unicode characters in labels."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/café", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Café Spécial", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
assert "Café Spécial" in contexts[0].context
|
||||
|
||||
def test_handles_long_definitions(self, processor):
|
||||
"""Test handling of very long definitions."""
|
||||
long_def = "This is a very long definition " * 50
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/test", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2004/02/skos/core#definition", is_uri=True),
|
||||
o=Value(value=long_def, is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
assert long_def in contexts[0].context
|
||||
|
||||
def test_handles_special_characters_in_context(self, processor):
|
||||
"""Test handling of special characters in context text."""
|
||||
triples = [
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/test", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Test & Entity <with> \"quotes\"", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
assert "Test & Entity <with> \"quotes\"" in contexts[0].context
|
||||
|
||||
def test_mixed_relevant_and_irrelevant_triples(self, processor):
|
||||
"""Test extracting contexts from mixed triple types."""
|
||||
triples = [
|
||||
# Label - relevant
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/recipe1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2000/01/rdf-schema#label", is_uri=True),
|
||||
o=Value(value="Cornish Pasty Recipe", is_uri=False)
|
||||
),
|
||||
# Type - irrelevant
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/recipe1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/1999/02/22-rdf-syntax-ns#type", is_uri=True),
|
||||
o=Value(value="http://example.com/Recipe", is_uri=True)
|
||||
),
|
||||
# Property - irrelevant
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/recipe1", is_uri=True),
|
||||
p=Value(value="http://example.com/produces", is_uri=True),
|
||||
o=Value(value="https://example.com/entity/pasty", is_uri=True)
|
||||
),
|
||||
# Definition - relevant
|
||||
Triple(
|
||||
s=Value(value="https://example.com/entity/recipe1", is_uri=True),
|
||||
p=Value(value="http://www.w3.org/2004/02/skos/core#definition", is_uri=True),
|
||||
o=Value(value="Traditional British pastry recipe", is_uri=False)
|
||||
)
|
||||
]
|
||||
|
||||
contexts = processor.build_entity_contexts(triples)
|
||||
|
||||
assert len(contexts) == 1
|
||||
context_text = contexts[0].context
|
||||
# Should include label and definition
|
||||
assert "Label: Cornish Pasty Recipe" in context_text
|
||||
assert "Traditional British pastry recipe" in context_text
|
||||
# Should not include type or property info
|
||||
assert "Recipe" not in context_text or "Cornish Pasty Recipe" in context_text # Only in label
|
||||
assert "produces" not in context_text
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
518
tests/unit/test_extract/test_ontology/test_ontology_loading.py
Normal file
518
tests/unit/test_extract/test_ontology/test_ontology_loading.py
Normal file
|
|
@ -0,0 +1,518 @@
|
|||
"""
|
||||
Unit tests for ontology loading and configuration.
|
||||
|
||||
Tests that ontologies are properly loaded from configuration,
|
||||
parsed, validated, and managed by the OntologyLoader.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.extract.kg.ontology.ontology_loader import (
|
||||
OntologyLoader,
|
||||
Ontology,
|
||||
OntologyClass,
|
||||
OntologyProperty
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def ontology_loader():
|
||||
"""Create an OntologyLoader instance."""
|
||||
return OntologyLoader()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_ontology_config():
|
||||
"""Create a sample ontology configuration."""
|
||||
return {
|
||||
"food": {
|
||||
"metadata": {
|
||||
"name": "Food Ontology",
|
||||
"namespace": "http://purl.org/ontology/fo/"
|
||||
},
|
||||
"classes": {
|
||||
"Recipe": {
|
||||
"uri": "http://purl.org/ontology/fo/Recipe",
|
||||
"type": "owl:Class",
|
||||
"rdfs:label": [{"value": "Recipe", "lang": "en-gb"}],
|
||||
"rdfs:comment": "A Recipe is a combination of ingredients and a method."
|
||||
},
|
||||
"Ingredient": {
|
||||
"uri": "http://purl.org/ontology/fo/Ingredient",
|
||||
"type": "owl:Class",
|
||||
"rdfs:label": [{"value": "Ingredient", "lang": "en-gb"}],
|
||||
"rdfs:comment": "An Ingredient combines a quantity and a food."
|
||||
},
|
||||
"Food": {
|
||||
"uri": "http://purl.org/ontology/fo/Food",
|
||||
"type": "owl:Class",
|
||||
"rdfs:label": [{"value": "Food", "lang": "en-gb"}],
|
||||
"rdfs:comment": "A Food is something that can be eaten.",
|
||||
"rdfs:subClassOf": "EdibleThing"
|
||||
}
|
||||
},
|
||||
"objectProperties": {
|
||||
"ingredients": {
|
||||
"uri": "http://purl.org/ontology/fo/ingredients",
|
||||
"type": "owl:ObjectProperty",
|
||||
"rdfs:label": [{"value": "ingredients", "lang": "en-gb"}],
|
||||
"rdfs:domain": "Recipe",
|
||||
"rdfs:range": "IngredientList"
|
||||
},
|
||||
"produces": {
|
||||
"uri": "http://purl.org/ontology/fo/produces",
|
||||
"type": "owl:ObjectProperty",
|
||||
"rdfs:label": [{"value": "produces", "lang": "en-gb"}],
|
||||
"rdfs:domain": "Recipe",
|
||||
"rdfs:range": "Food"
|
||||
}
|
||||
},
|
||||
"datatypeProperties": {
|
||||
"serves": {
|
||||
"uri": "http://purl.org/ontology/fo/serves",
|
||||
"type": "owl:DatatypeProperty",
|
||||
"rdfs:label": [{"value": "serves", "lang": "en-gb"}],
|
||||
"rdfs:domain": "Recipe",
|
||||
"rdfs:range": "xsd:string"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class TestOntologyLoaderInitialization:
|
||||
"""Test suite for OntologyLoader initialization."""
|
||||
|
||||
def test_loader_starts_empty(self, ontology_loader):
|
||||
"""Test that loader initializes with no ontologies."""
|
||||
assert len(ontology_loader.get_all_ontologies()) == 0
|
||||
|
||||
def test_loader_get_nonexistent_ontology(self, ontology_loader):
|
||||
"""Test getting non-existent ontology returns None."""
|
||||
result = ontology_loader.get_ontology("nonexistent")
|
||||
assert result is None
|
||||
|
||||
|
||||
class TestOntologyLoading:
|
||||
"""Test suite for loading ontologies from configuration."""
|
||||
|
||||
def test_loads_single_ontology(self, ontology_loader, sample_ontology_config):
|
||||
"""Test loading a single ontology."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontologies = ontology_loader.get_all_ontologies()
|
||||
assert len(ontologies) == 1
|
||||
assert "food" in ontologies
|
||||
|
||||
def test_loaded_ontology_has_correct_id(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that loaded ontology has correct ID."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
assert ontology is not None
|
||||
assert ontology.id == "food"
|
||||
|
||||
def test_loaded_ontology_has_metadata(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that loaded ontology includes metadata."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
assert ontology.metadata["name"] == "Food Ontology"
|
||||
assert ontology.metadata["namespace"] == "http://purl.org/ontology/fo/"
|
||||
|
||||
def test_loaded_ontology_has_classes(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that loaded ontology includes classes."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
assert len(ontology.classes) == 3
|
||||
assert "Recipe" in ontology.classes
|
||||
assert "Ingredient" in ontology.classes
|
||||
assert "Food" in ontology.classes
|
||||
|
||||
def test_loaded_classes_have_correct_properties(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that loaded classes have correct properties."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
recipe = ontology.get_class("Recipe")
|
||||
|
||||
assert isinstance(recipe, OntologyClass)
|
||||
assert recipe.uri == "http://purl.org/ontology/fo/Recipe"
|
||||
assert recipe.type == "owl:Class"
|
||||
assert len(recipe.labels) == 1
|
||||
assert recipe.labels[0]["value"] == "Recipe"
|
||||
assert "combination of ingredients" in recipe.comment
|
||||
|
||||
def test_loaded_ontology_has_object_properties(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that loaded ontology includes object properties."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
assert len(ontology.object_properties) == 2
|
||||
assert "ingredients" in ontology.object_properties
|
||||
assert "produces" in ontology.object_properties
|
||||
|
||||
def test_loaded_properties_have_domain_and_range(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that loaded properties have domain and range."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
produces = ontology.get_property("produces")
|
||||
|
||||
assert isinstance(produces, OntologyProperty)
|
||||
assert produces.domain == "Recipe"
|
||||
assert produces.range == "Food"
|
||||
|
||||
def test_loaded_ontology_has_datatype_properties(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that loaded ontology includes datatype properties."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
assert len(ontology.datatype_properties) == 1
|
||||
assert "serves" in ontology.datatype_properties
|
||||
|
||||
def test_loads_multiple_ontologies(self, ontology_loader):
|
||||
"""Test loading multiple ontologies."""
|
||||
config = {
|
||||
"food": {
|
||||
"metadata": {"name": "Food Ontology"},
|
||||
"classes": {"Recipe": {"uri": "http://example.com/Recipe"}},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
},
|
||||
"music": {
|
||||
"metadata": {"name": "Music Ontology"},
|
||||
"classes": {"Song": {"uri": "http://example.com/Song"}},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
|
||||
ontology_loader.update_ontologies(config)
|
||||
|
||||
ontologies = ontology_loader.get_all_ontologies()
|
||||
assert len(ontologies) == 2
|
||||
assert "food" in ontologies
|
||||
assert "music" in ontologies
|
||||
|
||||
def test_update_replaces_existing_ontologies(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that update replaces existing ontologies."""
|
||||
# Load initial ontologies
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
assert len(ontology_loader.get_all_ontologies()) == 1
|
||||
|
||||
# Update with different config
|
||||
new_config = {
|
||||
"music": {
|
||||
"metadata": {"name": "Music Ontology"},
|
||||
"classes": {},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
ontology_loader.update_ontologies(new_config)
|
||||
|
||||
# Old ontologies should be replaced
|
||||
ontologies = ontology_loader.get_all_ontologies()
|
||||
assert len(ontologies) == 1
|
||||
assert "music" in ontologies
|
||||
assert "food" not in ontologies
|
||||
|
||||
|
||||
class TestOntologyRetrieval:
|
||||
"""Test suite for retrieving ontologies."""
|
||||
|
||||
def test_get_ontology_by_id(self, ontology_loader, sample_ontology_config):
|
||||
"""Test retrieving ontology by ID."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
assert ontology is not None
|
||||
assert isinstance(ontology, Ontology)
|
||||
|
||||
def test_get_all_ontologies(self, ontology_loader):
|
||||
"""Test retrieving all ontologies."""
|
||||
config = {
|
||||
"food": {
|
||||
"metadata": {},
|
||||
"classes": {},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
},
|
||||
"music": {
|
||||
"metadata": {},
|
||||
"classes": {},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
ontology_loader.update_ontologies(config)
|
||||
|
||||
ontologies = ontology_loader.get_all_ontologies()
|
||||
assert isinstance(ontologies, dict)
|
||||
assert len(ontologies) == 2
|
||||
|
||||
def test_get_all_ontology_ids(self, ontology_loader):
|
||||
"""Test retrieving all ontology IDs."""
|
||||
config = {
|
||||
"food": {
|
||||
"metadata": {},
|
||||
"classes": {},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
},
|
||||
"music": {
|
||||
"metadata": {},
|
||||
"classes": {},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
ontology_loader.update_ontologies(config)
|
||||
|
||||
ontologies = ontology_loader.get_all_ontologies()
|
||||
ids = list(ontologies.keys())
|
||||
assert len(ids) == 2
|
||||
assert "food" in ids
|
||||
assert "music" in ids
|
||||
|
||||
|
||||
class TestOntologyClassMethods:
|
||||
"""Test suite for Ontology helper methods."""
|
||||
|
||||
def test_get_class(self, ontology_loader, sample_ontology_config):
|
||||
"""Test getting a class from ontology."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
|
||||
recipe = ontology.get_class("Recipe")
|
||||
assert recipe is not None
|
||||
assert recipe.uri == "http://purl.org/ontology/fo/Recipe"
|
||||
|
||||
def test_get_nonexistent_class(self, ontology_loader, sample_ontology_config):
|
||||
"""Test getting non-existent class returns None."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
|
||||
result = ontology.get_class("NonExistent")
|
||||
assert result is None
|
||||
|
||||
def test_get_property(self, ontology_loader, sample_ontology_config):
|
||||
"""Test getting a property from ontology."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
|
||||
produces = ontology.get_property("produces")
|
||||
assert produces is not None
|
||||
assert produces.domain == "Recipe"
|
||||
|
||||
def test_get_property_checks_both_types(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that get_property checks both object and datatype properties."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
|
||||
# Object property
|
||||
produces = ontology.get_property("produces")
|
||||
assert produces is not None
|
||||
|
||||
# Datatype property
|
||||
serves = ontology.get_property("serves")
|
||||
assert serves is not None
|
||||
|
||||
def test_get_parent_classes(self, ontology_loader, sample_ontology_config):
|
||||
"""Test getting parent classes following subClassOf."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
|
||||
parents = ontology.get_parent_classes("Food")
|
||||
assert "EdibleThing" in parents
|
||||
|
||||
def test_get_parent_classes_empty_for_root(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that root classes have no parents."""
|
||||
ontology_loader.update_ontologies(sample_ontology_config)
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
|
||||
parents = ontology.get_parent_classes("Recipe")
|
||||
assert len(parents) == 0
|
||||
|
||||
|
||||
class TestOntologyValidation:
|
||||
"""Test suite for ontology validation."""
|
||||
|
||||
def test_validates_property_domain_exists(self, ontology_loader):
|
||||
"""Test validation of property domain."""
|
||||
config = {
|
||||
"test": {
|
||||
"metadata": {},
|
||||
"classes": {
|
||||
"Recipe": {"uri": "http://example.com/Recipe"}
|
||||
},
|
||||
"objectProperties": {
|
||||
"produces": {
|
||||
"uri": "http://example.com/produces",
|
||||
"type": "owl:ObjectProperty",
|
||||
"rdfs:domain": "NonExistentClass", # Invalid
|
||||
"rdfs:range": "Food"
|
||||
}
|
||||
},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
|
||||
ontology_loader.update_ontologies(config)
|
||||
ontology = ontology_loader.get_ontology("test")
|
||||
|
||||
issues = ontology.validate_structure()
|
||||
assert len(issues) > 0
|
||||
assert any("unknown domain" in issue.lower() for issue in issues)
|
||||
|
||||
def test_validates_object_property_range_exists(self, ontology_loader):
|
||||
"""Test validation of object property range."""
|
||||
config = {
|
||||
"test": {
|
||||
"metadata": {},
|
||||
"classes": {
|
||||
"Recipe": {"uri": "http://example.com/Recipe"}
|
||||
},
|
||||
"objectProperties": {
|
||||
"produces": {
|
||||
"uri": "http://example.com/produces",
|
||||
"type": "owl:ObjectProperty",
|
||||
"rdfs:domain": "Recipe",
|
||||
"rdfs:range": "NonExistentClass" # Invalid
|
||||
}
|
||||
},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
|
||||
ontology_loader.update_ontologies(config)
|
||||
ontology = ontology_loader.get_ontology("test")
|
||||
|
||||
issues = ontology.validate_structure()
|
||||
assert len(issues) > 0
|
||||
assert any("unknown range" in issue.lower() for issue in issues)
|
||||
|
||||
def test_detects_circular_inheritance(self, ontology_loader):
|
||||
"""Test detection of circular inheritance."""
|
||||
config = {
|
||||
"test": {
|
||||
"metadata": {},
|
||||
"classes": {
|
||||
"A": {
|
||||
"uri": "http://example.com/A",
|
||||
"rdfs:subClassOf": "B"
|
||||
},
|
||||
"B": {
|
||||
"uri": "http://example.com/B",
|
||||
"rdfs:subClassOf": "C"
|
||||
},
|
||||
"C": {
|
||||
"uri": "http://example.com/C",
|
||||
"rdfs:subClassOf": "A" # Circular!
|
||||
}
|
||||
},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
|
||||
ontology_loader.update_ontologies(config)
|
||||
ontology = ontology_loader.get_ontology("test")
|
||||
|
||||
issues = ontology.validate_structure()
|
||||
assert len(issues) > 0
|
||||
assert any("circular" in issue.lower() for issue in issues)
|
||||
|
||||
def test_valid_ontology_has_no_issues(self, ontology_loader, sample_ontology_config):
|
||||
"""Test that valid ontology passes validation."""
|
||||
# Modify config to have valid references
|
||||
config = sample_ontology_config.copy()
|
||||
config["food"]["classes"]["EdibleThing"] = {
|
||||
"uri": "http://purl.org/ontology/fo/EdibleThing"
|
||||
}
|
||||
config["food"]["classes"]["IngredientList"] = {
|
||||
"uri": "http://purl.org/ontology/fo/IngredientList"
|
||||
}
|
||||
|
||||
ontology_loader.update_ontologies(config)
|
||||
ontology = ontology_loader.get_ontology("food")
|
||||
|
||||
issues = ontology.validate_structure()
|
||||
# Should have minimal or no issues for valid ontology
|
||||
assert isinstance(issues, list)
|
||||
|
||||
|
||||
class TestEdgeCases:
|
||||
"""Test suite for edge cases in ontology loading."""
|
||||
|
||||
def test_handles_empty_config(self, ontology_loader):
|
||||
"""Test handling of empty configuration."""
|
||||
ontology_loader.update_ontologies({})
|
||||
|
||||
ontologies = ontology_loader.get_all_ontologies()
|
||||
assert len(ontologies) == 0
|
||||
|
||||
def test_handles_ontology_without_classes(self, ontology_loader):
|
||||
"""Test handling of ontology with no classes."""
|
||||
config = {
|
||||
"minimal": {
|
||||
"metadata": {"name": "Minimal"},
|
||||
"classes": {},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
|
||||
ontology_loader.update_ontologies(config)
|
||||
ontology = ontology_loader.get_ontology("minimal")
|
||||
|
||||
assert ontology is not None
|
||||
assert len(ontology.classes) == 0
|
||||
|
||||
def test_handles_ontology_without_properties(self, ontology_loader):
|
||||
"""Test handling of ontology with no properties."""
|
||||
config = {
|
||||
"test": {
|
||||
"metadata": {},
|
||||
"classes": {
|
||||
"Recipe": {"uri": "http://example.com/Recipe"}
|
||||
},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
|
||||
ontology_loader.update_ontologies(config)
|
||||
ontology = ontology_loader.get_ontology("test")
|
||||
|
||||
assert ontology is not None
|
||||
assert len(ontology.object_properties) == 0
|
||||
assert len(ontology.datatype_properties) == 0
|
||||
|
||||
def test_handles_missing_optional_fields(self, ontology_loader):
|
||||
"""Test handling of missing optional fields."""
|
||||
config = {
|
||||
"test": {
|
||||
"metadata": {},
|
||||
"classes": {
|
||||
"Simple": {
|
||||
"uri": "http://example.com/Simple"
|
||||
# No labels, comments, subclass, etc.
|
||||
}
|
||||
},
|
||||
"objectProperties": {},
|
||||
"datatypeProperties": {}
|
||||
}
|
||||
}
|
||||
|
||||
ontology_loader.update_ontologies(config)
|
||||
ontology = ontology_loader.get_ontology("test")
|
||||
|
||||
simple = ontology.get_class("Simple")
|
||||
assert simple is not None
|
||||
assert simple.uri == "http://example.com/Simple"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
336
tests/unit/test_extract/test_ontology/test_ontology_selector.py
Normal file
336
tests/unit/test_extract/test_ontology/test_ontology_selector.py
Normal file
|
|
@ -0,0 +1,336 @@
|
|||
"""
|
||||
Unit tests for OntologySelector component.
|
||||
|
||||
Tests the critical auto-include properties feature that automatically pulls in
|
||||
all properties related to selected classes.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from unittest.mock import Mock, AsyncMock
|
||||
from trustgraph.extract.kg.ontology.ontology_selector import (
|
||||
OntologySelector,
|
||||
OntologySubset
|
||||
)
|
||||
from trustgraph.extract.kg.ontology.ontology_loader import (
|
||||
Ontology,
|
||||
OntologyClass,
|
||||
OntologyProperty
|
||||
)
|
||||
from trustgraph.extract.kg.ontology.text_processor import TextSegment
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_ontology():
|
||||
"""Create a sample food ontology for testing."""
|
||||
# Create classes
|
||||
recipe_class = OntologyClass(
|
||||
uri="http://purl.org/ontology/fo/Recipe",
|
||||
type="owl:Class",
|
||||
labels=[{"value": "Recipe", "lang": "en-gb"}],
|
||||
comment="A Recipe is a combination of ingredients and a method."
|
||||
)
|
||||
|
||||
ingredient_class = OntologyClass(
|
||||
uri="http://purl.org/ontology/fo/Ingredient",
|
||||
type="owl:Class",
|
||||
labels=[{"value": "Ingredient", "lang": "en-gb"}],
|
||||
comment="An Ingredient is a combination of a quantity and a food."
|
||||
)
|
||||
|
||||
food_class = OntologyClass(
|
||||
uri="http://purl.org/ontology/fo/Food",
|
||||
type="owl:Class",
|
||||
labels=[{"value": "Food", "lang": "en-gb"}],
|
||||
comment="A Food is something that can be eaten."
|
||||
)
|
||||
|
||||
method_class = OntologyClass(
|
||||
uri="http://purl.org/ontology/fo/Method",
|
||||
type="owl:Class",
|
||||
labels=[{"value": "Method", "lang": "en-gb"}],
|
||||
comment="A Method is the way in which ingredients are combined."
|
||||
)
|
||||
|
||||
# Create object properties
|
||||
ingredients_prop = OntologyProperty(
|
||||
uri="http://purl.org/ontology/fo/ingredients",
|
||||
type="owl:ObjectProperty",
|
||||
labels=[{"value": "ingredients", "lang": "en-gb"}],
|
||||
comment="The ingredients property relates a recipe to an ingredient list.",
|
||||
domain="Recipe",
|
||||
range="IngredientList"
|
||||
)
|
||||
|
||||
food_prop = OntologyProperty(
|
||||
uri="http://purl.org/ontology/fo/food",
|
||||
type="owl:ObjectProperty",
|
||||
labels=[{"value": "food", "lang": "en-gb"}],
|
||||
comment="The food property relates an ingredient to the food that is required.",
|
||||
domain="Ingredient",
|
||||
range="Food"
|
||||
)
|
||||
|
||||
method_prop = OntologyProperty(
|
||||
uri="http://purl.org/ontology/fo/method",
|
||||
type="owl:ObjectProperty",
|
||||
labels=[{"value": "method", "lang": "en-gb"}],
|
||||
comment="The method property relates a recipe to the method used.",
|
||||
domain="Recipe",
|
||||
range="Method"
|
||||
)
|
||||
|
||||
produces_prop = OntologyProperty(
|
||||
uri="http://purl.org/ontology/fo/produces",
|
||||
type="owl:ObjectProperty",
|
||||
labels=[{"value": "produces", "lang": "en-gb"}],
|
||||
comment="The produces property relates a recipe to the food it produces.",
|
||||
domain="Recipe",
|
||||
range="Food"
|
||||
)
|
||||
|
||||
# Create datatype properties
|
||||
serves_prop = OntologyProperty(
|
||||
uri="http://purl.org/ontology/fo/serves",
|
||||
type="owl:DatatypeProperty",
|
||||
labels=[{"value": "serves", "lang": "en-gb"}],
|
||||
comment="The serves property indicates what the recipe is intended to serve.",
|
||||
domain="Recipe",
|
||||
range="xsd:string"
|
||||
)
|
||||
|
||||
# Build ontology
|
||||
ontology = Ontology(
|
||||
id="food",
|
||||
metadata={
|
||||
"name": "Food Ontology",
|
||||
"namespace": "http://purl.org/ontology/fo/"
|
||||
},
|
||||
classes={
|
||||
"Recipe": recipe_class,
|
||||
"Ingredient": ingredient_class,
|
||||
"Food": food_class,
|
||||
"Method": method_class
|
||||
},
|
||||
object_properties={
|
||||
"ingredients": ingredients_prop,
|
||||
"food": food_prop,
|
||||
"method": method_prop,
|
||||
"produces": produces_prop
|
||||
},
|
||||
datatype_properties={
|
||||
"serves": serves_prop
|
||||
}
|
||||
)
|
||||
|
||||
return ontology
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def ontology_loader_with_sample(sample_ontology):
|
||||
"""Create an OntologyLoader with the sample ontology."""
|
||||
loader = Mock()
|
||||
loader.get_ontology = Mock(return_value=sample_ontology)
|
||||
loader.ontologies = {"food": sample_ontology}
|
||||
return loader
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def ontology_embedder():
|
||||
"""Create a mock OntologyEmbedder."""
|
||||
embedder = Mock()
|
||||
embedder.embed_text = AsyncMock(return_value=[0.1, 0.2, 0.3]) # Mock embedding
|
||||
|
||||
# Mock vector store with search results
|
||||
vector_store = Mock()
|
||||
embedder.get_vector_store = Mock(return_value=vector_store)
|
||||
|
||||
return embedder
|
||||
|
||||
|
||||
class TestOntologySelector:
|
||||
"""Test suite for OntologySelector."""
|
||||
|
||||
def test_auto_include_properties_for_recipe_class(
|
||||
self, ontology_loader_with_sample, ontology_embedder, sample_ontology
|
||||
):
|
||||
"""Test that selecting Recipe class automatically includes all related properties."""
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=ontology_embedder,
|
||||
ontology_loader=ontology_loader_with_sample,
|
||||
top_k=10,
|
||||
similarity_threshold=0.3
|
||||
)
|
||||
|
||||
# Create a subset with only Recipe class initially selected
|
||||
subset = OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={"Recipe": sample_ontology.classes["Recipe"].__dict__},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata=sample_ontology.metadata,
|
||||
relevance_score=0.8
|
||||
)
|
||||
|
||||
# Resolve dependencies (this is where auto-include happens)
|
||||
selector._resolve_dependencies(subset)
|
||||
|
||||
# Assert that properties with Recipe in domain are included
|
||||
assert "ingredients" in subset.object_properties, \
|
||||
"ingredients property should be auto-included (Recipe in domain)"
|
||||
assert "method" in subset.object_properties, \
|
||||
"method property should be auto-included (Recipe in domain)"
|
||||
assert "produces" in subset.object_properties, \
|
||||
"produces property should be auto-included (Recipe in domain)"
|
||||
assert "serves" in subset.datatype_properties, \
|
||||
"serves property should be auto-included (Recipe in domain)"
|
||||
|
||||
# Assert that unrelated property is NOT included
|
||||
assert "food" not in subset.object_properties, \
|
||||
"food property should NOT be included (Recipe not in domain/range)"
|
||||
|
||||
def test_auto_include_properties_for_ingredient_class(
|
||||
self, ontology_loader_with_sample, ontology_embedder, sample_ontology
|
||||
):
|
||||
"""Test that selecting Ingredient class includes properties with Ingredient in domain."""
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=ontology_embedder,
|
||||
ontology_loader=ontology_loader_with_sample
|
||||
)
|
||||
|
||||
subset = OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={"Ingredient": sample_ontology.classes["Ingredient"].__dict__},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata=sample_ontology.metadata
|
||||
)
|
||||
|
||||
selector._resolve_dependencies(subset)
|
||||
|
||||
# Ingredient has 'food' property in domain
|
||||
assert "food" in subset.object_properties, \
|
||||
"food property should be auto-included (Ingredient in domain)"
|
||||
|
||||
# Recipe-related properties should NOT be included
|
||||
assert "ingredients" not in subset.object_properties
|
||||
assert "method" not in subset.object_properties
|
||||
|
||||
def test_auto_include_properties_for_range_class(
|
||||
self, ontology_loader_with_sample, ontology_embedder, sample_ontology
|
||||
):
|
||||
"""Test that selecting a class includes properties with that class in range."""
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=ontology_embedder,
|
||||
ontology_loader=ontology_loader_with_sample
|
||||
)
|
||||
|
||||
subset = OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={"Food": sample_ontology.classes["Food"].__dict__},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata=sample_ontology.metadata
|
||||
)
|
||||
|
||||
selector._resolve_dependencies(subset)
|
||||
|
||||
# Food appears in range of 'food' and 'produces' properties
|
||||
assert "food" in subset.object_properties, \
|
||||
"food property should be auto-included (Food in range)"
|
||||
assert "produces" in subset.object_properties, \
|
||||
"produces property should be auto-included (Food in range)"
|
||||
|
||||
def test_auto_include_adds_domain_and_range_classes(
|
||||
self, ontology_loader_with_sample, ontology_embedder, sample_ontology
|
||||
):
|
||||
"""Test that auto-included properties also add their domain/range classes."""
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=ontology_embedder,
|
||||
ontology_loader=ontology_loader_with_sample
|
||||
)
|
||||
|
||||
# Start with only Recipe class
|
||||
subset = OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={"Recipe": sample_ontology.classes["Recipe"].__dict__},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata=sample_ontology.metadata
|
||||
)
|
||||
|
||||
selector._resolve_dependencies(subset)
|
||||
|
||||
# Should auto-include 'produces' property (Recipe → Food)
|
||||
assert "produces" in subset.object_properties
|
||||
|
||||
# Should also add Food class (range of produces)
|
||||
assert "Food" in subset.classes, \
|
||||
"Food class should be added (range of auto-included produces property)"
|
||||
|
||||
# Should also add Method class (range of method property)
|
||||
assert "Method" in subset.classes, \
|
||||
"Method class should be added (range of auto-included method property)"
|
||||
|
||||
def test_multiple_classes_get_all_related_properties(
|
||||
self, ontology_loader_with_sample, ontology_embedder, sample_ontology
|
||||
):
|
||||
"""Test that selecting multiple classes includes all their related properties."""
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=ontology_embedder,
|
||||
ontology_loader=ontology_loader_with_sample
|
||||
)
|
||||
|
||||
# Select both Recipe and Ingredient classes
|
||||
subset = OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={
|
||||
"Recipe": sample_ontology.classes["Recipe"].__dict__,
|
||||
"Ingredient": sample_ontology.classes["Ingredient"].__dict__
|
||||
},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata=sample_ontology.metadata
|
||||
)
|
||||
|
||||
selector._resolve_dependencies(subset)
|
||||
|
||||
# Should include Recipe-related properties
|
||||
assert "ingredients" in subset.object_properties
|
||||
assert "method" in subset.object_properties
|
||||
assert "produces" in subset.object_properties
|
||||
assert "serves" in subset.datatype_properties
|
||||
|
||||
# Should also include Ingredient-related properties
|
||||
assert "food" in subset.object_properties
|
||||
|
||||
def test_no_duplicate_properties_added(
|
||||
self, ontology_loader_with_sample, ontology_embedder, sample_ontology
|
||||
):
|
||||
"""Test that properties aren't added multiple times."""
|
||||
selector = OntologySelector(
|
||||
ontology_embedder=ontology_embedder,
|
||||
ontology_loader=ontology_loader_with_sample
|
||||
)
|
||||
|
||||
# Start with Recipe and Food (both related to 'produces')
|
||||
subset = OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={
|
||||
"Recipe": sample_ontology.classes["Recipe"].__dict__,
|
||||
"Food": sample_ontology.classes["Food"].__dict__
|
||||
},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata=sample_ontology.metadata
|
||||
)
|
||||
|
||||
selector._resolve_dependencies(subset)
|
||||
|
||||
# 'produces' should be included once (not duplicated)
|
||||
assert "produces" in subset.object_properties
|
||||
# Count would be 1 - dict keys are unique, so this is guaranteed
|
||||
# but worth documenting the expected behavior
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
300
tests/unit/test_extract/test_ontology/test_ontology_triples.py
Normal file
300
tests/unit/test_extract/test_ontology/test_ontology_triples.py
Normal file
|
|
@ -0,0 +1,300 @@
|
|||
"""
|
||||
Unit tests for ontology triple generation.
|
||||
|
||||
Tests that ontology elements (classes and properties) are properly converted
|
||||
to RDF triples with labels, comments, domains, and ranges so they appear in
|
||||
the knowledge graph.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.extract.kg.ontology.extract import Processor
|
||||
from trustgraph.extract.kg.ontology.ontology_selector import OntologySubset
|
||||
from trustgraph.schema.core.primitives import Triple, Value
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def extractor():
|
||||
"""Create a Processor instance for testing."""
|
||||
extractor = object.__new__(Processor)
|
||||
return extractor
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_ontology_subset():
|
||||
"""Create a sample ontology subset with classes and properties."""
|
||||
return OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={
|
||||
"Recipe": {
|
||||
"uri": "http://purl.org/ontology/fo/Recipe",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Recipe", "lang": "en-gb"}],
|
||||
"comment": "A Recipe is a combination of ingredients and a method.",
|
||||
"subclass_of": None
|
||||
},
|
||||
"Ingredient": {
|
||||
"uri": "http://purl.org/ontology/fo/Ingredient",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Ingredient", "lang": "en-gb"}],
|
||||
"comment": "An Ingredient combines a quantity and a food.",
|
||||
"subclass_of": None
|
||||
},
|
||||
"Food": {
|
||||
"uri": "http://purl.org/ontology/fo/Food",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Food", "lang": "en-gb"}],
|
||||
"comment": "A Food is something that can be eaten.",
|
||||
"subclass_of": None
|
||||
}
|
||||
},
|
||||
object_properties={
|
||||
"ingredients": {
|
||||
"uri": "http://purl.org/ontology/fo/ingredients",
|
||||
"type": "owl:ObjectProperty",
|
||||
"labels": [{"value": "ingredients", "lang": "en-gb"}],
|
||||
"comment": "The ingredients property relates a recipe to an ingredient list.",
|
||||
"domain": "Recipe",
|
||||
"range": "IngredientList"
|
||||
},
|
||||
"produces": {
|
||||
"uri": "http://purl.org/ontology/fo/produces",
|
||||
"type": "owl:ObjectProperty",
|
||||
"labels": [{"value": "produces", "lang": "en-gb"}],
|
||||
"comment": "The produces property relates a recipe to the food it produces.",
|
||||
"domain": "Recipe",
|
||||
"range": "Food"
|
||||
}
|
||||
},
|
||||
datatype_properties={
|
||||
"serves": {
|
||||
"uri": "http://purl.org/ontology/fo/serves",
|
||||
"type": "owl:DatatypeProperty",
|
||||
"labels": [{"value": "serves", "lang": "en-gb"}],
|
||||
"comment": "The serves property indicates serving size.",
|
||||
"domain": "Recipe",
|
||||
"rdfs:range": "xsd:string"
|
||||
}
|
||||
},
|
||||
metadata={
|
||||
"name": "Food Ontology",
|
||||
"namespace": "http://purl.org/ontology/fo/"
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
class TestOntologyTripleGeneration:
|
||||
"""Test suite for ontology triple generation."""
|
||||
|
||||
def test_generates_class_type_triples(self, extractor, sample_ontology_subset):
|
||||
"""Test that classes get rdf:type owl:Class triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find type triples for Recipe class
|
||||
recipe_type_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/Recipe"
|
||||
and t.p.value == "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
|
||||
]
|
||||
|
||||
assert len(recipe_type_triples) == 1, "Should generate exactly one type triple per class"
|
||||
assert recipe_type_triples[0].o.value == "http://www.w3.org/2002/07/owl#Class", \
|
||||
"Class type should be owl:Class"
|
||||
|
||||
def test_generates_class_labels(self, extractor, sample_ontology_subset):
|
||||
"""Test that classes get rdfs:label triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find label triples for Recipe class
|
||||
recipe_label_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/Recipe"
|
||||
and t.p.value == "http://www.w3.org/2000/01/rdf-schema#label"
|
||||
]
|
||||
|
||||
assert len(recipe_label_triples) == 1, "Should generate label triple for class"
|
||||
assert recipe_label_triples[0].o.value == "Recipe", \
|
||||
"Label should match class label from ontology"
|
||||
assert not recipe_label_triples[0].o.is_uri, \
|
||||
"Label should be a literal, not URI"
|
||||
|
||||
def test_generates_class_comments(self, extractor, sample_ontology_subset):
|
||||
"""Test that classes get rdfs:comment triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find comment triples for Recipe class
|
||||
recipe_comment_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/Recipe"
|
||||
and t.p.value == "http://www.w3.org/2000/01/rdf-schema#comment"
|
||||
]
|
||||
|
||||
assert len(recipe_comment_triples) == 1, "Should generate comment triple for class"
|
||||
assert "combination of ingredients and a method" in recipe_comment_triples[0].o.value, \
|
||||
"Comment should match class description from ontology"
|
||||
|
||||
def test_generates_object_property_type_triples(self, extractor, sample_ontology_subset):
|
||||
"""Test that object properties get rdf:type owl:ObjectProperty triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find type triples for ingredients property
|
||||
ingredients_type_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/ingredients"
|
||||
and t.p.value == "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
|
||||
]
|
||||
|
||||
assert len(ingredients_type_triples) == 1, \
|
||||
"Should generate exactly one type triple per object property"
|
||||
assert ingredients_type_triples[0].o.value == "http://www.w3.org/2002/07/owl#ObjectProperty", \
|
||||
"Object property type should be owl:ObjectProperty"
|
||||
|
||||
def test_generates_object_property_labels(self, extractor, sample_ontology_subset):
|
||||
"""Test that object properties get rdfs:label triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find label triples for ingredients property
|
||||
ingredients_label_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/ingredients"
|
||||
and t.p.value == "http://www.w3.org/2000/01/rdf-schema#label"
|
||||
]
|
||||
|
||||
assert len(ingredients_label_triples) == 1, \
|
||||
"Should generate label triple for object property"
|
||||
assert ingredients_label_triples[0].o.value == "ingredients", \
|
||||
"Label should match property label from ontology"
|
||||
|
||||
def test_generates_object_property_domain(self, extractor, sample_ontology_subset):
|
||||
"""Test that object properties get rdfs:domain triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find domain triples for ingredients property
|
||||
ingredients_domain_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/ingredients"
|
||||
and t.p.value == "http://www.w3.org/2000/01/rdf-schema#domain"
|
||||
]
|
||||
|
||||
assert len(ingredients_domain_triples) == 1, \
|
||||
"Should generate domain triple for object property"
|
||||
assert ingredients_domain_triples[0].o.value == "http://purl.org/ontology/fo/Recipe", \
|
||||
"Domain should be Recipe class URI"
|
||||
assert ingredients_domain_triples[0].o.is_uri, \
|
||||
"Domain should be a URI reference"
|
||||
|
||||
def test_generates_object_property_range(self, extractor, sample_ontology_subset):
|
||||
"""Test that object properties get rdfs:range triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find range triples for produces property
|
||||
produces_range_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/produces"
|
||||
and t.p.value == "http://www.w3.org/2000/01/rdf-schema#range"
|
||||
]
|
||||
|
||||
assert len(produces_range_triples) == 1, \
|
||||
"Should generate range triple for object property"
|
||||
assert produces_range_triples[0].o.value == "http://purl.org/ontology/fo/Food", \
|
||||
"Range should be Food class URI"
|
||||
|
||||
def test_generates_datatype_property_type_triples(self, extractor, sample_ontology_subset):
|
||||
"""Test that datatype properties get rdf:type owl:DatatypeProperty triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find type triples for serves property
|
||||
serves_type_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/serves"
|
||||
and t.p.value == "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
|
||||
]
|
||||
|
||||
assert len(serves_type_triples) == 1, \
|
||||
"Should generate exactly one type triple per datatype property"
|
||||
assert serves_type_triples[0].o.value == "http://www.w3.org/2002/07/owl#DatatypeProperty", \
|
||||
"Datatype property type should be owl:DatatypeProperty"
|
||||
|
||||
def test_generates_datatype_property_range(self, extractor, sample_ontology_subset):
|
||||
"""Test that datatype properties get rdfs:range triples with XSD types."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Find range triples for serves property
|
||||
serves_range_triples = [
|
||||
t for t in triples
|
||||
if t.s.value == "http://purl.org/ontology/fo/serves"
|
||||
and t.p.value == "http://www.w3.org/2000/01/rdf-schema#range"
|
||||
]
|
||||
|
||||
assert len(serves_range_triples) == 1, \
|
||||
"Should generate range triple for datatype property"
|
||||
assert serves_range_triples[0].o.value == "http://www.w3.org/2001/XMLSchema#string", \
|
||||
"Range should be XSD type URI (xsd:string expanded)"
|
||||
|
||||
def test_generates_triples_for_all_classes(self, extractor, sample_ontology_subset):
|
||||
"""Test that triples are generated for all classes in the subset."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Count unique class subjects
|
||||
class_subjects = set(
|
||||
t.s.value for t in triples
|
||||
if t.p.value == "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
|
||||
and t.o.value == "http://www.w3.org/2002/07/owl#Class"
|
||||
)
|
||||
|
||||
assert len(class_subjects) == 3, \
|
||||
"Should generate triples for all 3 classes (Recipe, Ingredient, Food)"
|
||||
|
||||
def test_generates_triples_for_all_properties(self, extractor, sample_ontology_subset):
|
||||
"""Test that triples are generated for all properties in the subset."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Count unique property subjects (object + datatype properties)
|
||||
property_subjects = set(
|
||||
t.s.value for t in triples
|
||||
if t.p.value == "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
|
||||
and ("ObjectProperty" in t.o.value or "DatatypeProperty" in t.o.value)
|
||||
)
|
||||
|
||||
assert len(property_subjects) == 3, \
|
||||
"Should generate triples for all 3 properties (ingredients, produces, serves)"
|
||||
|
||||
def test_uses_dict_field_names_not_rdf_names(self, extractor, sample_ontology_subset):
|
||||
"""Test that triple generation works with dict field names (labels, comment, domain, range).
|
||||
|
||||
This is critical - the ontology subset has dicts with Python field names,
|
||||
not RDF property names.
|
||||
"""
|
||||
# Verify the subset uses dict field names
|
||||
recipe_def = sample_ontology_subset.classes["Recipe"]
|
||||
assert isinstance(recipe_def, dict), "Class definitions should be dicts"
|
||||
assert "labels" in recipe_def, "Should use 'labels' not 'rdfs:label'"
|
||||
assert "comment" in recipe_def, "Should use 'comment' not 'rdfs:comment'"
|
||||
|
||||
# Now verify triple generation works
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Should still generate proper RDF triples despite dict field names
|
||||
label_triples = [
|
||||
t for t in triples
|
||||
if t.p.value == "http://www.w3.org/2000/01/rdf-schema#label"
|
||||
]
|
||||
assert len(label_triples) > 0, \
|
||||
"Should generate rdfs:label triples from dict 'labels' field"
|
||||
|
||||
def test_total_triple_count_is_reasonable(self, extractor, sample_ontology_subset):
|
||||
"""Test that we generate a reasonable number of triples."""
|
||||
triples = extractor.build_ontology_triples(sample_ontology_subset)
|
||||
|
||||
# Each class gets: type, label, comment (3 triples)
|
||||
# Each object property gets: type, label, comment, domain, range (5 triples)
|
||||
# Each datatype property gets: type, label, comment, domain, range (5 triples)
|
||||
# Expected: 3 classes * 3 + 2 object props * 5 + 1 datatype prop * 5 = 9 + 10 + 5 = 24
|
||||
|
||||
assert len(triples) >= 20, \
|
||||
"Should generate substantial number of triples for ontology elements"
|
||||
assert len(triples) < 50, \
|
||||
"Should not generate excessive duplicate triples"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
|
|
@ -0,0 +1,414 @@
|
|||
"""
|
||||
Unit tests for LLM prompt construction and triple extraction.
|
||||
|
||||
Tests that the system correctly constructs prompts with ontology constraints
|
||||
and extracts/validates triples from LLM responses.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.extract.kg.ontology.extract import Processor
|
||||
from trustgraph.extract.kg.ontology.ontology_selector import OntologySubset
|
||||
from trustgraph.schema.core.primitives import Triple, Value
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def extractor():
|
||||
"""Create a Processor instance for testing."""
|
||||
extractor = object.__new__(Processor)
|
||||
extractor.URI_PREFIXES = {
|
||||
"rdf:": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
|
||||
"rdfs:": "http://www.w3.org/2000/01/rdf-schema#",
|
||||
"owl:": "http://www.w3.org/2002/07/owl#",
|
||||
"xsd:": "http://www.w3.org/2001/XMLSchema#",
|
||||
}
|
||||
return extractor
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_ontology_subset():
|
||||
"""Create a sample ontology subset for extraction testing."""
|
||||
return OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={
|
||||
"Recipe": {
|
||||
"uri": "http://purl.org/ontology/fo/Recipe",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Recipe", "lang": "en-gb"}],
|
||||
"comment": "A Recipe is a combination of ingredients and a method."
|
||||
},
|
||||
"Ingredient": {
|
||||
"uri": "http://purl.org/ontology/fo/Ingredient",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Ingredient", "lang": "en-gb"}],
|
||||
"comment": "An Ingredient combines a quantity and a food."
|
||||
},
|
||||
"Food": {
|
||||
"uri": "http://purl.org/ontology/fo/Food",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Food", "lang": "en-gb"}],
|
||||
"comment": "A Food is something that can be eaten."
|
||||
}
|
||||
},
|
||||
object_properties={
|
||||
"ingredients": {
|
||||
"uri": "http://purl.org/ontology/fo/ingredients",
|
||||
"type": "owl:ObjectProperty",
|
||||
"labels": [{"value": "ingredients", "lang": "en-gb"}],
|
||||
"comment": "The ingredients property relates a recipe to an ingredient list.",
|
||||
"domain": "Recipe",
|
||||
"range": "IngredientList"
|
||||
},
|
||||
"food": {
|
||||
"uri": "http://purl.org/ontology/fo/food",
|
||||
"type": "owl:ObjectProperty",
|
||||
"labels": [{"value": "food", "lang": "en-gb"}],
|
||||
"comment": "The food property relates an ingredient to food.",
|
||||
"domain": "Ingredient",
|
||||
"range": "Food"
|
||||
},
|
||||
"produces": {
|
||||
"uri": "http://purl.org/ontology/fo/produces",
|
||||
"type": "owl:ObjectProperty",
|
||||
"labels": [{"value": "produces", "lang": "en-gb"}],
|
||||
"comment": "The produces property relates a recipe to the food it produces.",
|
||||
"domain": "Recipe",
|
||||
"range": "Food"
|
||||
}
|
||||
},
|
||||
datatype_properties={
|
||||
"serves": {
|
||||
"uri": "http://purl.org/ontology/fo/serves",
|
||||
"type": "owl:DatatypeProperty",
|
||||
"labels": [{"value": "serves", "lang": "en-gb"}],
|
||||
"comment": "The serves property indicates serving size.",
|
||||
"domain": "Recipe",
|
||||
"rdfs:range": "xsd:string"
|
||||
}
|
||||
},
|
||||
metadata={
|
||||
"name": "Food Ontology",
|
||||
"namespace": "http://purl.org/ontology/fo/"
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
class TestPromptConstruction:
|
||||
"""Test suite for LLM prompt construction."""
|
||||
|
||||
def test_build_extraction_variables_includes_text(self, extractor, sample_ontology_subset):
|
||||
"""Test that extraction variables include the input text."""
|
||||
chunk = "Cornish pasty is a traditional British pastry filled with meat and vegetables."
|
||||
|
||||
variables = extractor.build_extraction_variables(chunk, sample_ontology_subset)
|
||||
|
||||
assert "text" in variables, "Should include text key"
|
||||
assert variables["text"] == chunk, "Text should match input chunk"
|
||||
|
||||
def test_build_extraction_variables_includes_classes(self, extractor, sample_ontology_subset):
|
||||
"""Test that extraction variables include ontology classes."""
|
||||
chunk = "Test text"
|
||||
|
||||
variables = extractor.build_extraction_variables(chunk, sample_ontology_subset)
|
||||
|
||||
assert "classes" in variables, "Should include classes key"
|
||||
assert len(variables["classes"]) == 3, "Should include all classes from subset"
|
||||
assert "Recipe" in variables["classes"]
|
||||
assert "Ingredient" in variables["classes"]
|
||||
assert "Food" in variables["classes"]
|
||||
|
||||
def test_build_extraction_variables_includes_properties(self, extractor, sample_ontology_subset):
|
||||
"""Test that extraction variables include ontology properties."""
|
||||
chunk = "Test text"
|
||||
|
||||
variables = extractor.build_extraction_variables(chunk, sample_ontology_subset)
|
||||
|
||||
assert "object_properties" in variables, "Should include object_properties key"
|
||||
assert "datatype_properties" in variables, "Should include datatype_properties key"
|
||||
assert len(variables["object_properties"]) == 3
|
||||
assert len(variables["datatype_properties"]) == 1
|
||||
|
||||
def test_build_extraction_variables_structure(self, extractor, sample_ontology_subset):
|
||||
"""Test the overall structure of extraction variables."""
|
||||
chunk = "Test text"
|
||||
|
||||
variables = extractor.build_extraction_variables(chunk, sample_ontology_subset)
|
||||
|
||||
# Should have exactly 4 keys
|
||||
assert set(variables.keys()) == {"text", "classes", "object_properties", "datatype_properties"}
|
||||
|
||||
def test_build_extraction_variables_with_empty_subset(self, extractor):
|
||||
"""Test building variables with minimal ontology subset."""
|
||||
minimal_subset = OntologySubset(
|
||||
ontology_id="minimal",
|
||||
classes={},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata={}
|
||||
)
|
||||
chunk = "Test text"
|
||||
|
||||
variables = extractor.build_extraction_variables(chunk, minimal_subset)
|
||||
|
||||
assert variables["text"] == chunk
|
||||
assert len(variables["classes"]) == 0
|
||||
assert len(variables["object_properties"]) == 0
|
||||
assert len(variables["datatype_properties"]) == 0
|
||||
|
||||
|
||||
class TestTripleValidation:
|
||||
"""Test suite for triple validation against ontology."""
|
||||
|
||||
def test_validates_rdf_type_triple_with_valid_class(self, extractor, sample_ontology_subset):
|
||||
"""Test that rdf:type triples are validated against ontology classes."""
|
||||
subject = "cornish-pasty"
|
||||
predicate = "rdf:type"
|
||||
object_val = "Recipe"
|
||||
|
||||
is_valid = extractor.is_valid_triple(subject, predicate, object_val, sample_ontology_subset)
|
||||
|
||||
assert is_valid, "rdf:type with valid class should be valid"
|
||||
|
||||
def test_rejects_rdf_type_triple_with_invalid_class(self, extractor, sample_ontology_subset):
|
||||
"""Test that rdf:type triples with non-existent classes are rejected."""
|
||||
subject = "cornish-pasty"
|
||||
predicate = "rdf:type"
|
||||
object_val = "NonExistentClass"
|
||||
|
||||
is_valid = extractor.is_valid_triple(subject, predicate, object_val, sample_ontology_subset)
|
||||
|
||||
assert not is_valid, "rdf:type with invalid class should be rejected"
|
||||
|
||||
def test_validates_rdfs_label_triple(self, extractor, sample_ontology_subset):
|
||||
"""Test that rdfs:label triples are always valid."""
|
||||
subject = "cornish-pasty"
|
||||
predicate = "rdfs:label"
|
||||
object_val = "Cornish Pasty"
|
||||
|
||||
is_valid = extractor.is_valid_triple(subject, predicate, object_val, sample_ontology_subset)
|
||||
|
||||
assert is_valid, "rdfs:label should always be valid"
|
||||
|
||||
def test_validates_object_property_triple(self, extractor, sample_ontology_subset):
|
||||
"""Test that object property triples are validated."""
|
||||
subject = "cornish-pasty-recipe"
|
||||
predicate = "produces"
|
||||
object_val = "cornish-pasty"
|
||||
|
||||
is_valid = extractor.is_valid_triple(subject, predicate, object_val, sample_ontology_subset)
|
||||
|
||||
assert is_valid, "Valid object property should be accepted"
|
||||
|
||||
def test_validates_datatype_property_triple(self, extractor, sample_ontology_subset):
|
||||
"""Test that datatype property triples are validated."""
|
||||
subject = "cornish-pasty-recipe"
|
||||
predicate = "serves"
|
||||
object_val = "4-6 people"
|
||||
|
||||
is_valid = extractor.is_valid_triple(subject, predicate, object_val, sample_ontology_subset)
|
||||
|
||||
assert is_valid, "Valid datatype property should be accepted"
|
||||
|
||||
def test_rejects_unknown_property(self, extractor, sample_ontology_subset):
|
||||
"""Test that unknown properties are rejected."""
|
||||
subject = "cornish-pasty"
|
||||
predicate = "unknownProperty"
|
||||
object_val = "some value"
|
||||
|
||||
is_valid = extractor.is_valid_triple(subject, predicate, object_val, sample_ontology_subset)
|
||||
|
||||
assert not is_valid, "Unknown property should be rejected"
|
||||
|
||||
def test_validates_multiple_valid_properties(self, extractor, sample_ontology_subset):
|
||||
"""Test validation of different property types."""
|
||||
test_cases = [
|
||||
("recipe1", "produces", "food1", True),
|
||||
("ingredient1", "food", "food1", True),
|
||||
("recipe1", "serves", "4", True),
|
||||
("recipe1", "invalidProp", "value", False),
|
||||
]
|
||||
|
||||
for subject, predicate, object_val, expected in test_cases:
|
||||
is_valid = extractor.is_valid_triple(subject, predicate, object_val, sample_ontology_subset)
|
||||
assert is_valid == expected, f"Validation of {predicate} should be {expected}"
|
||||
|
||||
|
||||
class TestTripleParsing:
|
||||
"""Test suite for parsing triples from LLM responses."""
|
||||
|
||||
def test_parse_simple_triple_dict(self, extractor, sample_ontology_subset):
|
||||
"""Test parsing a simple triple from dict format."""
|
||||
triples_response = [
|
||||
{
|
||||
"subject": "cornish-pasty",
|
||||
"predicate": "rdf:type",
|
||||
"object": "Recipe"
|
||||
}
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 1, "Should parse one valid triple"
|
||||
assert validated[0].s.value == "https://trustgraph.ai/food/cornish-pasty"
|
||||
assert validated[0].p.value == "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
|
||||
assert validated[0].o.value == "http://purl.org/ontology/fo/Recipe"
|
||||
|
||||
def test_parse_multiple_triples(self, extractor, sample_ontology_subset):
|
||||
"""Test parsing multiple triples."""
|
||||
triples_response = [
|
||||
{"subject": "cornish-pasty", "predicate": "rdf:type", "object": "Recipe"},
|
||||
{"subject": "cornish-pasty", "predicate": "rdfs:label", "object": "Cornish Pasty"},
|
||||
{"subject": "cornish-pasty", "predicate": "serves", "object": "1-2 people"}
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 3, "Should parse all valid triples"
|
||||
|
||||
def test_filters_invalid_triples(self, extractor, sample_ontology_subset):
|
||||
"""Test that invalid triples are filtered out."""
|
||||
triples_response = [
|
||||
{"subject": "cornish-pasty", "predicate": "rdf:type", "object": "Recipe"}, # Valid
|
||||
{"subject": "cornish-pasty", "predicate": "invalidProp", "object": "value"}, # Invalid
|
||||
{"subject": "cornish-pasty", "predicate": "produces", "object": "food1"} # Valid
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 2, "Should filter out invalid triple"
|
||||
|
||||
def test_handles_missing_fields(self, extractor, sample_ontology_subset):
|
||||
"""Test that triples with missing fields are skipped."""
|
||||
triples_response = [
|
||||
{"subject": "cornish-pasty", "predicate": "rdf:type"}, # Missing object
|
||||
{"subject": "cornish-pasty", "object": "Recipe"}, # Missing predicate
|
||||
{"predicate": "rdf:type", "object": "Recipe"}, # Missing subject
|
||||
{"subject": "cornish-pasty", "predicate": "rdf:type", "object": "Recipe"} # Valid
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 1, "Should skip triples with missing fields"
|
||||
|
||||
def test_handles_empty_response(self, extractor, sample_ontology_subset):
|
||||
"""Test handling of empty LLM response."""
|
||||
triples_response = []
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 0, "Empty response should return no triples"
|
||||
|
||||
def test_expands_uris_in_parsed_triples(self, extractor, sample_ontology_subset):
|
||||
"""Test that URIs are properly expanded in parsed triples."""
|
||||
triples_response = [
|
||||
{"subject": "recipe1", "predicate": "produces", "object": "Food"}
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 1
|
||||
# Subject should be expanded to entity URI
|
||||
assert validated[0].s.value.startswith("https://trustgraph.ai/food/")
|
||||
# Predicate should be expanded to ontology URI
|
||||
assert validated[0].p.value == "http://purl.org/ontology/fo/produces"
|
||||
# Object should be expanded to class URI
|
||||
assert validated[0].o.value == "http://purl.org/ontology/fo/Food"
|
||||
|
||||
def test_creates_proper_triple_objects(self, extractor, sample_ontology_subset):
|
||||
"""Test that Triple objects are properly created."""
|
||||
triples_response = [
|
||||
{"subject": "cornish-pasty", "predicate": "rdfs:label", "object": "Cornish Pasty"}
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 1
|
||||
triple = validated[0]
|
||||
assert isinstance(triple, Triple), "Should create Triple objects"
|
||||
assert isinstance(triple.s, Value), "Subject should be Value object"
|
||||
assert isinstance(triple.p, Value), "Predicate should be Value object"
|
||||
assert isinstance(triple.o, Value), "Object should be Value object"
|
||||
assert triple.s.is_uri, "Subject should be marked as URI"
|
||||
assert triple.p.is_uri, "Predicate should be marked as URI"
|
||||
assert not triple.o.is_uri, "Object literal should not be marked as URI"
|
||||
|
||||
|
||||
class TestURIExpansionInExtraction:
|
||||
"""Test suite for URI expansion during triple extraction."""
|
||||
|
||||
def test_expands_class_names_in_objects(self, extractor, sample_ontology_subset):
|
||||
"""Test that class names in object position are expanded."""
|
||||
triples_response = [
|
||||
{"subject": "entity1", "predicate": "rdf:type", "object": "Recipe"}
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert validated[0].o.value == "http://purl.org/ontology/fo/Recipe"
|
||||
assert validated[0].o.is_uri, "Class reference should be URI"
|
||||
|
||||
def test_expands_property_names(self, extractor, sample_ontology_subset):
|
||||
"""Test that property names are expanded to full URIs."""
|
||||
triples_response = [
|
||||
{"subject": "recipe1", "predicate": "produces", "object": "food1"}
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert validated[0].p.value == "http://purl.org/ontology/fo/produces"
|
||||
|
||||
def test_expands_entity_instances(self, extractor, sample_ontology_subset):
|
||||
"""Test that entity instances get constructed URIs."""
|
||||
triples_response = [
|
||||
{"subject": "my-special-recipe", "predicate": "rdf:type", "object": "Recipe"}
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert validated[0].s.value.startswith("https://trustgraph.ai/food/")
|
||||
assert "my-special-recipe" in validated[0].s.value
|
||||
|
||||
|
||||
class TestEdgeCases:
|
||||
"""Test suite for edge cases in extraction."""
|
||||
|
||||
def test_handles_non_dict_response_items(self, extractor, sample_ontology_subset):
|
||||
"""Test that non-dict items in response are skipped."""
|
||||
triples_response = [
|
||||
{"subject": "entity1", "predicate": "rdf:type", "object": "Recipe"}, # Valid
|
||||
"invalid string item", # Invalid
|
||||
None, # Invalid
|
||||
{"subject": "entity2", "predicate": "rdf:type", "object": "Food"} # Valid
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
# Should skip non-dict items gracefully
|
||||
assert len(validated) >= 0, "Should handle non-dict items without crashing"
|
||||
|
||||
def test_handles_empty_string_values(self, extractor, sample_ontology_subset):
|
||||
"""Test that empty string values are skipped."""
|
||||
triples_response = [
|
||||
{"subject": "", "predicate": "rdf:type", "object": "Recipe"},
|
||||
{"subject": "entity1", "predicate": "", "object": "Recipe"},
|
||||
{"subject": "entity1", "predicate": "rdf:type", "object": ""},
|
||||
{"subject": "entity1", "predicate": "rdf:type", "object": "Recipe"} # Valid
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 1, "Should skip triples with empty strings"
|
||||
|
||||
def test_handles_unicode_in_literals(self, extractor, sample_ontology_subset):
|
||||
"""Test handling of unicode characters in literal values."""
|
||||
triples_response = [
|
||||
{"subject": "café-recipe", "predicate": "rdfs:label", "object": "Café Spécial"}
|
||||
]
|
||||
|
||||
validated = extractor.parse_and_validate_triples(triples_response, sample_ontology_subset)
|
||||
|
||||
assert len(validated) == 1
|
||||
assert "Café Spécial" in validated[0].o.value
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
290
tests/unit/test_extract/test_ontology/test_text_processing.py
Normal file
290
tests/unit/test_extract/test_ontology/test_text_processing.py
Normal file
|
|
@ -0,0 +1,290 @@
|
|||
"""
|
||||
Unit tests for text processing and segmentation.
|
||||
|
||||
Tests that text is properly split into sentences for ontology matching,
|
||||
including NLTK tokenization and TextSegment creation.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.extract.kg.ontology.text_processor import TextProcessor, TextSegment
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def text_processor():
|
||||
"""Create a TextProcessor instance for testing."""
|
||||
return TextProcessor()
|
||||
|
||||
|
||||
class TestTextSegmentation:
|
||||
"""Test suite for text segmentation functionality."""
|
||||
|
||||
def test_segment_single_sentence(self, text_processor):
|
||||
"""Test segmentation of a single sentence."""
|
||||
text = "This is a simple sentence."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Filter to only sentences
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 1, "Single sentence should produce one sentence segment"
|
||||
assert text in sentences[0].text, "Segment text should contain input"
|
||||
|
||||
def test_segment_multiple_sentences(self, text_processor):
|
||||
"""Test segmentation of multiple sentences."""
|
||||
text = "First sentence. Second sentence. Third sentence."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Filter to only sentences
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 3, "Should create three sentence segments for three sentences"
|
||||
assert "First sentence" in sentences[0].text
|
||||
assert "Second sentence" in sentences[1].text
|
||||
assert "Third sentence" in sentences[2].text
|
||||
|
||||
def test_segment_positions(self, text_processor):
|
||||
"""Test that segment positions are tracked."""
|
||||
text = "First sentence. Second sentence."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Filter to only sentences
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 2
|
||||
assert sentences[0].position == 0
|
||||
assert sentences[1].position > 0
|
||||
|
||||
def test_segment_empty_text(self, text_processor):
|
||||
"""Test handling of empty text."""
|
||||
text = ""
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
assert len(segments) == 0, "Empty text should produce no segments"
|
||||
|
||||
def test_segment_whitespace_only(self, text_processor):
|
||||
"""Test handling of whitespace-only text."""
|
||||
text = " \n\t "
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# May produce empty segments or no segments depending on implementation
|
||||
assert len(segments) <= 1, "Whitespace-only text should produce minimal segments"
|
||||
|
||||
def test_segment_with_newlines(self, text_processor):
|
||||
"""Test segmentation of text with newlines."""
|
||||
text = "First sentence.\nSecond sentence."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Filter to only sentences
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 2
|
||||
assert "First sentence" in sentences[0].text
|
||||
assert "Second sentence" in sentences[1].text
|
||||
|
||||
def test_segment_complex_punctuation(self, text_processor):
|
||||
"""Test segmentation with complex punctuation."""
|
||||
text = "Dr. Smith went to the U.S.A. yesterday. He met Mr. Jones."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Filter to only sentences
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
# NLTK should handle abbreviations correctly
|
||||
assert len(sentences) == 2, "Should recognize abbreviations and not split on them"
|
||||
assert "Dr. Smith" in sentences[0].text
|
||||
assert "Mr. Jones" in sentences[1].text
|
||||
|
||||
def test_segment_question_and_exclamation(self, text_processor):
|
||||
"""Test segmentation with different sentence terminators."""
|
||||
text = "Is this working? Yes, it is! Great news."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Filter to only sentences
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 3
|
||||
assert "Is this working?" in sentences[0].text
|
||||
assert "Yes, it is!" in sentences[1].text
|
||||
assert "Great news" in sentences[2].text
|
||||
|
||||
def test_segment_long_paragraph(self, text_processor):
|
||||
"""Test segmentation of a longer paragraph."""
|
||||
text = (
|
||||
"The recipe requires several ingredients. "
|
||||
"First, gather flour and sugar. "
|
||||
"Then, add eggs and milk. "
|
||||
"Finally, mix everything together."
|
||||
)
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Filter to only sentences
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 4, "Should split paragraph into individual sentences"
|
||||
assert all(isinstance(seg, TextSegment) for seg in sentences)
|
||||
|
||||
def test_extract_phrases_option(self, text_processor):
|
||||
"""Test that phrase extraction can be enabled."""
|
||||
text = "The recipe requires several ingredients."
|
||||
|
||||
# With phrases
|
||||
segments_with_phrases = text_processor.process_chunk(text, extract_phrases=True)
|
||||
# Without phrases
|
||||
segments_without_phrases = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# With phrases should have more segments (sentences + phrases)
|
||||
assert len(segments_with_phrases) >= len(segments_without_phrases)
|
||||
|
||||
|
||||
class TestTextSegmentCreation:
|
||||
"""Test suite for TextSegment object creation."""
|
||||
|
||||
def test_text_segment_attributes(self, text_processor):
|
||||
"""Test that TextSegment objects have correct attributes."""
|
||||
text = "This is a test sentence."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
assert len(segments) >= 1
|
||||
segment = segments[0]
|
||||
|
||||
assert hasattr(segment, 'text'), "Segment should have text attribute"
|
||||
assert hasattr(segment, 'type'), "Segment should have type attribute"
|
||||
assert hasattr(segment, 'position'), "Segment should have position attribute"
|
||||
assert segment.type in ['sentence', 'phrase', 'noun_phrase', 'verb_phrase']
|
||||
|
||||
def test_text_segment_types(self, text_processor):
|
||||
"""Test that different segment types are created correctly."""
|
||||
text = "The recipe requires several ingredients."
|
||||
|
||||
# Without phrases
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
types = set(s.type for s in segments)
|
||||
assert 'sentence' in types, "Should create sentence segments"
|
||||
|
||||
# With phrases
|
||||
segments = text_processor.process_chunk(text, extract_phrases=True)
|
||||
types = set(s.type for s in segments)
|
||||
assert 'sentence' in types, "Should create sentence segments"
|
||||
# May also have phrase types
|
||||
|
||||
def test_text_segment_sentence_tracking(self, text_processor):
|
||||
"""Test that segments track their parent sentence."""
|
||||
text = "This is a test sentence."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=True)
|
||||
|
||||
# Phrases should reference their parent sentence
|
||||
phrases = [s for s in segments if s.type != 'sentence']
|
||||
if phrases:
|
||||
for phrase in phrases:
|
||||
# parent_sentence may be set for phrases
|
||||
assert hasattr(phrase, 'parent_sentence')
|
||||
|
||||
|
||||
class TestNLTKCompatibility:
|
||||
"""Test suite for NLTK version compatibility."""
|
||||
|
||||
def test_nltk_punkt_availability(self, text_processor):
|
||||
"""Test that NLTK punkt tokenizer is available."""
|
||||
# This test verifies the text_processor can use NLTK
|
||||
# If punkt/punkt_tab is not available, this will fail during setup
|
||||
import nltk
|
||||
|
||||
# Try to use sentence tokenizer
|
||||
text = "Test sentence. Another sentence."
|
||||
|
||||
try:
|
||||
from nltk.tokenize import sent_tokenize
|
||||
result = sent_tokenize(text)
|
||||
assert len(result) > 0, "NLTK sentence tokenizer should work"
|
||||
except LookupError:
|
||||
pytest.fail("NLTK punkt tokenizer not available")
|
||||
|
||||
def test_text_processor_uses_nltk(self, text_processor):
|
||||
"""Test that TextProcessor successfully uses NLTK for segmentation."""
|
||||
# This verifies the integration works
|
||||
text = "First sentence. Second sentence."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Should successfully segment using NLTK
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) >= 1, "Should successfully segment text using NLTK"
|
||||
|
||||
|
||||
class TestEdgeCases:
|
||||
"""Test suite for edge cases in text processing."""
|
||||
|
||||
def test_sentence_with_only_punctuation(self, text_processor):
|
||||
"""Test handling of unusual punctuation patterns."""
|
||||
text = "...!?!"
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# Should handle gracefully (NLTK may split this oddly, that's ok)
|
||||
assert len(segments) <= 3, "Should handle punctuation-only text gracefully"
|
||||
|
||||
def test_very_long_sentence(self, text_processor):
|
||||
"""Test handling of very long sentences."""
|
||||
# Create a long sentence with many clauses
|
||||
text = (
|
||||
"This is a very long sentence with many clauses, "
|
||||
"including subordinate clauses, coordinate clauses, "
|
||||
"and various other grammatical structures that make it "
|
||||
"quite lengthy but still technically a single sentence."
|
||||
)
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 1, "Long sentence should still be one sentence segment"
|
||||
assert len(sentences[0].text) > 100
|
||||
|
||||
def test_unicode_text(self, text_processor):
|
||||
"""Test handling of unicode characters."""
|
||||
text = "Café serves crêpes. The recipe is français."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 2
|
||||
assert "Café" in sentences[0].text
|
||||
assert "français" in sentences[1].text
|
||||
|
||||
def test_numbers_and_dates(self, text_processor):
|
||||
"""Test handling of numbers and dates in text."""
|
||||
text = "The recipe was created on Jan. 1, 2024. It serves 4-6 people."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 2
|
||||
assert "2024" in sentences[0].text
|
||||
assert "4-6" in sentences[1].text
|
||||
|
||||
def test_ellipsis_handling(self, text_processor):
|
||||
"""Test handling of ellipsis in text."""
|
||||
text = "First sentence... Second sentence."
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
# NLTK may handle ellipsis differently
|
||||
assert len(segments) >= 1, "Should produce at least one segment"
|
||||
# The exact behavior depends on NLTK version
|
||||
|
||||
def test_quoted_text(self, text_processor):
|
||||
"""Test handling of quoted text."""
|
||||
text = 'He said "Hello world." Then he left.'
|
||||
|
||||
segments = text_processor.process_chunk(text, extract_phrases=False)
|
||||
|
||||
sentences = [s for s in segments if s.type == 'sentence']
|
||||
assert len(sentences) == 2
|
||||
assert '"Hello world."' in sentences[0].text or "Hello world" in sentences[0].text
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
258
tests/unit/test_extract/test_ontology/test_uri_expansion.py
Normal file
258
tests/unit/test_extract/test_ontology/test_uri_expansion.py
Normal file
|
|
@ -0,0 +1,258 @@
|
|||
"""
|
||||
Unit tests for URI expansion functionality.
|
||||
|
||||
Tests that URIs are properly expanded using ontology definitions instead of
|
||||
constructed fallback URIs.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.extract.kg.ontology.extract import Processor
|
||||
from trustgraph.extract.kg.ontology.ontology_selector import OntologySubset
|
||||
|
||||
|
||||
class MockParams:
|
||||
"""Mock parameters for Processor."""
|
||||
def get(self, key, default=None):
|
||||
return default
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def extractor():
|
||||
"""Create a Processor instance for testing."""
|
||||
params = MockParams()
|
||||
# We only need the expand_uri method, so minimal initialization
|
||||
extractor = object.__new__(Processor)
|
||||
extractor.URI_PREFIXES = {
|
||||
"rdf:": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
|
||||
"rdfs:": "http://www.w3.org/2000/01/rdf-schema#",
|
||||
"owl:": "http://www.w3.org/2002/07/owl#",
|
||||
"xsd:": "http://www.w3.org/2001/XMLSchema#",
|
||||
}
|
||||
return extractor
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def ontology_subset_with_uris():
|
||||
"""Create an ontology subset with proper URIs defined."""
|
||||
return OntologySubset(
|
||||
ontology_id="food",
|
||||
classes={
|
||||
"Recipe": {
|
||||
"uri": "http://purl.org/ontology/fo/Recipe",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Recipe", "lang": "en-gb"}],
|
||||
"comment": "A Recipe is a combination of ingredients and a method."
|
||||
},
|
||||
"Ingredient": {
|
||||
"uri": "http://purl.org/ontology/fo/Ingredient",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Ingredient", "lang": "en-gb"}],
|
||||
"comment": "An Ingredient combines a quantity and a food."
|
||||
},
|
||||
"Food": {
|
||||
"uri": "http://purl.org/ontology/fo/Food",
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Food", "lang": "en-gb"}],
|
||||
"comment": "A Food is something that can be eaten."
|
||||
}
|
||||
},
|
||||
object_properties={
|
||||
"ingredients": {
|
||||
"uri": "http://purl.org/ontology/fo/ingredients",
|
||||
"type": "owl:ObjectProperty",
|
||||
"labels": [{"value": "ingredients", "lang": "en-gb"}],
|
||||
"domain": "Recipe",
|
||||
"range": "IngredientList"
|
||||
},
|
||||
"food": {
|
||||
"uri": "http://purl.org/ontology/fo/food",
|
||||
"type": "owl:ObjectProperty",
|
||||
"labels": [{"value": "food", "lang": "en-gb"}],
|
||||
"domain": "Ingredient",
|
||||
"range": "Food"
|
||||
},
|
||||
"produces": {
|
||||
"uri": "http://purl.org/ontology/fo/produces",
|
||||
"type": "owl:ObjectProperty",
|
||||
"labels": [{"value": "produces", "lang": "en-gb"}],
|
||||
"domain": "Recipe",
|
||||
"range": "Food"
|
||||
}
|
||||
},
|
||||
datatype_properties={
|
||||
"serves": {
|
||||
"uri": "http://purl.org/ontology/fo/serves",
|
||||
"type": "owl:DatatypeProperty",
|
||||
"labels": [{"value": "serves", "lang": "en-gb"}],
|
||||
"domain": "Recipe",
|
||||
"range": "xsd:string"
|
||||
}
|
||||
},
|
||||
metadata={
|
||||
"name": "Food Ontology",
|
||||
"namespace": "http://purl.org/ontology/fo/"
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
class TestURIExpansion:
|
||||
"""Test suite for URI expansion functionality."""
|
||||
|
||||
def test_expand_class_uri_from_ontology(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that class names are expanded to their ontology URIs."""
|
||||
result = extractor.expand_uri("Recipe", ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == "http://purl.org/ontology/fo/Recipe", \
|
||||
"Recipe should expand to its ontology URI"
|
||||
|
||||
def test_expand_object_property_uri_from_ontology(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that object properties are expanded to their ontology URIs."""
|
||||
result = extractor.expand_uri("ingredients", ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == "http://purl.org/ontology/fo/ingredients", \
|
||||
"ingredients property should expand to its ontology URI"
|
||||
|
||||
def test_expand_datatype_property_uri_from_ontology(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that datatype properties are expanded to their ontology URIs."""
|
||||
result = extractor.expand_uri("serves", ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == "http://purl.org/ontology/fo/serves", \
|
||||
"serves property should expand to its ontology URI"
|
||||
|
||||
def test_expand_multiple_classes(self, extractor, ontology_subset_with_uris):
|
||||
"""Test expansion of multiple different classes."""
|
||||
recipe_uri = extractor.expand_uri("Recipe", ontology_subset_with_uris, "food")
|
||||
ingredient_uri = extractor.expand_uri("Ingredient", ontology_subset_with_uris, "food")
|
||||
food_uri = extractor.expand_uri("Food", ontology_subset_with_uris, "food")
|
||||
|
||||
assert recipe_uri == "http://purl.org/ontology/fo/Recipe"
|
||||
assert ingredient_uri == "http://purl.org/ontology/fo/Ingredient"
|
||||
assert food_uri == "http://purl.org/ontology/fo/Food"
|
||||
|
||||
def test_expand_rdf_prefix(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that standard RDF prefixes are expanded."""
|
||||
result = extractor.expand_uri("rdf:type", ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", \
|
||||
"rdf:type should expand to full RDF namespace URI"
|
||||
|
||||
def test_expand_rdfs_prefix(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that RDFS prefixes are expanded."""
|
||||
result = extractor.expand_uri("rdfs:label", ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == "http://www.w3.org/2000/01/rdf-schema#label", \
|
||||
"rdfs:label should expand to full RDFS namespace URI"
|
||||
|
||||
def test_expand_owl_prefix(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that OWL prefixes are expanded."""
|
||||
result = extractor.expand_uri("owl:Class", ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == "http://www.w3.org/2002/07/owl#Class", \
|
||||
"owl:Class should expand to full OWL namespace URI"
|
||||
|
||||
def test_expand_xsd_prefix(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that XSD prefixes are expanded."""
|
||||
result = extractor.expand_uri("xsd:string", ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == "http://www.w3.org/2001/XMLSchema#string", \
|
||||
"xsd:string should expand to full XSD namespace URI"
|
||||
|
||||
def test_fallback_uri_for_instance(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that entity instances get constructed URIs when not in ontology."""
|
||||
result = extractor.expand_uri("recipe:cornish-pasty", ontology_subset_with_uris, "food")
|
||||
|
||||
# Should construct a URI for the instance
|
||||
assert result.startswith("https://trustgraph.ai/food/"), \
|
||||
"Entity instance should get constructed URI under trustgraph.ai domain"
|
||||
assert "cornish-pasty" in result.lower(), \
|
||||
"Instance URI should include normalized entity name"
|
||||
|
||||
def test_already_full_uri_unchanged(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that full URIs are returned unchanged."""
|
||||
full_uri = "http://example.com/custom/entity"
|
||||
result = extractor.expand_uri(full_uri, ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == full_uri, \
|
||||
"Full URIs should be returned unchanged"
|
||||
|
||||
def test_https_uri_unchanged(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that HTTPS URIs are returned unchanged."""
|
||||
full_uri = "https://example.com/custom/entity"
|
||||
result = extractor.expand_uri(full_uri, ontology_subset_with_uris, "food")
|
||||
|
||||
assert result == full_uri, \
|
||||
"HTTPS URIs should be returned unchanged"
|
||||
|
||||
def test_class_without_uri_gets_fallback(self, extractor):
|
||||
"""Test that classes without URI definitions get constructed fallback URIs."""
|
||||
# Create subset with class that has no URI
|
||||
subset_no_uri = OntologySubset(
|
||||
ontology_id="test",
|
||||
classes={
|
||||
"SomeClass": {
|
||||
"type": "owl:Class",
|
||||
"labels": [{"value": "Some Class"}],
|
||||
# No 'uri' field
|
||||
}
|
||||
},
|
||||
object_properties={},
|
||||
datatype_properties={},
|
||||
metadata={}
|
||||
)
|
||||
|
||||
result = extractor.expand_uri("SomeClass", subset_no_uri, "test")
|
||||
|
||||
assert result == "https://trustgraph.ai/ontology/test#SomeClass", \
|
||||
"Class without URI should get fallback constructed URI"
|
||||
|
||||
def test_property_without_uri_gets_fallback(self, extractor):
|
||||
"""Test that properties without URI definitions get constructed fallback URIs."""
|
||||
subset_no_uri = OntologySubset(
|
||||
ontology_id="test",
|
||||
classes={},
|
||||
object_properties={
|
||||
"someProperty": {
|
||||
"type": "owl:ObjectProperty",
|
||||
# No 'uri' field
|
||||
}
|
||||
},
|
||||
datatype_properties={},
|
||||
metadata={}
|
||||
)
|
||||
|
||||
result = extractor.expand_uri("someProperty", subset_no_uri, "test")
|
||||
|
||||
assert result == "https://trustgraph.ai/ontology/test#someProperty", \
|
||||
"Property without URI should get fallback constructed URI"
|
||||
|
||||
def test_entity_normalization_in_constructed_uri(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that entity names are normalized when constructing URIs."""
|
||||
# Entity with spaces and mixed case
|
||||
result = extractor.expand_uri("Cornish Pasty Recipe", ontology_subset_with_uris, "food")
|
||||
|
||||
# Should be normalized: lowercase, spaces to hyphens
|
||||
assert result == "https://trustgraph.ai/food/cornish-pasty-recipe", \
|
||||
"Entity names should be normalized (lowercase, spaces to hyphens)"
|
||||
|
||||
def test_dict_access_not_object_attribute(self, extractor, ontology_subset_with_uris):
|
||||
"""Test that URI expansion works with dict access (not object attributes).
|
||||
|
||||
This is the key fix - ontology_selector stores cls.__dict__ which means
|
||||
we get dicts, not objects, so we must use dict key access.
|
||||
"""
|
||||
# The ontology_subset_with_uris uses dicts (with 'uri' key)
|
||||
# This test verifies we can access it correctly
|
||||
class_def = ontology_subset_with_uris.classes["Recipe"]
|
||||
|
||||
# Verify it's a dict
|
||||
assert isinstance(class_def, dict), "Class definitions should be dicts"
|
||||
assert "uri" in class_def, "Dict should have 'uri' key"
|
||||
|
||||
# Now test expansion works
|
||||
result = extractor.expand_uri("Recipe", ontology_subset_with_uris, "food")
|
||||
assert result == class_def["uri"], \
|
||||
"URI expansion must work with dict access (not object attributes)"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
Loading…
Add table
Add a link
Reference in a new issue