# OntoRAG: Ontology-Based Knowledge Extraction and Query Technical Specification
## Overview
OntoRAG is an ontology-driven knowledge extraction and query system that enforces strict semantic consistency during both the extraction of knowledge triples from unstructured text and the querying of the resulting knowledge graph. Similar to GraphRAG but with formal ontology constraints, OntoRAG ensures all extracted triples conform to predefined ontological structures and provides semantically-aware querying capabilities.
The system uses vector similarity matching to dynamically select relevant ontology subsets for both extraction and query operations, enabling focused and contextually appropriate processing while maintaining semantic validity.
**Service Name**: `kg-extract-ontology`
## Goals
- **Ontology-Conformant Extraction**: Ensure all extracted triples strictly conform to loaded ontologies
- **Dynamic Context Selection**: Use embeddings to select relevant ontology subsets for each chunk
- **Semantic Consistency**: Maintain class hierarchies, property domains/ranges, and constraints
- **Efficient Processing**: Use in-memory vector stores for fast ontology element matching
- **Scalable Architecture**: Support multiple concurrent ontologies with different domains
## Background
Current knowledge extraction services (`kg-extract-definitions`, `kg-extract-relationships`) operate without formal constraints, potentially producing inconsistent or incompatible triples. OntoRAG addresses this by:
1. Loading formal ontologies that define valid classes and properties
2. Using embeddings to match text content with relevant ontology elements
3. Constraining extraction to only produce ontology-conformant triples
4. Providing semantic validation of extracted knowledge
This approach combines the flexibility of neural extraction with the rigor of formal knowledge representation.
## Technical Design
### Architecture
The OntoRAG system consists of the following components:
The Ontology Loader uses TrustGraph's ConfigPush queue to receive event-driven ontology configuration updates. When a configuration element of type "ontology" is added or modified, the loader receives the update via the config-update queue and parses the JSON structure containing metadata, classes, object properties, and datatype properties. These parsed ontologies are stored in memory as structured objects that can be efficiently accessed during the extraction process.
The Ontology Embedder processes each element in the loaded ontologies (classes, object properties, and datatype properties) and generates vector embeddings using the EmbeddingsClientSpec service. For each element, it combines the element's identifier, labels, and description (comment) to create a text representation. This text is then converted to a high-dimensional vector embedding that captures its semantic meaning. These embeddings are stored in a per-flow in-memory FAISS vector store along with metadata about the element type, source ontology, and full definition. The embedder automatically detects the embedding dimension from the first embedding response.
The Text Processor uses NLTK for sentence tokenization and POS tagging to break down incoming text chunks into sentences. It handles NLTK version compatibility by attempting to download `punkt_tab` and `averaged_perceptron_tagger_eng` with fallbacks to older versions if needed. Each text chunk is split into individual sentences that can be independently matched against ontology elements.
The Ontology Selector performs semantic matching between text segments and ontology elements using FAISS vector similarity search. For each sentence from the text chunk, it generates an embedding and searches the vector store for the most similar ontology elements using cosine similarity with a configurable threshold (default 0.3). After collecting all relevant elements, it performs comprehensive dependency resolution: if a class is selected, its parent classes are included; if a property is selected, its domain and range classes are added. Additionally, for each selected class, it automatically includes **all properties that reference that class** in their domain or range. This ensures the extraction has access to all relevant relationship properties.
The extraction service uses a Jinja2 template loaded from `ontology-prompt.md` which formats the ontology subset and text for LLM extraction. The template dynamically iterates over classes, object properties, and datatype properties using Jinja2 syntax, presenting each with their descriptions, domains, ranges, and hierarchical relationships. The prompt includes strict rules about using only the provided ontology elements and requests JSON output format for consistent parsing.
The Main Extractor Service (KgExtractOntology) is the orchestration layer that manages the complete extraction workflow. It uses TrustGraph's FlowProcessor pattern with per-flow component initialization. When an ontology configuration update arrives, it initializes or updates the flow-specific components (ontology loader, embedder, text processor, selector). When a text chunk arrives for processing, it coordinates the pipeline: splitting the text into segments, finding relevant ontology elements through vector search, constructing a constrained prompt, calling the prompt service, parsing and validating the response, generating ontology definition triples, and emitting both content triples and entity contexts.
- Dependencies: [animal (parent of dog and cat), lifeform (parent of animal)]
- Final subset: Complete mini-ontology with animal hierarchy and chase relationship
### Triple Validation
**Purpose**: Ensures all extracted triples strictly conform to ontology constraints.
**Validation Algorithm**:
1.**Class Validation**:
- Verify that subjects are instances of classes defined in the ontology subset
- For object properties, verify that objects are also valid class instances
- Check class names against the ontology's class dictionary
- Handle class hierarchies - instances of subclasses are valid for parent class constraints
2.**Property Validation**:
- Confirm predicates correspond to properties in the ontology subset
- Distinguish between object properties (entity-to-entity) and datatype properties (entity-to-literal)
- Verify property names match exactly (considering namespace if present)
3.**Domain/Range Checking**:
- For each property used as predicate, retrieve its domain and range
- Verify the subject's type matches or inherits from the property's domain
- Verify the object's type matches or inherits from the property's range
- For datatype properties, verify the object is a literal of the correct XSD type
4.**Cardinality Validation**:
- Track property usage counts per subject
- Check minimum cardinality - ensure required properties are present
- Check maximum cardinality - ensure property isn't used too many times
- For functional properties, ensure at most one value per subject
5.**Datatype Validation**:
- Parse literal values according to their declared XSD types
- Validate integers are valid numbers, dates are properly formatted, etc.
- Check string patterns if regex constraints are defined
- Ensure URIs are well-formed for xsd:anyURI types
**Validation Example**:
Triple: ("Buddy", "has-owner", "John")
- Check "Buddy" is typed as a class that can have "has-owner" property
- Check "has-owner" exists in the ontology
- Verify domain constraint: subject must be of type "Pet" or subclass
- Verify range constraint: object must be of type "Person" or subclass
- If valid, add to output; if invalid, log violation and skip
## Performance Considerations
### Optimisation Strategies
1.**Embedding Caching**: Cache embeddings for frequently used text segments
2.**Batch Processing**: Process multiple segments in parallel
3.**Vector Store Indexing**: Use approximate nearest neighbor algorithms for large ontologies
4.**Prompt Optimisation**: Minimise prompt size by including only essential ontology elements
5.**Result Caching**: Cache extraction results for identical chunks
### Scalability
- **Horizontal Scaling**: Multiple extractor instances with shared ontology cache
- **Ontology Partitioning**: Split large ontologies by domain
- **Streaming Processing**: Process chunks as they arrive without batching
- **Memory Management**: Periodic cleanup of unused embeddings
## Error Handling
### Failure Scenarios
1.**Missing Ontologies**: Fallback to unconstrained extraction
2.**Embedding Service Failure**: Use cached embeddings or skip semantic matching
3.**Prompt Service Timeout**: Retry with exponential backoff
4.**Invalid Triple Format**: Log and skip malformed triples
5.**Ontology Inconsistencies**: Report conflicts and use most specific valid elements
### Monitoring
Key metrics to track:
- Ontology load time and memory usage
- Embedding generation latency
- Vector search performance
- Prompt service response time
- Triple extraction accuracy
- Ontology conformance rate
## Migration Path
### From Existing Extractors
1.**Parallel Operation**: Run alongside existing extractors initially
2.**Gradual Rollout**: Start with specific document types
3.**Quality Comparison**: Compare output quality with existing extractors
4.**Full Migration**: Replace existing extractors once quality verified
### Ontology Development
1.**Bootstrap from Existing**: Generate initial ontologies from existing knowledge
2.**Iterative Refinement**: Refine based on extraction patterns
3.**Domain Expert Review**: Validate with subject matter experts
4.**Continuous Improvement**: Update based on extraction feedback
## Ontology-Sensitive Query Service
### Overview
The ontology-sensitive query service provides multiple query paths to support different backend graph stores. It leverages ontology knowledge for precise, semantically-aware question answering across both Cassandra (via SPARQL) and Cypher-based graph stores (Neo4j, Memgraph, FalkorDB).
**Service Components**:
-`onto-query-sparql`: Converts natural language to SPARQL for Cassandra
-`sparql-cassandra`: SPARQL query layer for Cassandra using rdflib
-`onto-query-cypher`: Converts natural language to Cypher for graph databases
-`cypher-executor`: Cypher query execution for Neo4j/Memgraph/FalkorDB
**Purpose**: Decomposes user questions into semantic components for ontology matching.
**Algorithm Description**:
The Question Analyser takes the incoming natural language question and breaks it down into meaningful segments using the same sentence splitting approach as the extraction pipeline. It identifies key entities, relationships, and constraints mentioned in the question. Each segment is analysed for question type (factual, aggregation, comparison, etc.) and the expected answer format. This decomposition helps identify which parts of the ontology are most relevant for answering the question.
**Key Operations**:
- Split question into sentences and phrases
- Identify question type and intent
- Extract mentioned entities and relationships
- Detect constraints and filters in the question
- Determine expected answer format
#### 2. Ontology Matcher for Queries
**Purpose**: Identifies the relevant ontology subset needed to answer the question.
**Algorithm Description**:
Similar to the extraction pipeline's Ontology Selector, but optimised for question answering. The matcher generates embeddings for question segments and searches the vector store for relevant ontology elements. However, it focuses on finding concepts that would be useful for query construction rather than extraction. It expands the selection to include related properties that might be traversed during graph exploration, even if not explicitly mentioned in the question. For example, if asked about "employees," it might include properties like "works-for," "manages," and "reports-to" that could be relevant for finding employee information.
**Matching Strategy**:
- Embed question segments
- Find directly mentioned ontology concepts
- Include properties that connect mentioned classes
- Add inverse and related properties for traversal
- Include parent/child classes for hierarchical queries
- Build query-focused ontology partition
#### 3. Backend Router
**Purpose**: Routes queries to the appropriate backend-specific query path based on configuration.
**Algorithm Description**:
The Backend Router examines the system configuration to determine which graph backend is active (Cassandra or Cypher-based). It routes the question and ontology partition to the appropriate query generation service. The router can also support load balancing across multiple backends or fallback mechanisms if the primary backend is unavailable.
**Routing Logic**:
- Check configured backend type from system settings
- Route to `onto-query-sparql` for Cassandra backends
- Route to `onto-query-cypher` for Neo4j/Memgraph/FalkorDB
- Support multi-backend configurations with query distribution
**Purpose**: Converts natural language questions to SPARQL queries for Cassandra execution.
**Algorithm Description**:
The SPARQL query generator takes the question and ontology partition and constructs a SPARQL query optimised for execution against the Cassandra backend. It uses the prompt service with a SPARQL-specific template that includes RDF/OWL semantics. The generator understands SPARQL patterns like property paths, optional clauses, and filters that can efficiently translate to Cassandra operations.
**SPARQL Generation Prompt Template**:
```
Generate a SPARQL query for the following question using the provided ontology.
**Purpose**: Converts natural language questions to Cypher queries for graph databases.
**Algorithm Description**:
The Cypher query generator creates native Cypher queries optimised for Neo4j, Memgraph, and FalkorDB. It maps ontology classes to node labels and properties to relationships, using Cypher's pattern matching syntax. The generator includes Cypher-specific optimisations like relationship direction hints, index usage, and query planning hints.
**Cypher Generation Prompt Template**:
```
Generate a Cypher query for the following question using the provided ontology.
**Purpose**: Executes SPARQL queries against Cassandra using Python rdflib.
**Algorithm Description**:
The SPARQL-Cassandra engine implements a SPARQL processor using Python's rdflib library with a custom Cassandra backend store. It translates SPARQL graph patterns into appropriate Cassandra CQL queries, handling joins, filters, and aggregations. The engine maintains an RDF-to-Cassandra mapping that preserves the semantic structure while optimising for Cassandra's column-family storage model.
**Implementation Features**:
- rdflib Store interface implementation for Cassandra
- SPARQL 1.1 query support with common patterns
- Efficient translation of triple patterns to CQL
- Support for property paths and hierarchical queries
- Result streaming for large datasets
- Connection pooling and query caching
**Example Translation**:
```sparql
SELECT ?animal WHERE {
?animal rdf:type :Animal .
?animal :hasOwner "John" .
}
```
Translates to optimised Cassandra queries leveraging indexes and partition keys.
#### 7. Cypher Query Executor (`cypher-executor`)
**Purpose**: Executes Cypher queries against Neo4j, Memgraph, and FalkorDB.
**Algorithm Description**:
The Cypher executor provides a unified interface for executing Cypher queries across different graph databases. It handles database-specific connection protocols, query optimisation hints, and result format normalisation. The executor includes retry logic, connection pooling, and transaction management appropriate for each database type.
**Multi-Database Support**:
- **Neo4j**: Bolt protocol, transaction functions, index hints
**Purpose**: Synthesises a natural language answer from query results.
**Algorithm Description**:
The Answer Generator takes the structured query results and the original question, then uses the prompt service to generate a comprehensive answer. Unlike simple template-based responses, it uses an LLM to interpret the graph data in the context of the question, handling complex relationships, aggregations, and inferences. The generator can explain its reasoning by referencing the ontology structure and the specific triples retrieved from the graph.
**Answer Generation Process**:
- Format query results into structured context
- Include relevant ontology definitions for clarity
- Construct prompt with question and results
- Generate natural language answer via LLM
- Validate answer against query intent
- Add citations to specific graph entities if needed
shared_with_extractor: true # Reuse kg-extract-ontology's store
query_builder:
model: "gpt-4"
temperature: 0.1
max_query_length: 1000
graph_executor:
timeout: 30000 # ms
max_results: 1000
answer_generator:
model: "gpt-4"
temperature: 0.3
max_tokens: 500
```
### Performance Optimisations
#### Query Optimisation
- **Ontology Pruning**: Only include necessary ontology elements in prompts
- **Query Caching**: Cache frequently asked questions and their queries
- **Result Caching**: Store results for identical queries within time window
- **Batch Processing**: Handle multiple related questions in single graph traversal
#### Scalability Considerations
- **Distributed Execution**: Parallelise subqueries across graph partitions
- **Incremental Results**: Stream results for large datasets
- **Load Balancing**: Distribute query load across multiple service instances
- **Resource Pools**: Manage connection pools to graph databases
### Error Handling
#### Failure Scenarios
1.**Invalid Query Generation**: Fallback to GraphRAG or simple keyword search
2.**Ontology Mismatch**: Expand search to broader ontology subset
3.**Query Timeout**: Simplify query or increase timeout
4.**Empty Results**: Suggest query reformulation or related questions
5.**LLM Service Failure**: Use cached queries or template-based responses
### Monitoring Metrics
- Question complexity distribution
- Ontology partition sizes
- Query generation success rate
- Graph query execution time
- Answer quality scores
- Cache hit rates
- Error frequencies by type
## Future Enhancements
1.**Ontology Learning**: Automatically extend ontologies based on extraction patterns
2.**Confidence Scoring**: Assign confidence scores to extracted triples
3.**Explanation Generation**: Provide reasoning for triple extraction
4.**Active Learning**: Request human validation for uncertain extractions
## Security Considerations
1.**Prompt Injection Prevention**: Sanitise chunk text before prompt construction
2.**Resource Limits**: Cap memory usage for vector store
3.**Rate Limiting**: Limit extraction requests per client
4.**Audit Logging**: Track all extraction requests and results
## Testing Strategy
### Unit Testing
- Ontology loader with various formats
- Embedding generation and storage
- Sentence splitting algorithms
- Vector similarity calculations
- Triple parsing and validation
### Integration Testing
- End-to-end extraction pipeline
- Configuration service integration
- Prompt service interaction
- Concurrent extraction handling
### Performance Testing
- Large ontology handling (1000+ classes)
- High-volume chunk processing
- Memory usage under load
- Latency benchmarks
## Delivery Plan
### Overview
The OntoRAG system will be delivered in four major phases, with each phase providing incremental value while building toward the complete system. The plan focuses on establishing core extraction capabilities first, then adding query functionality, followed by optimizations and advanced features.
### Phase 1: Foundation and Core Extraction
**Goal**: Establish the basic ontology-driven extraction pipeline with simple vector matching.