mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-25 00:16:23 +02:00

OntoRAG: Ontology-Based Knowledge Extraction and Query Technical Specification (#523 )

* Onto-rag tech spec

* New processor kg-extract-ontology, use 'ontology' objects from config to guide triple extraction

* Also entity contexts

* Integrate with ontology extractor from workbench

This is first phase, the extraction is tested and working, also GraphRAG with the extracted knowledge works

2025-11-12 20:38:08 +00:00

58 KiB

Raw Blame History

OntoRAG: Ontology-Based Knowledge Extraction and Query Technical Specification

Overview

OntoRAG is an ontology-driven knowledge extraction and query system that enforces strict semantic consistency during both the extraction of knowledge triples from unstructured text and the querying of the resulting knowledge graph. Similar to GraphRAG but with formal ontology constraints, OntoRAG ensures all extracted triples conform to predefined ontological structures and provides semantically-aware querying capabilities.

The system uses vector similarity matching to dynamically select relevant ontology subsets for both extraction and query operations, enabling focused and contextually appropriate processing while maintaining semantic validity.

Service Name: kg-extract-ontology

Goals

Ontology-Conformant Extraction: Ensure all extracted triples strictly conform to loaded ontologies
Dynamic Context Selection: Use embeddings to select relevant ontology subsets for each chunk
Semantic Consistency: Maintain class hierarchies, property domains/ranges, and constraints
Efficient Processing: Use in-memory vector stores for fast ontology element matching
Scalable Architecture: Support multiple concurrent ontologies with different domains

Background

Current knowledge extraction services (kg-extract-definitions, kg-extract-relationships) operate without formal constraints, potentially producing inconsistent or incompatible triples. OntoRAG addresses this by:

Loading formal ontologies that define valid classes and properties
Using embeddings to match text content with relevant ontology elements
Constraining extraction to only produce ontology-conformant triples
Providing semantic validation of extracted knowledge

This approach combines the flexibility of neural extraction with the rigor of formal knowledge representation.

Technical Design

Architecture

The OntoRAG system consists of the following components:

┌─────────────────┐
│  Configuration  │
│    Service      │
└────────┬────────┘
         │ Ontologies
         ▼
┌─────────────────┐     ┌──────────────┐
│ kg-extract-     │────▶│  Embedding   │
│   ontology      │     │   Service    │
└────────┬────────┘     └──────────────┘
         │                      │
         ▼                      ▼
┌─────────────────┐     ┌──────────────┐
│   In-Memory     │◀────│   Ontology   │
│  Vector Store   │     │   Embedder   │
└────────┬────────┘     └──────────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────┐
│    Sentence     │────▶│   Chunker    │
│    Splitter     │     │   Service    │
└────────┬────────┘     └──────────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────┐
│    Ontology     │────▶│   Vector     │
│    Selector     │     │   Search     │
└────────┬────────┘     └──────────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────┐
│    Prompt       │────▶│   Prompt     │
│   Constructor   │     │   Service    │
└────────┬────────┘     └──────────────┘
         │
         ▼
┌─────────────────┐
│  Triple Output  │
└─────────────────┘

Component Details

1. Ontology Loader

Purpose: Retrieves and parses ontology configurations from the configuration service at service startup.

Algorithm Description: The Ontology Loader connects to the configuration service and requests all configuration items of type "ontology". For each ontology configuration found, it parses the JSON structure containing metadata, classes, object properties, and datatype properties. These parsed ontologies are stored in memory as structured objects that can be efficiently accessed during the extraction process. The loader runs once during service initialisation and can optionally refresh ontologies at configured intervals to pick up updates.

Key Operations:

Query configuration service for all ontology-type configurations
Parse JSON ontology structures into internal object models
Validate ontology structure and consistency
Cache parsed ontologies in memory for fast access

Loads ontologies from the configuration service during initialisation:

class OntologyLoader:
    def __init__(self, config_service):
        self.config_service = config_service
        self.ontologies = {}

    async def load_ontologies(self):
        # Fetch all ontology configurations
        configs = await self.config_service.get_configs(type="ontology")

        for config_id, ontology_data in configs:
            self.ontologies[config_id] = Ontology(
                metadata=ontology_data['metadata'],
                classes=ontology_data['classes'],
                object_properties=ontology_data['objectProperties'],
                datatype_properties=ontology_data['datatypeProperties']
            )

        return self.ontologies

2. Ontology Embedder

Purpose: Creates vector embeddings for all ontology elements to enable semantic similarity matching.

Algorithm Description: The Ontology Embedder processes each element in the loaded ontologies (classes, object properties, and datatype properties) and generates vector embeddings using an embedding service. For each element, it combines the element's identifier with its description (from rdfs:comment) to create a text representation. This text is then converted to a high-dimensional vector embedding that captures its semantic meaning. These embeddings are stored in an in-memory vector store along with metadata about the element type, source ontology, and full definition. This preprocessing step happens once at startup, creating a searchable index of all ontology concepts.

Key Operations:

Concatenate element IDs with their descriptions for rich semantic representation
Generate embeddings via external embedding service (e.g., text-embedding-3-small)
Store embeddings with comprehensive metadata in vector store
Index by ontology, element type, and element ID for efficient retrieval

Generates embeddings for ontology elements and stores them in an in-memory vector store:

class OntologyEmbedder:
    def __init__(self, embedding_service, vector_store):
        self.embedding_service = embedding_service
        self.vector_store = vector_store

    async def embed_ontologies(self, ontologies):
        for onto_id, ontology in ontologies.items():
            # Embed classes
            for class_id, class_def in ontology.classes.items():
                text = f"{class_id} {class_def.get('rdfs:comment', '')}"
                embedding = await self.embedding_service.embed(text)

                self.vector_store.add(
                    id=f"{onto_id}:class:{class_id}",
                    embedding=embedding,
                    metadata={
                        'type': 'class',
                        'ontology': onto_id,
                        'element': class_id,
                        'definition': class_def
                    }
                )

            # Embed properties (object and datatype)
            for prop_type in ['objectProperties', 'datatypeProperties']:
                for prop_id, prop_def in getattr(ontology, prop_type).items():
                    text = f"{prop_id} {prop_def.get('rdfs:comment', '')}"
                    embedding = await self.embedding_service.embed(text)

                    self.vector_store.add(
                        id=f"{onto_id}:{prop_type}:{prop_id}",
                        embedding=embedding,
                        metadata={
                            'type': prop_type,
                            'ontology': onto_id,
                            'element': prop_id,
                            'definition': prop_def
                        }
                    )

3. Sentence Splitter

Purpose: Decomposes text chunks into fine-grained segments for precise ontology matching.

Algorithm Description: The Sentence Splitter takes incoming text chunks and breaks them down into smaller, more manageable units. First, it uses natural language processing techniques (via NLTK or spaCy) to identify sentence boundaries, handling edge cases like abbreviations and decimal points. Then, for each sentence, it extracts meaningful phrases including noun phrases (e.g., "the red car"), verb phrases (e.g., "quickly ran"), and named entities. This multi-level segmentation ensures that both complete thoughts (sentences) and specific concepts (phrases) can be matched against ontology elements. Each segment is tagged with its type and position information to maintain context.

Key Operations:

Split text into sentences using NLP sentence detection
Extract noun phrases and verb phrases from each sentence
Identify named entities and key terms
Maintain hierarchical relationship between sentences and their phrases
Preserve positional information for context reconstruction

Breaks incoming chunks into smaller sentences and phrases for granular matching:

class SentenceSplitter:
    def __init__(self):
        # Use NLTK or spaCy for sophisticated splitting
        self.sentence_detector = SentenceDetector()
        self.phrase_extractor = PhraseExtractor()

    def split_chunk(self, chunk_text):
        sentences = self.sentence_detector.split(chunk_text)

        segments = []
        for sentence in sentences:
            # Add full sentence
            segments.append({
                'text': sentence,
                'type': 'sentence',
                'position': len(segments)
            })

            # Extract noun phrases and verb phrases
            phrases = self.phrase_extractor.extract(sentence)
            for phrase in phrases:
                segments.append({
                    'text': phrase,
                    'type': 'phrase',
                    'parent_sentence': sentence,
                    'position': len(segments)
                })

        return segments

4. Ontology Selector

Purpose: Identifies the most relevant subset of ontology elements for the current text chunk.

Algorithm Description: The Ontology Selector performs semantic matching between text segments and ontology elements using vector similarity search. For each sentence and phrase from the text chunk, it generates an embedding and searches the vector store for the most similar ontology elements. The search uses cosine similarity with a configurable threshold (e.g., 0.7) to find semantically related concepts. After collecting all relevant elements, it performs dependency resolution to ensure completeness - if a class is selected, its parent classes are included; if a property is selected, its domain and range classes are added. This creates a minimal but complete ontology subset that contains all necessary elements for valid triple extraction while avoiding irrelevant concepts that could confuse the extraction process.

Key Operations:

Generate embeddings for each text segment (sentences and phrases)
Perform k-nearest neighbor search in the vector store
Apply similarity threshold to filter weak matches
Resolve dependencies (parent classes, domains, ranges)
Construct coherent ontology subset with all required relationships
Deduplicate elements appearing multiple times

Uses vector similarity to find relevant ontology elements for each text segment:

class OntologySelector:
    def __init__(self, embedding_service, vector_store):
        self.embedding_service = embedding_service
        self.vector_store = vector_store

    async def select_ontology_subset(self, segments, top_k=10):
        relevant_elements = set()

        for segment in segments:
            # Get embedding for segment
            embedding = await self.embedding_service.embed(segment['text'])

            # Search for similar ontology elements
            results = self.vector_store.search(
                embedding=embedding,
                top_k=top_k,
                threshold=0.7  # Similarity threshold
            )

            for result in results:
                relevant_elements.add((
                    result['metadata']['ontology'],
                    result['metadata']['type'],
                    result['metadata']['element'],
                    result['metadata']['definition']
                ))

        # Build ontology subset
        return self._build_subset(relevant_elements)

    def _build_subset(self, elements):
        # Include selected elements and their dependencies
        # (parent classes, domain/range references, etc.)
        subset = {
            'classes': {},
            'objectProperties': {},
            'datatypeProperties': {}
        }

        for onto_id, elem_type, elem_id, definition in elements:
            if elem_type == 'class':
                subset['classes'][elem_id] = definition
                # Include parent classes
                if 'rdfs:subClassOf' in definition:
                    parent = definition['rdfs:subClassOf']
                    # Recursively add parent from full ontology
            elif elem_type == 'objectProperties':
                subset['objectProperties'][elem_id] = definition
                # Include domain and range classes
            elif elem_type == 'datatypeProperties':
                subset['datatypeProperties'][elem_id] = definition

        return subset

5. Prompt Constructor

Purpose: Creates structured prompts that guide the LLM to extract only ontology-conformant triples.

Algorithm Description: The Prompt Constructor assembles a carefully formatted prompt that constrains the LLM's extraction to the selected ontology subset. It takes the relevant classes and properties identified by the Ontology Selector and formats them into clear instructions. Classes are presented with their hierarchical relationships and descriptions. Properties are shown with their domain and range constraints, making explicit what types of entities they can connect. The prompt includes strict rules about using only the provided ontology elements and respecting all constraints. The original text chunk is then appended, and the LLM is instructed to extract triples in the format (subject, predicate, object). This structured approach ensures the LLM understands both what to look for and what constraints to respect.

Key Operations:

Format classes with parent relationships and descriptions
Format properties with domain/range constraints
Include explicit extraction rules and constraints
Specify output format for consistent parsing
Balance prompt size with completeness of ontology information

Builds prompts for the extraction service with ontology constraints:

class PromptConstructor:
    def __init__(self):
        self.template = """
Extract knowledge triples from the following text using ONLY the provided ontology elements.

ONTOLOGY CLASSES:
{classes}

OBJECT PROPERTIES (connect entities):
{object_properties}

DATATYPE PROPERTIES (entity attributes):
{datatype_properties}

RULES:
1. Only use classes defined above for entity types
2. Only use properties defined above for relationships and attributes
3. Respect domain and range constraints
4. Output format: (subject, predicate, object)

TEXT:
{text}

TRIPLES:
"""

    def build_prompt(self, chunk_text, ontology_subset):
        classes_str = self._format_classes(ontology_subset['classes'])
        obj_props_str = self._format_properties(
            ontology_subset['objectProperties'],
            'object'
        )
        dt_props_str = self._format_properties(
            ontology_subset['datatypeProperties'],
            'datatype'
        )

        return self.template.format(
            classes=classes_str,
            object_properties=obj_props_str,
            datatype_properties=dt_props_str,
            text=chunk_text
        )

    def _format_classes(self, classes):
        lines = []
        for class_id, definition in classes.items():
            comment = definition.get('rdfs:comment', '')
            parent = definition.get('rdfs:subClassOf', 'Thing')
            lines.append(f"- {class_id} (subclass of {parent}): {comment}")
        return '\n'.join(lines)

    def _format_properties(self, properties, prop_type):
        lines = []
        for prop_id, definition in properties.items():
            comment = definition.get('rdfs:comment', '')
            domain = definition.get('rdfs:domain', 'Any')
            range_val = definition.get('rdfs:range', 'Any')
            lines.append(f"- {prop_id} ({domain} -> {range_val}): {comment}")
        return '\n'.join(lines)

6. Main Extractor Service

Purpose: Coordinates all components to perform end-to-end ontology-based triple extraction.

Algorithm Description: The Main Extractor Service is the orchestration layer that manages the complete extraction workflow. During initialisation, it loads all ontologies and pre-computes their embeddings, creating the searchable vector index. When a text chunk arrives for processing, it coordinates the pipeline: first splitting the text into segments, then finding relevant ontology elements through vector search, constructing a constrained prompt, calling the LLM service, and finally parsing and validating the response. The service ensures that each extracted triple conforms to the ontology by validating that subjects and objects are valid class instances, predicates are valid properties, and all domain/range constraints are satisfied. Only validated triples that fully conform to the ontology are returned.

Extraction Pipeline:

Receive text chunk for processing
Split into sentences and phrases for granular analysis
Search vector store to find relevant ontology concepts
Build ontology subset including dependencies
Construct prompt with ontology constraints and text
Call LLM service for triple extraction
Parse response into structured triples
Validate each triple against ontology rules
Return only valid, ontology-conformant triples

Orchestrates the complete extraction pipeline:

class KgExtractOntology:
    def __init__(self, config):
        self.loader = OntologyLoader(config['config_service'])
        self.embedder = OntologyEmbedder(
            config['embedding_service'],
            InMemoryVectorStore()
        )
        self.splitter = SentenceSplitter()
        self.selector = OntologySelector(
            config['embedding_service'],
            self.embedder.vector_store
        )
        self.prompt_builder = PromptConstructor()
        self.prompt_service = config['prompt_service']

    async def initialize(self):
        # Load and embed ontologies at startup
        ontologies = await self.loader.load_ontologies()
        await self.embedder.embed_ontologies(ontologies)

    async def extract(self, chunk):
        # Split chunk into segments
        segments = self.splitter.split_chunk(chunk['text'])

        # Select relevant ontology subset
        ontology_subset = await self.selector.select_ontology_subset(segments)

        # Build extraction prompt
        prompt = self.prompt_builder.build_prompt(
            chunk['text'],
            ontology_subset
        )

        # Call prompt service
        response = await self.prompt_service.generate(prompt)

        # Parse and validate triples
        triples = self.parse_triples(response)
        validated_triples = self.validate_triples(triples, ontology_subset)

        return validated_triples

    def parse_triples(self, response):
        # Parse LLM response into structured triples
        triples = []
        for line in response.split('\n'):
            if line.strip().startswith('(') and line.strip().endswith(')'):
                # Parse (subject, predicate, object)
                parts = line.strip()[1:-1].split(',')
                if len(parts) == 3:
                    triples.append({
                        'subject': parts[0].strip(),
                        'predicate': parts[1].strip(),
                        'object': parts[2].strip()
                    })
        return triples

    def validate_triples(self, triples, ontology_subset):
        # Validate against ontology constraints
        validated = []
        for triple in triples:
            if self._is_valid(triple, ontology_subset):
                validated.append(triple)
        return validated

Configuration

The service loads configuration on startup:

kg-extract-ontology:
  embedding_model: "text-embedding-3-small"
  vector_store:
    type: "in-memory"
    similarity_threshold: 0.7
    top_k: 10
  sentence_splitter:
    model: "nltk"
    max_sentence_length: 512
  prompt_service:
    endpoint: "http://prompt-service:8080"
    model: "gpt-4"
    temperature: 0.1
  ontology_refresh_interval: 300  # seconds

Data Flow

Initialisation Phase:
- Load ontologies from configuration service
- Generate embeddings for all ontology elements
- Store embeddings in in-memory vector store
Extraction Phase (per chunk):
- Split chunk into sentences and phrases
- Compute embeddings for each segment
- Search vector store for relevant ontology elements
- Build ontology subset with selected elements
- Construct prompt with chunk text and ontology subset
- Call prompt service for extraction
- Parse and validate returned triples
- Output conformant triples

In-Memory Vector Store

Purpose: Provides fast, memory-based similarity search for ontology element matching.

Recommended Implementation: FAISS

The system should use FAISS (Facebook AI Similarity Search) as the primary vector store implementation for the following reasons:

Performance: Optimised for similarity search with microsecond latency, critical for real-time query processing
Memory Efficiency: Multiple index types (Flat, IVF, HNSW) allow memory/speed tradeoffs based on ontology size
Scalability: Efficiently handles hundreds to tens of thousands of ontology elements
Production Ready: Battle-tested in production environments with excellent stability
Python Integration: Native Python bindings with numpy compatibility for seamless integration

FAISS Implementation:

import faiss
import numpy as np

class FAISSVectorStore:
    def __init__(self, dimension=1536, index_type='flat'):
        """
        Initialize FAISS vector store.

        Args:
            dimension: Embedding dimension (1536 for text-embedding-3-small)
            index_type: 'flat' for exact search, 'ivf' for larger datasets
        """
        self.dimension = dimension
        self.metadata = []
        self.ids = []

        if index_type == 'flat':
            # Exact search - best for ontologies with <10k elements
            self.index = faiss.IndexFlatIP(dimension)
        else:
            # Approximate search - for larger ontologies
            quantizer = faiss.IndexFlatIP(dimension)
            self.index = faiss.IndexIVFFlat(quantizer, dimension, 100)
            self.index.train(np.random.randn(1000, dimension).astype('float32'))

    def add(self, id, embedding, metadata):
        """Add single embedding with metadata."""
        # Normalize for cosine similarity
        embedding = embedding / np.linalg.norm(embedding)
        self.index.add(np.array([embedding], dtype=np.float32))
        self.metadata.append(metadata)
        self.ids.append(id)

    def add_batch(self, ids, embeddings, metadata_list):
        """Batch add for initial ontology loading."""
        # Normalize all embeddings
        norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
        normalized = embeddings / norms
        self.index.add(normalized.astype(np.float32))
        self.metadata.extend(metadata_list)
        self.ids.extend(ids)

    def search(self, embedding, top_k=10, threshold=0.0):
        """Search for similar vectors."""
        # Normalize query
        embedding = embedding / np.linalg.norm(embedding)

        # Search
        scores, indices = self.index.search(
            np.array([embedding], dtype=np.float32),
            min(top_k, self.index.ntotal)
        )

        # Filter by threshold and format results
        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx >= 0 and score >= threshold:  # FAISS returns -1 for empty slots
                results.append({
                    'id': self.ids[idx],
                    'score': float(score),
                    'metadata': self.metadata[idx]
                })

        return results

    def clear(self):
        """Reset the store."""
        self.index.reset()
        self.metadata = []
        self.ids = []

    def size(self):
        """Return number of stored vectors."""
        return self.index.ntotal

Fallback Implementation (NumPy):

For development or small-scale deployments, a simple NumPy implementation can be used:

class SimpleVectorStore:
    """Fallback implementation using NumPy - suitable for <1000 elements."""
    def __init__(self):
        self.embeddings = []
        self.metadata = []
        self.ids = []

    def add(self, id, embedding, metadata):
        self.embeddings.append(embedding / np.linalg.norm(embedding))
        self.metadata.append(metadata)
        self.ids.append(id)

    def search(self, embedding, top_k=10, threshold=0.0):
        if not self.embeddings:
            return []

        # Normalize and compute similarities
        embedding = embedding / np.linalg.norm(embedding)
        similarities = np.dot(self.embeddings, embedding)

        # Get top-k indices
        top_indices = np.argsort(similarities)[::-1][:top_k]

        # Build results
        results = []
        for idx in top_indices:
            if similarities[idx] >= threshold:
                results.append({
                    'id': self.ids[idx],
                    'score': float(similarities[idx]),
                    'metadata': self.metadata[idx]
                })

        return results

Ontology Subset Selection Algorithm

Purpose: Dynamically selects the minimal relevant portion of the ontology for each text chunk.

Detailed Algorithm Steps:

Text Segmentation:
- Split the input chunk into sentences using NLP sentence detection
- Extract noun phrases, verb phrases, and named entities from each sentence
- Create a hierarchical structure of segments preserving context
Embedding Generation:
- Generate vector embeddings for each text segment (sentences and phrases)
- Use the same embedding model as used for ontology elements
- Cache embeddings for repeated segments to improve performance
Similarity Search:
- For each text segment embedding, search the vector store
- Retrieve top-k (e.g., 10) most similar ontology elements
- Apply similarity threshold (e.g., 0.7) to filter weak matches
- Aggregate results across all segments, tracking match frequencies
Dependency Resolution:
- For each selected class, recursively include all parent classes up to root
- For each selected property, include its domain and range classes
- For inverse properties, ensure both directions are included
- Add equivalent classes if they exist in the ontology
Subset Construction:
- Deduplicate collected elements while preserving relationships
- Organise into classes, object properties, and datatype properties
- Ensure all constraints and relationships are preserved
- Create a self-contained mini-ontology that is valid and complete

Example Walkthrough: Given text: "The brown dog chased the white cat up the tree."

Segments: ["brown dog", "white cat", "tree", "chased"]
Matched elements: [dog (class), cat (class), animal (parent), chases (property)]
Dependencies: [animal (parent of dog and cat), lifeform (parent of animal)]
Final subset: Complete mini-ontology with animal hierarchy and chase relationship

Triple Validation

Purpose: Ensures all extracted triples strictly conform to ontology constraints.

Validation Algorithm:

Class Validation:
- Verify that subjects are instances of classes defined in the ontology subset
- For object properties, verify that objects are also valid class instances
- Check class names against the ontology's class dictionary
- Handle class hierarchies - instances of subclasses are valid for parent class constraints
Property Validation:
- Confirm predicates correspond to properties in the ontology subset
- Distinguish between object properties (entity-to-entity) and datatype properties (entity-to-literal)
- Verify property names match exactly (considering namespace if present)
Domain/Range Checking:
- For each property used as predicate, retrieve its domain and range
- Verify the subject's type matches or inherits from the property's domain
- Verify the object's type matches or inherits from the property's range
- For datatype properties, verify the object is a literal of the correct XSD type
Cardinality Validation:
- Track property usage counts per subject
- Check minimum cardinality - ensure required properties are present
- Check maximum cardinality - ensure property isn't used too many times
- For functional properties, ensure at most one value per subject
Datatype Validation:
- Parse literal values according to their declared XSD types
- Validate integers are valid numbers, dates are properly formatted, etc.
- Check string patterns if regex constraints are defined
- Ensure URIs are well-formed for xsd:anyURI types

Validation Example: Triple: ("Buddy", "has-owner", "John")

Check "Buddy" is typed as a class that can have "has-owner" property
Check "has-owner" exists in the ontology
Verify domain constraint: subject must be of type "Pet" or subclass
Verify range constraint: object must be of type "Person" or subclass
If valid, add to output; if invalid, log violation and skip

Performance Considerations

Optimisation Strategies

Embedding Caching: Cache embeddings for frequently used text segments
Batch Processing: Process multiple segments in parallel
Vector Store Indexing: Use approximate nearest neighbor algorithms for large ontologies
Prompt Optimisation: Minimise prompt size by including only essential ontology elements
Result Caching: Cache extraction results for identical chunks

Scalability

Horizontal Scaling: Multiple extractor instances with shared ontology cache
Ontology Partitioning: Split large ontologies by domain
Streaming Processing: Process chunks as they arrive without batching
Memory Management: Periodic cleanup of unused embeddings

Error Handling

Failure Scenarios

Missing Ontologies: Fallback to unconstrained extraction
Embedding Service Failure: Use cached embeddings or skip semantic matching
Prompt Service Timeout: Retry with exponential backoff
Invalid Triple Format: Log and skip malformed triples
Ontology Inconsistencies: Report conflicts and use most specific valid elements

Monitoring

Key metrics to track:

Ontology load time and memory usage
Embedding generation latency
Vector search performance
Prompt service response time
Triple extraction accuracy
Ontology conformance rate

Migration Path

From Existing Extractors

Parallel Operation: Run alongside existing extractors initially
Gradual Rollout: Start with specific document types
Quality Comparison: Compare output quality with existing extractors
Full Migration: Replace existing extractors once quality verified

Ontology Development

Bootstrap from Existing: Generate initial ontologies from existing knowledge
Iterative Refinement: Refine based on extraction patterns
Domain Expert Review: Validate with subject matter experts
Continuous Improvement: Update based on extraction feedback

Ontology-Sensitive Query Service

Overview

The ontology-sensitive query service provides multiple query paths to support different backend graph stores. It leverages ontology knowledge for precise, semantically-aware question answering across both Cassandra (via SPARQL) and Cypher-based graph stores (Neo4j, Memgraph, FalkorDB).

Service Components:

onto-query-sparql: Converts natural language to SPARQL for Cassandra
sparql-cassandra: SPARQL query layer for Cassandra using rdflib
onto-query-cypher: Converts natural language to Cypher for graph databases
cypher-executor: Cypher query execution for Neo4j/Memgraph/FalkorDB

Architecture

                    ┌─────────────────┐
                    │   User Query    │
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐     ┌──────────────┐
                    │   Question      │────▶│   Sentence   │
                    │   Analyser      │     │   Splitter   │
                    └────────┬────────┘     └──────────────┘
                             │
                             ▼
                    ┌─────────────────┐     ┌──────────────┐
                    │   Ontology      │────▶│   Vector     │
                    │   Matcher       │     │    Store     │
                    └────────┬────────┘     └──────────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │ Backend Router  │
                    └────────┬────────┘
                             │
                 ┌───────────┴───────────┐
                 │                       │
                 ▼                       ▼
    ┌─────────────────┐          ┌─────────────────┐
    │ onto-query-     │          │ onto-query-     │
    │    sparql       │          │    cypher       │
    └────────┬────────┘          └────────┬────────┘
             │                            │
             ▼                            ▼
    ┌─────────────────┐          ┌─────────────────┐
    │   SPARQL        │          │   Cypher        │
    │  Generator      │          │  Generator      │
    └────────┬────────┘          └────────┬────────┘
             │                            │
             ▼                            ▼
    ┌─────────────────┐          ┌─────────────────┐
    │ sparql-         │          │ cypher-         │
    │ cassandra       │          │ executor        │
    └────────┬────────┘          └────────┬────────┘
             │                            │
             ▼                            ▼
    ┌─────────────────┐          ┌─────────────────┐
    │   Cassandra     │          │ Neo4j/Memgraph/ │
    │                 │          │   FalkorDB      │
    └────────┬────────┘          └────────┬────────┘
             │                            │
             └────────────┬───────────────┘
                          │
                          ▼
                 ┌─────────────────┐     ┌──────────────┐
                 │   Answer        │────▶│   Prompt     │
                 │  Generator      │     │   Service    │
                 └────────┬────────┘     └──────────────┘
                          │
                          ▼
                 ┌─────────────────┐
                 │  Final Answer   │
                 └─────────────────┘

Query Processing Pipeline

1. Question Analyser

Purpose: Decomposes user questions into semantic components for ontology matching.

Algorithm Description: The Question Analyser takes the incoming natural language question and breaks it down into meaningful segments using the same sentence splitting approach as the extraction pipeline. It identifies key entities, relationships, and constraints mentioned in the question. Each segment is analysed for question type (factual, aggregation, comparison, etc.) and the expected answer format. This decomposition helps identify which parts of the ontology are most relevant for answering the question.

Key Operations:

Split question into sentences and phrases
Identify question type and intent
Extract mentioned entities and relationships
Detect constraints and filters in the question
Determine expected answer format

2. Ontology Matcher for Queries

Purpose: Identifies the relevant ontology subset needed to answer the question.

Algorithm Description: Similar to the extraction pipeline's Ontology Selector, but optimised for question answering. The matcher generates embeddings for question segments and searches the vector store for relevant ontology elements. However, it focuses on finding concepts that would be useful for query construction rather than extraction. It expands the selection to include related properties that might be traversed during graph exploration, even if not explicitly mentioned in the question. For example, if asked about "employees," it might include properties like "works-for," "manages," and "reports-to" that could be relevant for finding employee information.

Matching Strategy:

Embed question segments
Find directly mentioned ontology concepts
Include properties that connect mentioned classes
Add inverse and related properties for traversal
Include parent/child classes for hierarchical queries
Build query-focused ontology partition

3. Backend Router

Purpose: Routes queries to the appropriate backend-specific query path based on configuration.

Algorithm Description: The Backend Router examines the system configuration to determine which graph backend is active (Cassandra or Cypher-based). It routes the question and ontology partition to the appropriate query generation service. The router can also support load balancing across multiple backends or fallback mechanisms if the primary backend is unavailable.

Routing Logic:

Check configured backend type from system settings
Route to onto-query-sparql for Cassandra backends
Route to onto-query-cypher for Neo4j/Memgraph/FalkorDB
Support multi-backend configurations with query distribution
Handle failover and load balancing scenarios

4. SPARQL Query Generation (`onto-query-sparql`)

Purpose: Converts natural language questions to SPARQL queries for Cassandra execution.

Algorithm Description: The SPARQL query generator takes the question and ontology partition and constructs a SPARQL query optimised for execution against the Cassandra backend. It uses the prompt service with a SPARQL-specific template that includes RDF/OWL semantics. The generator understands SPARQL patterns like property paths, optional clauses, and filters that can efficiently translate to Cassandra operations.

SPARQL Generation Prompt Template:

Generate a SPARQL query for the following question using the provided ontology.

ONTOLOGY CLASSES:
{classes}

ONTOLOGY PROPERTIES:
{properties}

RULES:
- Use proper RDF/OWL semantics
- Include relevant prefixes
- Use property paths for hierarchical queries
- Add FILTER clauses for constraints
- Optimise for Cassandra backend

QUESTION: {question}

SPARQL QUERY:

5. Cypher Query Generation (`onto-query-cypher`)

Purpose: Converts natural language questions to Cypher queries for graph databases.

Algorithm Description: The Cypher query generator creates native Cypher queries optimised for Neo4j, Memgraph, and FalkorDB. It maps ontology classes to node labels and properties to relationships, using Cypher's pattern matching syntax. The generator includes Cypher-specific optimisations like relationship direction hints, index usage, and query planning hints.

Cypher Generation Prompt Template:

Generate a Cypher query for the following question using the provided ontology.

NODE LABELS (from classes):
{classes}

RELATIONSHIP TYPES (from properties):
{properties}

RULES:
- Use MATCH patterns for graph traversal
- Include WHERE clauses for filters
- Use aggregation functions when needed
- Optimise for graph database performance
- Consider index hints for large datasets

QUESTION: {question}

CYPHER QUERY:

6. SPARQL-Cassandra Query Engine (`sparql-cassandra`)

Purpose: Executes SPARQL queries against Cassandra using Python rdflib.

Algorithm Description: The SPARQL-Cassandra engine implements a SPARQL processor using Python's rdflib library with a custom Cassandra backend store. It translates SPARQL graph patterns into appropriate Cassandra CQL queries, handling joins, filters, and aggregations. The engine maintains an RDF-to-Cassandra mapping that preserves the semantic structure while optimising for Cassandra's column-family storage model.

Implementation Features:

rdflib Store interface implementation for Cassandra
SPARQL 1.1 query support with common patterns
Efficient translation of triple patterns to CQL
Support for property paths and hierarchical queries
Result streaming for large datasets
Connection pooling and query caching

Example Translation:

SELECT ?animal WHERE {
  ?animal rdf:type :Animal .
  ?animal :hasOwner "John" .
}

Translates to optimised Cassandra queries leveraging indexes and partition keys.

7. Cypher Query Executor (`cypher-executor`)

Purpose: Executes Cypher queries against Neo4j, Memgraph, and FalkorDB.

Algorithm Description: The Cypher executor provides a unified interface for executing Cypher queries across different graph databases. It handles database-specific connection protocols, query optimisation hints, and result format normalisation. The executor includes retry logic, connection pooling, and transaction management appropriate for each database type.

Multi-Database Support:

Neo4j: Bolt protocol, transaction functions, index hints
Memgraph: Custom protocol, streaming results, analytical queries
FalkorDB: Redis protocol adaptation, in-memory optimisations

Execution Features:

Database-agnostic connection management
Query validation and syntax checking
Timeout and resource limit enforcement
Result pagination and streaming
Performance monitoring per database type
Automatic failover between database instances

8. Answer Generator

Purpose: Synthesises a natural language answer from query results.

Algorithm Description: The Answer Generator takes the structured query results and the original question, then uses the prompt service to generate a comprehensive answer. Unlike simple template-based responses, it uses an LLM to interpret the graph data in the context of the question, handling complex relationships, aggregations, and inferences. The generator can explain its reasoning by referencing the ontology structure and the specific triples retrieved from the graph.

Answer Generation Process:

Format query results into structured context
Include relevant ontology definitions for clarity
Construct prompt with question and results
Generate natural language answer via LLM
Validate answer against query intent
Add citations to specific graph entities if needed

Integration with Existing Services

Relationship with GraphRAG

Complementary: onto-query provides semantic precision while GraphRAG provides broad coverage
Shared Infrastructure: Both use the same knowledge graph and prompt services
Query Routing: System can route queries to most appropriate service based on question type
Hybrid Mode: Can combine both approaches for comprehensive answers

Relationship with OntoRAG Extraction

Shared Ontologies: Uses same ontology configurations loaded by kg-extract-ontology
Shared Vector Store: Reuses the in-memory embeddings from extraction service
Consistent Semantics: Queries operate on graphs built with same ontological constraints

Query Examples

Example 1: Simple Entity Query

Question: "What animals are mammals?" Ontology Match: [animal, mammal, subClassOf] Generated Query:

MATCH (a:animal)-[:subClassOf*]->(m:mammal)
RETURN a.name

Example 2: Relationship Query

Question: "Which documents were authored by John Smith?" Ontology Match: [document, person, has-author] Generated Query:

MATCH (d:document)-[:has-author]->(p:person {name: "John Smith"})
RETURN d.title, d.date

Example 3: Aggregation Query

Question: "How many legs do cats have?" Ontology Match: [cat, number-of-legs (datatype property)] Generated Query:

MATCH (c:cat)
RETURN c.name, c.number_of_legs

Configuration

onto-query:
  embedding_model: "text-embedding-3-small"
  vector_store:
    shared_with_extractor: true  # Reuse kg-extract-ontology's store
  query_builder:
    model: "gpt-4"
    temperature: 0.1
    max_query_length: 1000
  graph_executor:
    timeout: 30000  # ms
    max_results: 1000
  answer_generator:
    model: "gpt-4"
    temperature: 0.3
    max_tokens: 500

Performance Optimisations

Query Optimisation

Ontology Pruning: Only include necessary ontology elements in prompts
Query Caching: Cache frequently asked questions and their queries
Result Caching: Store results for identical queries within time window
Batch Processing: Handle multiple related questions in single graph traversal

Scalability Considerations

Distributed Execution: Parallelise subqueries across graph partitions
Incremental Results: Stream results for large datasets
Load Balancing: Distribute query load across multiple service instances
Resource Pools: Manage connection pools to graph databases

Error Handling

Failure Scenarios

Invalid Query Generation: Fallback to GraphRAG or simple keyword search
Ontology Mismatch: Expand search to broader ontology subset
Query Timeout: Simplify query or increase timeout
Empty Results: Suggest query reformulation or related questions
LLM Service Failure: Use cached queries or template-based responses

Monitoring Metrics

Question complexity distribution
Ontology partition sizes
Query generation success rate
Graph query execution time
Answer quality scores
Cache hit rates
Error frequencies by type

Future Enhancements

Ontology Learning: Automatically extend ontologies based on extraction patterns
Confidence Scoring: Assign confidence scores to extracted triples
Explanation Generation: Provide reasoning for triple extraction
Active Learning: Request human validation for uncertain extractions

Security Considerations

Prompt Injection Prevention: Sanitise chunk text before prompt construction
Resource Limits: Cap memory usage for vector store
Rate Limiting: Limit extraction requests per client
Audit Logging: Track all extraction requests and results

Testing Strategy

Unit Testing

Ontology loader with various formats
Embedding generation and storage
Sentence splitting algorithms
Vector similarity calculations
Triple parsing and validation

Integration Testing

End-to-end extraction pipeline
Configuration service integration
Prompt service interaction
Concurrent extraction handling

Performance Testing

Large ontology handling (1000+ classes)
High-volume chunk processing
Memory usage under load
Latency benchmarks

Delivery Plan

Overview

The OntoRAG system will be delivered in four major phases, with each phase providing incremental value while building toward the complete system. The plan focuses on establishing core extraction capabilities first, then adding query functionality, followed by optimizations and advanced features.

Phase 1: Foundation and Core Extraction

Goal: Establish the basic ontology-driven extraction pipeline with simple vector matching.

Step 1.1: Ontology Management Foundation

Implement ontology configuration loader (OntologyLoader)
Parse and validate ontology JSON structures
Create in-memory ontology storage and access patterns
Implement ontology refresh mechanism

Success Criteria:

Successfully load and parse ontology configurations
Validate ontology structure and consistency
Handle multiple concurrent ontologies

Step 1.2: Vector Store Implementation

Implement simple NumPy-based vector store as initial prototype
Add FAISS vector store implementation
Create vector store interface abstraction
Implement similarity search with configurable thresholds

Success Criteria:

Store and retrieve embeddings efficiently
Perform similarity search with <100ms latency
Support both NumPy and FAISS backends

Step 1.3: Ontology Embedding Pipeline

Integrate with embedding service
Implement OntologyEmbedder component
Generate embeddings for all ontology elements
Store embeddings with metadata in vector store

Success Criteria:

Generate embeddings for classes and properties
Store embeddings with proper metadata
Rebuild embeddings on ontology updates

Step 1.4: Text Processing Components

Implement sentence splitter using NLTK/spaCy
Extract phrases and named entities
Create text segment hierarchy
Generate embeddings for text segments

Success Criteria:

Accurately split text into sentences
Extract meaningful phrases
Maintain context relationships

Step 1.5: Ontology Selection Algorithm

Implement similarity matching between text and ontology
Build dependency resolution for ontology elements
Create minimal coherent ontology subsets
Optimize subset generation performance

Success Criteria:

Select relevant ontology elements with >80% precision
Include all necessary dependencies
Generate subsets in <500ms

Step 1.6: Basic Extraction Service

Implement prompt construction for extraction
Integrate with prompt service
Parse and validate triple responses
Create kg-extract-ontology service endpoint

Success Criteria:

Extract ontology-conformant triples
Validate all triples against ontology
Handle extraction errors gracefully

Phase 2: Query System Implementation

Goal: Add ontology-aware query capabilities with support for multiple backends.

Step 2.1: Query Foundation Components

Implement question analyzer
Create ontology matcher for queries
Adapt vector search for query context
Build backend router component

Success Criteria:

Analyze questions into semantic components
Match questions to relevant ontology elements
Route queries to appropriate backend

Step 2.2: SPARQL Path Implementation

Implement onto-query-sparql service
Create SPARQL query generator using LLM
Develop prompt templates for SPARQL generation
Validate generated SPARQL syntax

Success Criteria:

Generate valid SPARQL queries
Use appropriate SPARQL patterns
Handle complex query types

Step 2.3: SPARQL-Cassandra Engine

Implement rdflib Store interface for Cassandra
Create CQL query translator
Optimize triple pattern matching
Handle SPARQL result formatting

Success Criteria:

Execute SPARQL queries on Cassandra
Support common SPARQL patterns
Return results in standard format

Step 2.4: Cypher Path Implementation

Implement onto-query-cypher service
Create Cypher query generator using LLM
Develop prompt templates for Cypher generation
Validate generated Cypher syntax

Success Criteria:

Generate valid Cypher queries
Use appropriate graph patterns
Support Neo4j, Memgraph, FalkorDB

Step 2.5: Cypher Executor

Implement multi-database Cypher executor
Support Bolt protocol (Neo4j/Memgraph)
Support Redis protocol (FalkorDB)
Handle result normalization

Success Criteria:

Execute Cypher on all target databases
Handle database-specific differences
Maintain connection pools efficiently

Step 2.6: Answer Generation

Implement answer generator component
Create prompts for answer synthesis
Format query results for LLM consumption
Generate natural language answers

Success Criteria:

Generate accurate answers from query results
Maintain context from original question
Provide clear, concise responses

Phase 3: Optimization and Robustness

Goal: Optimize performance, add caching, improve error handling, and enhance reliability.

Step 3.1: Performance Optimization

Implement embedding caching
Add query result caching
Optimize vector search with FAISS IVF indexes
Implement batch processing for embeddings

Success Criteria:

Reduce average query latency by 50%
Support 10x more concurrent requests
Maintain sub-second response times

Step 3.2: Advanced Error Handling

Implement comprehensive error recovery
Add fallback mechanisms between query paths
Create retry logic with exponential backoff
Improve error logging and diagnostics

Success Criteria:

Gracefully handle all failure scenarios
Automatic failover between backends
Detailed error reporting for debugging

Step 3.3: Monitoring and Observability

Add performance metrics collection
Implement query tracing
Create health check endpoints
Add resource usage monitoring

Success Criteria:

Track all key performance indicators
Identify bottlenecks quickly
Monitor system health in real-time

Step 3.4: Configuration Management

Implement dynamic configuration updates
Add configuration validation
Create configuration templates
Support environment-specific settings

Success Criteria:

Update configuration without restart
Validate all configuration changes
Support multiple deployment environments

Phase 4: Advanced Features

Goal: Add sophisticated capabilities for production deployment and enhanced functionality.

Step 4.1: Multi-Ontology Support

Implement ontology selection logic
Support cross-ontology queries
Handle ontology versioning
Create ontology merge capabilities

Success Criteria:

Query across multiple ontologies
Handle ontology conflicts
Support ontology evolution

Step 4.2: Intelligent Query Routing

Implement performance-based routing
Add query complexity analysis
Create adaptive routing algorithms
Support A/B testing for paths

Success Criteria:

Route queries optimally
Learn from query performance
Improve routing over time

Step 4.3: Advanced Extraction Features

Add confidence scoring for triples
Implement explanation generation
Create feedback loops for improvement
Support incremental learning

Success Criteria:

Provide confidence scores
Explain extraction decisions
Continuously improve accuracy

Step 4.4: Production Hardening

Add rate limiting
Implement authentication/authorization
Create deployment automation
Add backup and recovery

Success Criteria:

Production-ready security
Automated deployment pipeline
Disaster recovery capability

Delivery Milestones

Milestone 1 (End of Phase 1): Basic ontology-driven extraction operational
Milestone 2 (End of Phase 2): Full query system with both SPARQL and Cypher paths
Milestone 3 (End of Phase 3): Optimized, robust system ready for staging
Milestone 4 (End of Phase 4): Production-ready system with advanced features

58 KiB Raw Blame History

OntoRAG: Ontology-Based Knowledge Extraction and Query Technical Specification

Overview

Goals

Background

Technical Design

Architecture

Component Details

1. Ontology Loader

2. Ontology Embedder

3. Sentence Splitter

4. Ontology Selector

5. Prompt Constructor

6. Main Extractor Service

Configuration

Data Flow

In-Memory Vector Store

Ontology Subset Selection Algorithm

Triple Validation

Performance Considerations

Optimisation Strategies

Scalability

Error Handling

Failure Scenarios

Monitoring

Migration Path

From Existing Extractors

Ontology Development

Ontology-Sensitive Query Service

Overview

Architecture

Query Processing Pipeline

1. Question Analyser

2. Ontology Matcher for Queries

3. Backend Router

4. SPARQL Query Generation (onto-query-sparql)

5. Cypher Query Generation (onto-query-cypher)

6. SPARQL-Cassandra Query Engine (sparql-cassandra)

7. Cypher Query Executor (cypher-executor)

8. Answer Generator

Integration with Existing Services

Relationship with GraphRAG

Relationship with OntoRAG Extraction

Query Examples

Example 1: Simple Entity Query

Example 2: Relationship Query

Example 3: Aggregation Query

Configuration

Performance Optimisations

Query Optimisation

Scalability Considerations

Error Handling

Failure Scenarios

Monitoring Metrics

Future Enhancements

Security Considerations

Testing Strategy

Unit Testing

Integration Testing

Performance Testing

Delivery Plan

Overview

Phase 1: Foundation and Core Extraction

Step 1.1: Ontology Management Foundation

Step 1.2: Vector Store Implementation

Step 1.3: Ontology Embedding Pipeline

Step 1.4: Text Processing Components

Step 1.5: Ontology Selection Algorithm

Step 1.6: Basic Extraction Service

Phase 2: Query System Implementation

Step 2.1: Query Foundation Components

Step 2.2: SPARQL Path Implementation

Step 2.3: SPARQL-Cassandra Engine

Step 2.4: Cypher Path Implementation

Step 2.5: Cypher Executor

Step 2.6: Answer Generation

Phase 3: Optimization and Robustness

Step 3.1: Performance Optimization

Step 3.2: Advanced Error Handling

Step 3.3: Monitoring and Observability

58 KiB

Raw Blame History

4. SPARQL Query Generation (`onto-query-sparql`)

5. Cypher Query Generation (`onto-query-cypher`)

6. SPARQL-Cassandra Query Engine (`sparql-cassandra`)

7. Cypher Query Executor (`cypher-executor`)