trustgraph/docs/tech-specs/graphrag-performance-optimization.md

---
layout: default
title: "GraphRAG Performance Optimisation Technical Specification"
parent: "Tech Specs"
---

# GraphRAG Performance Optimisation Technical Specification

## Overview

This specification describes comprehensive performance optimisations for the GraphRAG (Graph Retrieval-Augmented Generation) algorithm in TrustGraph. The current implementation suffers from significant performance bottlenecks that limit scalability and response times. This specification addresses four primary optimisation areas:

1. **Graph Traversal Optimisation**: Eliminate inefficient recursive database queries and implement batched graph exploration
2. **Label Resolution Optimisation**: Replace sequential label fetching with parallel/batched operations
3. **Caching Strategy Enhancement**: Implement intelligent caching with LRU eviction and prefetching
4. **Query Optimisation**: Add result memoisation and embedding caching for improved response times

## Goals

- **Reduce Database Query Volume**: Achieve 50-80% reduction in total database queries through batching and caching
- **Improve Response Times**: Target 3-5x faster subgraph construction and 2-3x faster label resolution
- **Enhance Scalability**: Support larger knowledge graphs with better memory management
- **Maintain Accuracy**: Preserve existing GraphRAG functionality and result quality
- **Enable Concurrency**: Improve parallel processing capabilities for multiple concurrent requests
- **Reduce Memory Footprint**: Implement efficient data structures and memory management
- **Add Observability**: Include performance metrics and monitoring capabilities
- **Ensure Reliability**: Add proper error handling and timeout mechanisms

## Background

The current GraphRAG implementation in `trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py` exhibits several critical performance issues that severely impact system scalability:

### Current Performance Problems

**1. Inefficient Graph Traversal (`follow_edges` function, lines 79-127)**
- Makes 3 separate database queries per entity per depth level
- Query pattern: subject-based, predicate-based, and object-based queries for each entity
- No batching: Each query processes only one entity at a time
- No cycle detection: Can revisit the same nodes multiple times
- Recursive implementation without memoisation leads to exponential complexity
- Time complexity: O(entities × max_path_length × triple_limit³)

**2. Sequential Label Resolution (`get_labelgraph` function, lines 144-171)**
- Processes each triple component (subject, predicate, object) sequentially
- Each `maybe_label` call potentially triggers a database query
- No parallel execution or batching of label queries
- Results in up to 3 × subgraph_size individual database calls

**3. Primitive Caching Strategy (`maybe_label` function, lines 62-77)**
- Simple dictionary cache without size limits or TTL
- No cache eviction policy leads to unbounded memory growth
- Cache misses trigger individual database queries
- No prefetching or intelligent cache warming

**4. Suboptimal Query Patterns**
- Entity vector similarity queries not cached between similar requests
- No result memoisation for repeated query patterns
- Missing query optimisation for common access patterns

**5. Critical Object Lifetime Issues (`rag.py:96-102`)**
- **GraphRag object recreated per request**: Fresh instance created for every query, losing all cache benefits
- **Query object extremely short-lived**: Created and destroyed within single query execution (lines 201-207)
- **Label cache reset per request**: Cache warming and accumulated knowledge lost between requests
- **Client recreation overhead**: Database clients potentially re-established for each request
- **No cross-request optimisation**: Cannot benefit from query patterns or result sharing

### Performance Impact Analysis

Current worst-case scenario for a typical query:
- **Entity Retrieval**: 1 vector similarity query
- **Graph Traversal**: entities × max_path_length × 3 × triple_limit queries
- **Label Resolution**: subgraph_size × 3 individual label queries

For default parameters (50 entities, path length 2, 30 triple limit, 150 subgraph size):
- **Minimum queries**: 1 + (50 × 2 × 3 × 30) + (150 × 3) = **9,451 database queries**
- **Response time**: 15-30 seconds for moderate-sized graphs
- **Memory usage**: Unbounded cache growth over time
- **Cache effectiveness**: 0% - caches reset on every request
- **Object creation overhead**: GraphRag + Query objects created/destroyed per request

This specification addresses these gaps by implementing batched queries, intelligent caching, and parallel processing. By optimizing query patterns and data access, TrustGraph can:
- Support enterprise-scale knowledge graphs with millions of entities
- Provide sub-second response times for typical queries
- Handle hundreds of concurrent GraphRAG requests
- Scale efficiently with graph size and complexity

## Technical Design

### Architecture

The GraphRAG performance optimisation requires the following technical components:

#### 1. **Object Lifetime Architectural Refactor**
   - **Make GraphRag long-lived**: Move GraphRag instance to Processor level for persistence across requests
   - **Preserve caches**: Maintain label cache, embedding cache, and query result cache between requests
   - **Optimize Query object**: Refactor Query as lightweight execution context, not data container
   - **Connection persistence**: Maintain database client connections across requests

   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/rag.py` (modified)

#### 2. **Optimized Graph Traversal Engine**
   - Replace recursive `follow_edges` with iterative breadth-first search
   - Implement batched entity processing at each traversal level
   - Add cycle detection using visited node tracking
   - Include early termination when limits are reached

   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/optimized_traversal.py`

#### 3. **Parallel Label Resolution System**
   - Batch label queries for multiple entities simultaneously
   - Implement async/await patterns for concurrent database access
   - Add intelligent prefetching for common label patterns
   - Include label cache warming strategies

   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/label_resolver.py`

#### 4. **Conservative Label Caching Layer**
   - LRU cache with short TTL for labels only (5min) to balance performance vs consistency
   - Cache metrics and hit ratio monitoring
   - **No embedding caching**: Already cached per-query, no cross-query benefit
   - **No query result caching**: Due to graph mutation consistency concerns

   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/cache_manager.py`

#### 5. **Query Optimisation Framework**
   - Query pattern analysis and optimisation suggestions
   - Batch query coordinator for database access
   - Connection pooling and query timeout management
   - Performance monitoring and metrics collection

   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/query_optimizer.py`

### Data Models

#### Optimized Graph Traversal State

The traversal engine maintains state to avoid redundant operations:

```python
@dataclass
class TraversalState:
    visited_entities: Set[str]
    current_level_entities: Set[str]
    next_level_entities: Set[str]
    subgraph: Set[Tuple[str, str, str]]
    depth: int
    query_batch: List[TripleQuery]
```

This approach allows:
- Efficient cycle detection through visited entity tracking
- Batched query preparation at each traversal level
- Memory-efficient state management
- Early termination when size limits are reached

#### Enhanced Cache Structure

```python
@dataclass
class CacheEntry:
    value: Any
    timestamp: float
    access_count: int
    ttl: Optional[float]

class CacheManager:
    label_cache: LRUCache[str, CacheEntry]
    embedding_cache: LRUCache[str, CacheEntry]
    query_result_cache: LRUCache[str, CacheEntry]
    cache_stats: CacheStatistics
```

#### Batch Query Structures

```python
@dataclass
class BatchTripleQuery:
    entities: List[str]
    query_type: QueryType  # SUBJECT, PREDICATE, OBJECT
    limit_per_entity: int

@dataclass
class BatchLabelQuery:
    entities: List[str]
    predicate: str = LABEL
```

### APIs

#### New APIs:

**GraphTraversal API**
```python
async def optimized_follow_edges_batch(
    entities: List[str],
    max_depth: int,
    triple_limit: int,
    max_subgraph_size: int
) -> Set[Tuple[str, str, str]]
```

**Batch Label Resolution API**
```python
async def resolve_labels_batch(
    entities: List[str],
    cache_manager: CacheManager
) -> Dict[str, str]
```

**Cache Management API**
```python
class CacheManager:
    async def get_or_fetch_label(self, entity: str) -> str
    async def get_or_fetch_embeddings(self, query: str) -> List[float]
    async def cache_query_result(self, query_hash: str, result: Any, ttl: int)
    def get_cache_statistics(self) -> CacheStatistics
```

#### Modified APIs:

**GraphRag.query()** - Enhanced with performance optimisations:
- Add cache_manager parameter for cache control
- Include performance_metrics return value
- Add query_timeout parameter for reliability

**Query class** - Refactored for batch processing:
- Replace individual entity processing with batch operations
- Add async context managers for resource cleanup
- Include progress callbacks for long-running operations

### Implementation Details

#### Phase 0: Critical Architectural Lifetime Refactor

**Current Problematic Implementation:**
```python
# INEFFICIENT: GraphRag recreated every request
class Processor(FlowProcessor):
    async def on_request(self, msg, consumer, flow):
        # PROBLEM: New GraphRag instance per request!
        self.rag = GraphRag(
            embeddings_client = flow("embeddings-request"),
            graph_embeddings_client = flow("graph-embeddings-request"),
            triples_client = flow("triples-request"),
            prompt_client = flow("prompt-request"),
            verbose=True,
        )
        # Cache starts empty every time - no benefit from previous requests
        response = await self.rag.query(...)

# VERY SHORT-LIVED: Query object created/destroyed per request
class GraphRag:
    async def query(self, query, user="trustgraph", collection="default", ...):
        q = Query(rag=self, user=user, collection=collection, ...)  # Created
        kg = await q.get_labelgraph(query)  # Used briefly
        # q automatically destroyed when function exits
```

**Optimized Long-Lived Architecture:**
```python
class Processor(FlowProcessor):
    def __init__(self, **params):
        super().__init__(**params)
        self.rag_instance = None  # Will be initialized once
        self.client_connections = {}

    async def initialize_rag(self, flow):
        """Initialize GraphRag once, reuse for all requests"""
        if self.rag_instance is None:
            self.rag_instance = LongLivedGraphRag(
                embeddings_client=flow("embeddings-request"),
                graph_embeddings_client=flow("graph-embeddings-request"),
                triples_client=flow("triples-request"),
                prompt_client=flow("prompt-request"),
                verbose=True,
            )
        return self.rag_instance

    async def on_request(self, msg, consumer, flow):
        # REUSE the same GraphRag instance - caches persist!
        rag = await self.initialize_rag(flow)

        # Query object becomes lightweight execution context
        response = await rag.query_with_context(
            query=v.query,
            execution_context=QueryContext(
                user=v.user,
                collection=v.collection,
                entity_limit=entity_limit,
                # ... other params
            )
        )

class LongLivedGraphRag:
    def __init__(self, ...):
        # CONSERVATIVE caches - balance performance vs consistency
        self.label_cache = LRUCacheWithTTL(max_size=5000, ttl=300)  # 5min TTL for freshness
        # Note: No embedding cache - already cached per-query, no cross-query benefit
        # Note: No query result cache due to consistency concerns
        self.performance_metrics = PerformanceTracker()

    async def query_with_context(self, query: str, context: QueryContext):
        # Use lightweight QueryExecutor instead of heavyweight Query object
        executor = QueryExecutor(self, context)  # Minimal object
        return await executor.execute(query)

@dataclass
class QueryContext:
    """Lightweight execution context - no heavy operations"""
    user: str
    collection: str
    entity_limit: int
    triple_limit: int
    max_subgraph_size: int
    max_path_length: int

class QueryExecutor:
    """Lightweight execution context - replaces old Query class"""
    def __init__(self, rag: LongLivedGraphRag, context: QueryContext):
        self.rag = rag
        self.context = context
        # No heavy initialization - just references

    async def execute(self, query: str):
        # All heavy lifting uses persistent rag caches
        return await self.rag.execute_optimized_query(query, self.context)
```

This architectural change provides:
- **10-20% database query reduction** for graphs with common relationships (vs 0% currently)
- **Eliminated object creation overhead** for every request
- **Persistent connection pooling** and client reuse
- **Cross-request optimization** within cache TTL windows

**Important Cache Consistency Limitation:**
Long-term caching introduces staleness risk when entities/labels are deleted or modified in the underlying graph. The LRU cache with TTL provides a balance between performance gains and data freshness, but cannot detect real-time graph changes.

#### Phase 1: Graph Traversal Optimisation

**Current Implementation Problems:**
```python
# INEFFICIENT: 3 queries per entity per level
async def follow_edges(self, ent, subgraph, path_length):
    # Query 1: s=ent, p=None, o=None
    res = await self.rag.triples_client.query(s=ent, p=None, o=None, limit=self.triple_limit)
    # Query 2: s=None, p=ent, o=None
    res = await self.rag.triples_client.query(s=None, p=ent, o=None, limit=self.triple_limit)
    # Query 3: s=None, p=None, o=ent
    res = await self.rag.triples_client.query(s=None, p=None, o=ent, limit=self.triple_limit)
```

**Optimized Implementation:**
```python
async def optimized_traversal(self, entities: List[str], max_depth: int) -> Set[Triple]:
    visited = set()
    current_level = set(entities)
    subgraph = set()

    for depth in range(max_depth):
        if not current_level or len(subgraph) >= self.max_subgraph_size:
            break

        # Batch all queries for current level
        batch_queries = []
        for entity in current_level:
            if entity not in visited:
                batch_queries.extend([
                    TripleQuery(s=entity, p=None, o=None),
                    TripleQuery(s=None, p=entity, o=None),
                    TripleQuery(s=None, p=None, o=entity)
                ])

        # Execute all queries concurrently
        results = await self.execute_batch_queries(batch_queries)

        # Process results and prepare next level
        next_level = set()
        for result in results:
            subgraph.update(result.triples)
            next_level.update(result.new_entities)

        visited.update(current_level)
        current_level = next_level - visited

    return subgraph
```

#### Phase 2: Parallel Label Resolution

**Current Sequential Implementation:**
```python
# INEFFICIENT: Sequential processing
for edge in subgraph:
    s = await self.maybe_label(edge[0])  # Individual query
    p = await self.maybe_label(edge[1])  # Individual query
    o = await self.maybe_label(edge[2])  # Individual query
```

**Optimized Parallel Implementation:**
```python
async def resolve_labels_parallel(self, subgraph: List[Triple]) -> List[Triple]:
    # Collect all unique entities needing labels
    entities_to_resolve = set()
    for s, p, o in subgraph:
        entities_to_resolve.update([s, p, o])

    # Remove already cached entities
    uncached_entities = [e for e in entities_to_resolve if e not in self.label_cache]

    # Batch query for all uncached labels
    if uncached_entities:
        label_results = await self.batch_label_query(uncached_entities)
        self.label_cache.update(label_results)

    # Apply labels to subgraph
    return [
        (self.label_cache.get(s, s), self.label_cache.get(p, p), self.label_cache.get(o, o))
        for s, p, o in subgraph
    ]
```

#### Phase 3: Advanced Caching Strategy

**LRU Cache with TTL:**
```python
class LRUCacheWithTTL:
    def __init__(self, max_size: int, default_ttl: int = 3600):
        self.cache = OrderedDict()
        self.max_size = max_size
        self.default_ttl = default_ttl
        self.access_times = {}

    async def get(self, key: str) -> Optional[Any]:
        if key in self.cache:
            # Check TTL expiration
            if time.time() - self.access_times[key] > self.default_ttl:
                del self.cache[key]
                del self.access_times[key]
                return None

            # Move to end (most recently used)
            self.cache.move_to_end(key)
            return self.cache[key]
        return None

    async def put(self, key: str, value: Any):
        if key in self.cache:
            self.cache.move_to_end(key)
        else:
            if len(self.cache) >= self.max_size:
                # Remove least recently used
                oldest_key = next(iter(self.cache))
                del self.cache[oldest_key]
                del self.access_times[oldest_key]

        self.cache[key] = value
        self.access_times[key] = time.time()
```

#### Phase 4: Query Optimisation and Monitoring

**Performance Metrics Collection:**
```python
@dataclass
class PerformanceMetrics:
    total_queries: int
    cache_hits: int
    cache_misses: int
    avg_response_time: float
    subgraph_construction_time: float
    label_resolution_time: float
    total_entities_processed: int
    memory_usage_mb: float
```

**Query Timeout and Circuit Breaker:**
```python
async def execute_with_timeout(self, query_func, timeout: int = 30):
    try:
        return await asyncio.wait_for(query_func(), timeout=timeout)
    except asyncio.TimeoutError:
        logger.error(f"Query timeout after {timeout}s")
        raise GraphRagTimeoutError(f"Query exceeded timeout of {timeout}s")
```

## Cache Consistency Considerations

**Data Staleness Trade-offs:**
- **Label cache (5min TTL)**: Risk of serving deleted/renamed entity labels
- **No embedding caching**: Not needed - embeddings already cached per-query
- **No result caching**: Prevents stale subgraph results from deleted entities/relationships

**Mitigation Strategies:**
- **Conservative TTL values**: Balance performance gains (10-20%) with data freshness
- **Cache invalidation hooks**: Optional integration with graph mutation events
- **Monitoring dashboards**: Track cache hit rates vs staleness incidents
- **Configurable cache policies**: Allow per-deployment tuning based on mutation frequency

**Recommended Cache Configuration by Graph Mutation Rate:**
- **High mutation (>100 changes/hour)**: TTL=60s, smaller cache sizes
- **Medium mutation (10-100 changes/hour)**: TTL=300s (default)
- **Low mutation (<10 changes/hour)**: TTL=600s, larger cache sizes

## Security Considerations

**Query Injection Prevention:**
- Validate all entity identifiers and query parameters
- Use parameterized queries for all database interactions
- Implement query complexity limits to prevent DoS attacks

**Resource Protection:**
- Enforce maximum subgraph size limits
- Implement query timeouts to prevent resource exhaustion
- Add memory usage monitoring and limits

**Access Control:**
- Maintain existing user and collection isolation
- Add audit logging for performance-impacting operations
- Implement rate limiting for expensive operations

## Performance Considerations

### Expected Performance Improvements

**Query Reduction:**
- Current: ~9,000+ queries for typical request
- Optimized: ~50-100 batched queries (98% reduction)

**Response Time Improvements:**
- Graph traversal: 15-20s → 3-5s (4-5x faster)
- Label resolution: 8-12s → 2-4s (3x faster)
- Overall query: 25-35s → 6-10s (3-4x improvement)

**Memory Efficiency:**
- Bounded cache sizes prevent memory leaks
- Efficient data structures reduce memory footprint by ~40%
- Better garbage collection through proper resource cleanup

**Realistic Performance Expectations:**
- **Label cache**: 10-20% query reduction for graphs with common relationships
- **Batching optimization**: 50-80% query reduction (primary optimization)
- **Object lifetime optimization**: Eliminate per-request creation overhead
- **Overall improvement**: 3-4x response time improvement primarily from batching

**Scalability Improvements:**
- Support for 3-5x larger knowledge graphs (limited by cache consistency needs)
- 3-5x higher concurrent request capacity
- Better resource utilization through connection reuse

### Performance Monitoring

**Real-time Metrics:**
- Query execution times by operation type
- Cache hit ratios and effectiveness
- Database connection pool utilisation
- Memory usage and garbage collection impact

**Performance Benchmarking:**
- Automated performance regression testing
- Load testing with realistic data volumes
- Comparison benchmarks against current implementation

## Testing Strategy

### Unit Testing
- Individual component testing for traversal, caching, and label resolution
- Mock database interactions for performance testing
- Cache eviction and TTL expiration testing
- Error handling and timeout scenarios

### Integration Testing
- End-to-end GraphRAG query testing with optimisations
- Database interaction testing with real data
- Concurrent request handling and resource management
- Memory leak detection and resource cleanup verification

### Performance Testing
- Benchmark testing against current implementation
- Load testing with varying graph sizes and complexities
- Stress testing for memory and connection limits
- Regression testing for performance improvements

### Compatibility Testing
- Verify existing GraphRAG API compatibility
- Test with various graph database backends
- Validate result accuracy compared to current implementation

## Implementation Plan

### Direct Implementation Approach
Since APIs are allowed to change, implement optimizations directly without migration complexity:

1. **Replace `follow_edges` method**: Rewrite with iterative batched traversal
2. **Optimize `get_labelgraph`**: Implement parallel label resolution
3. **Add long-lived GraphRag**: Modify Processor to maintain persistent instance
4. **Implement label caching**: Add LRU cache with TTL to GraphRag class

### Scope of Changes
- **Query class**: Replace ~50 lines in `follow_edges`, add ~30 lines batch handling
- **GraphRag class**: Add caching layer (~40 lines)
- **Processor class**: Modify to use persistent GraphRag instance (~20 lines)
- **Total**: ~140 lines of focused changes, mostly within existing classes

## Timeline

**Week 1: Core Implementation**
- Replace `follow_edges` with batched iterative traversal
- Implement parallel label resolution in `get_labelgraph`
- Add long-lived GraphRag instance to Processor
- Implement label caching layer

**Week 2: Testing and Integration**
- Unit tests for new traversal and caching logic
- Performance benchmarking against current implementation
- Integration testing with real graph data
- Code review and optimization

**Week 3: Deployment**
- Deploy optimized implementation
- Monitor performance improvements
- Fine-tune cache TTL and batch sizes based on real usage

## Open Questions

- **Database Connection Pooling**: Should we implement custom connection pooling or rely on existing database client pooling?
- **Cache Persistence**: Should label and embedding caches persist across service restarts?
- **Distributed Caching**: For multi-instance deployments, should we implement distributed caching with Redis/Memcached?
- **Query Result Format**: Should we optimize the internal triple representation for better memory efficiency?
- **Monitoring Integration**: Which metrics should be exposed to existing monitoring systems (Prometheus, etc.)?

## References

- [GraphRAG Original Implementation](trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py)
- [TrustGraph Architecture Principles](architecture-principles.md)
- [Collection Management Specification](collection-management.md)
-												Feat: TrustGraph i18n & Documentation Translation Updates (#781)

Native CLI i18n: The TrustGraph CLI has built-in translation support
that dynamically loads language strings. You can test and use
different languages by simply passing the --lang flag (e.g., --lang
es for Spanish, --lang ru for Russian) or by configuring your
environment's LANG variable.

Automated Docs Translations: This PR introduces autonomously
translated Markdown documentation into several target languages,
including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew,
Arabic, Simplified Chinese, and Russian.
											
										
										
											2026-04-14 07:07:58 -04:00
+								---
 								layout: default
 								title: "GraphRAG Performance Optimisation Technical Specification"
 								parent: "Tech Specs"
 								---
-												release/v1.4 -> master (#548)


											
										
										
											2025-10-06 17:54:26 +01:00
+								# GraphRAG Performance Optimisation Technical Specification
 								## Overview
 								This specification describes comprehensive performance optimisations for the GraphRAG (Graph Retrieval-Augmented Generation) algorithm in TrustGraph. The current implementation suffers from significant performance bottlenecks that limit scalability and response times. This specification addresses four primary optimisation areas:
 . **Graph Traversal Optimisation**: Eliminate inefficient recursive database queries and implement batched graph exploration
 . **Label Resolution Optimisation**: Replace sequential label fetching with parallel/batched operations
 . **Caching Strategy Enhancement**: Implement intelligent caching with LRU eviction and prefetching
 . **Query Optimisation**: Add result memoisation and embedding caching for improved response times
 								## Goals
 								- **Reduce Database Query Volume**: Achieve 50-80% reduction in total database queries through batching and caching
 								- **Improve Response Times**: Target 3-5x faster subgraph construction and 2-3x faster label resolution
 								- **Enhance Scalability**: Support larger knowledge graphs with better memory management
 								- **Maintain Accuracy**: Preserve existing GraphRAG functionality and result quality
 								- **Enable Concurrency**: Improve parallel processing capabilities for multiple concurrent requests
 								- **Reduce Memory Footprint**: Implement efficient data structures and memory management
 								- **Add Observability**: Include performance metrics and monitoring capabilities
 								- **Ensure Reliability**: Add proper error handling and timeout mechanisms
 								## Background
 								The current GraphRAG implementation in `trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py` exhibits several critical performance issues that severely impact system scalability:
 								### Current Performance Problems
 								**1. Inefficient Graph Traversal (`follow_edges` function, lines 79-127)**
 								- Makes 3 separate database queries per entity per depth level
 								- Query pattern: subject-based, predicate-based, and object-based queries for each entity
 								- No batching: Each query processes only one entity at a time
 								- No cycle detection: Can revisit the same nodes multiple times
 								- Recursive implementation without memoisation leads to exponential complexity
 								- Time complexity: O(entities × max_path_length × triple_limit³)
 								**2. Sequential Label Resolution (`get_labelgraph` function, lines 144-171)**
 								- Processes each triple component (subject, predicate, object) sequentially
 								- Each `maybe_label` call potentially triggers a database query
 								- No parallel execution or batching of label queries
 								- Results in up to 3 × subgraph_size individual database calls
 								**3. Primitive Caching Strategy (`maybe_label` function, lines 62-77)**
 								- Simple dictionary cache without size limits or TTL
 								- No cache eviction policy leads to unbounded memory growth
 								- Cache misses trigger individual database queries
 								- No prefetching or intelligent cache warming
 								**4. Suboptimal Query Patterns**
 								- Entity vector similarity queries not cached between similar requests
 								- No result memoisation for repeated query patterns
 								- Missing query optimisation for common access patterns
 								**5. Critical Object Lifetime Issues (`rag.py:96-102`)**
 								- **GraphRag object recreated per request**: Fresh instance created for every query, losing all cache benefits
 								- **Query object extremely short-lived**: Created and destroyed within single query execution (lines 201-207)
 								- **Label cache reset per request**: Cache warming and accumulated knowledge lost between requests
 								- **Client recreation overhead**: Database clients potentially re-established for each request
 								- **No cross-request optimisation**: Cannot benefit from query patterns or result sharing
 								### Performance Impact Analysis
 								Current worst-case scenario for a typical query:
 								- **Entity Retrieval**: 1 vector similarity query
 								- **Graph Traversal**: entities × max_path_length × 3 × triple_limit queries
 								- **Label Resolution**: subgraph_size × 3 individual label queries
 								For default parameters (50 entities, path length 2, 30 triple limit, 150 subgraph size):
 								- **Minimum queries**: 1 + (50 × 2 × 3 × 30) + (150 × 3) = **9,451 database queries**
 								- **Response time**: 15-30 seconds for moderate-sized graphs
 								- **Memory usage**: Unbounded cache growth over time
 								- **Cache effectiveness**: 0% - caches reset on every request
 								- **Object creation overhead**: GraphRag + Query objects created/destroyed per request
 								This specification addresses these gaps by implementing batched queries, intelligent caching, and parallel processing. By optimizing query patterns and data access, TrustGraph can:
 								- Support enterprise-scale knowledge graphs with millions of entities
 								- Provide sub-second response times for typical queries
 								- Handle hundreds of concurrent GraphRAG requests
 								- Scale efficiently with graph size and complexity
 								## Technical Design
 								### Architecture
 								The GraphRAG performance optimisation requires the following technical components:
 								#### 1. **Object Lifetime Architectural Refactor**
 								   - **Make GraphRag long-lived**: Move GraphRag instance to Processor level for persistence across requests
 								   - **Preserve caches**: Maintain label cache, embedding cache, and query result cache between requests
 								   - **Optimize Query object**: Refactor Query as lightweight execution context, not data container
 								   - **Connection persistence**: Maintain database client connections across requests
 								   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/rag.py` (modified)
 								#### 2. **Optimized Graph Traversal Engine**
 								   - Replace recursive `follow_edges` with iterative breadth-first search
 								   - Implement batched entity processing at each traversal level
 								   - Add cycle detection using visited node tracking
 								   - Include early termination when limits are reached
 								   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/optimized_traversal.py`
 								#### 3. **Parallel Label Resolution System**
 								   - Batch label queries for multiple entities simultaneously
 								   - Implement async/await patterns for concurrent database access
 								   - Add intelligent prefetching for common label patterns
 								   - Include label cache warming strategies
 								   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/label_resolver.py`
 								#### 4. **Conservative Label Caching Layer**
 								   - LRU cache with short TTL for labels only (5min) to balance performance vs consistency
 								   - Cache metrics and hit ratio monitoring
 								   - **No embedding caching**: Already cached per-query, no cross-query benefit
 								   - **No query result caching**: Due to graph mutation consistency concerns
 								   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/cache_manager.py`
 								#### 5. **Query Optimisation Framework**
 								   - Query pattern analysis and optimisation suggestions
 								   - Batch query coordinator for database access
 								   - Connection pooling and query timeout management
 								   - Performance monitoring and metrics collection
 								   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/query_optimizer.py`
 								### Data Models
 								#### Optimized Graph Traversal State
 								The traversal engine maintains state to avoid redundant operations:
 								```python
 								@dataclass
 								class TraversalState:
 								    visited_entities: Set[str]
 								    current_level_entities: Set[str]
 								    next_level_entities: Set[str]
 								    subgraph: Set[Tuple[str, str, str]]
 								    depth: int
 								    query_batch: List[TripleQuery]
 								```
 								This approach allows:
 								- Efficient cycle detection through visited entity tracking
 								- Batched query preparation at each traversal level
 								- Memory-efficient state management
 								- Early termination when size limits are reached
 								#### Enhanced Cache Structure
 								```python
 								@dataclass
 								class CacheEntry:
 								    value: Any
 								    timestamp: float
 								    access_count: int
 								    ttl: Optional[float]
 								class CacheManager:
 								    label_cache: LRUCache[str, CacheEntry]
 								    embedding_cache: LRUCache[str, CacheEntry]
 								    query_result_cache: LRUCache[str, CacheEntry]
 								    cache_stats: CacheStatistics
 								```
 								#### Batch Query Structures
 								```python
 								@dataclass
 								class BatchTripleQuery:
 								    entities: List[str]
 								    query_type: QueryType  # SUBJECT, PREDICATE, OBJECT
 								    limit_per_entity: int
 								@dataclass
 								class BatchLabelQuery:
 								    entities: List[str]
 								    predicate: str = LABEL
 								```
 								### APIs
 								#### New APIs:
 								**GraphTraversal API**
 								```python
 								async def optimized_follow_edges_batch(
 								    entities: List[str],
 								    max_depth: int,
 								    triple_limit: int,
 								    max_subgraph_size: int
 								) -> Set[Tuple[str, str, str]]
 								```
 								**Batch Label Resolution API**
 								```python
 								async def resolve_labels_batch(
 								    entities: List[str],
 								    cache_manager: CacheManager
 								) -> Dict[str, str]
 								```
 								**Cache Management API**
 								```python
 								class CacheManager:
 								    async def get_or_fetch_label(self, entity: str) -> str
 								    async def get_or_fetch_embeddings(self, query: str) -> List[float]
 								    async def cache_query_result(self, query_hash: str, result: Any, ttl: int)
 								    def get_cache_statistics(self) -> CacheStatistics
 								```
 								#### Modified APIs:
 								**GraphRag.query()** - Enhanced with performance optimisations:
 								- Add cache_manager parameter for cache control
 								- Include performance_metrics return value
 								- Add query_timeout parameter for reliability
 								**Query class** - Refactored for batch processing:
 								- Replace individual entity processing with batch operations
 								- Add async context managers for resource cleanup
 								- Include progress callbacks for long-running operations
 								### Implementation Details
 								#### Phase 0: Critical Architectural Lifetime Refactor
 								**Current Problematic Implementation:**
 								```python
 								# INEFFICIENT: GraphRag recreated every request
 								class Processor(FlowProcessor):
 								    async def on_request(self, msg, consumer, flow):
 								        # PROBLEM: New GraphRag instance per request!
 								        self.rag = GraphRag(
 								            embeddings_client = flow("embeddings-request"),
 								            graph_embeddings_client = flow("graph-embeddings-request"),
 								            triples_client = flow("triples-request"),
 								            prompt_client = flow("prompt-request"),
 								            verbose=True,
 								        )
 								        # Cache starts empty every time - no benefit from previous requests
 								        response = await self.rag.query(...)
 								# VERY SHORT-LIVED: Query object created/destroyed per request
 								class GraphRag:
 								    async def query(self, query, user="trustgraph", collection="default", ...):
 								        q = Query(rag=self, user=user, collection=collection, ...)  # Created
 								        kg = await q.get_labelgraph(query)  # Used briefly
 								        # q automatically destroyed when function exits
 								```
 								**Optimized Long-Lived Architecture:**
 								```python
 								class Processor(FlowProcessor):
 								    def __init__(self, **params):
 								        super().__init__(**params)
 								        self.rag_instance = None  # Will be initialized once
 								        self.client_connections = {}
 								    async def initialize_rag(self, flow):
 								        """Initialize GraphRag once, reuse for all requests"""
 								        if self.rag_instance is None:
 								            self.rag_instance = LongLivedGraphRag(
 								                embeddings_client=flow("embeddings-request"),
 								                graph_embeddings_client=flow("graph-embeddings-request"),
 								                triples_client=flow("triples-request"),
 								                prompt_client=flow("prompt-request"),
 								                verbose=True,
 								            )
 								        return self.rag_instance
 								    async def on_request(self, msg, consumer, flow):
 								        # REUSE the same GraphRag instance - caches persist!
 								        rag = await self.initialize_rag(flow)
 								        # Query object becomes lightweight execution context
 								        response = await rag.query_with_context(
 								            query=v.query,
 								            execution_context=QueryContext(
 								                user=v.user,
 								                collection=v.collection,
 								                entity_limit=entity_limit,
 								                # ... other params
 								            )
 								        )
 								class LongLivedGraphRag:
 								    def __init__(self, ...):
 								        # CONSERVATIVE caches - balance performance vs consistency
 								        self.label_cache = LRUCacheWithTTL(max_size=5000, ttl=300)  # 5min TTL for freshness
 								        # Note: No embedding cache - already cached per-query, no cross-query benefit
 								        # Note: No query result cache due to consistency concerns
 								        self.performance_metrics = PerformanceTracker()
 								    async def query_with_context(self, query: str, context: QueryContext):
 								        # Use lightweight QueryExecutor instead of heavyweight Query object
 								        executor = QueryExecutor(self, context)  # Minimal object
 								        return await executor.execute(query)
 								@dataclass
 								class QueryContext:
 								    """Lightweight execution context - no heavy operations"""
 								    user: str
 								    collection: str
 								    entity_limit: int
 								    triple_limit: int
 								    max_subgraph_size: int
 								    max_path_length: int
 								class QueryExecutor:
 								    """Lightweight execution context - replaces old Query class"""
 								    def __init__(self, rag: LongLivedGraphRag, context: QueryContext):
 								        self.rag = rag
 								        self.context = context
 								        # No heavy initialization - just references
 								    async def execute(self, query: str):
 								        # All heavy lifting uses persistent rag caches
 								        return await self.rag.execute_optimized_query(query, self.context)
 								```
 								This architectural change provides:
 								- **10-20% database query reduction** for graphs with common relationships (vs 0% currently)
 								- **Eliminated object creation overhead** for every request
 								- **Persistent connection pooling** and client reuse
 								- **Cross-request optimization** within cache TTL windows
 								**Important Cache Consistency Limitation:**
 								Long-term caching introduces staleness risk when entities/labels are deleted or modified in the underlying graph. The LRU cache with TTL provides a balance between performance gains and data freshness, but cannot detect real-time graph changes.
 								#### Phase 1: Graph Traversal Optimisation
 								**Current Implementation Problems:**
 								```python
 								# INEFFICIENT: 3 queries per entity per level
 								async def follow_edges(self, ent, subgraph, path_length):
 								    # Query 1: s=ent, p=None, o=None
 								    res = await self.rag.triples_client.query(s=ent, p=None, o=None, limit=self.triple_limit)
 								    # Query 2: s=None, p=ent, o=None
 								    res = await self.rag.triples_client.query(s=None, p=ent, o=None, limit=self.triple_limit)
 								    # Query 3: s=None, p=None, o=ent
 								    res = await self.rag.triples_client.query(s=None, p=None, o=ent, limit=self.triple_limit)
 								```
 								**Optimized Implementation:**
 								```python
 								async def optimized_traversal(self, entities: List[str], max_depth: int) -> Set[Triple]:
 								    visited = set()
 								    current_level = set(entities)
 								    subgraph = set()
 								    for depth in range(max_depth):
 								        if not current_level or len(subgraph) >= self.max_subgraph_size:
 								            break
 								        # Batch all queries for current level
 								        batch_queries = []
 								        for entity in current_level:
 								            if entity not in visited:
 								                batch_queries.extend([
 								                    TripleQuery(s=entity, p=None, o=None),
 								                    TripleQuery(s=None, p=entity, o=None),
 								                    TripleQuery(s=None, p=None, o=entity)
 								                ])
 								        # Execute all queries concurrently
 								        results = await self.execute_batch_queries(batch_queries)
 								        # Process results and prepare next level
 								        next_level = set()
 								        for result in results:
 								            subgraph.update(result.triples)
 								            next_level.update(result.new_entities)
 								        visited.update(current_level)
 								        current_level = next_level - visited
 								    return subgraph
 								```
 								#### Phase 2: Parallel Label Resolution
 								**Current Sequential Implementation:**
 								```python
 								# INEFFICIENT: Sequential processing
 								for edge in subgraph:
 								    s = await self.maybe_label(edge[0])  # Individual query
 								    p = await self.maybe_label(edge[1])  # Individual query
 								    o = await self.maybe_label(edge[2])  # Individual query
 								```
 								**Optimized Parallel Implementation:**
 								```python
 								async def resolve_labels_parallel(self, subgraph: List[Triple]) -> List[Triple]:
 								    # Collect all unique entities needing labels
 								    entities_to_resolve = set()
 								    for s, p, o in subgraph:
 								        entities_to_resolve.update([s, p, o])
 								    # Remove already cached entities
 								    uncached_entities = [e for e in entities_to_resolve if e not in self.label_cache]
 								    # Batch query for all uncached labels
 								    if uncached_entities:
 								        label_results = await self.batch_label_query(uncached_entities)
 								        self.label_cache.update(label_results)
 								    # Apply labels to subgraph
 								    return [
 								        (self.label_cache.get(s, s), self.label_cache.get(p, p), self.label_cache.get(o, o))
 								        for s, p, o in subgraph
 								    ]
 								```
 								#### Phase 3: Advanced Caching Strategy
 								**LRU Cache with TTL:**
 								```python
 								class LRUCacheWithTTL:
 								    def __init__(self, max_size: int, default_ttl: int = 3600):
 								        self.cache = OrderedDict()
 								        self.max_size = max_size
 								        self.default_ttl = default_ttl
 								        self.access_times = {}
 								    async def get(self, key: str) -> Optional[Any]:
 								        if key in self.cache:
 								            # Check TTL expiration
 								            if time.time() - self.access_times[key] > self.default_ttl:
 								                del self.cache[key]
 								                del self.access_times[key]
 								                return None
 								            # Move to end (most recently used)
 								            self.cache.move_to_end(key)
 								            return self.cache[key]
 								        return None
 								    async def put(self, key: str, value: Any):
 								        if key in self.cache:
 								            self.cache.move_to_end(key)
 								        else:
 								            if len(self.cache) >= self.max_size:
 								                # Remove least recently used
 								                oldest_key = next(iter(self.cache))
 								                del self.cache[oldest_key]
 								                del self.access_times[oldest_key]
 								        self.cache[key] = value
 								        self.access_times[key] = time.time()
 								```
 								#### Phase 4: Query Optimisation and Monitoring
 								**Performance Metrics Collection:**
 								```python
 								@dataclass
 								class PerformanceMetrics:
 								    total_queries: int
 								    cache_hits: int
 								    cache_misses: int
 								    avg_response_time: float
 								    subgraph_construction_time: float
 								    label_resolution_time: float
 								    total_entities_processed: int
 								    memory_usage_mb: float
 								```
 								**Query Timeout and Circuit Breaker:**
 								```python
 								async def execute_with_timeout(self, query_func, timeout: int = 30):
 								    try:
 								        return await asyncio.wait_for(query_func(), timeout=timeout)
 								    except asyncio.TimeoutError:
 								        logger.error(f"Query timeout after {timeout}s")
 								        raise GraphRagTimeoutError(f"Query exceeded timeout of {timeout}s")
 								```
 								## Cache Consistency Considerations
 								**Data Staleness Trade-offs:**
 								- **Label cache (5min TTL)**: Risk of serving deleted/renamed entity labels
 								- **No embedding caching**: Not needed - embeddings already cached per-query
 								- **No result caching**: Prevents stale subgraph results from deleted entities/relationships
 								**Mitigation Strategies:**
 								- **Conservative TTL values**: Balance performance gains (10-20%) with data freshness
 								- **Cache invalidation hooks**: Optional integration with graph mutation events
 								- **Monitoring dashboards**: Track cache hit rates vs staleness incidents
 								- **Configurable cache policies**: Allow per-deployment tuning based on mutation frequency
 								**Recommended Cache Configuration by Graph Mutation Rate:**
 								- **High mutation (>100 changes/hour)**: TTL=60s, smaller cache sizes
 								- **Medium mutation (10-100 changes/hour)**: TTL=300s (default)
 								- **Low mutation (<10 changes/hour)**: TTL=600s, larger cache sizes
 								## Security Considerations
 								**Query Injection Prevention:**
 								- Validate all entity identifiers and query parameters
 								- Use parameterized queries for all database interactions
 								- Implement query complexity limits to prevent DoS attacks
 								**Resource Protection:**
 								- Enforce maximum subgraph size limits
 								- Implement query timeouts to prevent resource exhaustion
 								- Add memory usage monitoring and limits
 								**Access Control:**
 								- Maintain existing user and collection isolation
 								- Add audit logging for performance-impacting operations
 								- Implement rate limiting for expensive operations
 								## Performance Considerations
 								### Expected Performance Improvements
 								**Query Reduction:**
 								- Current: ~9,000+ queries for typical request
 								- Optimized: ~50-100 batched queries (98% reduction)
 								**Response Time Improvements:**
 								- Graph traversal: 15-20s → 3-5s (4-5x faster)
 								- Label resolution: 8-12s → 2-4s (3x faster)
 								- Overall query: 25-35s → 6-10s (3-4x improvement)
 								**Memory Efficiency:**
 								- Bounded cache sizes prevent memory leaks
 								- Efficient data structures reduce memory footprint by ~40%
 								- Better garbage collection through proper resource cleanup
 								**Realistic Performance Expectations:**
 								- **Label cache**: 10-20% query reduction for graphs with common relationships
 								- **Batching optimization**: 50-80% query reduction (primary optimization)
 								- **Object lifetime optimization**: Eliminate per-request creation overhead
 								- **Overall improvement**: 3-4x response time improvement primarily from batching
 								**Scalability Improvements:**
 								- Support for 3-5x larger knowledge graphs (limited by cache consistency needs)
 								- 3-5x higher concurrent request capacity
 								- Better resource utilization through connection reuse
 								### Performance Monitoring
 								**Real-time Metrics:**
 								- Query execution times by operation type
 								- Cache hit ratios and effectiveness
 								- Database connection pool utilisation
 								- Memory usage and garbage collection impact
 								**Performance Benchmarking:**
 								- Automated performance regression testing
 								- Load testing with realistic data volumes
 								- Comparison benchmarks against current implementation
 								## Testing Strategy
 								### Unit Testing
 								- Individual component testing for traversal, caching, and label resolution
 								- Mock database interactions for performance testing
 								- Cache eviction and TTL expiration testing
 								- Error handling and timeout scenarios
 								### Integration Testing
 								- End-to-end GraphRAG query testing with optimisations
 								- Database interaction testing with real data
 								- Concurrent request handling and resource management
 								- Memory leak detection and resource cleanup verification
 								### Performance Testing
 								- Benchmark testing against current implementation
 								- Load testing with varying graph sizes and complexities
 								- Stress testing for memory and connection limits
 								- Regression testing for performance improvements
 								### Compatibility Testing
 								- Verify existing GraphRAG API compatibility
 								- Test with various graph database backends
 								- Validate result accuracy compared to current implementation
 								## Implementation Plan
 								### Direct Implementation Approach
 								Since APIs are allowed to change, implement optimizations directly without migration complexity:
 . **Replace `follow_edges` method**: Rewrite with iterative batched traversal
 . **Optimize `get_labelgraph`**: Implement parallel label resolution
 . **Add long-lived GraphRag**: Modify Processor to maintain persistent instance
 . **Implement label caching**: Add LRU cache with TTL to GraphRag class
 								### Scope of Changes
 								- **Query class**: Replace ~50 lines in `follow_edges`, add ~30 lines batch handling
 								- **GraphRag class**: Add caching layer (~40 lines)
 								- **Processor class**: Modify to use persistent GraphRag instance (~20 lines)
 								- **Total**: ~140 lines of focused changes, mostly within existing classes
 								## Timeline
 								**Week 1: Core Implementation**
 								- Replace `follow_edges` with batched iterative traversal
 								- Implement parallel label resolution in `get_labelgraph`
 								- Add long-lived GraphRag instance to Processor
 								- Implement label caching layer
 								**Week 2: Testing and Integration**
 								- Unit tests for new traversal and caching logic
 								- Performance benchmarking against current implementation
 								- Integration testing with real graph data
 								- Code review and optimization
 								**Week 3: Deployment**
 								- Deploy optimized implementation
 								- Monitor performance improvements
 								- Fine-tune cache TTL and batch sizes based on real usage
 								## Open Questions
 								- **Database Connection Pooling**: Should we implement custom connection pooling or rely on existing database client pooling?
 								- **Cache Persistence**: Should label and embedding caches persist across service restarts?
 								- **Distributed Caching**: For multi-instance deployments, should we implement distributed caching with Redis/Memcached?
 								- **Query Result Format**: Should we optimize the internal triple representation for better memory efficiency?
 								- **Monitoring Integration**: Which metrics should be exposed to existing monitoring systems (Prometheus, etc.)?
 								## References
 								- [GraphRAG Original Implementation](trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py)
 								- [TrustGraph Architecture Principles](architecture-principles.md)
 								- [Collection Management Specification](collection-management.md)