Graph rag optimisations (#527)

* Tech spec for GraphRAG optimisation * Implement GraphRAG optimisation and update tests
2026-06-14 17:25:14 +02:00 · 2025-09-23 21:05:51 +01:00 · 2025-09-23 21:05:51 +01:00 · 45a14b5958
commit 45a14b5958
parent fcd15d1833
4 changed files with 881 additions and 104 deletions
--- a/docs/tech-specs/graphrag-performance-optimization.md
+++ b/docs/tech-specs/graphrag-performance-optimization.md
@ -0,0 +1,629 @@
+# GraphRAG Performance Optimisation Technical Specification
+
+## Overview
+
+This specification describes comprehensive performance optimisations for the GraphRAG (Graph Retrieval-Augmented Generation) algorithm in TrustGraph. The current implementation suffers from significant performance bottlenecks that limit scalability and response times. This specification addresses four primary optimisation areas:
+
+1. **Graph Traversal Optimisation**: Eliminate inefficient recursive database queries and implement batched graph exploration
+2. **Label Resolution Optimisation**: Replace sequential label fetching with parallel/batched operations
+3. **Caching Strategy Enhancement**: Implement intelligent caching with LRU eviction and prefetching
+4. **Query Optimisation**: Add result memoisation and embedding caching for improved response times
+
+## Goals
+
+- **Reduce Database Query Volume**: Achieve 50-80% reduction in total database queries through batching and caching
+- **Improve Response Times**: Target 3-5x faster subgraph construction and 2-3x faster label resolution
+- **Enhance Scalability**: Support larger knowledge graphs with better memory management
+- **Maintain Accuracy**: Preserve existing GraphRAG functionality and result quality
+- **Enable Concurrency**: Improve parallel processing capabilities for multiple concurrent requests
+- **Reduce Memory Footprint**: Implement efficient data structures and memory management
+- **Add Observability**: Include performance metrics and monitoring capabilities
+- **Ensure Reliability**: Add proper error handling and timeout mechanisms
+
+## Background
+
+The current GraphRAG implementation in `trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py` exhibits several critical performance issues that severely impact system scalability:
+
+### Current Performance Problems
+
+**1. Inefficient Graph Traversal (`follow_edges` function, lines 79-127)**
+- Makes 3 separate database queries per entity per depth level
+- Query pattern: subject-based, predicate-based, and object-based queries for each entity
+- No batching: Each query processes only one entity at a time
+- No cycle detection: Can revisit the same nodes multiple times
+- Recursive implementation without memoisation leads to exponential complexity
+- Time complexity: O(entities × max_path_length × triple_limit³)
+
+**2. Sequential Label Resolution (`get_labelgraph` function, lines 144-171)**
+- Processes each triple component (subject, predicate, object) sequentially
+- Each `maybe_label` call potentially triggers a database query
+- No parallel execution or batching of label queries
+- Results in up to 3 × subgraph_size individual database calls
+
+**3. Primitive Caching Strategy (`maybe_label` function, lines 62-77)**
+- Simple dictionary cache without size limits or TTL
+- No cache eviction policy leads to unbounded memory growth
+- Cache misses trigger individual database queries
+- No prefetching or intelligent cache warming
+
+**4. Suboptimal Query Patterns**
+- Entity vector similarity queries not cached between similar requests
+- No result memoisation for repeated query patterns
+- Missing query optimisation for common access patterns
+
+**5. Critical Object Lifetime Issues (`rag.py:96-102`)**
+- **GraphRag object recreated per request**: Fresh instance created for every query, losing all cache benefits
+- **Query object extremely short-lived**: Created and destroyed within single query execution (lines 201-207)
+- **Label cache reset per request**: Cache warming and accumulated knowledge lost between requests
+- **Client recreation overhead**: Database clients potentially re-established for each request
+- **No cross-request optimisation**: Cannot benefit from query patterns or result sharing
+
+### Performance Impact Analysis
+
+Current worst-case scenario for a typical query:
+- **Entity Retrieval**: 1 vector similarity query
+- **Graph Traversal**: entities × max_path_length × 3 × triple_limit queries
+- **Label Resolution**: subgraph_size × 3 individual label queries
+
+For default parameters (50 entities, path length 2, 30 triple limit, 150 subgraph size):
+- **Minimum queries**: 1 + (50 × 2 × 3 × 30) + (150 × 3) = **9,451 database queries**
+- **Response time**: 15-30 seconds for moderate-sized graphs
+- **Memory usage**: Unbounded cache growth over time
+- **Cache effectiveness**: 0% - caches reset on every request
+- **Object creation overhead**: GraphRag + Query objects created/destroyed per request
+
+This specification addresses these gaps by implementing batched queries, intelligent caching, and parallel processing. By optimizing query patterns and data access, TrustGraph can:
+- Support enterprise-scale knowledge graphs with millions of entities
+- Provide sub-second response times for typical queries
+- Handle hundreds of concurrent GraphRAG requests
+- Scale efficiently with graph size and complexity
+
+## Technical Design
+
+### Architecture
+
+The GraphRAG performance optimisation requires the following technical components:
+
+#### 1. **Object Lifetime Architectural Refactor**
+   - **Make GraphRag long-lived**: Move GraphRag instance to Processor level for persistence across requests
+   - **Preserve caches**: Maintain label cache, embedding cache, and query result cache between requests
+   - **Optimize Query object**: Refactor Query as lightweight execution context, not data container
+   - **Connection persistence**: Maintain database client connections across requests
+
+   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/rag.py` (modified)
+
+#### 2. **Optimized Graph Traversal Engine**
+   - Replace recursive `follow_edges` with iterative breadth-first search
+   - Implement batched entity processing at each traversal level
+   - Add cycle detection using visited node tracking
+   - Include early termination when limits are reached
+
+   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/optimized_traversal.py`
+
+#### 3. **Parallel Label Resolution System**
+   - Batch label queries for multiple entities simultaneously
+   - Implement async/await patterns for concurrent database access
+   - Add intelligent prefetching for common label patterns
+   - Include label cache warming strategies
+
+   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/label_resolver.py`
+
+#### 4. **Conservative Label Caching Layer**
+   - LRU cache with short TTL for labels only (5min) to balance performance vs consistency
+   - Cache metrics and hit ratio monitoring
+   - **No embedding caching**: Already cached per-query, no cross-query benefit
+   - **No query result caching**: Due to graph mutation consistency concerns
+
+   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/cache_manager.py`
+
+#### 5. **Query Optimisation Framework**
+   - Query pattern analysis and optimisation suggestions
+   - Batch query coordinator for database access
+   - Connection pooling and query timeout management
+   - Performance monitoring and metrics collection
+
+   Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/query_optimizer.py`
+
+### Data Models
+
+#### Optimized Graph Traversal State
+
+The traversal engine maintains state to avoid redundant operations:
+
+```python
+@dataclass
+class TraversalState:
+    visited_entities: Set[str]
+    current_level_entities: Set[str]
+    next_level_entities: Set[str]
+    subgraph: Set[Tuple[str, str, str]]
+    depth: int
+    query_batch: List[TripleQuery]
+```
+
+This approach allows:
+- Efficient cycle detection through visited entity tracking
+- Batched query preparation at each traversal level
+- Memory-efficient state management
+- Early termination when size limits are reached
+
+#### Enhanced Cache Structure
+
+```python
+@dataclass
+class CacheEntry:
+    value: Any
+    timestamp: float
+    access_count: int
+    ttl: Optional[float]
+
+class CacheManager:
+    label_cache: LRUCache[str, CacheEntry]
+    embedding_cache: LRUCache[str, CacheEntry]
+    query_result_cache: LRUCache[str, CacheEntry]
+    cache_stats: CacheStatistics
+```
+
+#### Batch Query Structures
+
+```python
+@dataclass
+class BatchTripleQuery:
+    entities: List[str]
+    query_type: QueryType  # SUBJECT, PREDICATE, OBJECT
+    limit_per_entity: int
+
+@dataclass
+class BatchLabelQuery:
+    entities: List[str]
+    predicate: str = LABEL
+```
+
+### APIs
+
+#### New APIs:
+
+**GraphTraversal API**
+```python
+async def optimized_follow_edges_batch(
+    entities: List[str],
+    max_depth: int,
+    triple_limit: int,
+    max_subgraph_size: int
+) -> Set[Tuple[str, str, str]]
+```
+
+**Batch Label Resolution API**
+```python
+async def resolve_labels_batch(
+    entities: List[str],
+    cache_manager: CacheManager
+) -> Dict[str, str]
+```
+
+**Cache Management API**
+```python
+class CacheManager:
+    async def get_or_fetch_label(self, entity: str) -> str
+    async def get_or_fetch_embeddings(self, query: str) -> List[float]
+    async def cache_query_result(self, query_hash: str, result: Any, ttl: int)
+    def get_cache_statistics(self) -> CacheStatistics
+```
+
+#### Modified APIs:
+
+**GraphRag.query()** - Enhanced with performance optimisations:
+- Add cache_manager parameter for cache control
+- Include performance_metrics return value
+- Add query_timeout parameter for reliability
+
+**Query class** - Refactored for batch processing:
+- Replace individual entity processing with batch operations
+- Add async context managers for resource cleanup
+- Include progress callbacks for long-running operations
+
+### Implementation Details
+
+#### Phase 0: Critical Architectural Lifetime Refactor
+
+**Current Problematic Implementation:**
+```python
+# INEFFICIENT: GraphRag recreated every request
+class Processor(FlowProcessor):
+    async def on_request(self, msg, consumer, flow):
+        # PROBLEM: New GraphRag instance per request!
+        self.rag = GraphRag(
+            embeddings_client = flow("embeddings-request"),
+            graph_embeddings_client = flow("graph-embeddings-request"),
+            triples_client = flow("triples-request"),
+            prompt_client = flow("prompt-request"),
+            verbose=True,
+        )
+        # Cache starts empty every time - no benefit from previous requests
+        response = await self.rag.query(...)
+
+# VERY SHORT-LIVED: Query object created/destroyed per request
+class GraphRag:
+    async def query(self, query, user="trustgraph", collection="default", ...):
+        q = Query(rag=self, user=user, collection=collection, ...)  # Created
+        kg = await q.get_labelgraph(query)  # Used briefly
+        # q automatically destroyed when function exits
+```
+
+**Optimized Long-Lived Architecture:**
+```python
+class Processor(FlowProcessor):
+    def __init__(self, **params):
+        super().__init__(**params)
+        self.rag_instance = None  # Will be initialized once
+        self.client_connections = {}
+
+    async def initialize_rag(self, flow):
+        """Initialize GraphRag once, reuse for all requests"""
+        if self.rag_instance is None:
+            self.rag_instance = LongLivedGraphRag(
+                embeddings_client=flow("embeddings-request"),
+                graph_embeddings_client=flow("graph-embeddings-request"),
+                triples_client=flow("triples-request"),
+                prompt_client=flow("prompt-request"),
+                verbose=True,
+            )
+        return self.rag_instance
+
+    async def on_request(self, msg, consumer, flow):
+        # REUSE the same GraphRag instance - caches persist!
+        rag = await self.initialize_rag(flow)
+
+        # Query object becomes lightweight execution context
+        response = await rag.query_with_context(
+            query=v.query,
+            execution_context=QueryContext(
+                user=v.user,
+                collection=v.collection,
+                entity_limit=entity_limit,
+                # ... other params
+            )
+        )
+
+class LongLivedGraphRag:
+    def __init__(self, ...):
+        # CONSERVATIVE caches - balance performance vs consistency
+        self.label_cache = LRUCacheWithTTL(max_size=5000, ttl=300)  # 5min TTL for freshness
+        # Note: No embedding cache - already cached per-query, no cross-query benefit
+        # Note: No query result cache due to consistency concerns
+        self.performance_metrics = PerformanceTracker()
+
+    async def query_with_context(self, query: str, context: QueryContext):
+        # Use lightweight QueryExecutor instead of heavyweight Query object
+        executor = QueryExecutor(self, context)  # Minimal object
+        return await executor.execute(query)
+
+@dataclass
+class QueryContext:
+    """Lightweight execution context - no heavy operations"""
+    user: str
+    collection: str
+    entity_limit: int
+    triple_limit: int
+    max_subgraph_size: int
+    max_path_length: int
+
+class QueryExecutor:
+    """Lightweight execution context - replaces old Query class"""
+    def __init__(self, rag: LongLivedGraphRag, context: QueryContext):
+        self.rag = rag
+        self.context = context
+        # No heavy initialization - just references
+
+    async def execute(self, query: str):
+        # All heavy lifting uses persistent rag caches
+        return await self.rag.execute_optimized_query(query, self.context)
+```
+
+This architectural change provides:
+- **10-20% database query reduction** for graphs with common relationships (vs 0% currently)
+- **Eliminated object creation overhead** for every request
+- **Persistent connection pooling** and client reuse
+- **Cross-request optimization** within cache TTL windows
+
+**Important Cache Consistency Limitation:**
+Long-term caching introduces staleness risk when entities/labels are deleted or modified in the underlying graph. The LRU cache with TTL provides a balance between performance gains and data freshness, but cannot detect real-time graph changes.
+
+#### Phase 1: Graph Traversal Optimisation
+
+**Current Implementation Problems:**
+```python
+# INEFFICIENT: 3 queries per entity per level
+async def follow_edges(self, ent, subgraph, path_length):
+    # Query 1: s=ent, p=None, o=None
+    res = await self.rag.triples_client.query(s=ent, p=None, o=None, limit=self.triple_limit)
+    # Query 2: s=None, p=ent, o=None
+    res = await self.rag.triples_client.query(s=None, p=ent, o=None, limit=self.triple_limit)
+    # Query 3: s=None, p=None, o=ent
+    res = await self.rag.triples_client.query(s=None, p=None, o=ent, limit=self.triple_limit)
+```
+
+**Optimized Implementation:**
+```python
+async def optimized_traversal(self, entities: List[str], max_depth: int) -> Set[Triple]:
+    visited = set()
+    current_level = set(entities)
+    subgraph = set()
+
+    for depth in range(max_depth):
+        if not current_level or len(subgraph) >= self.max_subgraph_size:
+            break
+
+        # Batch all queries for current level
+        batch_queries = []
+        for entity in current_level:
+            if entity not in visited:
+                batch_queries.extend([
+                    TripleQuery(s=entity, p=None, o=None),
+                    TripleQuery(s=None, p=entity, o=None),
+                    TripleQuery(s=None, p=None, o=entity)
+                ])
+
+        # Execute all queries concurrently
+        results = await self.execute_batch_queries(batch_queries)
+
+        # Process results and prepare next level
+        next_level = set()
+        for result in results:
+            subgraph.update(result.triples)
+            next_level.update(result.new_entities)
+
+        visited.update(current_level)
+        current_level = next_level - visited
+
+    return subgraph
+```
+
+#### Phase 2: Parallel Label Resolution
+
+**Current Sequential Implementation:**
+```python
+# INEFFICIENT: Sequential processing
+for edge in subgraph:
+    s = await self.maybe_label(edge[0])  # Individual query
+    p = await self.maybe_label(edge[1])  # Individual query
+    o = await self.maybe_label(edge[2])  # Individual query
+```
+
+**Optimized Parallel Implementation:**
+```python
+async def resolve_labels_parallel(self, subgraph: List[Triple]) -> List[Triple]:
+    # Collect all unique entities needing labels
+    entities_to_resolve = set()
+    for s, p, o in subgraph:
+        entities_to_resolve.update([s, p, o])
+
+    # Remove already cached entities
+    uncached_entities = [e for e in entities_to_resolve if e not in self.label_cache]
+
+    # Batch query for all uncached labels
+    if uncached_entities:
+        label_results = await self.batch_label_query(uncached_entities)
+        self.label_cache.update(label_results)
+
+    # Apply labels to subgraph
+    return [
+        (self.label_cache.get(s, s), self.label_cache.get(p, p), self.label_cache.get(o, o))
+        for s, p, o in subgraph
+    ]
+```
+
+#### Phase 3: Advanced Caching Strategy
+
+**LRU Cache with TTL:**
+```python
+class LRUCacheWithTTL:
+    def __init__(self, max_size: int, default_ttl: int = 3600):
+        self.cache = OrderedDict()
+        self.max_size = max_size
+        self.default_ttl = default_ttl
+        self.access_times = {}
+
+    async def get(self, key: str) -> Optional[Any]:
+        if key in self.cache:
+            # Check TTL expiration
+            if time.time() - self.access_times[key] > self.default_ttl:
+                del self.cache[key]
+                del self.access_times[key]
+                return None
+
+            # Move to end (most recently used)
+            self.cache.move_to_end(key)
+            return self.cache[key]
+        return None
+
+    async def put(self, key: str, value: Any):
+        if key in self.cache:
+            self.cache.move_to_end(key)
+        else:
+            if len(self.cache) >= self.max_size:
+                # Remove least recently used
+                oldest_key = next(iter(self.cache))
+                del self.cache[oldest_key]
+                del self.access_times[oldest_key]
+
+        self.cache[key] = value
+        self.access_times[key] = time.time()
+```
+
+#### Phase 4: Query Optimisation and Monitoring
+
+**Performance Metrics Collection:**
+```python
+@dataclass
+class PerformanceMetrics:
+    total_queries: int
+    cache_hits: int
+    cache_misses: int
+    avg_response_time: float
+    subgraph_construction_time: float
+    label_resolution_time: float
+    total_entities_processed: int
+    memory_usage_mb: float
+```
+
+**Query Timeout and Circuit Breaker:**
+```python
+async def execute_with_timeout(self, query_func, timeout: int = 30):
+    try:
+        return await asyncio.wait_for(query_func(), timeout=timeout)
+    except asyncio.TimeoutError:
+        logger.error(f"Query timeout after {timeout}s")
+        raise GraphRagTimeoutError(f"Query exceeded timeout of {timeout}s")
+```
+
+## Cache Consistency Considerations
+
+**Data Staleness Trade-offs:**
+- **Label cache (5min TTL)**: Risk of serving deleted/renamed entity labels
+- **No embedding caching**: Not needed - embeddings already cached per-query
+- **No result caching**: Prevents stale subgraph results from deleted entities/relationships
+
+**Mitigation Strategies:**
+- **Conservative TTL values**: Balance performance gains (10-20%) with data freshness
+- **Cache invalidation hooks**: Optional integration with graph mutation events
+- **Monitoring dashboards**: Track cache hit rates vs staleness incidents
+- **Configurable cache policies**: Allow per-deployment tuning based on mutation frequency
+
+**Recommended Cache Configuration by Graph Mutation Rate:**
+- **High mutation (>100 changes/hour)**: TTL=60s, smaller cache sizes
+- **Medium mutation (10-100 changes/hour)**: TTL=300s (default)
+- **Low mutation (<10 changes/hour)**: TTL=600s, larger cache sizes
+
+## Security Considerations
+
+**Query Injection Prevention:**
+- Validate all entity identifiers and query parameters
+- Use parameterized queries for all database interactions
+- Implement query complexity limits to prevent DoS attacks
+
+**Resource Protection:**
+- Enforce maximum subgraph size limits
+- Implement query timeouts to prevent resource exhaustion
+- Add memory usage monitoring and limits
+
+**Access Control:**
+- Maintain existing user and collection isolation
+- Add audit logging for performance-impacting operations
+- Implement rate limiting for expensive operations
+
+## Performance Considerations
+
+### Expected Performance Improvements
+
+**Query Reduction:**
+- Current: ~9,000+ queries for typical request
+- Optimized: ~50-100 batched queries (98% reduction)
+
+**Response Time Improvements:**
+- Graph traversal: 15-20s → 3-5s (4-5x faster)
+- Label resolution: 8-12s → 2-4s (3x faster)
+- Overall query: 25-35s → 6-10s (3-4x improvement)
+
+**Memory Efficiency:**
+- Bounded cache sizes prevent memory leaks
+- Efficient data structures reduce memory footprint by ~40%
+- Better garbage collection through proper resource cleanup
+
+**Realistic Performance Expectations:**
+- **Label cache**: 10-20% query reduction for graphs with common relationships
+- **Batching optimization**: 50-80% query reduction (primary optimization)
+- **Object lifetime optimization**: Eliminate per-request creation overhead
+- **Overall improvement**: 3-4x response time improvement primarily from batching
+
+**Scalability Improvements:**
+- Support for 3-5x larger knowledge graphs (limited by cache consistency needs)
+- 3-5x higher concurrent request capacity
+- Better resource utilization through connection reuse
+
+### Performance Monitoring
+
+**Real-time Metrics:**
+- Query execution times by operation type
+- Cache hit ratios and effectiveness
+- Database connection pool utilisation
+- Memory usage and garbage collection impact
+
+**Performance Benchmarking:**
+- Automated performance regression testing
+- Load testing with realistic data volumes
+- Comparison benchmarks against current implementation
+
+## Testing Strategy
+
+### Unit Testing
+- Individual component testing for traversal, caching, and label resolution
+- Mock database interactions for performance testing
+- Cache eviction and TTL expiration testing
+- Error handling and timeout scenarios
+
+### Integration Testing
+- End-to-end GraphRAG query testing with optimisations
+- Database interaction testing with real data
+- Concurrent request handling and resource management
+- Memory leak detection and resource cleanup verification
+
+### Performance Testing
+- Benchmark testing against current implementation
+- Load testing with varying graph sizes and complexities
+- Stress testing for memory and connection limits
+- Regression testing for performance improvements
+
+### Compatibility Testing
+- Verify existing GraphRAG API compatibility
+- Test with various graph database backends
+- Validate result accuracy compared to current implementation
+
+## Implementation Plan
+
+### Direct Implementation Approach
+Since APIs are allowed to change, implement optimizations directly without migration complexity:
+
+1. **Replace `follow_edges` method**: Rewrite with iterative batched traversal
+2. **Optimize `get_labelgraph`**: Implement parallel label resolution
+3. **Add long-lived GraphRag**: Modify Processor to maintain persistent instance
+4. **Implement label caching**: Add LRU cache with TTL to GraphRag class
+
+### Scope of Changes
+- **Query class**: Replace ~50 lines in `follow_edges`, add ~30 lines batch handling
+- **GraphRag class**: Add caching layer (~40 lines)
+- **Processor class**: Modify to use persistent GraphRag instance (~20 lines)
+- **Total**: ~140 lines of focused changes, mostly within existing classes
+
+## Timeline
+
+**Week 1: Core Implementation**
+- Replace `follow_edges` with batched iterative traversal
+- Implement parallel label resolution in `get_labelgraph`
+- Add long-lived GraphRag instance to Processor
+- Implement label caching layer
+
+**Week 2: Testing and Integration**
+- Unit tests for new traversal and caching logic
+- Performance benchmarking against current implementation
+- Integration testing with real graph data
+- Code review and optimization
+
+**Week 3: Deployment**
+- Deploy optimized implementation
+- Monitor performance improvements
+- Fine-tune cache TTL and batch sizes based on real usage
+
+## Open Questions
+
+- **Database Connection Pooling**: Should we implement custom connection pooling or rely on existing database client pooling?
+- **Cache Persistence**: Should label and embedding caches persist across service restarts?
+- **Distributed Caching**: For multi-instance deployments, should we implement distributed caching with Redis/Memcached?
+- **Query Result Format**: Should we optimize the internal triple representation for better memory efficiency?
+- **Monitoring Integration**: Which metrics should be exposed to existing monitoring systems (Prometheus, etc.)?
+
+## References
+
+- [GraphRAG Original Implementation](trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py)
+- [TrustGraph Architecture Principles](architecture-principles.md)
+- [Collection Management Specification](collection-management.md)