mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
635 lines
24 KiB
Markdown
635 lines
24 KiB
Markdown
---
|
||
layout: default
|
||
title: "GraphRAG Performance Optimisation Technical Specification"
|
||
parent: "Tech Specs"
|
||
---
|
||
|
||
# GraphRAG Performance Optimisation Technical Specification
|
||
|
||
## Overview
|
||
|
||
This specification describes comprehensive performance optimisations for the GraphRAG (Graph Retrieval-Augmented Generation) algorithm in TrustGraph. The current implementation suffers from significant performance bottlenecks that limit scalability and response times. This specification addresses four primary optimisation areas:
|
||
|
||
1. **Graph Traversal Optimisation**: Eliminate inefficient recursive database queries and implement batched graph exploration
|
||
2. **Label Resolution Optimisation**: Replace sequential label fetching with parallel/batched operations
|
||
3. **Caching Strategy Enhancement**: Implement intelligent caching with LRU eviction and prefetching
|
||
4. **Query Optimisation**: Add result memoisation and embedding caching for improved response times
|
||
|
||
## Goals
|
||
|
||
- **Reduce Database Query Volume**: Achieve 50-80% reduction in total database queries through batching and caching
|
||
- **Improve Response Times**: Target 3-5x faster subgraph construction and 2-3x faster label resolution
|
||
- **Enhance Scalability**: Support larger knowledge graphs with better memory management
|
||
- **Maintain Accuracy**: Preserve existing GraphRAG functionality and result quality
|
||
- **Enable Concurrency**: Improve parallel processing capabilities for multiple concurrent requests
|
||
- **Reduce Memory Footprint**: Implement efficient data structures and memory management
|
||
- **Add Observability**: Include performance metrics and monitoring capabilities
|
||
- **Ensure Reliability**: Add proper error handling and timeout mechanisms
|
||
|
||
## Background
|
||
|
||
The current GraphRAG implementation in `trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py` exhibits several critical performance issues that severely impact system scalability:
|
||
|
||
### Current Performance Problems
|
||
|
||
**1. Inefficient Graph Traversal (`follow_edges` function, lines 79-127)**
|
||
- Makes 3 separate database queries per entity per depth level
|
||
- Query pattern: subject-based, predicate-based, and object-based queries for each entity
|
||
- No batching: Each query processes only one entity at a time
|
||
- No cycle detection: Can revisit the same nodes multiple times
|
||
- Recursive implementation without memoisation leads to exponential complexity
|
||
- Time complexity: O(entities × max_path_length × triple_limit³)
|
||
|
||
**2. Sequential Label Resolution (`get_labelgraph` function, lines 144-171)**
|
||
- Processes each triple component (subject, predicate, object) sequentially
|
||
- Each `maybe_label` call potentially triggers a database query
|
||
- No parallel execution or batching of label queries
|
||
- Results in up to 3 × subgraph_size individual database calls
|
||
|
||
**3. Primitive Caching Strategy (`maybe_label` function, lines 62-77)**
|
||
- Simple dictionary cache without size limits or TTL
|
||
- No cache eviction policy leads to unbounded memory growth
|
||
- Cache misses trigger individual database queries
|
||
- No prefetching or intelligent cache warming
|
||
|
||
**4. Suboptimal Query Patterns**
|
||
- Entity vector similarity queries not cached between similar requests
|
||
- No result memoisation for repeated query patterns
|
||
- Missing query optimisation for common access patterns
|
||
|
||
**5. Critical Object Lifetime Issues (`rag.py:96-102`)**
|
||
- **GraphRag object recreated per request**: Fresh instance created for every query, losing all cache benefits
|
||
- **Query object extremely short-lived**: Created and destroyed within single query execution (lines 201-207)
|
||
- **Label cache reset per request**: Cache warming and accumulated knowledge lost between requests
|
||
- **Client recreation overhead**: Database clients potentially re-established for each request
|
||
- **No cross-request optimisation**: Cannot benefit from query patterns or result sharing
|
||
|
||
### Performance Impact Analysis
|
||
|
||
Current worst-case scenario for a typical query:
|
||
- **Entity Retrieval**: 1 vector similarity query
|
||
- **Graph Traversal**: entities × max_path_length × 3 × triple_limit queries
|
||
- **Label Resolution**: subgraph_size × 3 individual label queries
|
||
|
||
For default parameters (50 entities, path length 2, 30 triple limit, 150 subgraph size):
|
||
- **Minimum queries**: 1 + (50 × 2 × 3 × 30) + (150 × 3) = **9,451 database queries**
|
||
- **Response time**: 15-30 seconds for moderate-sized graphs
|
||
- **Memory usage**: Unbounded cache growth over time
|
||
- **Cache effectiveness**: 0% - caches reset on every request
|
||
- **Object creation overhead**: GraphRag + Query objects created/destroyed per request
|
||
|
||
This specification addresses these gaps by implementing batched queries, intelligent caching, and parallel processing. By optimizing query patterns and data access, TrustGraph can:
|
||
- Support enterprise-scale knowledge graphs with millions of entities
|
||
- Provide sub-second response times for typical queries
|
||
- Handle hundreds of concurrent GraphRAG requests
|
||
- Scale efficiently with graph size and complexity
|
||
|
||
## Technical Design
|
||
|
||
### Architecture
|
||
|
||
The GraphRAG performance optimisation requires the following technical components:
|
||
|
||
#### 1. **Object Lifetime Architectural Refactor**
|
||
- **Make GraphRag long-lived**: Move GraphRag instance to Processor level for persistence across requests
|
||
- **Preserve caches**: Maintain label cache, embedding cache, and query result cache between requests
|
||
- **Optimize Query object**: Refactor Query as lightweight execution context, not data container
|
||
- **Connection persistence**: Maintain database client connections across requests
|
||
|
||
Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/rag.py` (modified)
|
||
|
||
#### 2. **Optimized Graph Traversal Engine**
|
||
- Replace recursive `follow_edges` with iterative breadth-first search
|
||
- Implement batched entity processing at each traversal level
|
||
- Add cycle detection using visited node tracking
|
||
- Include early termination when limits are reached
|
||
|
||
Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/optimized_traversal.py`
|
||
|
||
#### 3. **Parallel Label Resolution System**
|
||
- Batch label queries for multiple entities simultaneously
|
||
- Implement async/await patterns for concurrent database access
|
||
- Add intelligent prefetching for common label patterns
|
||
- Include label cache warming strategies
|
||
|
||
Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/label_resolver.py`
|
||
|
||
#### 4. **Conservative Label Caching Layer**
|
||
- LRU cache with short TTL for labels only (5min) to balance performance vs consistency
|
||
- Cache metrics and hit ratio monitoring
|
||
- **No embedding caching**: Already cached per-query, no cross-query benefit
|
||
- **No query result caching**: Due to graph mutation consistency concerns
|
||
|
||
Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/cache_manager.py`
|
||
|
||
#### 5. **Query Optimisation Framework**
|
||
- Query pattern analysis and optimisation suggestions
|
||
- Batch query coordinator for database access
|
||
- Connection pooling and query timeout management
|
||
- Performance monitoring and metrics collection
|
||
|
||
Module: `trustgraph-flow/trustgraph/retrieval/graph_rag/query_optimizer.py`
|
||
|
||
### Data Models
|
||
|
||
#### Optimized Graph Traversal State
|
||
|
||
The traversal engine maintains state to avoid redundant operations:
|
||
|
||
```python
|
||
@dataclass
|
||
class TraversalState:
|
||
visited_entities: Set[str]
|
||
current_level_entities: Set[str]
|
||
next_level_entities: Set[str]
|
||
subgraph: Set[Tuple[str, str, str]]
|
||
depth: int
|
||
query_batch: List[TripleQuery]
|
||
```
|
||
|
||
This approach allows:
|
||
- Efficient cycle detection through visited entity tracking
|
||
- Batched query preparation at each traversal level
|
||
- Memory-efficient state management
|
||
- Early termination when size limits are reached
|
||
|
||
#### Enhanced Cache Structure
|
||
|
||
```python
|
||
@dataclass
|
||
class CacheEntry:
|
||
value: Any
|
||
timestamp: float
|
||
access_count: int
|
||
ttl: Optional[float]
|
||
|
||
class CacheManager:
|
||
label_cache: LRUCache[str, CacheEntry]
|
||
embedding_cache: LRUCache[str, CacheEntry]
|
||
query_result_cache: LRUCache[str, CacheEntry]
|
||
cache_stats: CacheStatistics
|
||
```
|
||
|
||
#### Batch Query Structures
|
||
|
||
```python
|
||
@dataclass
|
||
class BatchTripleQuery:
|
||
entities: List[str]
|
||
query_type: QueryType # SUBJECT, PREDICATE, OBJECT
|
||
limit_per_entity: int
|
||
|
||
@dataclass
|
||
class BatchLabelQuery:
|
||
entities: List[str]
|
||
predicate: str = LABEL
|
||
```
|
||
|
||
### APIs
|
||
|
||
#### New APIs:
|
||
|
||
**GraphTraversal API**
|
||
```python
|
||
async def optimized_follow_edges_batch(
|
||
entities: List[str],
|
||
max_depth: int,
|
||
triple_limit: int,
|
||
max_subgraph_size: int
|
||
) -> Set[Tuple[str, str, str]]
|
||
```
|
||
|
||
**Batch Label Resolution API**
|
||
```python
|
||
async def resolve_labels_batch(
|
||
entities: List[str],
|
||
cache_manager: CacheManager
|
||
) -> Dict[str, str]
|
||
```
|
||
|
||
**Cache Management API**
|
||
```python
|
||
class CacheManager:
|
||
async def get_or_fetch_label(self, entity: str) -> str
|
||
async def get_or_fetch_embeddings(self, query: str) -> List[float]
|
||
async def cache_query_result(self, query_hash: str, result: Any, ttl: int)
|
||
def get_cache_statistics(self) -> CacheStatistics
|
||
```
|
||
|
||
#### Modified APIs:
|
||
|
||
**GraphRag.query()** - Enhanced with performance optimisations:
|
||
- Add cache_manager parameter for cache control
|
||
- Include performance_metrics return value
|
||
- Add query_timeout parameter for reliability
|
||
|
||
**Query class** - Refactored for batch processing:
|
||
- Replace individual entity processing with batch operations
|
||
- Add async context managers for resource cleanup
|
||
- Include progress callbacks for long-running operations
|
||
|
||
### Implementation Details
|
||
|
||
#### Phase 0: Critical Architectural Lifetime Refactor
|
||
|
||
**Current Problematic Implementation:**
|
||
```python
|
||
# INEFFICIENT: GraphRag recreated every request
|
||
class Processor(FlowProcessor):
|
||
async def on_request(self, msg, consumer, flow):
|
||
# PROBLEM: New GraphRag instance per request!
|
||
self.rag = GraphRag(
|
||
embeddings_client = flow("embeddings-request"),
|
||
graph_embeddings_client = flow("graph-embeddings-request"),
|
||
triples_client = flow("triples-request"),
|
||
prompt_client = flow("prompt-request"),
|
||
verbose=True,
|
||
)
|
||
# Cache starts empty every time - no benefit from previous requests
|
||
response = await self.rag.query(...)
|
||
|
||
# VERY SHORT-LIVED: Query object created/destroyed per request
|
||
class GraphRag:
|
||
async def query(self, query, user="trustgraph", collection="default", ...):
|
||
q = Query(rag=self, user=user, collection=collection, ...) # Created
|
||
kg = await q.get_labelgraph(query) # Used briefly
|
||
# q automatically destroyed when function exits
|
||
```
|
||
|
||
**Optimized Long-Lived Architecture:**
|
||
```python
|
||
class Processor(FlowProcessor):
|
||
def __init__(self, **params):
|
||
super().__init__(**params)
|
||
self.rag_instance = None # Will be initialized once
|
||
self.client_connections = {}
|
||
|
||
async def initialize_rag(self, flow):
|
||
"""Initialize GraphRag once, reuse for all requests"""
|
||
if self.rag_instance is None:
|
||
self.rag_instance = LongLivedGraphRag(
|
||
embeddings_client=flow("embeddings-request"),
|
||
graph_embeddings_client=flow("graph-embeddings-request"),
|
||
triples_client=flow("triples-request"),
|
||
prompt_client=flow("prompt-request"),
|
||
verbose=True,
|
||
)
|
||
return self.rag_instance
|
||
|
||
async def on_request(self, msg, consumer, flow):
|
||
# REUSE the same GraphRag instance - caches persist!
|
||
rag = await self.initialize_rag(flow)
|
||
|
||
# Query object becomes lightweight execution context
|
||
response = await rag.query_with_context(
|
||
query=v.query,
|
||
execution_context=QueryContext(
|
||
user=v.user,
|
||
collection=v.collection,
|
||
entity_limit=entity_limit,
|
||
# ... other params
|
||
)
|
||
)
|
||
|
||
class LongLivedGraphRag:
|
||
def __init__(self, ...):
|
||
# CONSERVATIVE caches - balance performance vs consistency
|
||
self.label_cache = LRUCacheWithTTL(max_size=5000, ttl=300) # 5min TTL for freshness
|
||
# Note: No embedding cache - already cached per-query, no cross-query benefit
|
||
# Note: No query result cache due to consistency concerns
|
||
self.performance_metrics = PerformanceTracker()
|
||
|
||
async def query_with_context(self, query: str, context: QueryContext):
|
||
# Use lightweight QueryExecutor instead of heavyweight Query object
|
||
executor = QueryExecutor(self, context) # Minimal object
|
||
return await executor.execute(query)
|
||
|
||
@dataclass
|
||
class QueryContext:
|
||
"""Lightweight execution context - no heavy operations"""
|
||
user: str
|
||
collection: str
|
||
entity_limit: int
|
||
triple_limit: int
|
||
max_subgraph_size: int
|
||
max_path_length: int
|
||
|
||
class QueryExecutor:
|
||
"""Lightweight execution context - replaces old Query class"""
|
||
def __init__(self, rag: LongLivedGraphRag, context: QueryContext):
|
||
self.rag = rag
|
||
self.context = context
|
||
# No heavy initialization - just references
|
||
|
||
async def execute(self, query: str):
|
||
# All heavy lifting uses persistent rag caches
|
||
return await self.rag.execute_optimized_query(query, self.context)
|
||
```
|
||
|
||
This architectural change provides:
|
||
- **10-20% database query reduction** for graphs with common relationships (vs 0% currently)
|
||
- **Eliminated object creation overhead** for every request
|
||
- **Persistent connection pooling** and client reuse
|
||
- **Cross-request optimization** within cache TTL windows
|
||
|
||
**Important Cache Consistency Limitation:**
|
||
Long-term caching introduces staleness risk when entities/labels are deleted or modified in the underlying graph. The LRU cache with TTL provides a balance between performance gains and data freshness, but cannot detect real-time graph changes.
|
||
|
||
#### Phase 1: Graph Traversal Optimisation
|
||
|
||
**Current Implementation Problems:**
|
||
```python
|
||
# INEFFICIENT: 3 queries per entity per level
|
||
async def follow_edges(self, ent, subgraph, path_length):
|
||
# Query 1: s=ent, p=None, o=None
|
||
res = await self.rag.triples_client.query(s=ent, p=None, o=None, limit=self.triple_limit)
|
||
# Query 2: s=None, p=ent, o=None
|
||
res = await self.rag.triples_client.query(s=None, p=ent, o=None, limit=self.triple_limit)
|
||
# Query 3: s=None, p=None, o=ent
|
||
res = await self.rag.triples_client.query(s=None, p=None, o=ent, limit=self.triple_limit)
|
||
```
|
||
|
||
**Optimized Implementation:**
|
||
```python
|
||
async def optimized_traversal(self, entities: List[str], max_depth: int) -> Set[Triple]:
|
||
visited = set()
|
||
current_level = set(entities)
|
||
subgraph = set()
|
||
|
||
for depth in range(max_depth):
|
||
if not current_level or len(subgraph) >= self.max_subgraph_size:
|
||
break
|
||
|
||
# Batch all queries for current level
|
||
batch_queries = []
|
||
for entity in current_level:
|
||
if entity not in visited:
|
||
batch_queries.extend([
|
||
TripleQuery(s=entity, p=None, o=None),
|
||
TripleQuery(s=None, p=entity, o=None),
|
||
TripleQuery(s=None, p=None, o=entity)
|
||
])
|
||
|
||
# Execute all queries concurrently
|
||
results = await self.execute_batch_queries(batch_queries)
|
||
|
||
# Process results and prepare next level
|
||
next_level = set()
|
||
for result in results:
|
||
subgraph.update(result.triples)
|
||
next_level.update(result.new_entities)
|
||
|
||
visited.update(current_level)
|
||
current_level = next_level - visited
|
||
|
||
return subgraph
|
||
```
|
||
|
||
#### Phase 2: Parallel Label Resolution
|
||
|
||
**Current Sequential Implementation:**
|
||
```python
|
||
# INEFFICIENT: Sequential processing
|
||
for edge in subgraph:
|
||
s = await self.maybe_label(edge[0]) # Individual query
|
||
p = await self.maybe_label(edge[1]) # Individual query
|
||
o = await self.maybe_label(edge[2]) # Individual query
|
||
```
|
||
|
||
**Optimized Parallel Implementation:**
|
||
```python
|
||
async def resolve_labels_parallel(self, subgraph: List[Triple]) -> List[Triple]:
|
||
# Collect all unique entities needing labels
|
||
entities_to_resolve = set()
|
||
for s, p, o in subgraph:
|
||
entities_to_resolve.update([s, p, o])
|
||
|
||
# Remove already cached entities
|
||
uncached_entities = [e for e in entities_to_resolve if e not in self.label_cache]
|
||
|
||
# Batch query for all uncached labels
|
||
if uncached_entities:
|
||
label_results = await self.batch_label_query(uncached_entities)
|
||
self.label_cache.update(label_results)
|
||
|
||
# Apply labels to subgraph
|
||
return [
|
||
(self.label_cache.get(s, s), self.label_cache.get(p, p), self.label_cache.get(o, o))
|
||
for s, p, o in subgraph
|
||
]
|
||
```
|
||
|
||
#### Phase 3: Advanced Caching Strategy
|
||
|
||
**LRU Cache with TTL:**
|
||
```python
|
||
class LRUCacheWithTTL:
|
||
def __init__(self, max_size: int, default_ttl: int = 3600):
|
||
self.cache = OrderedDict()
|
||
self.max_size = max_size
|
||
self.default_ttl = default_ttl
|
||
self.access_times = {}
|
||
|
||
async def get(self, key: str) -> Optional[Any]:
|
||
if key in self.cache:
|
||
# Check TTL expiration
|
||
if time.time() - self.access_times[key] > self.default_ttl:
|
||
del self.cache[key]
|
||
del self.access_times[key]
|
||
return None
|
||
|
||
# Move to end (most recently used)
|
||
self.cache.move_to_end(key)
|
||
return self.cache[key]
|
||
return None
|
||
|
||
async def put(self, key: str, value: Any):
|
||
if key in self.cache:
|
||
self.cache.move_to_end(key)
|
||
else:
|
||
if len(self.cache) >= self.max_size:
|
||
# Remove least recently used
|
||
oldest_key = next(iter(self.cache))
|
||
del self.cache[oldest_key]
|
||
del self.access_times[oldest_key]
|
||
|
||
self.cache[key] = value
|
||
self.access_times[key] = time.time()
|
||
```
|
||
|
||
#### Phase 4: Query Optimisation and Monitoring
|
||
|
||
**Performance Metrics Collection:**
|
||
```python
|
||
@dataclass
|
||
class PerformanceMetrics:
|
||
total_queries: int
|
||
cache_hits: int
|
||
cache_misses: int
|
||
avg_response_time: float
|
||
subgraph_construction_time: float
|
||
label_resolution_time: float
|
||
total_entities_processed: int
|
||
memory_usage_mb: float
|
||
```
|
||
|
||
**Query Timeout and Circuit Breaker:**
|
||
```python
|
||
async def execute_with_timeout(self, query_func, timeout: int = 30):
|
||
try:
|
||
return await asyncio.wait_for(query_func(), timeout=timeout)
|
||
except asyncio.TimeoutError:
|
||
logger.error(f"Query timeout after {timeout}s")
|
||
raise GraphRagTimeoutError(f"Query exceeded timeout of {timeout}s")
|
||
```
|
||
|
||
## Cache Consistency Considerations
|
||
|
||
**Data Staleness Trade-offs:**
|
||
- **Label cache (5min TTL)**: Risk of serving deleted/renamed entity labels
|
||
- **No embedding caching**: Not needed - embeddings already cached per-query
|
||
- **No result caching**: Prevents stale subgraph results from deleted entities/relationships
|
||
|
||
**Mitigation Strategies:**
|
||
- **Conservative TTL values**: Balance performance gains (10-20%) with data freshness
|
||
- **Cache invalidation hooks**: Optional integration with graph mutation events
|
||
- **Monitoring dashboards**: Track cache hit rates vs staleness incidents
|
||
- **Configurable cache policies**: Allow per-deployment tuning based on mutation frequency
|
||
|
||
**Recommended Cache Configuration by Graph Mutation Rate:**
|
||
- **High mutation (>100 changes/hour)**: TTL=60s, smaller cache sizes
|
||
- **Medium mutation (10-100 changes/hour)**: TTL=300s (default)
|
||
- **Low mutation (<10 changes/hour)**: TTL=600s, larger cache sizes
|
||
|
||
## Security Considerations
|
||
|
||
**Query Injection Prevention:**
|
||
- Validate all entity identifiers and query parameters
|
||
- Use parameterized queries for all database interactions
|
||
- Implement query complexity limits to prevent DoS attacks
|
||
|
||
**Resource Protection:**
|
||
- Enforce maximum subgraph size limits
|
||
- Implement query timeouts to prevent resource exhaustion
|
||
- Add memory usage monitoring and limits
|
||
|
||
**Access Control:**
|
||
- Maintain existing user and collection isolation
|
||
- Add audit logging for performance-impacting operations
|
||
- Implement rate limiting for expensive operations
|
||
|
||
## Performance Considerations
|
||
|
||
### Expected Performance Improvements
|
||
|
||
**Query Reduction:**
|
||
- Current: ~9,000+ queries for typical request
|
||
- Optimized: ~50-100 batched queries (98% reduction)
|
||
|
||
**Response Time Improvements:**
|
||
- Graph traversal: 15-20s → 3-5s (4-5x faster)
|
||
- Label resolution: 8-12s → 2-4s (3x faster)
|
||
- Overall query: 25-35s → 6-10s (3-4x improvement)
|
||
|
||
**Memory Efficiency:**
|
||
- Bounded cache sizes prevent memory leaks
|
||
- Efficient data structures reduce memory footprint by ~40%
|
||
- Better garbage collection through proper resource cleanup
|
||
|
||
**Realistic Performance Expectations:**
|
||
- **Label cache**: 10-20% query reduction for graphs with common relationships
|
||
- **Batching optimization**: 50-80% query reduction (primary optimization)
|
||
- **Object lifetime optimization**: Eliminate per-request creation overhead
|
||
- **Overall improvement**: 3-4x response time improvement primarily from batching
|
||
|
||
**Scalability Improvements:**
|
||
- Support for 3-5x larger knowledge graphs (limited by cache consistency needs)
|
||
- 3-5x higher concurrent request capacity
|
||
- Better resource utilization through connection reuse
|
||
|
||
### Performance Monitoring
|
||
|
||
**Real-time Metrics:**
|
||
- Query execution times by operation type
|
||
- Cache hit ratios and effectiveness
|
||
- Database connection pool utilisation
|
||
- Memory usage and garbage collection impact
|
||
|
||
**Performance Benchmarking:**
|
||
- Automated performance regression testing
|
||
- Load testing with realistic data volumes
|
||
- Comparison benchmarks against current implementation
|
||
|
||
## Testing Strategy
|
||
|
||
### Unit Testing
|
||
- Individual component testing for traversal, caching, and label resolution
|
||
- Mock database interactions for performance testing
|
||
- Cache eviction and TTL expiration testing
|
||
- Error handling and timeout scenarios
|
||
|
||
### Integration Testing
|
||
- End-to-end GraphRAG query testing with optimisations
|
||
- Database interaction testing with real data
|
||
- Concurrent request handling and resource management
|
||
- Memory leak detection and resource cleanup verification
|
||
|
||
### Performance Testing
|
||
- Benchmark testing against current implementation
|
||
- Load testing with varying graph sizes and complexities
|
||
- Stress testing for memory and connection limits
|
||
- Regression testing for performance improvements
|
||
|
||
### Compatibility Testing
|
||
- Verify existing GraphRAG API compatibility
|
||
- Test with various graph database backends
|
||
- Validate result accuracy compared to current implementation
|
||
|
||
## Implementation Plan
|
||
|
||
### Direct Implementation Approach
|
||
Since APIs are allowed to change, implement optimizations directly without migration complexity:
|
||
|
||
1. **Replace `follow_edges` method**: Rewrite with iterative batched traversal
|
||
2. **Optimize `get_labelgraph`**: Implement parallel label resolution
|
||
3. **Add long-lived GraphRag**: Modify Processor to maintain persistent instance
|
||
4. **Implement label caching**: Add LRU cache with TTL to GraphRag class
|
||
|
||
### Scope of Changes
|
||
- **Query class**: Replace ~50 lines in `follow_edges`, add ~30 lines batch handling
|
||
- **GraphRag class**: Add caching layer (~40 lines)
|
||
- **Processor class**: Modify to use persistent GraphRag instance (~20 lines)
|
||
- **Total**: ~140 lines of focused changes, mostly within existing classes
|
||
|
||
## Timeline
|
||
|
||
**Week 1: Core Implementation**
|
||
- Replace `follow_edges` with batched iterative traversal
|
||
- Implement parallel label resolution in `get_labelgraph`
|
||
- Add long-lived GraphRag instance to Processor
|
||
- Implement label caching layer
|
||
|
||
**Week 2: Testing and Integration**
|
||
- Unit tests for new traversal and caching logic
|
||
- Performance benchmarking against current implementation
|
||
- Integration testing with real graph data
|
||
- Code review and optimization
|
||
|
||
**Week 3: Deployment**
|
||
- Deploy optimized implementation
|
||
- Monitor performance improvements
|
||
- Fine-tune cache TTL and batch sizes based on real usage
|
||
|
||
## Open Questions
|
||
|
||
- **Database Connection Pooling**: Should we implement custom connection pooling or rely on existing database client pooling?
|
||
- **Cache Persistence**: Should label and embedding caches persist across service restarts?
|
||
- **Distributed Caching**: For multi-instance deployments, should we implement distributed caching with Redis/Memcached?
|
||
- **Query Result Format**: Should we optimize the internal triple representation for better memory efficiency?
|
||
- **Monitoring Integration**: Which metrics should be exposed to existing monitoring systems (Prometheus, etc.)?
|
||
|
||
## References
|
||
|
||
- [GraphRAG Original Implementation](trustgraph-flow/trustgraph/retrieval/graph_rag/graph_rag.py)
|
||
- [TrustGraph Architecture Principles](architecture-principles.md)
|
||
- [Collection Management Specification](collection-management.md)
|