Update tech spec

2026-04-28 01:46:22 +02:00 · 2025-09-03 11:52:14 +01:00 · 2025-09-03 11:52:14 +01:00 · 3353d4305b
commit 3353d4305b
parent 66116eb875
1 changed files with 183 additions and 31 deletions
--- a/docs/tech-specs/confidence-based-agents.md
+++ b/docs/tech-specs/confidence-based-agents.md
@ -30,13 +30,13 @@ This document specifies a new agent architecture for TrustGraph that introduces
 │                                                                  │
 │  ┌──────────────┐   ┌─────────────────┐   ┌────────────────┐     │
 │  │   Planner    │   │ Flow Controller │   │   Confidence   │     │
-│  │   Module     │─▶│   Module        │─▶│    Evaluator   │     │
+│  │   Module     │─▶│      Module     │─▶│   Evaluator    │     │
 │  └──────────────┘   └─────────────────┘   └────────────────┘     │
 │         │                  │                    │                │
 │         ▼                  ▼                    ▼                │
 │  ┌──────────────┐   ┌───────────────┐     ┌────────────────┐     │
-│  │ Execution    │   │    Memory     │     │     Audit      │     │
-│  │   Engine     │◄──│    Manager    │     │     Logger     │     │
+│  │   Execution  │   │    Memory     │     │     Audit      │     │
+│  │    Engine    │◄──│    Manager    │     │     Logger     │     │
 │  └──────────────┘   └───────────────┘     └────────────────┘     │
 └──────────────────────────────────────────────────────────────────┘
                             │
@ -233,52 +233,204 @@ The confidence agent reuses existing tool implementations:

 No changes required to existing tools.

-### 5. Execution Flow
+### 5. End-to-End Execution Flow

-#### 5.1 Request Processing
+#### 5.1 Module Interaction Overview
+
+When an `AgentRequest` arrives, the confidence agent orchestrates the following flow:
+
+1. **Service Entry**: The main service receives the `AgentRequest` via Pulsar
+2. **Planning Phase**: Service invokes Planner Module to generate an `ExecutionPlan`
+3. **Execution Loop**: Service passes plan to Flow Controller, which:
+   - Resolves step dependencies
+   - For each step, calls Executor with context from Memory Manager
+   - Evaluator assesses confidence after each execution
+   - Retry logic triggered if confidence below threshold
+4. **Response Stream**: Service sends `AgentResponse` messages at key points
+5. **Audit Trail**: Logger records all decisions and confidence scores
+
+#### 5.2 Detailed Message Flow

 ```mermaid
 sequenceDiagram
    participant Client
-    participant Gateway
-    participant ConfidenceAgent
+    participant Service as ConfidenceAgent<br/>Service
    participant Planner
-    participant FlowController
+    participant FlowCtrl as Flow<br/>Controller
+    participant Memory
    participant Executor
+    participant Evaluator
    participant Tools
    
-    Client->>Gateway: Request
-    Gateway->>ConfidenceAgent: ConfidenceAgentRequest
-    ConfidenceAgent->>Planner: Generate plan
-    Planner->>ConfidenceAgent: ExecutionPlan
+    Client->>Service: AgentRequest
+    Service->>Service: Parse request,<br/>extract config
    
-    loop For each step
-        ConfidenceAgent->>FlowController: Execute step
-        FlowController->>Executor: Run tool
-        Executor->>Tools: Tool invocation
-        Tools->>Executor: Result
-        Executor->>FlowController: StepResult + Confidence
+    %% Planning Phase
+    Service->>Planner: generate_plan(request)
+    Planner->>Tools: Query available tools
+    Planner->>Planner: LLM generates<br/>ExecutionPlan
+    Planner-->>Service: ExecutionPlan
+    Service->>Client: AgentResponse<br/>(planning thought)
+    
+    %% Execution Phase
+    Service->>FlowCtrl: execute_plan(plan)
+    
+    loop For each ExecutionStep
+        FlowCtrl->>Memory: get_context(step)
+        Memory-->>FlowCtrl: context + dependencies
        
-        alt Confidence >= Threshold
-            FlowController->>ConfidenceAgent: Continue
-        else Confidence < Threshold
-            FlowController->>FlowController: Retry logic
+        FlowCtrl->>Executor: execute_step(step, context)
+        Executor->>Tools: invoke_tool(name, args)
+        Tools-->>Executor: raw_result
+        
+        Executor->>Evaluator: evaluate(result)
+        Evaluator-->>Executor: ConfidenceMetrics
+        
+        alt Confidence >= threshold
+            Executor-->>FlowCtrl: StepResult (success)
+            FlowCtrl->>Memory: store_result(step, result)
+            FlowCtrl->>Service: Send progress
+            Service->>Client: AgentResponse<br/>(step observation)
+        else Confidence < threshold
+            FlowCtrl->>FlowCtrl: Retry with backoff
+            Note over FlowCtrl: Max 3 retries by default
+            alt After max retries
+                FlowCtrl->>Service: Request override
+                Service->>Client: AgentResponse<br/>(override request)
+            end
        end
    end
    
-    ConfidenceAgent->>Gateway: ConfidenceAgentResponse
-    Gateway->>Client: Response
+    FlowCtrl-->>Service: All StepResults
+    Service->>Service: Generate final answer
+    Service->>Client: AgentResponse<br/>(final answer)
 ```

-#### 5.2 Confidence-Based Control Flow
+#### 5.3 Confidence Decision Points

-The control flow implements a retry loop with exponential backoff:
+The confidence mechanism affects execution at three critical points:

-1. Execute step and evaluate confidence
-2. If confidence meets threshold, proceed to next step
-3. If below threshold, retry with backoff delay
-4. After max retries, either request user override or fail
-5. Log all attempts and decisions for audit trail
+**1. Planning Confidence**
+- Planner assigns confidence thresholds to each step based on:
+  - Operation criticality (graph mutations = higher threshold)
+  - Tool reliability history
+  - Query complexity
+- Default thresholds: GraphQuery (0.8), TextCompletion (0.7), McpTool (0.6)
+
+**2. Execution Confidence**
+- After each tool execution, Evaluator calculates confidence based on:
+  - Output completeness and structure
+  - Consistency with expected schemas
+  - Semantic coherence (for text outputs)
+  - Result size and validity (for graph queries)
+
+**3. Retry Decision**
+- If confidence < threshold:
+  - First retry: Same parameters with backoff
+  - Second retry: Adjusted parameters (e.g., broader query)
+  - Third retry: Simplified approach
+  - After max retries: User override or graceful failure
+
+#### 5.4 Example: Graph Query with Low Confidence
+
+**Scenario**: User asks "What are the connections between Entity X and Entity Y?"
+
+**Step 1: Planning**
+```
+AgentRequest arrives:
+  question: "What are the connections between Entity X and Entity Y?"
+  
+Planner generates ExecutionPlan:
+  Step 1: GraphQuery
+    function: "GraphQuery"
+    arguments: {"query": "MATCH path=(x:Entity {name:'X'})-[*..3]-(y:Entity {name:'Y'}) RETURN path"}
+    confidence_threshold: 0.8
+```
+
+**Step 2: First Execution**
+```
+Executor runs GraphQuery:
+  Result: Empty result set []
+  
+Evaluator assesses confidence:
+  Score: 0.3 (low - empty results suspicious)
+  Reasoning: "Empty result may indicate entities don't exist or query too restrictive"
+  
+Flow Controller decides:
+  0.3 < 0.8 threshold → RETRY
+```
+
+**Step 3: Retry with Adjusted Query**
+```
+Flow Controller adjusts parameters:
+  New query: "MATCH (x:Entity), (y:Entity) WHERE x.name CONTAINS 'X' AND y.name CONTAINS 'Y' RETURN x, y"
+  
+Executor runs adjusted query:
+  Result: Found 2 entities but no connections
+  
+Evaluator assesses confidence:
+  Score: 0.85
+  Reasoning: "Entities exist but genuinely unconnected"
+  
+Flow Controller decides:
+  0.85 >= 0.8 threshold → SUCCESS
+```
+
+**Step 4: Response Stream**
+```
+AgentResponse 1 (planning):
+  thought: "Planning graph traversal query to find connections"
+  observation: "Generated query with 3-hop path search"
+
+AgentResponse 2 (retry):
+  thought: "Initial query returned empty, adjusting search parameters"
+  observation: "Retrying with broader entity matching"
+
+AgentResponse 3 (final):
+  answer: "Entity X and Entity Y exist in the graph but have no direct or indirect connections within 3 hops"
+  thought: "Query successful with high confidence after parameter adjustment"
+  observation: "Confidence: 0.85 - Entities verified to exist but unconnected"
+```
+
+#### 5.5 Example: Multi-Step Plan with Dependencies
+
+**Scenario**: "Summarize the main topics discussed about AI regulation"
+
+**ExecutionPlan Generated**:
+```
+Step 1: GraphQuery - Find documents about AI regulation
+  confidence_threshold: 0.75
+  
+Step 2: TextCompletion - Extract key topics from documents
+  dependencies: [Step 1]
+  confidence_threshold: 0.7
+  
+Step 3: TextCompletion - Generate summary
+  dependencies: [Step 2]
+  confidence_threshold: 0.8
+```
+
+**Execution Flow**:
+1. **Step 1 Success** (confidence: 0.9)
+   - Found 15 relevant documents
+   - Memory Manager stores document list
+   
+2. **Step 2 Initial Failure** (confidence: 0.5)
+   - Topics extraction unclear
+   - Retry with more specific prompt
+   - **Retry Success** (confidence: 0.75)
+   - Memory Manager stores topics list
+   
+3. **Step 3 Success** (confidence: 0.85)
+   - Uses topics from memory
+   - Generates coherent summary
+   
+**Total AgentResponses sent**: 6
+- 1 for planning
+- 2 for Step 1 (execution + success)
+- 2 for Step 2 (failure + retry success)  
+- 1 for Step 3
+- 1 final response

 ### 6. Monitoring and Observability