Agentic memory spec

2026-05-18 20:05:13 +02:00 · 2025-12-03 16:29:47 +00:00 · 2025-12-03 16:29:47 +00:00 · 5d67e9f9a1
commit 5d67e9f9a1
parent 6279a11a16
1 changed files with 594 additions and 112 deletions
--- a/docs/tech-specs/agentic-memory.md
+++ b/docs/tech-specs/agentic-memory.md
@ -1,127 +1,609 @@
+# Agentic Memory Technical Specification

-This is some sketches about how to add an agentic manager with memory.
+## Overview

-The existing 'react' manager is going to stay.  The new one will be called
-'react_mem'.  The 'react_mem' module is to going to be a drop-in replacement
-for 'react' so that the end user can decide which to deploy.
+This specification describes the implementation of an agent manager with multi-layered memory capabilities for TrustGraph. The new `react_mem` module extends the existing ReAct pattern with persistent memory across invocations and conversations, enabling agents to learn from past interactions and maintain context over time.

-## Interaction start
+The implementation supports the following use cases:

-1. RETRIEVE CONTEXT
-   │
-   ├── Long-term facts
-   │   └── Embed user query → retrieve relevant subgraph
-   │
-   ├── Episodic memory
-   │   └── Embed user query → find similar past episodes
-   │
-   └── Conversation memory
-       └── Pull summary/key points from prior invocations in this conversation
-   
-2. INITIALIZE WORKING MEMORY
-   └── Empty (or seed with "Goal: <user query>")
-   
-   
-## Core loop
+1. **Long-term Knowledge Retention**: Store and retrieve factual information learned during agent interactions
+2. **Episodic Memory**: Remember past problem-solving approaches and their outcomes
+3. **Conversation Continuity**: Maintain context across multiple invocations within a conversation
+4. **Working Memory Management**: Handle extended reasoning chains with automatic compression
+5. **Experience-Based Reasoning**: Leverage similar past experiences when addressing new queries

-┌─────────────────────────────────────────────────────────────────┐
-│                         ASSEMBLE CONTEXT                        │
-│                                                                 │
-│  • System prompt / agent persona                                │
-│  • Long-term facts (retrieved at invocation start)              │
-│  • Relevant episodes (retrieved at invocation start)            │
-│  • Conversation memory (what's happened this conversation)      │
-│  • Working memory (steps so far this invocation)                │
-│  • Current user query                                           │
-└─────────────────────────────────────────────────────────────────┘
-                              │
-                              ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                           REASON                                │
-│                                                                 │
-│  LLM generates:                                                 │
-│  • Thought (reasoning trace)                                    │
-│  • Decision: which tool + parameters, OR final answer           │
-└─────────────────────────────────────────────────────────────────┘
-                              │
-                              ▼
-                    ┌─────────────────────┐
-                    │   Final answer?     │
-                    └─────────────────────┘
-                         │           │
-                        YES          NO
-                         │           │
-                         ▼           ▼
-              ┌──────────────┐  ┌─────────────────────────────────┐
-              │   COMPLETE   │  │              ACT                │
-              │  (exit loop) │  │                                 │
-              └──────────────┘  │  Execute tool, get observation  │
-                                └─────────────────────────────────┘
-                                                 │
-                                                 ▼
-                                ┌─────────────────────────────────┐
-                                │      UPDATE WORKING MEMORY      │
-                                │                                 │
-                                │  Append: thought, action, result│
-                                │                                 │
-                                │  If working memory too large:   │
-                                │  → Compress/summarize older     │
-                                │    steps, keep recent raw       │
-                                └─────────────────────────────────┘
-                                                 │
-                                                 ▼
-                                          (loop back to
-                                          ASSEMBLE CONTEXT)
-                                          
-                                          
-## Invocation end
+## Goals

-1. EXTRACT & STORE
-   │
-   ├── Conversation memory
-   │   └── Summarize this invocation: "User asked X, agent did Y, result was Z"
-   │       Append to conversation memory
-   │
-   ├── Facts (optional)
-   │   └── Did the agent learn anything worth persisting?
-   │       If yes → StoreFact (could be automatic extraction or agent-driven)
-   │
-   └── Episode (if task-like)
-       └── RecordEpisode: goal, key steps, outcome, lessons
-           Tag with embeddings for future retrieval
+- **Drop-in Compatibility**: `react_mem` implements the same AgentService interface as `react`, allowing users to choose between implementations via configuration
+- **Memory Persistence**: Enable agents to retain and retrieve information across invocations and conversations
+- **Scalable Context Management**: Handle long-running conversations without unbounded memory growth
+- **Retrieval Efficiency**: Use embedding-based retrieval to surface relevant memory efficiently
+- **Incremental Adoption**: Deploy alongside existing `react` module without breaking changes
+- **Transparent Operation**: Memory operations should not significantly impact response latency
+- **Configurable Behavior**: Allow tuning of memory extraction, compression, and retrieval strategies

-2. DISCARD WORKING MEMORY
-   └── No longer needed (it's been summarized into conversation memory)
-   
-   
-## Conversation end
+## Background

-1. PROMOTE TO LONG-TERM
-   │
-   └── Scan conversation memory for facts worth persisting
-       → Store to knowledge graph
-   
-2. RECORD CONVERSATION-LEVEL EPISODE (optional)
-   │
-   └── If the conversation had an overarching goal/theme,
-       capture as a higher-level episode
+### Current Architecture

-## Refinements to consider
+The existing `react` agent manager (module: `trustgraph-flow/trustgraph/agent/react/`) implements a stateless ReAct (Reasoning and Acting) loop:

-Refinements to consider:
-Mid-invocation retrieval
-The initial retrieval might not be enough. Agent could have tools to pull in more:
+1. Receives `AgentRequest` with question and history
+2. Reasons about the problem using LLM
+3. Takes action by invoking tools
+4. Returns `AgentResponse` with thought, observation, or final answer
+5. Maintains working memory only for the current invocation via `history` field

-RetrieveFacts(query) — dig deeper if initial context insufficient
-RecallEpisodes(query) — "have I seen this specific problem before?"
+Current limitations include:

-Working memory compression trigger
+- **No Long-term Memory**: Each invocation starts fresh with no access to prior conversations
+- **Limited Context Window**: History is bounded by max iterations and token limits
+- **No Learning**: Agent cannot benefit from past problem-solving experiences
+- **Conversation Fragmentation**: No mechanism to connect related invocations across a conversation
+- **Context Loss**: Important information from early steps may be lost as history grows

-Option A: Every N steps, auto-compress
-Option B: When token count exceeds threshold
-Option C: Agent explicitly decides ("I should summarize before continuing")
+This specification addresses these gaps by introducing a four-tier memory architecture. By maintaining long-term facts, episodic memories, conversation summaries, and managed working memory, TrustGraph agents can:

-Fact/episode extraction
+- Build and leverage a persistent knowledge base
+- Learn from past successes and failures
+- Maintain coherent context across extended conversations
+- Handle complex multi-step reasoning without context overflow

-Option A: Agent explicitly calls StoreFact / RecordEpisode as part of its reasoning
-Option B: Post-invocation LLM pass: "Given this invocation, extract any facts/episodes worth storing"
+## Technical Design
+
+### Architecture
+
+The agentic memory system requires the following technical components:
+
+1. **ReactMemAgentManager**
+   - Extends AgentManager with memory-aware reasoning loop
+   - Orchestrates memory retrieval at invocation start
+   - Manages working memory compression during execution
+   - Triggers memory extraction at invocation end
+
+   Module: `trustgraph-flow/trustgraph/agent/react_mem/agent_manager.py`
+
+2. **ReactMemService**
+   - Implements AgentService interface (drop-in replacement for react.Processor)
+   - Manages conversation-level memory lifecycle
+   - Handles configuration of memory services and policies
+   - Coordinates memory promotion at conversation end
+
+   Module: `trustgraph-flow/trustgraph/agent/react_mem/service.py`
+
+3. **Memory Services** (Implementation Details: TBD)
+   - Long-term Facts Service
+   - Episodic Memory Service
+   - Conversation Memory Service
+   - Working Memory Management Service
+
+   Module: `trustgraph-flow/trustgraph/agent/react_mem/memory/` (TBD)
+
+4. **Memory Schema Definitions**
+   - Data structures for facts, episodes, conversation records
+   - Request/response schemas for memory operations
+
+   Module: `trustgraph-base/trustgraph/schema/agent_memory.py`
+
+### Memory Architecture
+
+#### Four-Tier Memory System
+
+**Long-term Facts**
+- **Purpose**: Persistent knowledge graph of facts learned across all conversations
+- **Scope**: Global per user/collection
+- **Retrieval**: Embedding-based semantic search on user query
+- **Lifetime**: Permanent until explicitly deleted
+- **Storage**: Graph database (existing TrustGraph knowledge store)
+
+**Episodic Memory**
+- **Purpose**: Records of past problem-solving episodes with outcomes
+- **Scope**: Global per user/collection
+- **Retrieval**: Embedding-based similarity search on current goal/context
+- **Lifetime**: Permanent with optional TTL or relevance-based pruning
+- **Storage**: Vector store with structured metadata
+
+**Conversation Memory**
+- **Purpose**: Summaries and key points from prior invocations in current conversation
+- **Scope**: Current conversation only
+- **Retrieval**: Sequential access (not searched)
+- **Lifetime**: Duration of conversation
+- **Storage**: In-memory with optional persistence for long conversations
+
+**Working Memory**
+- **Purpose**: Reasoning trace for current invocation (thoughts, actions, observations)
+- **Scope**: Current invocation only
+- **Retrieval**: Directly included in LLM context
+- **Lifetime**: Current invocation (discarded after summarization)
+- **Storage**: In-memory list
+
+### Data Models
+
+#### Memory Records
+
+**Fact Record**
+```
+Fact:
+  - id: string (unique identifier)
+  - content: string (fact statement)
+  - source: string (conversation_id where learned)
+  - timestamp: datetime
+  - embedding: vector
+  - metadata: dict
+```
+
+**Episode Record**
+```
+Episode:
+  - id: string (unique identifier)
+  - goal: string (what was being attempted)
+  - steps: list[string] (key actions taken)
+  - outcome: string (result achieved)
+  - lessons: string (insights/learnings)
+  - timestamp: datetime
+  - embedding: vector
+  - metadata: dict
+```
+
+**Conversation Record**
+```
+ConversationMemory:
+  - conversation_id: string
+  - invocations: list[InvocationSummary]
+  - metadata: dict
+
+InvocationSummary:
+  - query: string
+  - summary: string (what agent did)
+  - result: string (outcome)
+  - timestamp: datetime
+```
+
+**Working Memory Item**
+```
+WorkingMemoryItem:
+  - type: enum[thought, action, observation]
+  - content: string
+  - timestamp: datetime
+  - metadata: dict (e.g., token_count)
+```
+
+### Memory Operations Lifecycle
+
+#### Invocation Start: Context Assembly
+
+When `ReactMemService` receives an `AgentRequest`:
+
+1. **Retrieve Long-term Facts**
+   - Embed user query
+   - Query facts store with embedding
+   - Retrieve top-k relevant facts (configurable k)
+
+2. **Retrieve Episodic Memory**
+   - Embed user query + current state
+   - Query episodes store with embedding
+   - Retrieve top-k similar past episodes
+
+3. **Load Conversation Memory**
+   - Fetch conversation record by conversation_id
+   - Load all prior invocation summaries for this conversation
+
+4. **Initialize Working Memory**
+   - Create empty working memory buffer
+   - Optionally seed with high-level goal
+
+These retrieved memories are assembled into the context passed to `ReactMemAgentManager.reason()`.
+
+#### Core Loop: Memory-Aware Reasoning
+
+The reasoning loop proceeds similarly to standard ReAct, but with augmented context:
+
+1. **Assemble Context**
+   - System prompt / agent persona
+   - Long-term facts (from retrieval)
+   - Relevant episodes (from retrieval)
+   - Conversation memory (loaded summaries)
+   - Working memory (current invocation trace)
+   - Current user query
+
+2. **Reason**
+   - LLM generates thought and decision
+   - Returns Action (tool call) or Final (answer)
+
+3. **Act** (if Action)
+   - Execute tool
+   - Get observation result
+
+4. **Update Working Memory**
+   - Append thought, action, observation to working memory
+   - Check working memory size
+   - If exceeds threshold: compress older entries, preserve recent ones
+
+5. **Loop** until Final answer or max iterations
+
+**Working Memory Compression Trigger**: When token count of working memory exceeds threshold (e.g., 50% of context window), invoke summarization:
+- Keep most recent N steps verbatim
+- Summarize older steps into condensed form
+- Preserve critical information (tool results, key decisions)
+
+#### Invocation End: Memory Extraction
+
+After sending final response:
+
+1. **Conversation Memory Update**
+   - Generate invocation summary: "User asked X, agent did Y, result was Z"
+   - Append summary to conversation memory
+
+2. **Fact Extraction** (Optional/Conditional)
+   - LLM pass: "Did the agent learn anything worth persisting?"
+   - If yes: Extract fact statements, store to long-term facts
+   - Alternative: Agent explicitly calls StoreFact tool during reasoning
+
+3. **Episode Recording** (If Task-Like)
+   - If invocation resembled a task (multi-step problem solving):
+     - Extract: goal, key steps, outcome, lessons learned
+     - Generate embedding for retrieval
+     - Store to episodic memory
+
+4. **Discard Working Memory**
+   - Working memory cleared (already summarized)
+
+#### Conversation End: Memory Promotion
+
+When conversation concludes (explicit signal or timeout):
+
+1. **Promote Facts to Long-term**
+   - Scan conversation memory for high-value facts
+   - Store to persistent knowledge graph
+   - Update graph embeddings for retrieval
+
+2. **Record Conversation Episode** (Optional)
+   - If conversation had overarching theme/goal:
+     - Summarize entire conversation as high-level episode
+     - Store to episodic memory
+
+### APIs
+
+#### New Memory Service APIs
+
+**Fact Retrieval**
+```
+RetrieveFactsRequest:
+  - query: string (search query)
+  - embedding: vector (pre-computed)
+  - top_k: int (default: 5)
+  - user: string
+  - collection: string
+
+RetrieveFactsResponse:
+  - facts: list[Fact]
+  - error: Error
+```
+
+**Episode Retrieval**
+```
+RetrieveEpisodesRequest:
+  - query: string
+  - embedding: vector
+  - top_k: int (default: 3)
+  - user: string
+  - collection: string
+
+RetrieveEpisodesResponse:
+  - episodes: list[Episode]
+  - error: Error
+```
+
+**Fact Storage**
+```
+StoreFactRequest:
+  - content: string
+  - source: string (conversation_id)
+  - user: string
+  - collection: string
+  - metadata: dict
+
+StoreFactResponse:
+  - fact_id: string
+  - error: Error
+```
+
+**Episode Storage**
+```
+StoreEpisodeRequest:
+  - goal: string
+  - steps: list[string]
+  - outcome: string
+  - lessons: string
+  - user: string
+  - collection: string
+  - metadata: dict
+
+StoreEpisodeResponse:
+  - episode_id: string
+  - error: Error
+```
+
+**Conversation Memory Management**
+```
+GetConversationMemoryRequest:
+  - conversation_id: string
+  - user: string
+
+GetConversationMemoryResponse:
+  - conversation: ConversationMemory
+  - error: Error
+
+UpdateConversationMemoryRequest:
+  - conversation_id: string
+  - invocation_summary: InvocationSummary
+  - user: string
+
+UpdateConversationMemoryResponse:
+  - success: boolean
+  - error: Error
+```
+
+#### Modified Agent APIs
+
+**AgentRequest** (Extended)
+```
+AgentRequest:
+  # Existing fields
+  - question: string
+  - history: list[AgentStep]
+  - user: string
+  - streaming: boolean
+  - state: string
+  - group: list[string]
+
+  # New fields for memory
+  - conversation_id: string (identifies conversation for memory retrieval)
+  - enable_memory: boolean (default: false, set true for react_mem)
+```
+
+**AgentResponse** (No changes required)
+- Existing schema supports memory-enabled agents
+- Memory operations are transparent to client
+
+### Implementation Details
+
+#### Service Configuration
+
+The `react_mem` service will be configured similarly to `react`, with additional memory-related parameters:
+
+Configuration key: `agent-mem` (distinct from `agent` used by react)
+
+Configuration parameters:
+- `max-iterations`: Maximum ReAct loop iterations
+- `working-memory-threshold`: Token count trigger for compression (default: 2000)
+- `fact-retrieval-top-k`: Number of facts to retrieve (default: 5)
+- `episode-retrieval-top-k`: Number of episodes to retrieve (default: 3)
+- `enable-auto-fact-extraction`: Auto-extract facts at invocation end (default: true)
+- `enable-episode-recording`: Auto-record episodes (default: true)
+- `additional-context`: Additional system context (same as react)
+
+#### Memory Service Specifications
+
+**Memory service implementation details are TBD and will be defined in a separate technical specification.**
+
+The memory services must implement the following interfaces:
+
+1. **Long-term Facts Service**
+   - Interface: Fact storage, retrieval, deletion
+   - Storage backend: TBD (likely existing graph store + vector index)
+   - Embedding model: TBD (consistency with existing embeddings)
+
+2. **Episodic Memory Service**
+   - Interface: Episode storage, retrieval, search
+   - Storage backend: TBD (vector store + structured metadata store)
+   - Retention policy: TBD
+
+3. **Conversation Memory Service**
+   - Interface: CRUD operations on conversation records
+   - Storage backend: TBD (in-memory with optional persistence)
+   - TTL/cleanup policy: TBD
+
+4. **Working Memory Manager**
+   - Interface: Append, compress, retrieve working memory
+   - Compression strategy: TBD (summarization vs. truncation)
+   - Implementation: In-memory only
+
+These services will be registered as client specifications similar to existing services (GraphRagClientSpec, ToolClientSpec, etc.).
+
+#### Prompt Engineering
+
+The memory-aware prompts will need to incorporate retrieved context effectively:
+
+**Prompt structure** (conceptual):
+```
+System: You are an agent with access to persistent memory...
+
+Long-term Facts:
+- [Retrieved fact 1]
+- [Retrieved fact 2]
+...
+
+Relevant Past Episodes:
+- Episode 1: [summary]
+- Episode 2: [summary]
+...
+
+Conversation History:
+- Previous invocation 1: [summary]
+- Previous invocation 2: [summary]
+...
+
+Current Task:
+Working Memory:
+- [Current steps taken]
+
+Question: [User query]
+
+Think step by step and decide on next action or provide final answer.
+```
+
+Prompt templates will be defined in configuration, similar to existing agent prompts.
+
+#### Tool Extensions (Optional Refinement)
+
+To support mid-invocation retrieval (optional future enhancement):
+
+**RetrieveFacts Tool**
+- Description: "Retrieve additional facts from knowledge base"
+- Arguments: query (string)
+- Implementation: Calls fact retrieval service
+
+**RecallEpisodes Tool**
+- Description: "Search for similar past problem-solving episodes"
+- Arguments: query (string)
+- Implementation: Calls episode retrieval service
+
+**StoreFact Tool**
+- Description: "Explicitly store a fact for future reference"
+- Arguments: fact_content (string)
+- Implementation: Calls fact storage service
+
+These tools would be optional and configured per deployment.
+
+### Client Specifications
+
+The react_mem service will register the following client specifications:
+
+**Existing (inherited from react):**
+- TextCompletionClientSpec
+- GraphRagClientSpec
+- PromptClientSpec
+- ToolClientSpec
+- StructuredQueryClientSpec
+
+**New (memory-specific):**
+- FactRetrievalClientSpec
+- EpisodeRetrievalClientSpec
+- FactStorageClientSpec
+- EpisodeStorageClientSpec
+- ConversationMemoryClientSpec
+
+## Security Considerations
+
+**Memory Isolation**
+- All memory operations must be scoped to user and collection
+- Cross-user memory leakage must be prevented
+- Fact and episode retrieval must respect access controls
+
+**PII and Sensitive Information**
+- Fact extraction should avoid storing sensitive personal information
+- Conversation memory may contain PII and must be handled accordingly
+- Memory retention policies should align with privacy requirements
+- Option to disable memory or purge memory for specific users/conversations
+
+**Embedding Security**
+- Embeddings may leak information through similarity searches
+- Consider privacy-preserving embedding techniques for sensitive deployments
+
+**Resource Limits**
+- Prevent unbounded memory growth through:
+  - Per-user/collection memory quotas
+  - Time-based expiration of old memories
+  - Relevance-based pruning of unused memories
+
+## Performance Considerations
+
+**Retrieval Latency**
+- Embedding generation adds latency at invocation start
+- Vector searches for facts and episodes may add 50-200ms per retrieval
+- Mitigation: Parallel retrieval of facts and episodes, caching embeddings
+
+**Context Window Usage**
+- Retrieved memories consume context window
+- Must balance memory breadth vs. depth
+- Working memory compression necessary to avoid overflow
+
+**Storage Overhead**
+- Each conversation generates conversation memory records
+- Each invocation may create fact and episode records
+- Mitigation: Background cleanup jobs, retention policies
+
+**Scaling**
+- Memory stores must scale with user base
+- Vector search performance critical for large fact/episode databases
+- Consider sharding strategies for multi-tenant deployments
+
+## Testing Strategy
+
+**Unit Tests**
+- Memory service interfaces (mocked backends)
+- Working memory compression logic
+- Prompt assembly with memory context
+- Fact/episode extraction logic
+
+**Integration Tests**
+- End-to-end agent invocation with memory enabled
+- Memory retrieval and storage flows
+- Conversation continuity across invocations
+- Working memory compression triggers
+
+**Performance Tests**
+- Latency impact of memory retrieval
+- Context window utilization with varying memory sizes
+- Large-scale memory retrieval (1000s of facts/episodes)
+
+**Correctness Tests**
+- Verify facts are correctly stored and retrieved
+- Verify episodes improve problem-solving on similar tasks
+- Verify conversation memory maintains coherence
+
+## Migration Plan
+
+**Phase 1: Module Creation**
+- Create `react_mem` directory structure
+- Implement ReactMemService and ReactMemAgentManager shells
+- No memory operations yet (functionally equivalent to react)
+- Deploy and verify drop-in compatibility
+
+**Phase 2: Memory Services**
+- Implement memory service interfaces and backends
+- Add memory retrieval at invocation start (read-only)
+- Deploy and verify retrieval performance
+
+**Phase 3: Memory Writing**
+- Enable conversation memory updates
+- Enable fact and episode extraction
+- Deploy with memory writing enabled
+
+**Phase 4: Conversation Lifecycle**
+- Implement conversation-end memory promotion
+- Add cleanup and retention policies
+- Full agentic memory capabilities enabled
+
+**Rollout Strategy**
+- Deploy react_mem alongside react (not as replacement)
+- Users opt-in by changing configuration to use react_mem
+- Monitor performance and memory growth
+- Gradually migrate users based on feedback
+
+## Timeline
+
+- **Phase 1 (Module Creation)**: 1-2 weeks
+- **Phase 2 (Memory Services)**: 3-4 weeks (dependent on memory service spec)
+- **Phase 3 (Memory Writing)**: 2-3 weeks
+- **Phase 4 (Conversation Lifecycle)**: 1-2 weeks
+- **Total**: 7-11 weeks
+
+## Open Questions
+
+- **Embedding Model**: Should we use the same embedding model as existing graph RAG, or introduce a dedicated memory embedding model?
+- **Fact Extraction Trigger**: Auto-extract via LLM pass, or require explicit StoreFact tool usage?
+- **Conversation End Detection**: How do we reliably detect conversation end (timeout, explicit signal, heuristic)?
+- **Memory Cleanup**: What retention policies for facts, episodes, and conversation memory?
+- **Cross-conversation Learning**: Should episodic memory span conversations, or be scoped per conversation?
+- **Working Memory Compression**: Summarization (LLM-based) or truncation (rule-based)?
+- **Memory Search UX**: Should memory retrieval be visible to users, or completely transparent?
+- **Backward Compatibility**: Should react service gain `conversation_id` field for potential future memory support?
+
+## References
+
+- Existing react agent implementation: `trustgraph-flow/trustgraph/agent/react/`
+- AgentService base class: `trustgraph-base/trustgraph/base/agent_service.py`
+- Agent schemas: `trustgraph-base/trustgraph/schema/services/agent.py`
+- Similar memory-augmented agent architectures: Reflexion, MemGPT, ChatGPT memory features