diff --git a/docs/tech-specs/agentic-memory.md b/docs/tech-specs/agentic-memory.md index 12edff19..3297caff 100644 --- a/docs/tech-specs/agentic-memory.md +++ b/docs/tech-specs/agentic-memory.md @@ -1,127 +1,609 @@ +# Agentic Memory Technical Specification -This is some sketches about how to add an agentic manager with memory. +## Overview -The existing 'react' manager is going to stay. The new one will be called -'react_mem'. The 'react_mem' module is to going to be a drop-in replacement -for 'react' so that the end user can decide which to deploy. +This specification describes the implementation of an agent manager with multi-layered memory capabilities for TrustGraph. The new `react_mem` module extends the existing ReAct pattern with persistent memory across invocations and conversations, enabling agents to learn from past interactions and maintain context over time. -## Interaction start +The implementation supports the following use cases: -1. RETRIEVE CONTEXT - │ - ├── Long-term facts - │ └── Embed user query → retrieve relevant subgraph - │ - ├── Episodic memory - │ └── Embed user query → find similar past episodes - │ - └── Conversation memory - └── Pull summary/key points from prior invocations in this conversation - -2. INITIALIZE WORKING MEMORY - └── Empty (or seed with "Goal: ") - - -## Core loop +1. **Long-term Knowledge Retention**: Store and retrieve factual information learned during agent interactions +2. **Episodic Memory**: Remember past problem-solving approaches and their outcomes +3. **Conversation Continuity**: Maintain context across multiple invocations within a conversation +4. **Working Memory Management**: Handle extended reasoning chains with automatic compression +5. **Experience-Based Reasoning**: Leverage similar past experiences when addressing new queries -┌─────────────────────────────────────────────────────────────────┐ -│ ASSEMBLE CONTEXT │ -│ │ -│ • System prompt / agent persona │ -│ • Long-term facts (retrieved at invocation start) │ -│ • Relevant episodes (retrieved at invocation start) │ -│ • Conversation memory (what's happened this conversation) │ -│ • Working memory (steps so far this invocation) │ -│ • Current user query │ -└─────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ REASON │ -│ │ -│ LLM generates: │ -│ • Thought (reasoning trace) │ -│ • Decision: which tool + parameters, OR final answer │ -└─────────────────────────────────────────────────────────────────┘ - │ - ▼ - ┌─────────────────────┐ - │ Final answer? │ - └─────────────────────┘ - │ │ - YES NO - │ │ - ▼ ▼ - ┌──────────────┐ ┌─────────────────────────────────┐ - │ COMPLETE │ │ ACT │ - │ (exit loop) │ │ │ - └──────────────┘ │ Execute tool, get observation │ - └─────────────────────────────────┘ - │ - ▼ - ┌─────────────────────────────────┐ - │ UPDATE WORKING MEMORY │ - │ │ - │ Append: thought, action, result│ - │ │ - │ If working memory too large: │ - │ → Compress/summarize older │ - │ steps, keep recent raw │ - └─────────────────────────────────┘ - │ - ▼ - (loop back to - ASSEMBLE CONTEXT) - - -## Invocation end +## Goals -1. EXTRACT & STORE - │ - ├── Conversation memory - │ └── Summarize this invocation: "User asked X, agent did Y, result was Z" - │ Append to conversation memory - │ - ├── Facts (optional) - │ └── Did the agent learn anything worth persisting? - │ If yes → StoreFact (could be automatic extraction or agent-driven) - │ - └── Episode (if task-like) - └── RecordEpisode: goal, key steps, outcome, lessons - Tag with embeddings for future retrieval +- **Drop-in Compatibility**: `react_mem` implements the same AgentService interface as `react`, allowing users to choose between implementations via configuration +- **Memory Persistence**: Enable agents to retain and retrieve information across invocations and conversations +- **Scalable Context Management**: Handle long-running conversations without unbounded memory growth +- **Retrieval Efficiency**: Use embedding-based retrieval to surface relevant memory efficiently +- **Incremental Adoption**: Deploy alongside existing `react` module without breaking changes +- **Transparent Operation**: Memory operations should not significantly impact response latency +- **Configurable Behavior**: Allow tuning of memory extraction, compression, and retrieval strategies -2. DISCARD WORKING MEMORY - └── No longer needed (it's been summarized into conversation memory) - - -## Conversation end +## Background -1. PROMOTE TO LONG-TERM - │ - └── Scan conversation memory for facts worth persisting - → Store to knowledge graph - -2. RECORD CONVERSATION-LEVEL EPISODE (optional) - │ - └── If the conversation had an overarching goal/theme, - capture as a higher-level episode +### Current Architecture -## Refinements to consider +The existing `react` agent manager (module: `trustgraph-flow/trustgraph/agent/react/`) implements a stateless ReAct (Reasoning and Acting) loop: -Refinements to consider: -Mid-invocation retrieval -The initial retrieval might not be enough. Agent could have tools to pull in more: +1. Receives `AgentRequest` with question and history +2. Reasons about the problem using LLM +3. Takes action by invoking tools +4. Returns `AgentResponse` with thought, observation, or final answer +5. Maintains working memory only for the current invocation via `history` field -RetrieveFacts(query) — dig deeper if initial context insufficient -RecallEpisodes(query) — "have I seen this specific problem before?" +Current limitations include: -Working memory compression trigger +- **No Long-term Memory**: Each invocation starts fresh with no access to prior conversations +- **Limited Context Window**: History is bounded by max iterations and token limits +- **No Learning**: Agent cannot benefit from past problem-solving experiences +- **Conversation Fragmentation**: No mechanism to connect related invocations across a conversation +- **Context Loss**: Important information from early steps may be lost as history grows -Option A: Every N steps, auto-compress -Option B: When token count exceeds threshold -Option C: Agent explicitly decides ("I should summarize before continuing") +This specification addresses these gaps by introducing a four-tier memory architecture. By maintaining long-term facts, episodic memories, conversation summaries, and managed working memory, TrustGraph agents can: -Fact/episode extraction +- Build and leverage a persistent knowledge base +- Learn from past successes and failures +- Maintain coherent context across extended conversations +- Handle complex multi-step reasoning without context overflow -Option A: Agent explicitly calls StoreFact / RecordEpisode as part of its reasoning -Option B: Post-invocation LLM pass: "Given this invocation, extract any facts/episodes worth storing" +## Technical Design + +### Architecture + +The agentic memory system requires the following technical components: + +1. **ReactMemAgentManager** + - Extends AgentManager with memory-aware reasoning loop + - Orchestrates memory retrieval at invocation start + - Manages working memory compression during execution + - Triggers memory extraction at invocation end + + Module: `trustgraph-flow/trustgraph/agent/react_mem/agent_manager.py` + +2. **ReactMemService** + - Implements AgentService interface (drop-in replacement for react.Processor) + - Manages conversation-level memory lifecycle + - Handles configuration of memory services and policies + - Coordinates memory promotion at conversation end + + Module: `trustgraph-flow/trustgraph/agent/react_mem/service.py` + +3. **Memory Services** (Implementation Details: TBD) + - Long-term Facts Service + - Episodic Memory Service + - Conversation Memory Service + - Working Memory Management Service + + Module: `trustgraph-flow/trustgraph/agent/react_mem/memory/` (TBD) + +4. **Memory Schema Definitions** + - Data structures for facts, episodes, conversation records + - Request/response schemas for memory operations + + Module: `trustgraph-base/trustgraph/schema/agent_memory.py` + +### Memory Architecture + +#### Four-Tier Memory System + +**Long-term Facts** +- **Purpose**: Persistent knowledge graph of facts learned across all conversations +- **Scope**: Global per user/collection +- **Retrieval**: Embedding-based semantic search on user query +- **Lifetime**: Permanent until explicitly deleted +- **Storage**: Graph database (existing TrustGraph knowledge store) + +**Episodic Memory** +- **Purpose**: Records of past problem-solving episodes with outcomes +- **Scope**: Global per user/collection +- **Retrieval**: Embedding-based similarity search on current goal/context +- **Lifetime**: Permanent with optional TTL or relevance-based pruning +- **Storage**: Vector store with structured metadata + +**Conversation Memory** +- **Purpose**: Summaries and key points from prior invocations in current conversation +- **Scope**: Current conversation only +- **Retrieval**: Sequential access (not searched) +- **Lifetime**: Duration of conversation +- **Storage**: In-memory with optional persistence for long conversations + +**Working Memory** +- **Purpose**: Reasoning trace for current invocation (thoughts, actions, observations) +- **Scope**: Current invocation only +- **Retrieval**: Directly included in LLM context +- **Lifetime**: Current invocation (discarded after summarization) +- **Storage**: In-memory list + +### Data Models + +#### Memory Records + +**Fact Record** +``` +Fact: + - id: string (unique identifier) + - content: string (fact statement) + - source: string (conversation_id where learned) + - timestamp: datetime + - embedding: vector + - metadata: dict +``` + +**Episode Record** +``` +Episode: + - id: string (unique identifier) + - goal: string (what was being attempted) + - steps: list[string] (key actions taken) + - outcome: string (result achieved) + - lessons: string (insights/learnings) + - timestamp: datetime + - embedding: vector + - metadata: dict +``` + +**Conversation Record** +``` +ConversationMemory: + - conversation_id: string + - invocations: list[InvocationSummary] + - metadata: dict + +InvocationSummary: + - query: string + - summary: string (what agent did) + - result: string (outcome) + - timestamp: datetime +``` + +**Working Memory Item** +``` +WorkingMemoryItem: + - type: enum[thought, action, observation] + - content: string + - timestamp: datetime + - metadata: dict (e.g., token_count) +``` + +### Memory Operations Lifecycle + +#### Invocation Start: Context Assembly + +When `ReactMemService` receives an `AgentRequest`: + +1. **Retrieve Long-term Facts** + - Embed user query + - Query facts store with embedding + - Retrieve top-k relevant facts (configurable k) + +2. **Retrieve Episodic Memory** + - Embed user query + current state + - Query episodes store with embedding + - Retrieve top-k similar past episodes + +3. **Load Conversation Memory** + - Fetch conversation record by conversation_id + - Load all prior invocation summaries for this conversation + +4. **Initialize Working Memory** + - Create empty working memory buffer + - Optionally seed with high-level goal + +These retrieved memories are assembled into the context passed to `ReactMemAgentManager.reason()`. + +#### Core Loop: Memory-Aware Reasoning + +The reasoning loop proceeds similarly to standard ReAct, but with augmented context: + +1. **Assemble Context** + - System prompt / agent persona + - Long-term facts (from retrieval) + - Relevant episodes (from retrieval) + - Conversation memory (loaded summaries) + - Working memory (current invocation trace) + - Current user query + +2. **Reason** + - LLM generates thought and decision + - Returns Action (tool call) or Final (answer) + +3. **Act** (if Action) + - Execute tool + - Get observation result + +4. **Update Working Memory** + - Append thought, action, observation to working memory + - Check working memory size + - If exceeds threshold: compress older entries, preserve recent ones + +5. **Loop** until Final answer or max iterations + +**Working Memory Compression Trigger**: When token count of working memory exceeds threshold (e.g., 50% of context window), invoke summarization: +- Keep most recent N steps verbatim +- Summarize older steps into condensed form +- Preserve critical information (tool results, key decisions) + +#### Invocation End: Memory Extraction + +After sending final response: + +1. **Conversation Memory Update** + - Generate invocation summary: "User asked X, agent did Y, result was Z" + - Append summary to conversation memory + +2. **Fact Extraction** (Optional/Conditional) + - LLM pass: "Did the agent learn anything worth persisting?" + - If yes: Extract fact statements, store to long-term facts + - Alternative: Agent explicitly calls StoreFact tool during reasoning + +3. **Episode Recording** (If Task-Like) + - If invocation resembled a task (multi-step problem solving): + - Extract: goal, key steps, outcome, lessons learned + - Generate embedding for retrieval + - Store to episodic memory + +4. **Discard Working Memory** + - Working memory cleared (already summarized) + +#### Conversation End: Memory Promotion + +When conversation concludes (explicit signal or timeout): + +1. **Promote Facts to Long-term** + - Scan conversation memory for high-value facts + - Store to persistent knowledge graph + - Update graph embeddings for retrieval + +2. **Record Conversation Episode** (Optional) + - If conversation had overarching theme/goal: + - Summarize entire conversation as high-level episode + - Store to episodic memory + +### APIs + +#### New Memory Service APIs + +**Fact Retrieval** +``` +RetrieveFactsRequest: + - query: string (search query) + - embedding: vector (pre-computed) + - top_k: int (default: 5) + - user: string + - collection: string + +RetrieveFactsResponse: + - facts: list[Fact] + - error: Error +``` + +**Episode Retrieval** +``` +RetrieveEpisodesRequest: + - query: string + - embedding: vector + - top_k: int (default: 3) + - user: string + - collection: string + +RetrieveEpisodesResponse: + - episodes: list[Episode] + - error: Error +``` + +**Fact Storage** +``` +StoreFactRequest: + - content: string + - source: string (conversation_id) + - user: string + - collection: string + - metadata: dict + +StoreFactResponse: + - fact_id: string + - error: Error +``` + +**Episode Storage** +``` +StoreEpisodeRequest: + - goal: string + - steps: list[string] + - outcome: string + - lessons: string + - user: string + - collection: string + - metadata: dict + +StoreEpisodeResponse: + - episode_id: string + - error: Error +``` + +**Conversation Memory Management** +``` +GetConversationMemoryRequest: + - conversation_id: string + - user: string + +GetConversationMemoryResponse: + - conversation: ConversationMemory + - error: Error + +UpdateConversationMemoryRequest: + - conversation_id: string + - invocation_summary: InvocationSummary + - user: string + +UpdateConversationMemoryResponse: + - success: boolean + - error: Error +``` + +#### Modified Agent APIs + +**AgentRequest** (Extended) +``` +AgentRequest: + # Existing fields + - question: string + - history: list[AgentStep] + - user: string + - streaming: boolean + - state: string + - group: list[string] + + # New fields for memory + - conversation_id: string (identifies conversation for memory retrieval) + - enable_memory: boolean (default: false, set true for react_mem) +``` + +**AgentResponse** (No changes required) +- Existing schema supports memory-enabled agents +- Memory operations are transparent to client + +### Implementation Details + +#### Service Configuration + +The `react_mem` service will be configured similarly to `react`, with additional memory-related parameters: + +Configuration key: `agent-mem` (distinct from `agent` used by react) + +Configuration parameters: +- `max-iterations`: Maximum ReAct loop iterations +- `working-memory-threshold`: Token count trigger for compression (default: 2000) +- `fact-retrieval-top-k`: Number of facts to retrieve (default: 5) +- `episode-retrieval-top-k`: Number of episodes to retrieve (default: 3) +- `enable-auto-fact-extraction`: Auto-extract facts at invocation end (default: true) +- `enable-episode-recording`: Auto-record episodes (default: true) +- `additional-context`: Additional system context (same as react) + +#### Memory Service Specifications + +**Memory service implementation details are TBD and will be defined in a separate technical specification.** + +The memory services must implement the following interfaces: + +1. **Long-term Facts Service** + - Interface: Fact storage, retrieval, deletion + - Storage backend: TBD (likely existing graph store + vector index) + - Embedding model: TBD (consistency with existing embeddings) + +2. **Episodic Memory Service** + - Interface: Episode storage, retrieval, search + - Storage backend: TBD (vector store + structured metadata store) + - Retention policy: TBD + +3. **Conversation Memory Service** + - Interface: CRUD operations on conversation records + - Storage backend: TBD (in-memory with optional persistence) + - TTL/cleanup policy: TBD + +4. **Working Memory Manager** + - Interface: Append, compress, retrieve working memory + - Compression strategy: TBD (summarization vs. truncation) + - Implementation: In-memory only + +These services will be registered as client specifications similar to existing services (GraphRagClientSpec, ToolClientSpec, etc.). + +#### Prompt Engineering + +The memory-aware prompts will need to incorporate retrieved context effectively: + +**Prompt structure** (conceptual): +``` +System: You are an agent with access to persistent memory... + +Long-term Facts: +- [Retrieved fact 1] +- [Retrieved fact 2] +... + +Relevant Past Episodes: +- Episode 1: [summary] +- Episode 2: [summary] +... + +Conversation History: +- Previous invocation 1: [summary] +- Previous invocation 2: [summary] +... + +Current Task: +Working Memory: +- [Current steps taken] + +Question: [User query] + +Think step by step and decide on next action or provide final answer. +``` + +Prompt templates will be defined in configuration, similar to existing agent prompts. + +#### Tool Extensions (Optional Refinement) + +To support mid-invocation retrieval (optional future enhancement): + +**RetrieveFacts Tool** +- Description: "Retrieve additional facts from knowledge base" +- Arguments: query (string) +- Implementation: Calls fact retrieval service + +**RecallEpisodes Tool** +- Description: "Search for similar past problem-solving episodes" +- Arguments: query (string) +- Implementation: Calls episode retrieval service + +**StoreFact Tool** +- Description: "Explicitly store a fact for future reference" +- Arguments: fact_content (string) +- Implementation: Calls fact storage service + +These tools would be optional and configured per deployment. + +### Client Specifications + +The react_mem service will register the following client specifications: + +**Existing (inherited from react):** +- TextCompletionClientSpec +- GraphRagClientSpec +- PromptClientSpec +- ToolClientSpec +- StructuredQueryClientSpec + +**New (memory-specific):** +- FactRetrievalClientSpec +- EpisodeRetrievalClientSpec +- FactStorageClientSpec +- EpisodeStorageClientSpec +- ConversationMemoryClientSpec + +## Security Considerations + +**Memory Isolation** +- All memory operations must be scoped to user and collection +- Cross-user memory leakage must be prevented +- Fact and episode retrieval must respect access controls + +**PII and Sensitive Information** +- Fact extraction should avoid storing sensitive personal information +- Conversation memory may contain PII and must be handled accordingly +- Memory retention policies should align with privacy requirements +- Option to disable memory or purge memory for specific users/conversations + +**Embedding Security** +- Embeddings may leak information through similarity searches +- Consider privacy-preserving embedding techniques for sensitive deployments + +**Resource Limits** +- Prevent unbounded memory growth through: + - Per-user/collection memory quotas + - Time-based expiration of old memories + - Relevance-based pruning of unused memories + +## Performance Considerations + +**Retrieval Latency** +- Embedding generation adds latency at invocation start +- Vector searches for facts and episodes may add 50-200ms per retrieval +- Mitigation: Parallel retrieval of facts and episodes, caching embeddings + +**Context Window Usage** +- Retrieved memories consume context window +- Must balance memory breadth vs. depth +- Working memory compression necessary to avoid overflow + +**Storage Overhead** +- Each conversation generates conversation memory records +- Each invocation may create fact and episode records +- Mitigation: Background cleanup jobs, retention policies + +**Scaling** +- Memory stores must scale with user base +- Vector search performance critical for large fact/episode databases +- Consider sharding strategies for multi-tenant deployments + +## Testing Strategy + +**Unit Tests** +- Memory service interfaces (mocked backends) +- Working memory compression logic +- Prompt assembly with memory context +- Fact/episode extraction logic + +**Integration Tests** +- End-to-end agent invocation with memory enabled +- Memory retrieval and storage flows +- Conversation continuity across invocations +- Working memory compression triggers + +**Performance Tests** +- Latency impact of memory retrieval +- Context window utilization with varying memory sizes +- Large-scale memory retrieval (1000s of facts/episodes) + +**Correctness Tests** +- Verify facts are correctly stored and retrieved +- Verify episodes improve problem-solving on similar tasks +- Verify conversation memory maintains coherence + +## Migration Plan + +**Phase 1: Module Creation** +- Create `react_mem` directory structure +- Implement ReactMemService and ReactMemAgentManager shells +- No memory operations yet (functionally equivalent to react) +- Deploy and verify drop-in compatibility + +**Phase 2: Memory Services** +- Implement memory service interfaces and backends +- Add memory retrieval at invocation start (read-only) +- Deploy and verify retrieval performance + +**Phase 3: Memory Writing** +- Enable conversation memory updates +- Enable fact and episode extraction +- Deploy with memory writing enabled + +**Phase 4: Conversation Lifecycle** +- Implement conversation-end memory promotion +- Add cleanup and retention policies +- Full agentic memory capabilities enabled + +**Rollout Strategy** +- Deploy react_mem alongside react (not as replacement) +- Users opt-in by changing configuration to use react_mem +- Monitor performance and memory growth +- Gradually migrate users based on feedback + +## Timeline + +- **Phase 1 (Module Creation)**: 1-2 weeks +- **Phase 2 (Memory Services)**: 3-4 weeks (dependent on memory service spec) +- **Phase 3 (Memory Writing)**: 2-3 weeks +- **Phase 4 (Conversation Lifecycle)**: 1-2 weeks +- **Total**: 7-11 weeks + +## Open Questions + +- **Embedding Model**: Should we use the same embedding model as existing graph RAG, or introduce a dedicated memory embedding model? +- **Fact Extraction Trigger**: Auto-extract via LLM pass, or require explicit StoreFact tool usage? +- **Conversation End Detection**: How do we reliably detect conversation end (timeout, explicit signal, heuristic)? +- **Memory Cleanup**: What retention policies for facts, episodes, and conversation memory? +- **Cross-conversation Learning**: Should episodic memory span conversations, or be scoped per conversation? +- **Working Memory Compression**: Summarization (LLM-based) or truncation (rule-based)? +- **Memory Search UX**: Should memory retrieval be visible to users, or completely transparent? +- **Backward Compatibility**: Should react service gain `conversation_id` field for potential future memory support? + +## References + +- Existing react agent implementation: `trustgraph-flow/trustgraph/agent/react/` +- AgentService base class: `trustgraph-base/trustgraph/base/agent_service.py` +- Agent schemas: `trustgraph-base/trustgraph/schema/services/agent.py` +- Similar memory-augmented agent architectures: Reflexion, MemGPT, ChatGPT memory features