20 KiB
Agentic Memory Technical Specification
Overview
This specification describes the implementation of an agent manager with multi-layered memory capabilities for TrustGraph. The new react_mem module extends the existing ReAct pattern with persistent memory across invocations and conversations, enabling agents to learn from past interactions and maintain context over time.
The implementation supports the following use cases:
- Long-term Knowledge Retention: Store and retrieve factual information learned during agent interactions
- Episodic Memory: Remember past problem-solving approaches and their outcomes
- Conversation Continuity: Maintain context across multiple invocations within a conversation
- Working Memory Management: Handle extended reasoning chains with automatic compression
- Experience-Based Reasoning: Leverage similar past experiences when addressing new queries
Goals
- Drop-in Compatibility:
react_memimplements the same AgentService interface asreact, allowing users to choose between implementations via configuration - Memory Persistence: Enable agents to retain and retrieve information across invocations and conversations
- Scalable Context Management: Handle long-running conversations without unbounded memory growth
- Retrieval Efficiency: Use embedding-based retrieval to surface relevant memory efficiently
- Incremental Adoption: Deploy alongside existing
reactmodule without breaking changes - Transparent Operation: Memory operations should not significantly impact response latency
- Configurable Behavior: Allow tuning of memory extraction, compression, and retrieval strategies
Background
Current Architecture
The existing react agent manager (module: trustgraph-flow/trustgraph/agent/react/) implements a stateless ReAct (Reasoning and Acting) loop:
- Receives
AgentRequestwith question and history - Reasons about the problem using LLM
- Takes action by invoking tools
- Returns
AgentResponsewith thought, observation, or final answer - Maintains working memory only for the current invocation via
historyfield
Current limitations include:
- No Long-term Memory: Each invocation starts fresh with no access to prior conversations
- Limited Context Window: History is bounded by max iterations and token limits
- No Learning: Agent cannot benefit from past problem-solving experiences
- Conversation Fragmentation: No mechanism to connect related invocations across a conversation
- Context Loss: Important information from early steps may be lost as history grows
This specification addresses these gaps by introducing a four-tier memory architecture. By maintaining long-term facts, episodic memories, conversation summaries, and managed working memory, TrustGraph agents can:
- Build and leverage a persistent knowledge base
- Learn from past successes and failures
- Maintain coherent context across extended conversations
- Handle complex multi-step reasoning without context overflow
Technical Design
Architecture
The agentic memory system requires the following technical components:
-
ReactMemAgentManager
- Extends AgentManager with memory-aware reasoning loop
- Orchestrates memory retrieval at invocation start
- Manages working memory compression during execution
- Triggers memory extraction at invocation end
Module:
trustgraph-flow/trustgraph/agent/react_mem/agent_manager.py -
ReactMemService
- Implements AgentService interface (drop-in replacement for react.Processor)
- Manages conversation-level memory lifecycle
- Handles configuration of memory services and policies
- Coordinates memory promotion at conversation end
Module:
trustgraph-flow/trustgraph/agent/react_mem/service.py -
Memory Services (Implementation Details: TBD)
- Long-term Facts Service
- Episodic Memory Service
- Conversation Memory Service
- Working Memory Management Service
Module:
trustgraph-flow/trustgraph/agent/react_mem/memory/(TBD) -
Memory Schema Definitions
- Data structures for facts, episodes, conversation records
- Request/response schemas for memory operations
Module:
trustgraph-base/trustgraph/schema/agent_memory.py
Memory Architecture
Four-Tier Memory System
Long-term Facts
- Purpose: Persistent knowledge graph of facts learned across all conversations
- Scope: Global per user/collection
- Retrieval: Embedding-based semantic search on user query
- Lifetime: Permanent until explicitly deleted
- Storage: Graph database (existing TrustGraph knowledge store)
Episodic Memory
- Purpose: Records of past problem-solving episodes with outcomes
- Scope: Global per user/collection
- Retrieval: Embedding-based similarity search on current goal/context
- Lifetime: Permanent with optional TTL or relevance-based pruning
- Storage: Vector store with structured metadata
Conversation Memory
- Purpose: Summaries and key points from prior invocations in current conversation
- Scope: Current conversation only
- Retrieval: Sequential access (not searched)
- Lifetime: Duration of conversation
- Storage: In-memory with optional persistence for long conversations
Working Memory
- Purpose: Reasoning trace for current invocation (thoughts, actions, observations)
- Scope: Current invocation only
- Retrieval: Directly included in LLM context
- Lifetime: Current invocation (discarded after summarization)
- Storage: In-memory list
Data Models
Memory Records
Fact Record
Fact:
- id: string (unique identifier)
- content: string (fact statement)
- source: string (conversation_id where learned)
- timestamp: datetime
- embedding: vector
- metadata: dict
Episode Record
Episode:
- id: string (unique identifier)
- goal: string (what was being attempted)
- steps: list[string] (key actions taken)
- outcome: string (result achieved)
- lessons: string (insights/learnings)
- timestamp: datetime
- embedding: vector
- metadata: dict
Conversation Record
ConversationMemory:
- conversation_id: string
- invocations: list[InvocationSummary]
- metadata: dict
InvocationSummary:
- query: string
- summary: string (what agent did)
- result: string (outcome)
- timestamp: datetime
Working Memory Item
WorkingMemoryItem:
- type: enum[thought, action, observation]
- content: string
- timestamp: datetime
- metadata: dict (e.g., token_count)
Memory Operations Lifecycle
Invocation Start: Context Assembly
When ReactMemService receives an AgentRequest:
-
Retrieve Long-term Facts
- Embed user query
- Query facts store with embedding
- Retrieve top-k relevant facts (configurable k)
-
Retrieve Episodic Memory
- Embed user query + current state
- Query episodes store with embedding
- Retrieve top-k similar past episodes
-
Load Conversation Memory
- Fetch conversation record by conversation_id
- Load all prior invocation summaries for this conversation
-
Initialize Working Memory
- Create empty working memory buffer
- Optionally seed with high-level goal
These retrieved memories are assembled into the context passed to ReactMemAgentManager.reason().
Core Loop: Memory-Aware Reasoning
The reasoning loop proceeds similarly to standard ReAct, but with augmented context:
-
Assemble Context
- System prompt / agent persona
- Long-term facts (from retrieval)
- Relevant episodes (from retrieval)
- Conversation memory (loaded summaries)
- Working memory (current invocation trace)
- Current user query
-
Reason
- LLM generates thought and decision
- Returns Action (tool call) or Final (answer)
-
Act (if Action)
- Execute tool
- Get observation result
-
Update Working Memory
- Append thought, action, observation to working memory
- Check working memory size
- If exceeds threshold: compress older entries, preserve recent ones
-
Loop until Final answer or max iterations
Working Memory Compression Trigger: When token count of working memory exceeds threshold (e.g., 50% of context window), invoke summarization:
- Keep most recent N steps verbatim
- Summarize older steps into condensed form
- Preserve critical information (tool results, key decisions)
Invocation End: Memory Extraction
After sending final response:
-
Conversation Memory Update
- Generate invocation summary: "User asked X, agent did Y, result was Z"
- Append summary to conversation memory
-
Fact Extraction (Optional/Conditional)
- LLM pass: "Did the agent learn anything worth persisting?"
- If yes: Extract fact statements, store to long-term facts
- Alternative: Agent explicitly calls StoreFact tool during reasoning
-
Episode Recording (If Task-Like)
- If invocation resembled a task (multi-step problem solving):
- Extract: goal, key steps, outcome, lessons learned
- Generate embedding for retrieval
- Store to episodic memory
- If invocation resembled a task (multi-step problem solving):
-
Discard Working Memory
- Working memory cleared (already summarized)
Conversation End: Memory Promotion
When conversation concludes (explicit signal or timeout):
-
Promote Facts to Long-term
- Scan conversation memory for high-value facts
- Store to persistent knowledge graph
- Update graph embeddings for retrieval
-
Record Conversation Episode (Optional)
- If conversation had overarching theme/goal:
- Summarize entire conversation as high-level episode
- Store to episodic memory
- If conversation had overarching theme/goal:
APIs
New Memory Service APIs
Fact Retrieval
RetrieveFactsRequest:
- query: string (search query)
- embedding: vector (pre-computed)
- top_k: int (default: 5)
- user: string
- collection: string
RetrieveFactsResponse:
- facts: list[Fact]
- error: Error
Episode Retrieval
RetrieveEpisodesRequest:
- query: string
- embedding: vector
- top_k: int (default: 3)
- user: string
- collection: string
RetrieveEpisodesResponse:
- episodes: list[Episode]
- error: Error
Fact Storage
StoreFactRequest:
- content: string
- source: string (conversation_id)
- user: string
- collection: string
- metadata: dict
StoreFactResponse:
- fact_id: string
- error: Error
Episode Storage
StoreEpisodeRequest:
- goal: string
- steps: list[string]
- outcome: string
- lessons: string
- user: string
- collection: string
- metadata: dict
StoreEpisodeResponse:
- episode_id: string
- error: Error
Conversation Memory Management
GetConversationMemoryRequest:
- conversation_id: string
- user: string
GetConversationMemoryResponse:
- conversation: ConversationMemory
- error: Error
UpdateConversationMemoryRequest:
- conversation_id: string
- invocation_summary: InvocationSummary
- user: string
UpdateConversationMemoryResponse:
- success: boolean
- error: Error
Modified Agent APIs
AgentRequest (Extended)
AgentRequest:
# Existing fields
- question: string
- history: list[AgentStep]
- user: string
- streaming: boolean
- state: string
- group: list[string]
# New fields for memory
- conversation_id: string (identifies conversation for memory retrieval)
- enable_memory: boolean (default: false, set true for react_mem)
AgentResponse (No changes required)
- Existing schema supports memory-enabled agents
- Memory operations are transparent to client
Implementation Details
Service Configuration
The react_mem service will be configured similarly to react, with additional memory-related parameters:
Configuration key: agent-mem (distinct from agent used by react)
Configuration parameters:
max-iterations: Maximum ReAct loop iterationsworking-memory-threshold: Token count trigger for compression (default: 2000)fact-retrieval-top-k: Number of facts to retrieve (default: 5)episode-retrieval-top-k: Number of episodes to retrieve (default: 3)enable-auto-fact-extraction: Auto-extract facts at invocation end (default: true)enable-episode-recording: Auto-record episodes (default: true)additional-context: Additional system context (same as react)
Memory Service Specifications
Memory service implementation details are TBD and will be defined in a separate technical specification.
The memory services must implement the following interfaces:
-
Long-term Facts Service
- Interface: Fact storage, retrieval, deletion
- Storage backend: TBD (likely existing graph store + vector index)
- Embedding model: TBD (consistency with existing embeddings)
-
Episodic Memory Service
- Interface: Episode storage, retrieval, search
- Storage backend: TBD (vector store + structured metadata store)
- Retention policy: TBD
-
Conversation Memory Service
- Interface: CRUD operations on conversation records
- Storage backend: TBD (in-memory with optional persistence)
- TTL/cleanup policy: TBD
-
Working Memory Manager
- Interface: Append, compress, retrieve working memory
- Compression strategy: TBD (summarization vs. truncation)
- Implementation: In-memory only
These services will be registered as client specifications similar to existing services (GraphRagClientSpec, ToolClientSpec, etc.).
Prompt Engineering
The memory-aware prompts will need to incorporate retrieved context effectively:
Prompt structure (conceptual):
System: You are an agent with access to persistent memory...
Long-term Facts:
- [Retrieved fact 1]
- [Retrieved fact 2]
...
Relevant Past Episodes:
- Episode 1: [summary]
- Episode 2: [summary]
...
Conversation History:
- Previous invocation 1: [summary]
- Previous invocation 2: [summary]
...
Current Task:
Working Memory:
- [Current steps taken]
Question: [User query]
Think step by step and decide on next action or provide final answer.
Prompt templates will be defined in configuration, similar to existing agent prompts.
Tool Extensions (Optional Refinement)
To support mid-invocation retrieval (optional future enhancement):
RetrieveFacts Tool
- Description: "Retrieve additional facts from knowledge base"
- Arguments: query (string)
- Implementation: Calls fact retrieval service
RecallEpisodes Tool
- Description: "Search for similar past problem-solving episodes"
- Arguments: query (string)
- Implementation: Calls episode retrieval service
StoreFact Tool
- Description: "Explicitly store a fact for future reference"
- Arguments: fact_content (string)
- Implementation: Calls fact storage service
These tools would be optional and configured per deployment.
Client Specifications
The react_mem service will register the following client specifications:
Existing (inherited from react):
- TextCompletionClientSpec
- GraphRagClientSpec
- PromptClientSpec
- ToolClientSpec
- StructuredQueryClientSpec
New (memory-specific):
- FactRetrievalClientSpec
- EpisodeRetrievalClientSpec
- FactStorageClientSpec
- EpisodeStorageClientSpec
- ConversationMemoryClientSpec
Security Considerations
Memory Isolation
- All memory operations must be scoped to user and collection
- Cross-user memory leakage must be prevented
- Fact and episode retrieval must respect access controls
PII and Sensitive Information
- Fact extraction should avoid storing sensitive personal information
- Conversation memory may contain PII and must be handled accordingly
- Memory retention policies should align with privacy requirements
- Option to disable memory or purge memory for specific users/conversations
Embedding Security
- Embeddings may leak information through similarity searches
- Consider privacy-preserving embedding techniques for sensitive deployments
Resource Limits
- Prevent unbounded memory growth through:
- Per-user/collection memory quotas
- Time-based expiration of old memories
- Relevance-based pruning of unused memories
Performance Considerations
Retrieval Latency
- Embedding generation adds latency at invocation start
- Vector searches for facts and episodes may add 50-200ms per retrieval
- Mitigation: Parallel retrieval of facts and episodes, caching embeddings
Context Window Usage
- Retrieved memories consume context window
- Must balance memory breadth vs. depth
- Working memory compression necessary to avoid overflow
Storage Overhead
- Each conversation generates conversation memory records
- Each invocation may create fact and episode records
- Mitigation: Background cleanup jobs, retention policies
Scaling
- Memory stores must scale with user base
- Vector search performance critical for large fact/episode databases
- Consider sharding strategies for multi-tenant deployments
Testing Strategy
Unit Tests
- Memory service interfaces (mocked backends)
- Working memory compression logic
- Prompt assembly with memory context
- Fact/episode extraction logic
Integration Tests
- End-to-end agent invocation with memory enabled
- Memory retrieval and storage flows
- Conversation continuity across invocations
- Working memory compression triggers
Performance Tests
- Latency impact of memory retrieval
- Context window utilization with varying memory sizes
- Large-scale memory retrieval (1000s of facts/episodes)
Correctness Tests
- Verify facts are correctly stored and retrieved
- Verify episodes improve problem-solving on similar tasks
- Verify conversation memory maintains coherence
Migration Plan
Phase 1: Module Creation
- Create
react_memdirectory structure - Implement ReactMemService and ReactMemAgentManager shells
- No memory operations yet (functionally equivalent to react)
- Deploy and verify drop-in compatibility
Phase 2: Memory Services
- Implement memory service interfaces and backends
- Add memory retrieval at invocation start (read-only)
- Deploy and verify retrieval performance
Phase 3: Memory Writing
- Enable conversation memory updates
- Enable fact and episode extraction
- Deploy with memory writing enabled
Phase 4: Conversation Lifecycle
- Implement conversation-end memory promotion
- Add cleanup and retention policies
- Full agentic memory capabilities enabled
Rollout Strategy
- Deploy react_mem alongside react (not as replacement)
- Users opt-in by changing configuration to use react_mem
- Monitor performance and memory growth
- Gradually migrate users based on feedback
Timeline
- Phase 1 (Module Creation): 1-2 weeks
- Phase 2 (Memory Services): 3-4 weeks (dependent on memory service spec)
- Phase 3 (Memory Writing): 2-3 weeks
- Phase 4 (Conversation Lifecycle): 1-2 weeks
- Total: 7-11 weeks
Open Questions
- Embedding Model: Should we use the same embedding model as existing graph RAG, or introduce a dedicated memory embedding model?
- Fact Extraction Trigger: Auto-extract via LLM pass, or require explicit StoreFact tool usage?
- Conversation End Detection: How do we reliably detect conversation end (timeout, explicit signal, heuristic)?
- Memory Cleanup: What retention policies for facts, episodes, and conversation memory?
- Cross-conversation Learning: Should episodic memory span conversations, or be scoped per conversation?
- Working Memory Compression: Summarization (LLM-based) or truncation (rule-based)?
- Memory Search UX: Should memory retrieval be visible to users, or completely transparent?
- Backward Compatibility: Should react service gain
conversation_idfield for potential future memory support?
References
- Existing react agent implementation:
trustgraph-flow/trustgraph/agent/react/ - AgentService base class:
trustgraph-base/trustgraph/base/agent_service.py - Agent schemas:
trustgraph-base/trustgraph/schema/services/agent.py - Similar memory-augmented agent architectures: Reflexion, MemGPT, ChatGPT memory features