mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-05-18 20:05:13 +02:00
609 lines
20 KiB
Markdown
609 lines
20 KiB
Markdown
# Agentic Memory Technical Specification
|
|
|
|
## Overview
|
|
|
|
This specification describes the implementation of an agent manager with multi-layered memory capabilities for TrustGraph. The new `react_mem` module extends the existing ReAct pattern with persistent memory across invocations and conversations, enabling agents to learn from past interactions and maintain context over time.
|
|
|
|
The implementation supports the following use cases:
|
|
|
|
1. **Long-term Knowledge Retention**: Store and retrieve factual information learned during agent interactions
|
|
2. **Episodic Memory**: Remember past problem-solving approaches and their outcomes
|
|
3. **Conversation Continuity**: Maintain context across multiple invocations within a conversation
|
|
4. **Working Memory Management**: Handle extended reasoning chains with automatic compression
|
|
5. **Experience-Based Reasoning**: Leverage similar past experiences when addressing new queries
|
|
|
|
## Goals
|
|
|
|
- **Drop-in Compatibility**: `react_mem` implements the same AgentService interface as `react`, allowing users to choose between implementations via configuration
|
|
- **Memory Persistence**: Enable agents to retain and retrieve information across invocations and conversations
|
|
- **Scalable Context Management**: Handle long-running conversations without unbounded memory growth
|
|
- **Retrieval Efficiency**: Use embedding-based retrieval to surface relevant memory efficiently
|
|
- **Incremental Adoption**: Deploy alongside existing `react` module without breaking changes
|
|
- **Transparent Operation**: Memory operations should not significantly impact response latency
|
|
- **Configurable Behavior**: Allow tuning of memory extraction, compression, and retrieval strategies
|
|
|
|
## Background
|
|
|
|
### Current Architecture
|
|
|
|
The existing `react` agent manager (module: `trustgraph-flow/trustgraph/agent/react/`) implements a stateless ReAct (Reasoning and Acting) loop:
|
|
|
|
1. Receives `AgentRequest` with question and history
|
|
2. Reasons about the problem using LLM
|
|
3. Takes action by invoking tools
|
|
4. Returns `AgentResponse` with thought, observation, or final answer
|
|
5. Maintains working memory only for the current invocation via `history` field
|
|
|
|
Current limitations include:
|
|
|
|
- **No Long-term Memory**: Each invocation starts fresh with no access to prior conversations
|
|
- **Limited Context Window**: History is bounded by max iterations and token limits
|
|
- **No Learning**: Agent cannot benefit from past problem-solving experiences
|
|
- **Conversation Fragmentation**: No mechanism to connect related invocations across a conversation
|
|
- **Context Loss**: Important information from early steps may be lost as history grows
|
|
|
|
This specification addresses these gaps by introducing a four-tier memory architecture. By maintaining long-term facts, episodic memories, conversation summaries, and managed working memory, TrustGraph agents can:
|
|
|
|
- Build and leverage a persistent knowledge base
|
|
- Learn from past successes and failures
|
|
- Maintain coherent context across extended conversations
|
|
- Handle complex multi-step reasoning without context overflow
|
|
|
|
## Technical Design
|
|
|
|
### Architecture
|
|
|
|
The agentic memory system requires the following technical components:
|
|
|
|
1. **ReactMemAgentManager**
|
|
- Extends AgentManager with memory-aware reasoning loop
|
|
- Orchestrates memory retrieval at invocation start
|
|
- Manages working memory compression during execution
|
|
- Triggers memory extraction at invocation end
|
|
|
|
Module: `trustgraph-flow/trustgraph/agent/react_mem/agent_manager.py`
|
|
|
|
2. **ReactMemService**
|
|
- Implements AgentService interface (drop-in replacement for react.Processor)
|
|
- Manages conversation-level memory lifecycle
|
|
- Handles configuration of memory services and policies
|
|
- Coordinates memory promotion at conversation end
|
|
|
|
Module: `trustgraph-flow/trustgraph/agent/react_mem/service.py`
|
|
|
|
3. **Memory Services** (Implementation Details: TBD)
|
|
- Long-term Facts Service
|
|
- Episodic Memory Service
|
|
- Conversation Memory Service
|
|
- Working Memory Management Service
|
|
|
|
Module: `trustgraph-flow/trustgraph/agent/react_mem/memory/` (TBD)
|
|
|
|
4. **Memory Schema Definitions**
|
|
- Data structures for facts, episodes, conversation records
|
|
- Request/response schemas for memory operations
|
|
|
|
Module: `trustgraph-base/trustgraph/schema/agent_memory.py`
|
|
|
|
### Memory Architecture
|
|
|
|
#### Four-Tier Memory System
|
|
|
|
**Long-term Facts**
|
|
- **Purpose**: Persistent knowledge graph of facts learned across all conversations
|
|
- **Scope**: Global per user/collection
|
|
- **Retrieval**: Embedding-based semantic search on user query
|
|
- **Lifetime**: Permanent until explicitly deleted
|
|
- **Storage**: Graph database (existing TrustGraph knowledge store)
|
|
|
|
**Episodic Memory**
|
|
- **Purpose**: Records of past problem-solving episodes with outcomes
|
|
- **Scope**: Global per user/collection
|
|
- **Retrieval**: Embedding-based similarity search on current goal/context
|
|
- **Lifetime**: Permanent with optional TTL or relevance-based pruning
|
|
- **Storage**: Vector store with structured metadata
|
|
|
|
**Conversation Memory**
|
|
- **Purpose**: Summaries and key points from prior invocations in current conversation
|
|
- **Scope**: Current conversation only
|
|
- **Retrieval**: Sequential access (not searched)
|
|
- **Lifetime**: Duration of conversation
|
|
- **Storage**: In-memory with optional persistence for long conversations
|
|
|
|
**Working Memory**
|
|
- **Purpose**: Reasoning trace for current invocation (thoughts, actions, observations)
|
|
- **Scope**: Current invocation only
|
|
- **Retrieval**: Directly included in LLM context
|
|
- **Lifetime**: Current invocation (discarded after summarization)
|
|
- **Storage**: In-memory list
|
|
|
|
### Data Models
|
|
|
|
#### Memory Records
|
|
|
|
**Fact Record**
|
|
```
|
|
Fact:
|
|
- id: string (unique identifier)
|
|
- content: string (fact statement)
|
|
- source: string (conversation_id where learned)
|
|
- timestamp: datetime
|
|
- embedding: vector
|
|
- metadata: dict
|
|
```
|
|
|
|
**Episode Record**
|
|
```
|
|
Episode:
|
|
- id: string (unique identifier)
|
|
- goal: string (what was being attempted)
|
|
- steps: list[string] (key actions taken)
|
|
- outcome: string (result achieved)
|
|
- lessons: string (insights/learnings)
|
|
- timestamp: datetime
|
|
- embedding: vector
|
|
- metadata: dict
|
|
```
|
|
|
|
**Conversation Record**
|
|
```
|
|
ConversationMemory:
|
|
- conversation_id: string
|
|
- invocations: list[InvocationSummary]
|
|
- metadata: dict
|
|
|
|
InvocationSummary:
|
|
- query: string
|
|
- summary: string (what agent did)
|
|
- result: string (outcome)
|
|
- timestamp: datetime
|
|
```
|
|
|
|
**Working Memory Item**
|
|
```
|
|
WorkingMemoryItem:
|
|
- type: enum[thought, action, observation]
|
|
- content: string
|
|
- timestamp: datetime
|
|
- metadata: dict (e.g., token_count)
|
|
```
|
|
|
|
### Memory Operations Lifecycle
|
|
|
|
#### Invocation Start: Context Assembly
|
|
|
|
When `ReactMemService` receives an `AgentRequest`:
|
|
|
|
1. **Retrieve Long-term Facts**
|
|
- Embed user query
|
|
- Query facts store with embedding
|
|
- Retrieve top-k relevant facts (configurable k)
|
|
|
|
2. **Retrieve Episodic Memory**
|
|
- Embed user query + current state
|
|
- Query episodes store with embedding
|
|
- Retrieve top-k similar past episodes
|
|
|
|
3. **Load Conversation Memory**
|
|
- Fetch conversation record by conversation_id
|
|
- Load all prior invocation summaries for this conversation
|
|
|
|
4. **Initialize Working Memory**
|
|
- Create empty working memory buffer
|
|
- Optionally seed with high-level goal
|
|
|
|
These retrieved memories are assembled into the context passed to `ReactMemAgentManager.reason()`.
|
|
|
|
#### Core Loop: Memory-Aware Reasoning
|
|
|
|
The reasoning loop proceeds similarly to standard ReAct, but with augmented context:
|
|
|
|
1. **Assemble Context**
|
|
- System prompt / agent persona
|
|
- Long-term facts (from retrieval)
|
|
- Relevant episodes (from retrieval)
|
|
- Conversation memory (loaded summaries)
|
|
- Working memory (current invocation trace)
|
|
- Current user query
|
|
|
|
2. **Reason**
|
|
- LLM generates thought and decision
|
|
- Returns Action (tool call) or Final (answer)
|
|
|
|
3. **Act** (if Action)
|
|
- Execute tool
|
|
- Get observation result
|
|
|
|
4. **Update Working Memory**
|
|
- Append thought, action, observation to working memory
|
|
- Check working memory size
|
|
- If exceeds threshold: compress older entries, preserve recent ones
|
|
|
|
5. **Loop** until Final answer or max iterations
|
|
|
|
**Working Memory Compression Trigger**: When token count of working memory exceeds threshold (e.g., 50% of context window), invoke summarization:
|
|
- Keep most recent N steps verbatim
|
|
- Summarize older steps into condensed form
|
|
- Preserve critical information (tool results, key decisions)
|
|
|
|
#### Invocation End: Memory Extraction
|
|
|
|
After sending final response:
|
|
|
|
1. **Conversation Memory Update**
|
|
- Generate invocation summary: "User asked X, agent did Y, result was Z"
|
|
- Append summary to conversation memory
|
|
|
|
2. **Fact Extraction** (Optional/Conditional)
|
|
- LLM pass: "Did the agent learn anything worth persisting?"
|
|
- If yes: Extract fact statements, store to long-term facts
|
|
- Alternative: Agent explicitly calls StoreFact tool during reasoning
|
|
|
|
3. **Episode Recording** (If Task-Like)
|
|
- If invocation resembled a task (multi-step problem solving):
|
|
- Extract: goal, key steps, outcome, lessons learned
|
|
- Generate embedding for retrieval
|
|
- Store to episodic memory
|
|
|
|
4. **Discard Working Memory**
|
|
- Working memory cleared (already summarized)
|
|
|
|
#### Conversation End: Memory Promotion
|
|
|
|
When conversation concludes (explicit signal or timeout):
|
|
|
|
1. **Promote Facts to Long-term**
|
|
- Scan conversation memory for high-value facts
|
|
- Store to persistent knowledge graph
|
|
- Update graph embeddings for retrieval
|
|
|
|
2. **Record Conversation Episode** (Optional)
|
|
- If conversation had overarching theme/goal:
|
|
- Summarize entire conversation as high-level episode
|
|
- Store to episodic memory
|
|
|
|
### APIs
|
|
|
|
#### New Memory Service APIs
|
|
|
|
**Fact Retrieval**
|
|
```
|
|
RetrieveFactsRequest:
|
|
- query: string (search query)
|
|
- embedding: vector (pre-computed)
|
|
- top_k: int (default: 5)
|
|
- user: string
|
|
- collection: string
|
|
|
|
RetrieveFactsResponse:
|
|
- facts: list[Fact]
|
|
- error: Error
|
|
```
|
|
|
|
**Episode Retrieval**
|
|
```
|
|
RetrieveEpisodesRequest:
|
|
- query: string
|
|
- embedding: vector
|
|
- top_k: int (default: 3)
|
|
- user: string
|
|
- collection: string
|
|
|
|
RetrieveEpisodesResponse:
|
|
- episodes: list[Episode]
|
|
- error: Error
|
|
```
|
|
|
|
**Fact Storage**
|
|
```
|
|
StoreFactRequest:
|
|
- content: string
|
|
- source: string (conversation_id)
|
|
- user: string
|
|
- collection: string
|
|
- metadata: dict
|
|
|
|
StoreFactResponse:
|
|
- fact_id: string
|
|
- error: Error
|
|
```
|
|
|
|
**Episode Storage**
|
|
```
|
|
StoreEpisodeRequest:
|
|
- goal: string
|
|
- steps: list[string]
|
|
- outcome: string
|
|
- lessons: string
|
|
- user: string
|
|
- collection: string
|
|
- metadata: dict
|
|
|
|
StoreEpisodeResponse:
|
|
- episode_id: string
|
|
- error: Error
|
|
```
|
|
|
|
**Conversation Memory Management**
|
|
```
|
|
GetConversationMemoryRequest:
|
|
- conversation_id: string
|
|
- user: string
|
|
|
|
GetConversationMemoryResponse:
|
|
- conversation: ConversationMemory
|
|
- error: Error
|
|
|
|
UpdateConversationMemoryRequest:
|
|
- conversation_id: string
|
|
- invocation_summary: InvocationSummary
|
|
- user: string
|
|
|
|
UpdateConversationMemoryResponse:
|
|
- success: boolean
|
|
- error: Error
|
|
```
|
|
|
|
#### Modified Agent APIs
|
|
|
|
**AgentRequest** (Extended)
|
|
```
|
|
AgentRequest:
|
|
# Existing fields
|
|
- question: string
|
|
- history: list[AgentStep]
|
|
- user: string
|
|
- streaming: boolean
|
|
- state: string
|
|
- group: list[string]
|
|
|
|
# New fields for memory
|
|
- conversation_id: string (identifies conversation for memory retrieval)
|
|
- enable_memory: boolean (default: false, set true for react_mem)
|
|
```
|
|
|
|
**AgentResponse** (No changes required)
|
|
- Existing schema supports memory-enabled agents
|
|
- Memory operations are transparent to client
|
|
|
|
### Implementation Details
|
|
|
|
#### Service Configuration
|
|
|
|
The `react_mem` service will be configured similarly to `react`, with additional memory-related parameters:
|
|
|
|
Configuration key: `agent-mem` (distinct from `agent` used by react)
|
|
|
|
Configuration parameters:
|
|
- `max-iterations`: Maximum ReAct loop iterations
|
|
- `working-memory-threshold`: Token count trigger for compression (default: 2000)
|
|
- `fact-retrieval-top-k`: Number of facts to retrieve (default: 5)
|
|
- `episode-retrieval-top-k`: Number of episodes to retrieve (default: 3)
|
|
- `enable-auto-fact-extraction`: Auto-extract facts at invocation end (default: true)
|
|
- `enable-episode-recording`: Auto-record episodes (default: true)
|
|
- `additional-context`: Additional system context (same as react)
|
|
|
|
#### Memory Service Specifications
|
|
|
|
**Memory service implementation details are TBD and will be defined in a separate technical specification.**
|
|
|
|
The memory services must implement the following interfaces:
|
|
|
|
1. **Long-term Facts Service**
|
|
- Interface: Fact storage, retrieval, deletion
|
|
- Storage backend: TBD (likely existing graph store + vector index)
|
|
- Embedding model: TBD (consistency with existing embeddings)
|
|
|
|
2. **Episodic Memory Service**
|
|
- Interface: Episode storage, retrieval, search
|
|
- Storage backend: TBD (vector store + structured metadata store)
|
|
- Retention policy: TBD
|
|
|
|
3. **Conversation Memory Service**
|
|
- Interface: CRUD operations on conversation records
|
|
- Storage backend: TBD (in-memory with optional persistence)
|
|
- TTL/cleanup policy: TBD
|
|
|
|
4. **Working Memory Manager**
|
|
- Interface: Append, compress, retrieve working memory
|
|
- Compression strategy: TBD (summarization vs. truncation)
|
|
- Implementation: In-memory only
|
|
|
|
These services will be registered as client specifications similar to existing services (GraphRagClientSpec, ToolClientSpec, etc.).
|
|
|
|
#### Prompt Engineering
|
|
|
|
The memory-aware prompts will need to incorporate retrieved context effectively:
|
|
|
|
**Prompt structure** (conceptual):
|
|
```
|
|
System: You are an agent with access to persistent memory...
|
|
|
|
Long-term Facts:
|
|
- [Retrieved fact 1]
|
|
- [Retrieved fact 2]
|
|
...
|
|
|
|
Relevant Past Episodes:
|
|
- Episode 1: [summary]
|
|
- Episode 2: [summary]
|
|
...
|
|
|
|
Conversation History:
|
|
- Previous invocation 1: [summary]
|
|
- Previous invocation 2: [summary]
|
|
...
|
|
|
|
Current Task:
|
|
Working Memory:
|
|
- [Current steps taken]
|
|
|
|
Question: [User query]
|
|
|
|
Think step by step and decide on next action or provide final answer.
|
|
```
|
|
|
|
Prompt templates will be defined in configuration, similar to existing agent prompts.
|
|
|
|
#### Tool Extensions (Optional Refinement)
|
|
|
|
To support mid-invocation retrieval (optional future enhancement):
|
|
|
|
**RetrieveFacts Tool**
|
|
- Description: "Retrieve additional facts from knowledge base"
|
|
- Arguments: query (string)
|
|
- Implementation: Calls fact retrieval service
|
|
|
|
**RecallEpisodes Tool**
|
|
- Description: "Search for similar past problem-solving episodes"
|
|
- Arguments: query (string)
|
|
- Implementation: Calls episode retrieval service
|
|
|
|
**StoreFact Tool**
|
|
- Description: "Explicitly store a fact for future reference"
|
|
- Arguments: fact_content (string)
|
|
- Implementation: Calls fact storage service
|
|
|
|
These tools would be optional and configured per deployment.
|
|
|
|
### Client Specifications
|
|
|
|
The react_mem service will register the following client specifications:
|
|
|
|
**Existing (inherited from react):**
|
|
- TextCompletionClientSpec
|
|
- GraphRagClientSpec
|
|
- PromptClientSpec
|
|
- ToolClientSpec
|
|
- StructuredQueryClientSpec
|
|
|
|
**New (memory-specific):**
|
|
- FactRetrievalClientSpec
|
|
- EpisodeRetrievalClientSpec
|
|
- FactStorageClientSpec
|
|
- EpisodeStorageClientSpec
|
|
- ConversationMemoryClientSpec
|
|
|
|
## Security Considerations
|
|
|
|
**Memory Isolation**
|
|
- All memory operations must be scoped to user and collection
|
|
- Cross-user memory leakage must be prevented
|
|
- Fact and episode retrieval must respect access controls
|
|
|
|
**PII and Sensitive Information**
|
|
- Fact extraction should avoid storing sensitive personal information
|
|
- Conversation memory may contain PII and must be handled accordingly
|
|
- Memory retention policies should align with privacy requirements
|
|
- Option to disable memory or purge memory for specific users/conversations
|
|
|
|
**Embedding Security**
|
|
- Embeddings may leak information through similarity searches
|
|
- Consider privacy-preserving embedding techniques for sensitive deployments
|
|
|
|
**Resource Limits**
|
|
- Prevent unbounded memory growth through:
|
|
- Per-user/collection memory quotas
|
|
- Time-based expiration of old memories
|
|
- Relevance-based pruning of unused memories
|
|
|
|
## Performance Considerations
|
|
|
|
**Retrieval Latency**
|
|
- Embedding generation adds latency at invocation start
|
|
- Vector searches for facts and episodes may add 50-200ms per retrieval
|
|
- Mitigation: Parallel retrieval of facts and episodes, caching embeddings
|
|
|
|
**Context Window Usage**
|
|
- Retrieved memories consume context window
|
|
- Must balance memory breadth vs. depth
|
|
- Working memory compression necessary to avoid overflow
|
|
|
|
**Storage Overhead**
|
|
- Each conversation generates conversation memory records
|
|
- Each invocation may create fact and episode records
|
|
- Mitigation: Background cleanup jobs, retention policies
|
|
|
|
**Scaling**
|
|
- Memory stores must scale with user base
|
|
- Vector search performance critical for large fact/episode databases
|
|
- Consider sharding strategies for multi-tenant deployments
|
|
|
|
## Testing Strategy
|
|
|
|
**Unit Tests**
|
|
- Memory service interfaces (mocked backends)
|
|
- Working memory compression logic
|
|
- Prompt assembly with memory context
|
|
- Fact/episode extraction logic
|
|
|
|
**Integration Tests**
|
|
- End-to-end agent invocation with memory enabled
|
|
- Memory retrieval and storage flows
|
|
- Conversation continuity across invocations
|
|
- Working memory compression triggers
|
|
|
|
**Performance Tests**
|
|
- Latency impact of memory retrieval
|
|
- Context window utilization with varying memory sizes
|
|
- Large-scale memory retrieval (1000s of facts/episodes)
|
|
|
|
**Correctness Tests**
|
|
- Verify facts are correctly stored and retrieved
|
|
- Verify episodes improve problem-solving on similar tasks
|
|
- Verify conversation memory maintains coherence
|
|
|
|
## Migration Plan
|
|
|
|
**Phase 1: Module Creation**
|
|
- Create `react_mem` directory structure
|
|
- Implement ReactMemService and ReactMemAgentManager shells
|
|
- No memory operations yet (functionally equivalent to react)
|
|
- Deploy and verify drop-in compatibility
|
|
|
|
**Phase 2: Memory Services**
|
|
- Implement memory service interfaces and backends
|
|
- Add memory retrieval at invocation start (read-only)
|
|
- Deploy and verify retrieval performance
|
|
|
|
**Phase 3: Memory Writing**
|
|
- Enable conversation memory updates
|
|
- Enable fact and episode extraction
|
|
- Deploy with memory writing enabled
|
|
|
|
**Phase 4: Conversation Lifecycle**
|
|
- Implement conversation-end memory promotion
|
|
- Add cleanup and retention policies
|
|
- Full agentic memory capabilities enabled
|
|
|
|
**Rollout Strategy**
|
|
- Deploy react_mem alongside react (not as replacement)
|
|
- Users opt-in by changing configuration to use react_mem
|
|
- Monitor performance and memory growth
|
|
- Gradually migrate users based on feedback
|
|
|
|
## Timeline
|
|
|
|
- **Phase 1 (Module Creation)**: 1-2 weeks
|
|
- **Phase 2 (Memory Services)**: 3-4 weeks (dependent on memory service spec)
|
|
- **Phase 3 (Memory Writing)**: 2-3 weeks
|
|
- **Phase 4 (Conversation Lifecycle)**: 1-2 weeks
|
|
- **Total**: 7-11 weeks
|
|
|
|
## Open Questions
|
|
|
|
- **Embedding Model**: Should we use the same embedding model as existing graph RAG, or introduce a dedicated memory embedding model?
|
|
- **Fact Extraction Trigger**: Auto-extract via LLM pass, or require explicit StoreFact tool usage?
|
|
- **Conversation End Detection**: How do we reliably detect conversation end (timeout, explicit signal, heuristic)?
|
|
- **Memory Cleanup**: What retention policies for facts, episodes, and conversation memory?
|
|
- **Cross-conversation Learning**: Should episodic memory span conversations, or be scoped per conversation?
|
|
- **Working Memory Compression**: Summarization (LLM-based) or truncation (rule-based)?
|
|
- **Memory Search UX**: Should memory retrieval be visible to users, or completely transparent?
|
|
- **Backward Compatibility**: Should react service gain `conversation_id` field for potential future memory support?
|
|
|
|
## References
|
|
|
|
- Existing react agent implementation: `trustgraph-flow/trustgraph/agent/react/`
|
|
- AgentService base class: `trustgraph-base/trustgraph/base/agent_service.py`
|
|
- Agent schemas: `trustgraph-base/trustgraph/schema/services/agent.py`
|
|
- Similar memory-augmented agent architectures: Reflexion, MemGPT, ChatGPT memory features
|