mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-05-18 11:55:12 +02:00

Cyber MacGeddon 5d67e9f9a1 Agentic memory spec

2025-12-03 16:29:47 +00:00

20 KiB

Raw Blame History

Agentic Memory Technical Specification

Overview

This specification describes the implementation of an agent manager with multi-layered memory capabilities for TrustGraph. The new react_mem module extends the existing ReAct pattern with persistent memory across invocations and conversations, enabling agents to learn from past interactions and maintain context over time.

The implementation supports the following use cases:

Long-term Knowledge Retention: Store and retrieve factual information learned during agent interactions
Episodic Memory: Remember past problem-solving approaches and their outcomes
Conversation Continuity: Maintain context across multiple invocations within a conversation
Working Memory Management: Handle extended reasoning chains with automatic compression
Experience-Based Reasoning: Leverage similar past experiences when addressing new queries

Goals

Drop-in Compatibility: react_mem implements the same AgentService interface as react, allowing users to choose between implementations via configuration
Memory Persistence: Enable agents to retain and retrieve information across invocations and conversations
Scalable Context Management: Handle long-running conversations without unbounded memory growth
Retrieval Efficiency: Use embedding-based retrieval to surface relevant memory efficiently
Incremental Adoption: Deploy alongside existing react module without breaking changes
Transparent Operation: Memory operations should not significantly impact response latency
Configurable Behavior: Allow tuning of memory extraction, compression, and retrieval strategies

Background

Current Architecture

The existing react agent manager (module: trustgraph-flow/trustgraph/agent/react/) implements a stateless ReAct (Reasoning and Acting) loop:

Receives AgentRequest with question and history
Reasons about the problem using LLM
Takes action by invoking tools
Returns AgentResponse with thought, observation, or final answer
Maintains working memory only for the current invocation via history field

Current limitations include:

No Long-term Memory: Each invocation starts fresh with no access to prior conversations
Limited Context Window: History is bounded by max iterations and token limits
No Learning: Agent cannot benefit from past problem-solving experiences
Conversation Fragmentation: No mechanism to connect related invocations across a conversation
Context Loss: Important information from early steps may be lost as history grows

This specification addresses these gaps by introducing a four-tier memory architecture. By maintaining long-term facts, episodic memories, conversation summaries, and managed working memory, TrustGraph agents can:

Build and leverage a persistent knowledge base
Learn from past successes and failures
Maintain coherent context across extended conversations
Handle complex multi-step reasoning without context overflow

Technical Design

Architecture

The agentic memory system requires the following technical components:

ReactMemAgentManager
- Extends AgentManager with memory-aware reasoning loop
- Orchestrates memory retrieval at invocation start
- Manages working memory compression during execution
- Triggers memory extraction at invocation end
Module: trustgraph-flow/trustgraph/agent/react_mem/agent_manager.py
ReactMemService
- Implements AgentService interface (drop-in replacement for react.Processor)
- Manages conversation-level memory lifecycle
- Handles configuration of memory services and policies
- Coordinates memory promotion at conversation end
Module: trustgraph-flow/trustgraph/agent/react_mem/service.py
Memory Services (Implementation Details: TBD)
- Long-term Facts Service
- Episodic Memory Service
- Conversation Memory Service
- Working Memory Management Service
Module: trustgraph-flow/trustgraph/agent/react_mem/memory/ (TBD)
Memory Schema Definitions
- Data structures for facts, episodes, conversation records
- Request/response schemas for memory operations
Module: trustgraph-base/trustgraph/schema/agent_memory.py

Memory Architecture

Four-Tier Memory System

Long-term Facts

Purpose: Persistent knowledge graph of facts learned across all conversations
Scope: Global per user/collection
Retrieval: Embedding-based semantic search on user query
Lifetime: Permanent until explicitly deleted
Storage: Graph database (existing TrustGraph knowledge store)

Episodic Memory

Purpose: Records of past problem-solving episodes with outcomes
Scope: Global per user/collection
Retrieval: Embedding-based similarity search on current goal/context
Lifetime: Permanent with optional TTL or relevance-based pruning
Storage: Vector store with structured metadata

Conversation Memory

Purpose: Summaries and key points from prior invocations in current conversation
Scope: Current conversation only
Retrieval: Sequential access (not searched)
Lifetime: Duration of conversation
Storage: In-memory with optional persistence for long conversations

Working Memory

Purpose: Reasoning trace for current invocation (thoughts, actions, observations)
Scope: Current invocation only
Retrieval: Directly included in LLM context
Lifetime: Current invocation (discarded after summarization)
Storage: In-memory list

Data Models

Memory Records

Fact Record

Fact:
  - id: string (unique identifier)
  - content: string (fact statement)
  - source: string (conversation_id where learned)
  - timestamp: datetime
  - embedding: vector
  - metadata: dict

Episode Record

Episode:
  - id: string (unique identifier)
  - goal: string (what was being attempted)
  - steps: list[string] (key actions taken)
  - outcome: string (result achieved)
  - lessons: string (insights/learnings)
  - timestamp: datetime
  - embedding: vector
  - metadata: dict

Conversation Record

ConversationMemory:
  - conversation_id: string
  - invocations: list[InvocationSummary]
  - metadata: dict

InvocationSummary:
  - query: string
  - summary: string (what agent did)
  - result: string (outcome)
  - timestamp: datetime

Working Memory Item

WorkingMemoryItem:
  - type: enum[thought, action, observation]
  - content: string
  - timestamp: datetime
  - metadata: dict (e.g., token_count)

Memory Operations Lifecycle

Invocation Start: Context Assembly

When ReactMemService receives an AgentRequest:

Retrieve Long-term Facts
- Embed user query
- Query facts store with embedding
- Retrieve top-k relevant facts (configurable k)
Retrieve Episodic Memory
- Embed user query + current state
- Query episodes store with embedding
- Retrieve top-k similar past episodes
Load Conversation Memory
- Fetch conversation record by conversation_id
- Load all prior invocation summaries for this conversation
Initialize Working Memory
- Create empty working memory buffer
- Optionally seed with high-level goal

These retrieved memories are assembled into the context passed to ReactMemAgentManager.reason().

Core Loop: Memory-Aware Reasoning

The reasoning loop proceeds similarly to standard ReAct, but with augmented context:

Assemble Context
- System prompt / agent persona
- Long-term facts (from retrieval)
- Relevant episodes (from retrieval)
- Conversation memory (loaded summaries)
- Working memory (current invocation trace)
- Current user query
Reason
- LLM generates thought and decision
- Returns Action (tool call) or Final (answer)
Act (if Action)
- Execute tool
- Get observation result
Update Working Memory
- Append thought, action, observation to working memory
- Check working memory size
- If exceeds threshold: compress older entries, preserve recent ones
Loop until Final answer or max iterations

Working Memory Compression Trigger: When token count of working memory exceeds threshold (e.g., 50% of context window), invoke summarization:

Keep most recent N steps verbatim
Summarize older steps into condensed form
Preserve critical information (tool results, key decisions)

Invocation End: Memory Extraction

After sending final response:

Conversation Memory Update
- Generate invocation summary: "User asked X, agent did Y, result was Z"
- Append summary to conversation memory
Fact Extraction (Optional/Conditional)
- LLM pass: "Did the agent learn anything worth persisting?"
- If yes: Extract fact statements, store to long-term facts
- Alternative: Agent explicitly calls StoreFact tool during reasoning
Episode Recording (If Task-Like)
- If invocation resembled a task (multi-step problem solving):
  - Extract: goal, key steps, outcome, lessons learned
  - Generate embedding for retrieval
  - Store to episodic memory
Discard Working Memory
- Working memory cleared (already summarized)

Conversation End: Memory Promotion

When conversation concludes (explicit signal or timeout):

Promote Facts to Long-term
- Scan conversation memory for high-value facts
- Store to persistent knowledge graph
- Update graph embeddings for retrieval
Record Conversation Episode (Optional)
- If conversation had overarching theme/goal:
  - Summarize entire conversation as high-level episode
  - Store to episodic memory

APIs

New Memory Service APIs

Fact Retrieval

RetrieveFactsRequest:
  - query: string (search query)
  - embedding: vector (pre-computed)
  - top_k: int (default: 5)
  - user: string
  - collection: string

RetrieveFactsResponse:
  - facts: list[Fact]
  - error: Error

Episode Retrieval

RetrieveEpisodesRequest:
  - query: string
  - embedding: vector
  - top_k: int (default: 3)
  - user: string
  - collection: string

RetrieveEpisodesResponse:
  - episodes: list[Episode]
  - error: Error

Fact Storage

StoreFactRequest:
  - content: string
  - source: string (conversation_id)
  - user: string
  - collection: string
  - metadata: dict

StoreFactResponse:
  - fact_id: string
  - error: Error

Episode Storage

StoreEpisodeRequest:
  - goal: string
  - steps: list[string]
  - outcome: string
  - lessons: string
  - user: string
  - collection: string
  - metadata: dict

StoreEpisodeResponse:
  - episode_id: string
  - error: Error

Conversation Memory Management

GetConversationMemoryRequest:
  - conversation_id: string
  - user: string

GetConversationMemoryResponse:
  - conversation: ConversationMemory
  - error: Error

UpdateConversationMemoryRequest:
  - conversation_id: string
  - invocation_summary: InvocationSummary
  - user: string

UpdateConversationMemoryResponse:
  - success: boolean
  - error: Error

Modified Agent APIs

AgentRequest (Extended)

AgentRequest:
  # Existing fields
  - question: string
  - history: list[AgentStep]
  - user: string
  - streaming: boolean
  - state: string
  - group: list[string]

  # New fields for memory
  - conversation_id: string (identifies conversation for memory retrieval)
  - enable_memory: boolean (default: false, set true for react_mem)

AgentResponse (No changes required)

Existing schema supports memory-enabled agents
Memory operations are transparent to client

Implementation Details

Service Configuration

The react_mem service will be configured similarly to react, with additional memory-related parameters:

Configuration key: agent-mem (distinct from agent used by react)

Configuration parameters:

max-iterations: Maximum ReAct loop iterations
working-memory-threshold: Token count trigger for compression (default: 2000)
fact-retrieval-top-k: Number of facts to retrieve (default: 5)
episode-retrieval-top-k: Number of episodes to retrieve (default: 3)
enable-auto-fact-extraction: Auto-extract facts at invocation end (default: true)
enable-episode-recording: Auto-record episodes (default: true)
additional-context: Additional system context (same as react)

Memory Service Specifications

Memory service implementation details are TBD and will be defined in a separate technical specification.

The memory services must implement the following interfaces:

Long-term Facts Service
- Interface: Fact storage, retrieval, deletion
- Storage backend: TBD (likely existing graph store + vector index)
- Embedding model: TBD (consistency with existing embeddings)
Episodic Memory Service
- Interface: Episode storage, retrieval, search
- Storage backend: TBD (vector store + structured metadata store)
- Retention policy: TBD
Conversation Memory Service
- Interface: CRUD operations on conversation records
- Storage backend: TBD (in-memory with optional persistence)
- TTL/cleanup policy: TBD
Working Memory Manager
- Interface: Append, compress, retrieve working memory
- Compression strategy: TBD (summarization vs. truncation)
- Implementation: In-memory only

These services will be registered as client specifications similar to existing services (GraphRagClientSpec, ToolClientSpec, etc.).

Prompt Engineering

The memory-aware prompts will need to incorporate retrieved context effectively:

Prompt structure (conceptual):

System: You are an agent with access to persistent memory...

Long-term Facts:
- [Retrieved fact 1]
- [Retrieved fact 2]
...

Relevant Past Episodes:
- Episode 1: [summary]
- Episode 2: [summary]
...

Conversation History:
- Previous invocation 1: [summary]
- Previous invocation 2: [summary]
...

Current Task:
Working Memory:
- [Current steps taken]

Question: [User query]

Think step by step and decide on next action or provide final answer.

Prompt templates will be defined in configuration, similar to existing agent prompts.

Tool Extensions (Optional Refinement)

To support mid-invocation retrieval (optional future enhancement):

RetrieveFacts Tool

Description: "Retrieve additional facts from knowledge base"
Arguments: query (string)
Implementation: Calls fact retrieval service

RecallEpisodes Tool

Description: "Search for similar past problem-solving episodes"
Arguments: query (string)
Implementation: Calls episode retrieval service

StoreFact Tool

Description: "Explicitly store a fact for future reference"
Arguments: fact_content (string)
Implementation: Calls fact storage service

These tools would be optional and configured per deployment.

Client Specifications

The react_mem service will register the following client specifications:

Existing (inherited from react):

TextCompletionClientSpec
GraphRagClientSpec
PromptClientSpec
ToolClientSpec
StructuredQueryClientSpec

New (memory-specific):

FactRetrievalClientSpec
EpisodeRetrievalClientSpec
FactStorageClientSpec
EpisodeStorageClientSpec
ConversationMemoryClientSpec

Security Considerations

Memory Isolation

All memory operations must be scoped to user and collection
Cross-user memory leakage must be prevented
Fact and episode retrieval must respect access controls

PII and Sensitive Information

Fact extraction should avoid storing sensitive personal information
Conversation memory may contain PII and must be handled accordingly
Memory retention policies should align with privacy requirements
Option to disable memory or purge memory for specific users/conversations

Embedding Security

Embeddings may leak information through similarity searches
Consider privacy-preserving embedding techniques for sensitive deployments

Resource Limits

Prevent unbounded memory growth through:
- Per-user/collection memory quotas
- Time-based expiration of old memories
- Relevance-based pruning of unused memories

Performance Considerations

Retrieval Latency

Embedding generation adds latency at invocation start
Vector searches for facts and episodes may add 50-200ms per retrieval
Mitigation: Parallel retrieval of facts and episodes, caching embeddings

Context Window Usage

Retrieved memories consume context window
Must balance memory breadth vs. depth
Working memory compression necessary to avoid overflow

Storage Overhead

Each conversation generates conversation memory records
Each invocation may create fact and episode records
Mitigation: Background cleanup jobs, retention policies

Scaling

Memory stores must scale with user base
Vector search performance critical for large fact/episode databases
Consider sharding strategies for multi-tenant deployments

Testing Strategy

Unit Tests

Memory service interfaces (mocked backends)
Working memory compression logic
Prompt assembly with memory context
Fact/episode extraction logic

Integration Tests

End-to-end agent invocation with memory enabled
Memory retrieval and storage flows
Conversation continuity across invocations
Working memory compression triggers

Performance Tests

Latency impact of memory retrieval
Context window utilization with varying memory sizes
Large-scale memory retrieval (1000s of facts/episodes)

Correctness Tests

Verify facts are correctly stored and retrieved
Verify episodes improve problem-solving on similar tasks
Verify conversation memory maintains coherence

Migration Plan

Phase 1: Module Creation

Create react_mem directory structure
Implement ReactMemService and ReactMemAgentManager shells
No memory operations yet (functionally equivalent to react)
Deploy and verify drop-in compatibility

Phase 2: Memory Services

Implement memory service interfaces and backends
Add memory retrieval at invocation start (read-only)
Deploy and verify retrieval performance

Phase 3: Memory Writing

Enable conversation memory updates
Enable fact and episode extraction
Deploy with memory writing enabled

Phase 4: Conversation Lifecycle

Implement conversation-end memory promotion
Add cleanup and retention policies
Full agentic memory capabilities enabled

Rollout Strategy

Deploy react_mem alongside react (not as replacement)
Users opt-in by changing configuration to use react_mem
Monitor performance and memory growth
Gradually migrate users based on feedback

Timeline

Phase 1 (Module Creation): 1-2 weeks
Phase 2 (Memory Services): 3-4 weeks (dependent on memory service spec)
Phase 3 (Memory Writing): 2-3 weeks
Phase 4 (Conversation Lifecycle): 1-2 weeks
Total: 7-11 weeks

Open Questions

Embedding Model: Should we use the same embedding model as existing graph RAG, or introduce a dedicated memory embedding model?
Fact Extraction Trigger: Auto-extract via LLM pass, or require explicit StoreFact tool usage?
Conversation End Detection: How do we reliably detect conversation end (timeout, explicit signal, heuristic)?
Memory Cleanup: What retention policies for facts, episodes, and conversation memory?
Cross-conversation Learning: Should episodic memory span conversations, or be scoped per conversation?
Working Memory Compression: Summarization (LLM-based) or truncation (rule-based)?
Memory Search UX: Should memory retrieval be visible to users, or completely transparent?
Backward Compatibility: Should react service gain conversation_id field for potential future memory support?

References

Existing react agent implementation: trustgraph-flow/trustgraph/agent/react/
AgentService base class: trustgraph-base/trustgraph/base/agent_service.py
Agent schemas: trustgraph-base/trustgraph/schema/services/agent.py
Similar memory-augmented agent architectures: Reflexion, MemGPT, ChatGPT memory features

20 KiB Raw Blame History