trustgraph/docs/tech-specs/agentic-memory.md
2025-12-03 16:29:47 +00:00

20 KiB

Agentic Memory Technical Specification

Overview

This specification describes the implementation of an agent manager with multi-layered memory capabilities for TrustGraph. The new react_mem module extends the existing ReAct pattern with persistent memory across invocations and conversations, enabling agents to learn from past interactions and maintain context over time.

The implementation supports the following use cases:

  1. Long-term Knowledge Retention: Store and retrieve factual information learned during agent interactions
  2. Episodic Memory: Remember past problem-solving approaches and their outcomes
  3. Conversation Continuity: Maintain context across multiple invocations within a conversation
  4. Working Memory Management: Handle extended reasoning chains with automatic compression
  5. Experience-Based Reasoning: Leverage similar past experiences when addressing new queries

Goals

  • Drop-in Compatibility: react_mem implements the same AgentService interface as react, allowing users to choose between implementations via configuration
  • Memory Persistence: Enable agents to retain and retrieve information across invocations and conversations
  • Scalable Context Management: Handle long-running conversations without unbounded memory growth
  • Retrieval Efficiency: Use embedding-based retrieval to surface relevant memory efficiently
  • Incremental Adoption: Deploy alongside existing react module without breaking changes
  • Transparent Operation: Memory operations should not significantly impact response latency
  • Configurable Behavior: Allow tuning of memory extraction, compression, and retrieval strategies

Background

Current Architecture

The existing react agent manager (module: trustgraph-flow/trustgraph/agent/react/) implements a stateless ReAct (Reasoning and Acting) loop:

  1. Receives AgentRequest with question and history
  2. Reasons about the problem using LLM
  3. Takes action by invoking tools
  4. Returns AgentResponse with thought, observation, or final answer
  5. Maintains working memory only for the current invocation via history field

Current limitations include:

  • No Long-term Memory: Each invocation starts fresh with no access to prior conversations
  • Limited Context Window: History is bounded by max iterations and token limits
  • No Learning: Agent cannot benefit from past problem-solving experiences
  • Conversation Fragmentation: No mechanism to connect related invocations across a conversation
  • Context Loss: Important information from early steps may be lost as history grows

This specification addresses these gaps by introducing a four-tier memory architecture. By maintaining long-term facts, episodic memories, conversation summaries, and managed working memory, TrustGraph agents can:

  • Build and leverage a persistent knowledge base
  • Learn from past successes and failures
  • Maintain coherent context across extended conversations
  • Handle complex multi-step reasoning without context overflow

Technical Design

Architecture

The agentic memory system requires the following technical components:

  1. ReactMemAgentManager

    • Extends AgentManager with memory-aware reasoning loop
    • Orchestrates memory retrieval at invocation start
    • Manages working memory compression during execution
    • Triggers memory extraction at invocation end

    Module: trustgraph-flow/trustgraph/agent/react_mem/agent_manager.py

  2. ReactMemService

    • Implements AgentService interface (drop-in replacement for react.Processor)
    • Manages conversation-level memory lifecycle
    • Handles configuration of memory services and policies
    • Coordinates memory promotion at conversation end

    Module: trustgraph-flow/trustgraph/agent/react_mem/service.py

  3. Memory Services (Implementation Details: TBD)

    • Long-term Facts Service
    • Episodic Memory Service
    • Conversation Memory Service
    • Working Memory Management Service

    Module: trustgraph-flow/trustgraph/agent/react_mem/memory/ (TBD)

  4. Memory Schema Definitions

    • Data structures for facts, episodes, conversation records
    • Request/response schemas for memory operations

    Module: trustgraph-base/trustgraph/schema/agent_memory.py

Memory Architecture

Four-Tier Memory System

Long-term Facts

  • Purpose: Persistent knowledge graph of facts learned across all conversations
  • Scope: Global per user/collection
  • Retrieval: Embedding-based semantic search on user query
  • Lifetime: Permanent until explicitly deleted
  • Storage: Graph database (existing TrustGraph knowledge store)

Episodic Memory

  • Purpose: Records of past problem-solving episodes with outcomes
  • Scope: Global per user/collection
  • Retrieval: Embedding-based similarity search on current goal/context
  • Lifetime: Permanent with optional TTL or relevance-based pruning
  • Storage: Vector store with structured metadata

Conversation Memory

  • Purpose: Summaries and key points from prior invocations in current conversation
  • Scope: Current conversation only
  • Retrieval: Sequential access (not searched)
  • Lifetime: Duration of conversation
  • Storage: In-memory with optional persistence for long conversations

Working Memory

  • Purpose: Reasoning trace for current invocation (thoughts, actions, observations)
  • Scope: Current invocation only
  • Retrieval: Directly included in LLM context
  • Lifetime: Current invocation (discarded after summarization)
  • Storage: In-memory list

Data Models

Memory Records

Fact Record

Fact:
  - id: string (unique identifier)
  - content: string (fact statement)
  - source: string (conversation_id where learned)
  - timestamp: datetime
  - embedding: vector
  - metadata: dict

Episode Record

Episode:
  - id: string (unique identifier)
  - goal: string (what was being attempted)
  - steps: list[string] (key actions taken)
  - outcome: string (result achieved)
  - lessons: string (insights/learnings)
  - timestamp: datetime
  - embedding: vector
  - metadata: dict

Conversation Record

ConversationMemory:
  - conversation_id: string
  - invocations: list[InvocationSummary]
  - metadata: dict

InvocationSummary:
  - query: string
  - summary: string (what agent did)
  - result: string (outcome)
  - timestamp: datetime

Working Memory Item

WorkingMemoryItem:
  - type: enum[thought, action, observation]
  - content: string
  - timestamp: datetime
  - metadata: dict (e.g., token_count)

Memory Operations Lifecycle

Invocation Start: Context Assembly

When ReactMemService receives an AgentRequest:

  1. Retrieve Long-term Facts

    • Embed user query
    • Query facts store with embedding
    • Retrieve top-k relevant facts (configurable k)
  2. Retrieve Episodic Memory

    • Embed user query + current state
    • Query episodes store with embedding
    • Retrieve top-k similar past episodes
  3. Load Conversation Memory

    • Fetch conversation record by conversation_id
    • Load all prior invocation summaries for this conversation
  4. Initialize Working Memory

    • Create empty working memory buffer
    • Optionally seed with high-level goal

These retrieved memories are assembled into the context passed to ReactMemAgentManager.reason().

Core Loop: Memory-Aware Reasoning

The reasoning loop proceeds similarly to standard ReAct, but with augmented context:

  1. Assemble Context

    • System prompt / agent persona
    • Long-term facts (from retrieval)
    • Relevant episodes (from retrieval)
    • Conversation memory (loaded summaries)
    • Working memory (current invocation trace)
    • Current user query
  2. Reason

    • LLM generates thought and decision
    • Returns Action (tool call) or Final (answer)
  3. Act (if Action)

    • Execute tool
    • Get observation result
  4. Update Working Memory

    • Append thought, action, observation to working memory
    • Check working memory size
    • If exceeds threshold: compress older entries, preserve recent ones
  5. Loop until Final answer or max iterations

Working Memory Compression Trigger: When token count of working memory exceeds threshold (e.g., 50% of context window), invoke summarization:

  • Keep most recent N steps verbatim
  • Summarize older steps into condensed form
  • Preserve critical information (tool results, key decisions)

Invocation End: Memory Extraction

After sending final response:

  1. Conversation Memory Update

    • Generate invocation summary: "User asked X, agent did Y, result was Z"
    • Append summary to conversation memory
  2. Fact Extraction (Optional/Conditional)

    • LLM pass: "Did the agent learn anything worth persisting?"
    • If yes: Extract fact statements, store to long-term facts
    • Alternative: Agent explicitly calls StoreFact tool during reasoning
  3. Episode Recording (If Task-Like)

    • If invocation resembled a task (multi-step problem solving):
      • Extract: goal, key steps, outcome, lessons learned
      • Generate embedding for retrieval
      • Store to episodic memory
  4. Discard Working Memory

    • Working memory cleared (already summarized)

Conversation End: Memory Promotion

When conversation concludes (explicit signal or timeout):

  1. Promote Facts to Long-term

    • Scan conversation memory for high-value facts
    • Store to persistent knowledge graph
    • Update graph embeddings for retrieval
  2. Record Conversation Episode (Optional)

    • If conversation had overarching theme/goal:
      • Summarize entire conversation as high-level episode
      • Store to episodic memory

APIs

New Memory Service APIs

Fact Retrieval

RetrieveFactsRequest:
  - query: string (search query)
  - embedding: vector (pre-computed)
  - top_k: int (default: 5)
  - user: string
  - collection: string

RetrieveFactsResponse:
  - facts: list[Fact]
  - error: Error

Episode Retrieval

RetrieveEpisodesRequest:
  - query: string
  - embedding: vector
  - top_k: int (default: 3)
  - user: string
  - collection: string

RetrieveEpisodesResponse:
  - episodes: list[Episode]
  - error: Error

Fact Storage

StoreFactRequest:
  - content: string
  - source: string (conversation_id)
  - user: string
  - collection: string
  - metadata: dict

StoreFactResponse:
  - fact_id: string
  - error: Error

Episode Storage

StoreEpisodeRequest:
  - goal: string
  - steps: list[string]
  - outcome: string
  - lessons: string
  - user: string
  - collection: string
  - metadata: dict

StoreEpisodeResponse:
  - episode_id: string
  - error: Error

Conversation Memory Management

GetConversationMemoryRequest:
  - conversation_id: string
  - user: string

GetConversationMemoryResponse:
  - conversation: ConversationMemory
  - error: Error

UpdateConversationMemoryRequest:
  - conversation_id: string
  - invocation_summary: InvocationSummary
  - user: string

UpdateConversationMemoryResponse:
  - success: boolean
  - error: Error

Modified Agent APIs

AgentRequest (Extended)

AgentRequest:
  # Existing fields
  - question: string
  - history: list[AgentStep]
  - user: string
  - streaming: boolean
  - state: string
  - group: list[string]

  # New fields for memory
  - conversation_id: string (identifies conversation for memory retrieval)
  - enable_memory: boolean (default: false, set true for react_mem)

AgentResponse (No changes required)

  • Existing schema supports memory-enabled agents
  • Memory operations are transparent to client

Implementation Details

Service Configuration

The react_mem service will be configured similarly to react, with additional memory-related parameters:

Configuration key: agent-mem (distinct from agent used by react)

Configuration parameters:

  • max-iterations: Maximum ReAct loop iterations
  • working-memory-threshold: Token count trigger for compression (default: 2000)
  • fact-retrieval-top-k: Number of facts to retrieve (default: 5)
  • episode-retrieval-top-k: Number of episodes to retrieve (default: 3)
  • enable-auto-fact-extraction: Auto-extract facts at invocation end (default: true)
  • enable-episode-recording: Auto-record episodes (default: true)
  • additional-context: Additional system context (same as react)

Memory Service Specifications

Memory service implementation details are TBD and will be defined in a separate technical specification.

The memory services must implement the following interfaces:

  1. Long-term Facts Service

    • Interface: Fact storage, retrieval, deletion
    • Storage backend: TBD (likely existing graph store + vector index)
    • Embedding model: TBD (consistency with existing embeddings)
  2. Episodic Memory Service

    • Interface: Episode storage, retrieval, search
    • Storage backend: TBD (vector store + structured metadata store)
    • Retention policy: TBD
  3. Conversation Memory Service

    • Interface: CRUD operations on conversation records
    • Storage backend: TBD (in-memory with optional persistence)
    • TTL/cleanup policy: TBD
  4. Working Memory Manager

    • Interface: Append, compress, retrieve working memory
    • Compression strategy: TBD (summarization vs. truncation)
    • Implementation: In-memory only

These services will be registered as client specifications similar to existing services (GraphRagClientSpec, ToolClientSpec, etc.).

Prompt Engineering

The memory-aware prompts will need to incorporate retrieved context effectively:

Prompt structure (conceptual):

System: You are an agent with access to persistent memory...

Long-term Facts:
- [Retrieved fact 1]
- [Retrieved fact 2]
...

Relevant Past Episodes:
- Episode 1: [summary]
- Episode 2: [summary]
...

Conversation History:
- Previous invocation 1: [summary]
- Previous invocation 2: [summary]
...

Current Task:
Working Memory:
- [Current steps taken]

Question: [User query]

Think step by step and decide on next action or provide final answer.

Prompt templates will be defined in configuration, similar to existing agent prompts.

Tool Extensions (Optional Refinement)

To support mid-invocation retrieval (optional future enhancement):

RetrieveFacts Tool

  • Description: "Retrieve additional facts from knowledge base"
  • Arguments: query (string)
  • Implementation: Calls fact retrieval service

RecallEpisodes Tool

  • Description: "Search for similar past problem-solving episodes"
  • Arguments: query (string)
  • Implementation: Calls episode retrieval service

StoreFact Tool

  • Description: "Explicitly store a fact for future reference"
  • Arguments: fact_content (string)
  • Implementation: Calls fact storage service

These tools would be optional and configured per deployment.

Client Specifications

The react_mem service will register the following client specifications:

Existing (inherited from react):

  • TextCompletionClientSpec
  • GraphRagClientSpec
  • PromptClientSpec
  • ToolClientSpec
  • StructuredQueryClientSpec

New (memory-specific):

  • FactRetrievalClientSpec
  • EpisodeRetrievalClientSpec
  • FactStorageClientSpec
  • EpisodeStorageClientSpec
  • ConversationMemoryClientSpec

Security Considerations

Memory Isolation

  • All memory operations must be scoped to user and collection
  • Cross-user memory leakage must be prevented
  • Fact and episode retrieval must respect access controls

PII and Sensitive Information

  • Fact extraction should avoid storing sensitive personal information
  • Conversation memory may contain PII and must be handled accordingly
  • Memory retention policies should align with privacy requirements
  • Option to disable memory or purge memory for specific users/conversations

Embedding Security

  • Embeddings may leak information through similarity searches
  • Consider privacy-preserving embedding techniques for sensitive deployments

Resource Limits

  • Prevent unbounded memory growth through:
    • Per-user/collection memory quotas
    • Time-based expiration of old memories
    • Relevance-based pruning of unused memories

Performance Considerations

Retrieval Latency

  • Embedding generation adds latency at invocation start
  • Vector searches for facts and episodes may add 50-200ms per retrieval
  • Mitigation: Parallel retrieval of facts and episodes, caching embeddings

Context Window Usage

  • Retrieved memories consume context window
  • Must balance memory breadth vs. depth
  • Working memory compression necessary to avoid overflow

Storage Overhead

  • Each conversation generates conversation memory records
  • Each invocation may create fact and episode records
  • Mitigation: Background cleanup jobs, retention policies

Scaling

  • Memory stores must scale with user base
  • Vector search performance critical for large fact/episode databases
  • Consider sharding strategies for multi-tenant deployments

Testing Strategy

Unit Tests

  • Memory service interfaces (mocked backends)
  • Working memory compression logic
  • Prompt assembly with memory context
  • Fact/episode extraction logic

Integration Tests

  • End-to-end agent invocation with memory enabled
  • Memory retrieval and storage flows
  • Conversation continuity across invocations
  • Working memory compression triggers

Performance Tests

  • Latency impact of memory retrieval
  • Context window utilization with varying memory sizes
  • Large-scale memory retrieval (1000s of facts/episodes)

Correctness Tests

  • Verify facts are correctly stored and retrieved
  • Verify episodes improve problem-solving on similar tasks
  • Verify conversation memory maintains coherence

Migration Plan

Phase 1: Module Creation

  • Create react_mem directory structure
  • Implement ReactMemService and ReactMemAgentManager shells
  • No memory operations yet (functionally equivalent to react)
  • Deploy and verify drop-in compatibility

Phase 2: Memory Services

  • Implement memory service interfaces and backends
  • Add memory retrieval at invocation start (read-only)
  • Deploy and verify retrieval performance

Phase 3: Memory Writing

  • Enable conversation memory updates
  • Enable fact and episode extraction
  • Deploy with memory writing enabled

Phase 4: Conversation Lifecycle

  • Implement conversation-end memory promotion
  • Add cleanup and retention policies
  • Full agentic memory capabilities enabled

Rollout Strategy

  • Deploy react_mem alongside react (not as replacement)
  • Users opt-in by changing configuration to use react_mem
  • Monitor performance and memory growth
  • Gradually migrate users based on feedback

Timeline

  • Phase 1 (Module Creation): 1-2 weeks
  • Phase 2 (Memory Services): 3-4 weeks (dependent on memory service spec)
  • Phase 3 (Memory Writing): 2-3 weeks
  • Phase 4 (Conversation Lifecycle): 1-2 weeks
  • Total: 7-11 weeks

Open Questions

  • Embedding Model: Should we use the same embedding model as existing graph RAG, or introduce a dedicated memory embedding model?
  • Fact Extraction Trigger: Auto-extract via LLM pass, or require explicit StoreFact tool usage?
  • Conversation End Detection: How do we reliably detect conversation end (timeout, explicit signal, heuristic)?
  • Memory Cleanup: What retention policies for facts, episodes, and conversation memory?
  • Cross-conversation Learning: Should episodic memory span conversations, or be scoped per conversation?
  • Working Memory Compression: Summarization (LLM-based) or truncation (rule-based)?
  • Memory Search UX: Should memory retrieval be visible to users, or completely transparent?
  • Backward Compatibility: Should react service gain conversation_id field for potential future memory support?

References

  • Existing react agent implementation: trustgraph-flow/trustgraph/agent/react/
  • AgentService base class: trustgraph-base/trustgraph/base/agent_service.py
  • Agent schemas: trustgraph-base/trustgraph/schema/services/agent.py
  • Similar memory-augmented agent architectures: Reflexion, MemGPT, ChatGPT memory features