IAI-MCP fork to work with opencode plugin and qdrant db on non-AVX CPUs
Find a file
2026-05-22 12:02:04 +00:00
.forgejo/workflows add /oc bot 2026-05-22 09:41:45 +02:00
.github Update 2026-05-10 20:42:02 -07:00
bench Initial release: iai-mcp v0.1.0 2026-05-06 01:04:47 -07:00
deploy feat: ship OpenCode capture plugin in deploy/opencode/ 2026-05-22 10:03:16 +02:00
mcp-wrapper chore(deps): update dependency @types/node to v22.19.19 2026-05-22 12:02:04 +00:00
scripts feat: add monitor.sh — daemon health, topology, sleep cycles, edge health 2026-05-21 10:35:07 +02:00
src/iai_mcp feat(maintenance): add dedup-edges subcommand to clean Qdrant duplicates 2026-05-16 12:08:28 +02:00
tests Initial release: iai-mcp v0.1.0 2026-05-06 01:04:47 -07:00
.env.example Initial release: iai-mcp v0.1.0 2026-05-06 01:04:47 -07:00
.gitignore update gitignore 2026-05-12 16:47:31 +02:00
CODE_OF_CONDUCT.md Update 2026-05-10 20:42:02 -07:00
CONTRIBUTING.md Update 2026-05-10 20:42:02 -07:00
LICENSE Initial release: iai-mcp v0.1.0 2026-05-06 01:04:47 -07:00
logo.png Initial release: iai-mcp v0.1.0 2026-05-06 01:04:47 -07:00
pyproject.toml Removed 4 orphaned deps 2026-05-22 09:02:17 +00:00
README.md feat: ship OpenCode capture plugin in deploy/opencode/ 2026-05-22 10:03:16 +02:00
renovate.json Add renovate.json 2026-05-22 06:08:24 +00:00
SECURITY.md Update 2026-05-10 20:42:02 -07:00

IAI-MCP

The best-benchmarked open-source memory system for AI coding assistants.

Every claim ships with the harness that proves it. Run the benchmarks yourself.

License: MIT Python 3.11 | 3.12 Platform: macOS | Linux

Verbatim recall >= 99% p95 < 100ms AES-256-GCM Local only, no telemetry MCP compatible Backend: LanceDB | Qdrant


iai-mcp

Independent Autistic Intelligence — a local memory layer for Claude (and other MCP-compatible assistants).

Table of contents


What it is

A local server that speaks the MCP protocol and gives Claude, and any other MCP-compatible assistant, a long-term memory. It captures every turn of every session verbatim, organizes those captures over time into a personal map of who you are, and serves a small slice of relevant memory back at the start of each new conversation. You never have to say "remember this" or "what did we say last time?".

I built this for myself. It worked. I've been running it daily for months, and now I'm sharing it. The benchmarks were mostly for my own curiosity. I wanted to know if it actually works or if I'd just gotten used to it.


Quick start

Prerequisites

  • macOS (Apple Silicon tested) or Linux
  • Python 3.11 or 3.12
  • Node.js 18+
  • Claude Code or OpenCode as the MCP host
  • ~500 MB free disk

Install

git clone https://github.com/CodeAbra/iai-mcp.git
cd iai-mcp
bash scripts/install.sh

The installer creates a Python venv at ~/.venv/iai-mcp, installs dependencies (LanceDB, sentence-transformers, torch-hd, NetworkX, igraph), builds the TypeScript MCP wrapper, pre-downloads the default embedding model (~130 MB), symlinks the CLI to ~/.local/bin/iai-mcp, and on macOS registers the daemon with launchd. On Linux, install the systemd unit from deploy/systemd/iai-mcp-daemon.service.

Make sure ~/.local/bin is on your PATH:

export PATH="$HOME/.local/bin:$PATH"  # add to ~/.zshrc or ~/.bashrc
iai-mcp --version

Install the Stop hook

This is what makes capture ambient. Without it you'd have to save memories by hand.

mkdir -p ~/.claude/hooks
cp deploy/hooks/iai-mcp-session-capture.sh ~/.claude/hooks/
chmod +x ~/.claude/hooks/iai-mcp-session-capture.sh

Register in ~/.claude/settings.json:

{
  "hooks": {
    "Stop": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "$HOME/.claude/hooks/iai-mcp-session-capture.sh"
          }
        ]
      }
    ]
  }
}

Connect Claude

claude mcp add iai-mcp -- node "$(pwd)/mcp-wrapper/dist/index.js"

Or edit ~/.claude.json directly:

{
  "mcpServers": {
    "iai-mcp": {
      "command": "node",
      "args": ["/absolute/path/to/iai-mcp/mcp-wrapper/dist/index.js"]
    }
  }
}

Use the absolute path. ~ and $HOME won't expand here.

For Claude Desktop (untested), edit ~/Library/Application Support/Claude/claude_desktop_config.json.

Install the OpenCode plugin

This is the OpenCode equivalent of the Claude Code Stop hook. It listens for idle sessions and writes a deferred-capture JSONL file for the daemon to drain.

mkdir -p ~/.config/opencode/plugins
cp deploy/opencode/iai-mcp-capture.js ~/.config/opencode/plugins/

Make sure @opencode-ai/plugin is installed:

cd ~/.config/opencode
npm install @opencode-ai/plugin

Connect OpenCode

Add the MCP server to ~/.config/opencode/config.json:

{
  "mcp": {
    "iai-mcp": {
      "type": "local",
      "command": ["node", "/absolute/path/to/iai-mcp/mcp-wrapper/dist/index.js"],
      "enabled": true
    }
  }
}

Use the absolute path to mcp-wrapper/dist/index.js. The plugin file (iai-mcp-capture.js) must already be in ~/.config/opencode/plugins/.

Verify

iai-mcp doctor
iai-mcp daemon status

Restart your MCP host (Claude Code or OpenCode). Start a session, do some work, exit. Then:

tail ~/.iai-mcp/logs/capture-$(date -u +%Y-%m-%d).log

You should see a rc=0 line. That's your first memory.

Linux (systemd)

Install the systemd user unit:

mkdir -p ~/.config/systemd/user
cp deploy/systemd/iai-mcp-daemon.service ~/.config/systemd/user/
systemctl --user daemon-reload
systemctl --user enable --now iai-mcp-daemon.service

For survival past logout (headless servers): loginctl enable-linger $USER

Check status: systemctl --user status iai-mcp-daemon.service


Usage

You do not call iai-mcp directly during a session. Once it's connected:

Capture is automatic. Every turn, yours and the assistant's, is recorded verbatim with timestamps and session metadata. You don't say "remember this."

Recall is automatic. When a new session starts, the daemon assembles a small relevant slice of your history and injects it into the conversation prefix. You don't say "what did we say."

Consolidation runs idle. Between sessions, the daemon merges duplicates, strengthens recall pathways for things retrieved often, and prunes weak edges. The system gets quietly better at remembering you over time.

After a few weeks of regular use the difference becomes noticeable. The assistant stops asking the same orientation questions, references things you mentioned in passing, and adapts to your style without being told.


How it works

The daemon is a Python process that runs in the background. Your MCP client connects to it via a Unix socket. No network exposure.

Memory is stored in three tiers:

Episodic is verbatim, timestamped fragments of what was said. Write-once, never overwritten or rewritten.

Semantic is summaries induced from clusters of related episodes during idle-time consolidation.

Procedural is a small set of stable parameters about you, learned over time: preferences, style cues, recurring patterns. Eleven sealed knobs that shift based on what works.

A background pass runs periodically (sleep cycles): it clusters episodes, builds semantic summaries, decays old unreinforced connections, and reinforces frequently co-retrieved paths. Things you haven't revisited fade naturally. There's an optional "insight of the day" step that makes one Anthropic API call, but it's off by default.

Recall combines three signals: semantic similarity, graph-link strength, and recency. All ranked together.

All records are encrypted at rest with AES-256-GCM. The key lives in ~/.iai-mcp/.key (mode 0600). Back it up. Lose the key, lose the memories.

Everything lives at ~/.iai-mcp/. Embeddings are computed locally with bge-small-en-v1.5. The only data that leaves the machine is your normal conversation with whatever LLM API your client uses.

Storage backends

The default backend is LanceDB (embedded, zero-config, local). For larger deployments or multi-machine setups, an optional Qdrant backend is available. Set QDRANT_URL to enable it — the system auto-selects the backend via a factory function.

Migrate from LanceDB to Qdrant: iai-mcp migrate qdrant

Claude Code  <--MCP-stdio-->  TypeScript wrapper  <--UNIX socket-->  Python daemon  <-->  LanceDB / Qdrant

Benchmarks

I made these because I wanted honest numbers. Every harness ships in bench/. Run them on your machine, get your own results.

Metric Target Measured
Verbatim recall (byte-exact) >=99% >=99% at N=10k
Recall p95 latency <100 ms <100 ms at N=10k
RAM at steady state <=300 MB ~150-300 MB
Session-start tokens (warm cache) <=3,000 <=3,000
Session-start tokens (cold) <=8,000 <=8,000
python -m bench.verbatim                     # verbatim fidelity
python -m bench.neural_map                   # recall latency
python -m bench.memory_footprint             # RAM usage
python -m bench.tokens                       # session-start cost
python -m bench.total_session_cost           # full 10-turn cost
python -m bench.trajectory                   # 30-session corpus
python -m bench.contradiction_longitudinal   # falsifiability
python -m bench.longmemeval_blind            # LongMemEval-S blind run

The LongMemEval-S run is blind on purpose. No dataset-specific tuning, no hyperparameter sweep. The numbers are what they are.


Configuration

Variable Default What it does
IAI_MCP_STORE ~/.iai-mcp/ Data directory
IAI_MCP_EMBED_MODEL bge-small-en-v1.5 Embedding model. bge-m3 for multilingual at ~3x size.
QDRANT_URL (unset) Qdrant server URL. When set, uses Qdrant backend instead of LanceDB.
QDRANT_API_KEY (unset) Qdrant API key

Switching embedders requires re-embedding the store: iai-mcp migrate reembed.

Non-AVX CPUs

If your CPU lacks AVX support, rebuild the venv with the CPU-only PyTorch wheel:

bash scripts/rebuild-venv.sh

Doctor

iai-mcp doctor runs 14 checks against the daemon, the store, and the runtime state. Output is one line per check: PASS, WARN, or FAIL.

iai-mcp doctor

What it checks:

# Check What it means
a Daemon alive Is the daemon process running?
b Socket fresh Can the UNIX socket accept a connection?
c Lock healthy Is the process lock held correctly?
d No orphan core No leftover stdio core process without a daemon
e State file valid .daemon-state.json parses and has expected fields
f LanceDB readable Can the records table be opened and queried?
g No duplicate binders Only one process is bound to the socket
h Crypto file state Encryption key exists, correct permissions (0600)
i Lance versions count LanceDB version manifests aren't piling up
j Lifecycle current state Current FSM state is valid
k Lifecycle history 24h Recent lifecycle transitions look sane
l Sleep cycle status Last sleep cycle completed or is running normally
m Heartbeat scanner Wrapper heartbeat files are fresh
n HID idle source Idle detection source is available

14/14 PASS is healthy. 13/14 with check (b) failing during a sleep cycle is also normal (the socket is busy during consolidation). Multiple FAILs or a FAIL on (a) or (f) means something is actually wrong.


Notes for AI assistants helping with installation

If you're an AI model helping a user install or debug iai-mcp, read this before diagnosing anything.

The daemon has four FSM states: WAKE, TRANSITIONING, SLEEP, DREAMING. Separately, the wrapper-layer lifecycle has four states: WAKE, DROWSY, SLEEP, HIBERNATION. These are two different state machines in two different files. Don't confuse them.

Things that look broken but are not:

  • last_tick_at hasn't changed in hours. This is a sleep-cycle tick timestamp, not a process heartbeat. The daemon can be alive and healthy with a stale last_tick_at. Check iai-mcp daemon status instead.
  • fsm_state: DREAMING for hours. Normal. The daemon is running a sleep cycle. Wake transitions are event-driven (a new session arriving wakes it), not time-driven.
  • status: deferred in the capture log. This is the only path the Stop hook takes. It writes captures to ~/.iai-mcp/.deferred-captures/ and the daemon picks them up on its next tick. This is by design, not a failure.
  • iai-mcp doctor shows 13/14 PASS with check (b) "socket unreachable." Expected when the daemon FSM is in SLEEP or DREAMING. The socket is intentionally unresponsive during consolidation passes. This is healthy.
  • Cold start takes a few seconds. The daemon needs to initialize the LanceDB store and load the embedding model into memory. Don't kill it because it hasn't responded in 5 seconds.

Actual problems look like: iai-mcp daemon status returning ok: False, doctor showing multiple FAIL results, or the daemon process not existing at all (pgrep -f iai_mcp.daemon returns nothing).

When in doubt, run iai-mcp doctor and read what it says. The output is self-explanatory.


Status and limitations

This is experimental. I built it for myself, it works on my machine, and I'm sharing it because it might be useful to you. No SLA, no support guarantee. Breaking changes are possible between versions. Pin a commit hash if you depend on stability.

Limitations worth knowing about:

  • The default embedding model is English-only. The assistant translates to English on the way into memory. The opt-in bge-m3 model removes this constraint at a cost of ~3x storage and slower indexing.
  • No cross-machine sync. The data lives where the daemon runs. Backup is cp -a ~/.iai-mcp/ somewhere safe.
  • No GUI. Inspection happens through CLI subcommands (iai-mcp doctor, iai-mcp daemon status, iai-mcp topology).
  • Cold start on a freshly booted machine takes a few seconds while the daemon initializes caches.
  • Recall quality on the first ~10 sessions is mediocre. The system needs material to consolidate before it gets useful.

Compatibility

Claude Code is the primary host, validated in daily use.

OpenCode is supported via the iai-mcp-capture.js plugin (see Install the OpenCode plugin above) and MCP server config in ~/.config/opencode/config.json.

Claude Desktop should work (uses claude_desktop_config.json instead of ~/.claude.json) but hasn't been tested end to end.

Other MCP-over-stdio hosts speak the same protocol and should work in principle. Not tested.

Linux support via systemd is available. macOS via launchd.


About the name

IAI stands for Independent Autistic Intelligence.

  • Independent. Fully local. The daemon runs on your machine, embeddings are computed locally, no telemetry, no cloud dependency. Your memory is your data and stays your data.
  • Autistic. Describes the memory style, not a diagnosis or a metaphor. The memory is built around verbatim recall, attention to specific cues, and refusal to smooth rare events into typical ones. Most memory systems compress and summarize aggressively, aiming to give the assistant a gist of the past. This one preserves what was actually said and surfaces it on a precise cue. The trade-off is intentional: more storage and a stricter retrieval interface, in exchange for not losing details.
  • Intelligence. Used in the systems sense, something that observes, adapts, and stays viable over time, not the marketing sense.

Authors

By Areg Aramovich Noya, in collaboration with the team at lcgc.dev.

I built this because I needed it. It works for me. If it works for you, take it.

License

MIT

Contributing

Issues and PRs welcome. If your change touches retrieval, capture, or consolidation, include bench re-runs.