Refactor AGENTS.md from encyclopedia to map; move spec into docs/

Splits the 990-line AGENTS.md into a 184-line map (architecture, where-to-find index, always-on invariants, capability matrix, maintenance contract) plus 18 new docs/*.md files holding the deep content per topic (storage, schema and query languages, indexes, embeddings, branches/commits, runs, merge, changes, execution, policy, server, CLI reference, audit, errors, CI, constants, v0.3.1 notes). Adds scripts/check-agents-md.sh and a check_agents_md CI job that verifies every docs/ link in AGENTS.md resolves and every doc in the canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 01:35:18 +02:00 · 2026-04-28 23:31:08 +02:00 · 2026-04-28 23:31:08 +02:00 · a335d98854
commit a335d98854
parent cfea41e942
23 changed files with 1069 additions and 924 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@ -99,6 +99,18 @@ jobs:
          echo "run_full_ci=$run_full_ci" >> "$GITHUB_OUTPUT"
          echo "run_rustfs_ci=$run_rustfs_ci" >> "$GITHUB_OUTPUT"

+  check_agents_md:
+    name: Check AGENTS.md Links
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+    steps:
+      - name: Checkout source
+        uses: actions/checkout@v5.0.1
+
+      - name: Verify AGENTS.md ↔ docs/ cross-links
+        run: bash scripts/check-agents-md.sh
+
  test:
    name: Test Workspace
    needs: classify_changes
--- a/AGENTS.md
+++ b/AGENTS.md
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -0,0 +1,69 @@
+# Architecture
+
+OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets, with Git-style branches and commits across the whole graph, multi-modal querying (vector + FTS + BM25 + RRF + graph traversal) in one runtime, an HTTP server with Cedar policy, and a CLI driven by a single `omnigraph.yaml`.
+
+## Stack
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│  CLI (omnigraph)        HTTP Server (omnigraph-server, Axum)     │
+│  - 13 cmd families      - REST + OpenAPI                         │
+│  - Aliases, configs     - Bearer auth + Cedar policy             │
+└──────────────────────────────┬───────────────────────────────────┘
+                               │
+┌──────────────────────────────▼───────────────────────────────────┐
+│  omnigraph-compiler                                              │
+│  - Pest grammars: schema.pest, query.pest                        │
+│  - Catalog (Node/Edge/Interface types)                           │
+│  - IR + lowering (NodeScan / Expand / Filter / AntiJoin)         │
+│  - Schema migration planner                                      │
+│  - Embedding client (OpenAI-style for query-time normalization)  │
+└──────────────────────────────┬───────────────────────────────────┘
+                               │
+┌──────────────────────────────▼───────────────────────────────────┐
+│  omnigraph (engine)                                              │
+│  - GraphCoordinator + ManifestRepo (__manifest)                  │
+│  - CommitGraph (_graph_commits.lance)                            │
+│  - RunRegistry  (_graph_runs.lance, __run__ branches)            │
+│  - GraphIndex (CSR/CSC) + RuntimeCache (LRU 8)                   │
+│  - exec::query / mutation / merge                                │
+│  - Embedding client (Gemini for runtime ingest)                  │
+└──────────────────────────────┬───────────────────────────────────┘
+                               │
+┌──────────────────────────────▼───────────────────────────────────┐
+│  Lance 4.x  (per-table dataset)                                  │
+│  - Columnar (Arrow) storage, fragments                           │
+│  - Manifest versions per dataset                                 │
+│  - Per-dataset branches (copy-on-write)                          │
+│  - Indexes: BTREE, Inverted (FTS/BM25), IVF/HNSW vector          │
+│  - merge_insert (upsert), append, delete                         │
+│  - compact_files, cleanup_old_versions                           │
+└──────────────────────────────┬───────────────────────────────────┘
+                               │
+┌──────────────────────────────▼───────────────────────────────────┐
+│  Object store: local FS, S3, RustFS, MinIO, S3-compatible        │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+## L1 / L2 framing
+
+Throughout the docs, capabilities are split into:
+
+- **L1 — Inherited from Lance**: what OmniGraph gets "for free" by sitting on top of the Lance dataset format (columnar Arrow storage, per-dataset versions and branches, index types, `merge_insert`, `compact_files` / `cleanup_old_versions`).
+- **L2 — Added by OmniGraph**: typing (schema language), graph semantics, multi-dataset coordination via `__manifest`, graph-level branches and commits, the `.gq` query language and IR, the topology index, the HTTP server, Cedar policy, the CLI.
+
+## Concurrency model
+
+- **MVCC**: every Lance write bumps a per-dataset version; the OmniGraph manifest version coordinates which sub-table versions are visible together.
+- **Snapshot isolation**: a query holds one `Snapshot` for its lifetime; concurrent writes don't leak in.
+- **Cross-branch isolation**: copy-on-write means readers and writers on different branches don't block each other.
+- **Run isolation**: each transactional run lives on its own `__run__<id>` branch.
+- **Schema-apply lock**: `__schema_apply_lock__` system branch serializes schema migrations.
+- **Fail-points** (`failpoints` cargo feature): `failpoints::maybe_fail("operation.step")?` in `branch_create`, publish, etc., for deterministic failure injection in tests.
+
+## Workspace crates
+
+- `omnigraph-compiler` — schema and query grammars, catalog, IR, lowering, type checker, lint, migration planner, OpenAI-style embedding client.
+- `omnigraph` (engine, published as `omnigraph-engine` on crates.io since v0.2.2) — the Lance-backed runtime: manifest, commit graph, run registry, snapshot, exec, merge, loader, Gemini embedding client.
+- `omnigraph-cli` — the `omnigraph` binary.
+- `omnigraph-server` — the `omnigraph-server` binary (Axum HTTP server).
--- a/docs/audit.md
+++ b/docs/audit.md
@ -0,0 +1,6 @@
+# Audit / Actor tracking
+
+- `Omnigraph::audit_actor_id: Option<String>` is the actor in effect.
+- `_as` variants of every write API let callers override the actor: `begin_run_as`, `publish_run_as`, `ingest_as`, `mutate_as`, `branch_merge_as`, etc.
+- Actor IDs are persisted both on `RunRecord.actor_id` and on `GraphCommit.actor_id`, with optional split storage in `_graph_commit_actors.lance` and `_graph_run_actors.lance`.
+- HTTP server uses the bearer-token actor automatically; CLI uses the local user / explicit env (no implicit actor).
--- a/docs/branches-commits.md
+++ b/docs/branches-commits.md
@ -0,0 +1,51 @@
+# Branches, Commits, Snapshots
+
+## L1 — Lance per-dataset branches
+
+Lance supports branching at the dataset level: a branch is a named lineage of versions, and `fork_branch_from_state(source_branch, target_branch, source_version)` creates a copy-on-write fork.
+
+## L2 — Graph-level branches
+
+OmniGraph builds *graph branches* on top by branching every sub-table coherently:
+
+- `branch_create(name)` / `branch_create_from(target, name)` — disallowed name `main`; fails if branch exists; ensures the schema-apply lock is idle.
+- `branch_list()` — returns public branches, **filters internal** `__run__…` and `__schema_apply_lock__` prefixes.
+- `branch_delete(name)` — refuses if there are descendants or active runs on the branch; cleans up owned per-branch fragments.
+- **Lazy forking**: a branch only forks a sub-table when that sub-table is first mutated on it. Pure-read branches share fragments with their source.
+- `sync_branch(branch)` — re-binds the in-memory handle to the latest head of the branch.
+
+## L2 — Commit graph (`db/commit_graph.rs`)
+
+Stored as a Lance dataset `_graph_commits.lance` (with stable row IDs):
+
+```
+GraphCommit {
+  graph_commit_id: ULID,
+  manifest_branch: Option<String>,
+  manifest_version: u64,
+  parent_commit_id: Option<String>,
+  merged_parent_commit_id: Option<String>,   // populated for merge commits
+  actor_id: Option<String>,
+  created_at: i64 (microseconds since epoch),
+}
+```
+
+- Every successful publish (load / change / merge / schema_apply / publish_run) appends one commit.
+- Merge commits have two parents; linear commits have one.
+- `_graph_commit_actors.lance` — optional separate actor map (created on demand).
+- API: `list_commits(branch)`, `get_commit(id)`, `head_commit_id_for_branch(branch)`.
+
+## L2 — Snapshots & time travel
+
+- `snapshot()` — current snapshot for the bound branch; cached.
+- `snapshot_of(target)` — snapshot at a `ReadTarget` (branch | snapshot id).
+- `snapshot_at_version(v: u64)` — historical snapshot from any manifest version.
+- `entity_at(table_key, id, version)` — single-entity time travel without building a full snapshot.
+- A `Snapshot` is a `(version, HashMap<table_key, SubTableEntry>)` — cheap to build, snapshot-isolated cross-table reads.
+
+## L2 — Internal system branches
+
+Filtered from `branch_list()` but visible to internals:
+
+- `__run__<run-id>` — ephemeral isolation branch for a transactional run.
+- `__schema_apply_lock__` — serializes schema migrations.
--- a/docs/changes.md
+++ b/docs/changes.md
@ -0,0 +1,24 @@
+# Change Detection / Diff
+
+`changes/mod.rs`. Three-level algorithm:
+
+1. **Manifest diff**: skip sub-tables whose `(table_version, table_branch)` is unchanged.
+2. **Lineage check**:
+   - Same branch lineage → fast path: use the per-row `_row_last_updated_at_version` column to classify Insert/Update/Delete.
+   - Different lineages → ID-based streaming comparison.
+3. **Row-level diff**: streaming, no full materialization.
+
+## Public API
+
+- `diff_between(from: ReadTarget, to: ReadTarget, filter: Option<ChangeFilter>) -> ChangeSet`
+- `diff_commits(from_commit_id, to_commit_id, filter)` — cross-branch safe.
+
+## Types
+
+```
+ChangeOp: Insert | Update | Delete
+EntityKind: Node | Edge
+EntityChange { table_key, kind, type_name, id, op, manifest_version, endpoints?: {src, dst} }
+ChangeFilter { kinds?, type_names?, ops? }
+ChangeSet { from_version, to_version, branch?, changes[], stats }
+```
--- a/docs/ci.md
+++ b/docs/ci.md
@ -0,0 +1,10 @@
+# CI / Release Workflows
+
+`.github/workflows/`:
+
+- **ci.yml**: text-only changes skip; otherwise `cargo test --workspace --locked` on ubuntu-latest with protobuf compiler. OpenAPI-drift check that auto-commits the regenerated `openapi.json` for same-repo PRs. Also runs the AGENTS.md cross-link integrity check (`scripts/check-agents-md.sh`).
+- **AWS feature build job**: `cargo build/test -p omnigraph-server --features aws` on ubuntu-latest.
+- **RustFS S3 integration**: spins up RustFS in Docker, runs `s3_storage`, `server_opens_s3_repo_directly_and_serves_snapshot_and_read`, and `local_cli_s3_end_to_end_init_load_read_flow`.
+- **release-edge.yml**: on every push to main, retags `edge`, builds Linux/macOS-Intel/macOS-arm64 archives + sha256, publishes a rolling prerelease.
+- **release.yml**: on `v*` tags, builds the 3-platform matrix and updates the Homebrew tap (`scripts/update-homebrew-formula.sh`) by force-pushing the regenerated formula to `ModernRelay/homebrew-tap`.
+- **package.yml**: manual ECR image build; emits two image tags per commit (`<sha>`, `<sha>-aws`) via CodeBuild.
--- a/docs/cli-reference.md
+++ b/docs/cli-reference.md
@ -0,0 +1,83 @@
+# CLI Reference (`omnigraph`)
+
+A reference for the `omnigraph` binary's command surface and `omnigraph.yaml` schema. For a quick-start guide, see [cli.md](cli.md).
+
+13 top-level command families, 40+ subcommands. All commands accept either a positional `URI`, `--uri`, or a `--target <name>` resolved against `omnigraph.yaml`.
+
+## Top-level commands
+
+| Command | Purpose |
+|---|---|
+| `init` | `--schema <pg>` → initialize a repo (also scaffolds `omnigraph.yaml` if missing) |
+| `load` | bulk load a branch (`--mode overwrite\|append\|merge`) |
+| `ingest` | branch-creating transactional load (`--from <base>`) |
+| `read` | run named query (params via `--params`, `--params-file`, or alias args) |
+| `change` | run mutation query |
+| `snapshot` | print current snapshot (per-table version + row count) |
+| `export` | dump to JSONL on stdout (`--type T`, `--table K` filters) |
+| `branch create \| list \| delete \| merge` | branching ops |
+| `commit list \| show` | inspect commit graph |
+| `run list \| show \| publish \| abort` | transactional run ops |
+| `schema plan \| apply \| show (alias: get)` | migrations |
+| `query lint \| check` | offline / repo-backed validation |
+| `optimize` | non-destructive Lance compaction |
+| `cleanup --keep N --older-than 7d --confirm` | destructive version GC |
+| `embed` | offline JSONL embedding pipeline |
+| `policy validate \| test \| explain` | Cedar tooling |
+| `version` / `-v` | print `omnigraph 0.3.x` |
+
+## `omnigraph.yaml` schema
+
+```yaml
+project: { name }
+graphs:
+  <name>:
+    uri: <local|s3://|http(s)://>
+    bearer_token_env: <ENV_NAME>
+server:
+  graph: <name>
+  bind: <ip:port>
+cli:
+  graph: <name>
+  branch: <name>
+  output_format: json|jsonl|csv|kv|table
+  table_max_column_width: 80
+  table_cell_layout: truncate|wrap
+query:
+  roots: [<dir>, …]   # search path for .gq files
+auth:
+  env_file: ./.env.omni
+aliases:
+  <alias>:
+    command: read|change
+    query: <path-to-.gq>
+    name: <query-name>
+    args: [<positional-name>, …]
+    graph: <name>
+    branch: <name>
+    format: <output-format>
+policy:
+  file: ./policy.yaml
+```
+
+## Output formats (read command)
+
+- `json` — pretty-printed object with metadata + rows
+- `jsonl` — one metadata line then one JSON object per row
+- `csv` — RFC 4180-ish quoting
+- `table` — fitted text table, honors `table_max_column_width` + `table_cell_layout`
+- `kv` — grouped per-row key/value blocks
+
+## Param resolution
+
+Precedence (high to low): explicit `--params` / `--params-file`, alias positional args, `omnigraph.yaml` defaults. JS-safe-integer handling is built in (`is_js_safe_integer_i64`, `JS_MAX_SAFE_INTEGER_U64`) so 64-bit ids round-trip safely through JSON clients.
+
+## Bearer token resolution (CLI)
+
+1. `graphs.<name>.bearer_token_env`
+2. `OMNIGRAPH_BEARER_TOKEN` global env
+3. `auth.env_file` referenced `.env`
+
+## Duration parsing (cleanup)
+
+`s | m | h | d | w` units, e.g. `--older-than 7d`.
--- a/docs/constants.md
+++ b/docs/constants.md
@ -0,0 +1,20 @@
+# Constants & Tunables (cheat sheet)
+
+| Name | Value | Where |
+|---|---|---|
+| `MANIFEST_DIR` | `__manifest` | `db/manifest/layout.rs` |
+| Commit graph dir | `_graph_commits.lance` | `db/commit_graph.rs` |
+| Run registry dir | `_graph_runs.lance` | `db/run_registry.rs` |
+| Run branch prefix | `__run__` | `db/run_registry.rs` |
+| Schema apply lock | `__schema_apply_lock__` | `db/mod.rs` |
+| Merge stage batch | `MERGE_STAGE_BATCH_ROWS = 8192` | `exec/merge.rs` |
+| Maintenance concurrency | `OMNIGRAPH_MAINTENANCE_CONCURRENCY=8` | `db/omnigraph/optimize.rs` |
+| Graph index cache size | `8` (LRU) | `runtime_cache.rs` |
+| Default body limit | `1 MB` | `omnigraph-server/lib.rs` |
+| Ingest body limit | `32 MB` | `omnigraph-server/lib.rs` |
+| Engine embed model | `gemini-embedding-2-preview` | `omnigraph/embedding.rs` |
+| Compiler embed model | `text-embedding-3-small` | `omnigraph-compiler/embedding.rs` |
+| Embed timeout | `30 000 ms` | both clients |
+| Embed retries | `4` | both clients |
+| Embed retry backoff | `200 ms` | both clients |
+| LANCE memory pool default | `1 GB` (raised in v0.3.0) | runtime |
--- a/docs/embeddings.md
+++ b/docs/embeddings.md
@ -0,0 +1,31 @@
+# Embeddings
+
+OmniGraph has **two** embedding clients with different defaults and purposes.
+
+## Compiler-side client (`omnigraph-compiler/src/embedding.rs`) — query-time normalization
+
+- Default model: `text-embedding-3-small` (OpenAI-style schema)
+- Env: `NANOGRAPH_EMBED_MODEL`, `OPENAI_API_KEY`, `OPENAI_BASE_URL` (default `https://api.openai.com/v1`), `NANOGRAPH_EMBEDDINGS_MOCK`, `NANOGRAPH_EMBED_TIMEOUT_MS=30000`, `NANOGRAPH_EMBED_RETRY_ATTEMPTS=4`, `NANOGRAPH_EMBED_RETRY_BACKOFF_MS=200`
+- Methods: `embed_text(input, expected_dim)`, `embed_texts(inputs, expected_dim)`
+- Mock mode: deterministic FNV-1a + xorshift64 → L2-normalized vectors
+
+## Engine-side client (`omnigraph/src/embedding.rs`) — runtime ingest
+
+- Model: `gemini-embedding-2-preview`
+- Env: `GEMINI_API_KEY`, `OMNIGRAPH_GEMINI_BASE_URL` (default Google generativelanguage v1beta), `OMNIGRAPH_EMBED_TIMEOUT_MS=30000`, `OMNIGRAPH_EMBED_RETRY_ATTEMPTS=4`, `OMNIGRAPH_EMBED_RETRY_BACKOFF_MS=200`, `OMNIGRAPH_EMBEDDINGS_MOCK`
+- Two task types: `embed_query_text` (RETRIEVAL_QUERY) and `embed_document_text` (RETRIEVAL_DOCUMENT)
+- Exponential backoff with retryable detection (timeouts, 429, 5xx)
+
+## Schema integration
+
+Mark a Vector property with `@embed("source_text_property")`. At ingest, the engine pulls the source text and writes the embedding into the vector column. Stored as L2-normalized FixedSizeList(Float32, dim).
+
+## CLI `omnigraph embed` (offline file pipeline)
+
+Operates on **JSONL files** (not on a repo). Three modes (mutually exclusive):
+
+- (default) `fill_missing` — only embed rows whose target field is empty
+- `--reembed-all` — overwrite all
+- `--clean` — strip embeddings
+
+Inputs are either a single seed manifest YAML or `--input/--output/--spec`. Selectors `--type T`, `--select T:field=value` filter rows. Streams JSONL → JSONL.
--- a/docs/errors.md
+++ b/docs/errors.md
@ -0,0 +1,21 @@
+# Errors and Result Serialization
+
+## Error taxonomy (`omnigraph::error::OmniError`)
+
+- `Compiler(...)` — schema/query parse/typecheck errors
+- `Lance(String)` — storage layer
+- `DataFusion(String)` — execution layer
+- `Io(io::Error)`
+- `Manifest(ManifestError { kind: BadRequest|NotFound|Conflict|Internal, … })`
+- `MergeConflicts(Vec<MergeConflict>)`
+
+Compiler-side `NanoError` covers parse / catalog / type / plan / execution / arrow / lance / IO / manifest / unique-constraint, each with structured spans (`SourceSpan { start, end }`) for ariadne-style diagnostics.
+
+## Result serialization (`omnigraph_compiler::result::QueryResult`)
+
+- `to_arrow_ipc()` — efficient binary
+- `to_sdk_json()` — JS-safe JSON (large i64 wrapped in metadata)
+- `to_rust_json()` — Rust-friendly JSON
+- `batches()` — direct Arrow `RecordBatch` access
+
+Mutation results: `{ affectedNodes: usize, affectedEdges: usize }` (also exposed as a tiny Arrow batch).
--- a/docs/execution.md
+++ b/docs/execution.md
@ -0,0 +1,76 @@
+# Query Execution, Mutations, and Loading
+
+## Query execution (`exec/query.rs`)
+
+Pipeline:
+
+1. Parse + typecheck via `omnigraph-compiler`.
+2. Lower to IR.
+3. If `Expand` or `AntiJoin` is present, build (or fetch from `RuntimeCache`) a `GraphIndex`.
+4. Run `execute_query` against the snapshot.
+
+### Multi-modal search modes (`SearchMode`)
+
+The executor recognizes three modes that may be combined in a single query:
+
+- **`nearest`** — vector ANN (uses Lance vector index; `LIMIT` required).
+- **`bm25`** — BM25 over an inverted index.
+- **`rrf`** — Reciprocal Rank Fusion of two rankings, with k (default 60).
+
+Hybrid example: `order { rrf(nearest($d.embedding, $q), bm25($d.body, $q_text)) desc } limit 20`.
+
+### Joins / set operations
+
+- Joins are implicit: MATCH bindings + traversals are implemented as scans + CSR/CSC lookups.
+- `not { … }` lowers to an `AntiJoin` over the inner pipeline.
+
+### Scoped reads
+
+- `query(target, source, name, params)` — at any branch or snapshot.
+- `run_query_at(version, …)` — direct historical query at a manifest version.
+
+### Concurrency
+
+- Snapshot isolation per query: all reads inside a query use the same `Snapshot`.
+- Readers and writers on different branches don't block each other.
+
+## Mutation execution (`exec/mutation.rs`)
+
+Resolves expression values to literals, converts to typed Arrow arrays (`literal_to_typed_array(lit, DataType, num_rows)`), then writes:
+
+- `insert` → Lance `WriteMode::Append`
+- `update` → Lance `merge_insert(WhenMatched::Update)`
+- `delete` → Lance `merge_insert(WhenMatched::Delete)` (logical) or filtered overwrite.
+
+Multi-statement mutations are atomic at the manifest commit boundary.
+
+## Bulk loader (`loader/mod.rs`)
+
+- **JSONL only** in v1, with two record shapes:
+  - Node: `{"type":"NodeType", "data":{…}}`
+  - Edge: `{"edge":"EdgeType", "from":"src_id", "to":"dst_id", "data":{…}}`
+- Lines starting with `//` are treated as comments.
+- Schema validation on every row (typecheck, required props, blob base64 decoding).
+- Edge endpoint resolution by node `@key`.
+
+## Load modes (`LoadMode`)
+
+| Mode | Semantics |
+|---|---|
+| `Overwrite` | Replace all data in the target tables on the branch |
+| `Append` | Strict insert; duplicates error |
+| `Merge` | Upsert by id (`merge_insert`) |
+
+## `load` vs `ingest`
+
+- `load(branch, data, mode)` — direct load to a branch.
+- `ingest(branch, from, data, mode)` — branch-creating, transactional load:
+  1. If target advanced since the run started, fork a fresh run branch from `from`.
+  2. Load into the run branch (Append).
+  3. If target hasn't moved, fast-publish; otherwise abort.
+- Returns `IngestResult { branch, base_branch, branch_created, mode, tables[] }`.
+- `ingest_as(actor_id)` records the actor on the resulting commit.
+
+## Embeddings during load
+
+If a node type has `@embed` properties, the loader calls the engine embedding client (Gemini, RETRIEVAL_DOCUMENT) per row to populate the vector column. See [embeddings.md](embeddings.md).
--- a/docs/indexes.md
+++ b/docs/indexes.md
@ -0,0 +1,26 @@
+# Indexes
+
+## L1 — Lance index types OmniGraph exposes
+
+| Index | Use | Notes |
+|---|---|---|
+| **BTREE scalar** | range / equality on any scalar | created on `@key`, `@index(...)`, and on key columns by `ensure_indices()` |
+| **Inverted (FTS)** | `search`, `fuzzy`, `match_text`, `bm25` | created on text columns referenced by FTS queries |
+| **Vector** | `nearest()` k-NN | Lance picks IVF_PQ vs HNSW family by configuration; OmniGraph stores as FixedSizeList(Float32, dim) |
+
+## L2 — OmniGraph orchestration
+
+- `ensure_indices()` / `ensure_indices_on(branch)` — idempotent build of BTREE + inverted indexes for the current head; safe to re-run.
+- Indexes are built on the *branch head* (not on a snapshot), so reads always see the current index state.
+- **Lazy branch forking for indexes**: a branch that hasn't mutated a sub-table doesn't need its own index — the main lineage's index is reused until the first write triggers a copy-on-write fork.
+- Vector index parameters (metric, nlist, nprobe, etc.) are not exposed in the schema; they default at the Lance layer and are picked up automatically when an index is asked for on a Vector column.
+
+## L2 — Graph topology index (`graph_index/mod.rs`)
+
+This is OmniGraph-specific (not Lance):
+
+- `TypeIndex`: dense `u32 ↔ String id` mapping per node type.
+- `CsrIndex`: Compressed Sparse Row representation of edges per edge type — `offsets[i]..offsets[i+1]` slices into `targets`.
+- `GraphIndex { type_indices, csr (out), csc (in) }` — built on demand from a snapshot's edge tables.
+- Cached in `RuntimeCache::graph_indices` (LRU, max 8 entries, keyed by snapshot id + edge table versions).
+- Built only when an `Expand` or `AntiJoin` IR op is present in the lowered query, so pure scans skip it.
--- a/docs/maintenance.md
+++ b/docs/maintenance.md
@ -0,0 +1,22 @@
+# Maintenance: Optimize & Cleanup
+
+`db/omnigraph/optimize.rs`.
+
+## `optimize_all_tables(db)` — non-destructive
+
+- Lance `compact_files()` on every node + edge table on `main`.
+- Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests.
+- Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8).
+- Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed }]`.
+
+## `cleanup_all_tables(db, options)` — destructive
+
+- Lance `cleanup_old_versions()` per table.
+- Removes manifests (and their unique fragments) older than the retention policy.
+- `CleanupPolicyOptions { keep_versions: Option<u32>, older_than: Option<Duration> }` — at least one is required.
+- Returns `[TableCleanupStats { table_key, bytes_removed, old_versions_removed }]`.
+- CLI guards with `--confirm`; without it, prints a preview line.
+
+## Tombstones
+
+Logical sub-table delete markers in `__manifest`; `tombstone_object_id(table_key, version)` excludes a sub-table version from snapshot reconstruction.
--- a/docs/merge.md
+++ b/docs/merge.md
@ -0,0 +1,30 @@
+# Merging (three-way) and Conflicts
+
+`exec/merge.rs`.
+
+## Strategy
+
+Ordered, row-by-row cursor merge:
+
+- `OrderedTableCursor` scans each table sorted by `id` and supports peek/pop matching.
+- `StagedTableWriter` buffers `MERGE_STAGE_BATCH_ROWS = 8192` rows into a temp Lance dataset (`OMNIGRAPH_MERGE_STAGING_DIR`).
+- The merge runs per sub-table; results are published as one atomic manifest update.
+
+## Outcome enum
+
+`MergeOutcome { AlreadyUpToDate | FastForward | Merged }`
+
+## Conflict types (`error.rs`)
+
+```
+MergeConflictKind:
+  DivergentInsert        // same id inserted on both branches
+  DivergentUpdate        // updated differently on both branches
+  DeleteVsUpdate         // one side deletes, other updates
+  OrphanEdge             // edge references a node deleted by the other side
+  UniqueViolation
+  CardinalityViolation
+  ValueConstraintViolation
+```
+
+Returned as `OmniError::MergeConflicts(Vec<MergeConflict { table_key, row_id?, kind, message }>)`. The HTTP server surfaces this as a 409 with structured `merge_conflicts[]` (top 3 + "+N more").
--- a/docs/policy.md
+++ b/docs/policy.md
@ -0,0 +1,44 @@
+# Authorization (Cedar policy)
+
+OmniGraph integrates AWS Cedar (`cedar-policy = 4.9`) for ABAC.
+
+## Policy actions
+
+1. `read` — query / snapshot / list branches & commits
+2. `export` — NDJSON export
+3. `change` — mutations
+4. `schema_apply` — apply schema migrations
+5. `branch_create`
+6. `branch_delete`
+7. `branch_merge`
+8. `run_publish`
+9. `run_abort`
+10. `admin` — reserved
+
+## Scope kinds
+
+- `branch_scope` — applied to source branch (`read`, `export`, `change`)
+- `target_branch_scope` — applied to destination (`schema_apply`, branch ops, run ops)
+- `protected_branches` — named list with special rules; rule scopes are `any | protected | unprotected`
+
+## Configuration
+
+`omnigraph.yaml`:
+
+```yaml
+policy:
+  file: ./policy.yaml          # Cedar rules + groups
+  tests: ./policy.tests.yaml   # declarative test cases
+```
+
+Each rule must use exactly one of `branch_scope` or `target_branch_scope`.
+
+## CLI
+
+- `omnigraph policy validate` — parse + count actors, exit 1 on parse error.
+- `omnigraph policy test` — run cases in `policy.tests.yaml`, exit 1 on any expectation mismatch.
+- `omnigraph policy explain --actor … --action … [--branch …] [--target-branch …]` — show decision and matched rule.
+
+## Server enforcement
+
+Every mutating endpoint calls `authorize_request()` *before* the handler runs; decisions are logged with actor / action / branch / outcome / matched rule.
--- a/docs/query-language.md
+++ b/docs/query-language.md
@ -0,0 +1,103 @@
+# Query Language (`.gq`)
+
+Pest grammar at `crates/omnigraph-compiler/src/query/query.pest`. AST in `query/ast.rs`. Type checker in `query/typecheck.rs`. Lowering in `ir/lower.rs`.
+
+## Query declarations
+
+```
+query <name>($p1: T1, $p2: T2?, …)
+  @description("…") @instruction("…") {
+  …
+}
+```
+
+Two body shapes:
+
+- **Read**: `match { … } return { … } [order { … }] [limit N]`
+- **Mutation**: one or more of `insert | update | delete` statements
+
+Param types reuse all schema scalars; trailing `?` makes a param optional. The compiler reserves `$__nanograph_now` for `now()`.
+
+## MATCH clauses
+
+- **Binding**: `$x: NodeType { prop: <literal | $param | now()>, … }`
+- **Traversal**: `$src EDGE_NAME { min, max? } $dst` — variable-length paths via hop bounds; default 1..1 if bounds omitted.
+- **Filter**: `<expr> <op> <expr>` with operators `>=`, `<=`, `!=`, `>`, `<`, `=`, and string `contains`.
+- **Negation**: `not { clause+ }` — desugars to anti-join over the inner pipeline.
+
+## Search clauses (multi-modal)
+
+Used inside MATCH or as expressions inside RETURN/ORDER:
+
+| Function | Purpose | Underlying Lance facility |
+|---|---|---|
+| `nearest($x.vec, $q)` | k-NN vector search (cosine) | Lance vector index (IVF / HNSW) |
+| `search(field, q)` | Generic FTS | Inverted index |
+| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | Inverted index |
+| `match_text(field, q)` | Pattern match | Inverted index |
+| `bm25(field, q)` | BM25 scoring | Inverted index |
+| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default k=60) | OmniGraph fuses scored rankings |
+
+`nearest()` requires a `LIMIT`; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input).
+
+## RETURN clause
+
+`return { <expr> [as <alias>], … }` with expressions:
+
+- Variable / property access: `$x`, `$x.prop`
+- Literals: string, int, float, bool, list
+- `now()`
+- Aggregates: `count`, `sum`, `avg`, `min`, `max`
+- All search functions above (so you can return a score column)
+- `AliasRef` — re-use a previous projection alias
+
+## ORDER & LIMIT
+
+- `order { <expr> [asc|desc], … }` — supports plain expressions and `nearest(...)`.
+- `limit <integer>` — required when there is a `nearest(...)` ordering.
+
+## Mutation statements
+
+- `insert <Type> { prop: <value>, … }`
+- `update <Type> set { prop: <value>, … } where <prop> <op> <value>`
+- `delete <Type> where <prop> <op> <value>`
+
+`<value>` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0).
+
+## IR (Intermediate Representation)
+
+`QueryIR { name, params, pipeline: Vec<IROp>, return_exprs, order_by, limit }`
+
+Pipeline operations:
+
+- `NodeScan { variable, type_name, filters }`
+- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune.
+- `Filter { left, op, right }`
+- `AntiJoin { outer_var, inner: Vec<IROp> }` — for `not { … }`
+
+Lowering:
+
+1. Partition MATCH clauses (bindings, traversals, filters, negations).
+2. Identify "deferred" bindings (a destination of a traversal that has filters) so the Expand can carry the filter as a pushdown.
+3. Emit NodeScan for the first binding, then Expand operations, then remaining Filter operations, then AntiJoins for negations.
+4. Translate RETURN / ORDER expressions; preserve LIMIT.
+
+## Linting & validation (`query/lint.rs`)
+
+Codes seen so far:
+
+- **Q000** (Error): parse error
+- **L201** (Warning): nullable property never set by any UPDATE — "{type}.{prop} exists in schema but no update query sets it"
+- (Warning): mutation declares no params — hardcoded mutations are easy to miss
+- Plus all type errors from `typecheck_query_decl()` (undefined types, mismatched operators, undefined edges, etc.)
+
+Output:
+
+```
+QueryLintOutput { status, schema_source, query_path,
+  queries_processed, errors, warnings, infos,
+  results: [{ name, kind, status, error?, warnings[] }],
+  findings: [{ severity, code, message, type_name?, property?, query_names[] }] }
+```
+
+CLI exits non-zero only on `status = Error`.
--- a/docs/releases/v0.3.1.md
+++ b/docs/releases/v0.3.1.md
@ -0,0 +1,19 @@
+# Omnigraph v0.3.1
+
+Omnigraph v0.3.1 is a performance and operability point release.
+
+## Highlights
+
+- **Parallel per-type load writes**: the bulk loader writes to each node/edge table concurrently rather than serially, materially reducing wall-clock time on multi-table loads.
+- **`omnigraph optimize` and `omnigraph cleanup` CLI commands**: previously only available via the engine API. `optimize` runs Lance `compact_files()` across every node/edge table; `cleanup` runs Lance `cleanup_old_versions()` with a `--keep`/`--older-than` policy and requires `--confirm` for the destructive form.
+- **Dst-id deduplication during edge expand hydration**: avoids redundant lookups when the same destination id appears multiple times in an `Expand` step (#45).
+
+## Included Changes
+
+- Parallel per-type load writes (#46)
+- `omnigraph optimize` / `cleanup` CLI commands and runtime APIs (#46)
+- Dedupe dst ids before hydrating nodes in `execute_expand` (#45)
+
+## Upgrade Notes
+
+No breaking changes. Existing v0.3.0 repos can be opened directly with v0.3.1.
--- a/docs/runs.md
+++ b/docs/runs.md
@ -0,0 +1,38 @@
+# Runs (transactional graph mutations)
+
+`db/run_registry.rs` + run lifecycle in `db/omnigraph.rs`. Stored in `_graph_runs.lance` and `_graph_run_actors.lance`.
+
+## RunRecord
+
+```
+RunRecord {
+  run_id: RunId (ULID),
+  target_branch: String,           // where the run will publish
+  run_branch: "__run__<id>",       // ephemeral isolation branch
+  base_snapshot_id: String,
+  base_manifest_version: u64,
+  operation_hash: Option<String>,  // idempotency key
+  actor_id: Option<String>,
+  status: Running | Published | Failed | Aborted,
+  published_snapshot_id: Option<String>,
+  created_at, updated_at: i64 (microseconds),
+}
+```
+
+## Lifecycle
+
+1. `begin_run(target_branch, op_hash)` / `begin_run_as(target_branch, op_hash, actor_id)` — forks `__run__<id>` from the target's current head, appends a `RunRecord`.
+2. Mutations on `run_branch` (via the normal write APIs) — isolated from concurrent activity on the target.
+3. `publish_run(id)` / `publish_run_as(id, actor)`:
+   - **Fast path**: if the target hasn't moved since `base_snapshot_id`, promote the run snapshot directly.
+   - **Merge path**: if it has moved, perform a three-way merge (see [merge.md](merge.md)) into the target.
+   - On success: `status = Published`, `published_snapshot_id` set, run branch cleaned up asynchronously.
+4. `abort_run(id)` / `fail_run(id)` — terminal; cleans up run branch best-effort.
+
+## Idempotency
+
+`operation_hash` is an optional field clients can use to detect a duplicate `begin_run` retry.
+
+## Cleanup
+
+`cleanup_terminal_run_branches_for_target(branch)` is called as branches change; failures are swallowed (lazy cleanup on next branch op).
--- a/docs/schema-language.md
+++ b/docs/schema-language.md
@ -0,0 +1,79 @@
+# Schema Language (`.pg`)
+
+Pest grammar at `crates/omnigraph-compiler/src/schema/schema.pest`. AST at `schema/ast.rs`. Catalog at `catalog/mod.rs`.
+
+## Top-level declarations
+
+- `interface <Name> { property* }` — reusable property contracts.
+- `node <Name> [implements <Iface>, ...] { property* | constraint* }`
+- `edge <Name>: <FromType> -> <ToType> [@card(min..max)] { property* | constraint* }`
+- Comments: line `//` and block `/* … */`.
+
+## Property declarations
+
+`<ident>: <TypeRef> [annotation*]`
+
+## Built-in scalar types
+
+| Scalar | Arrow type |
+|---|---|
+| `String` | Utf8 |
+| `Blob` | LargeBinary |
+| `Bool` | Boolean |
+| `I32` / `I64` | Int32 / Int64 |
+| `U32` / `U64` | UInt32 / UInt64 |
+| `F32` / `F64` | Float32 / Float64 |
+| `Date` | Date32 |
+| `DateTime` | Date64 |
+| `Vector(<dim>)` | FixedSizeList(Float32, dim), `1 ≤ dim ≤ i32::MAX` |
+| `[<scalar>]` | List(scalar) |
+| `enum(v1, v2, …)` | Utf8 with sorted/dedup'd set of allowed string values |
+| `<scalar>?` | Same as scalar but `nullable: true` |
+
+## Constraints (body level)
+
+| Constraint | On | Effect |
+|---|---|---|
+| `@key(p, …)` | node | Primary key; implies index on key columns; `key_property()` returns the first key |
+| `@unique(p, …)` | node, edge | Uniqueness across listed columns |
+| `@index(p, …)` | node, edge | Build a scalar (BTREE) index on the columns |
+| `@range(p, min..max)` | node | Numeric range validation (open ranges allowed) |
+| `@check(p, "regex")` | node | Regex pattern validation |
+| `@card(min..max?)` | edge | Edge multiplicity — default `0..*`; `0..1`, `1..1`, `1..*`, etc. |
+
+Edge bodies only allow `@unique` and `@index`.
+
+## Annotations
+
+- `@<ident>` or `@<ident>(<literal>)` on any declaration or property.
+- Known annotations:
+  - `@embed` on a Vector property — names the *source* property whose text gets embedded into this vector at ingest (`embed_sources` map in NodeType).
+  - `@description("…")`, `@instruction("…")` on query declarations (carried through to clients).
+- Custom annotations are accepted by the parser and surfaced in catalog metadata; unrecognized annotations don't fail compilation.
+
+## Catalog construction
+
+- Pass 0: collect interfaces.
+- Pass 1: collect nodes, expand `implements`, build constraint and `@embed` mappings, build the Arrow schema for each node table (`id: Utf8` plus all properties; blob columns get `LargeBinary`).
+- Pass 2: collect edges, validate that `from_type` / `to_type` exist, normalize edge names case-insensitively for lookup, validate constraints for edges. Edge Arrow schema: `id: Utf8, src: Utf8, dst: Utf8` plus edge properties.
+
+## Schema IR & stable type IDs
+
+- `IR_VERSION = 1` (`catalog/schema_ir.rs`).
+- Each interface/node/edge gets a `stable_type_id` (kind+name hashed) so renames can be tracked.
+- Serialized as JSON for diff/migration plans.
+
+## Schema migration planning
+
+`plan_schema_migration(accepted, desired) -> SchemaMigrationPlan { supported, steps[] }` with step types:
+
+- `AddType { type_kind, name }`
+- `RenameType { type_kind, from, to }`
+- `AddProperty { type_kind, type_name, property_name, property_type }`
+- `RenameProperty { type_kind, type_name, from, to }`
+- `AddConstraint { type_kind, type_name, constraint }`
+- `UpdateTypeMetadata { … annotations }`
+- `UpdatePropertyMetadata { … annotations }`
+- `UnsupportedChange { entity, reason }` (forces `supported=false`)
+
+`apply_schema()` returns `SchemaApplyResult { supported, applied, manifest_version, steps }` and is gated by an internal `__schema_apply_lock__` system branch so concurrent schema applies serialize.
--- a/docs/server.md
+++ b/docs/server.md
@ -0,0 +1,68 @@
+# HTTP Server (`omnigraph-server`)
+
+Axum 0.8 + tokio + utoipa-generated OpenAPI. Single repo per process; deploy multiple processes for multi-tenant.
+
+## Endpoint inventory
+
+| Method | Path | Auth | Action | Handler |
+|---|---|---|---|---|
+| GET | `/healthz` | none | — | `server_health` |
+| GET | `/openapi.json` | none | — | `server_openapi` (strips security if auth disabled) |
+| GET | `/snapshot?branch=` | bearer + `read` | snapshot of branch | `server_snapshot` |
+| POST | `/read` | bearer + `read` | run named query | `server_read` |
+| POST | `/export` | bearer + `export` | NDJSON stream | `server_export` |
+| POST | `/change` | bearer + `change` | mutation | `server_change` |
+| GET | `/schema` | bearer + `read` | get current `.pg` source | `server_schema_get` |
+| POST | `/schema/apply` | bearer + `schema_apply` (target=`main`) | migrate | `server_schema_apply` |
+| POST | `/ingest` | bearer + `branch_create` (if new) + `change` | bulk load | `server_ingest` (32 MB body limit) |
+| GET | `/branches` | bearer + `read` | list branches | `server_branch_list` |
+| POST | `/branches` | bearer + `branch_create` | create | `server_branch_create` |
+| DELETE | `/branches/{branch}` | bearer + `branch_delete` | delete | `server_branch_delete` |
+| POST | `/branches/merge` | bearer + `branch_merge` | merge `source → target` | `server_branch_merge` |
+| GET | `/runs` | bearer + `read` | list | `server_run_list` |
+| GET | `/runs/{run_id}` | bearer + `read` | show | `server_run_show` |
+| POST | `/runs/{run_id}/publish` | bearer + `run_publish` | publish | `server_run_publish` |
+| POST | `/runs/{run_id}/abort` | bearer + `run_abort` | abort | `server_run_abort` |
+| GET | `/commits?branch=` | bearer + `read` | list | `server_commit_list` |
+| GET | `/commits/{commit_id}` | bearer + `read` | show | `server_commit_show` |
+
+## Streaming
+
+Only `/export` streams (`application/x-ndjson`, MPSC channel + `Body::from_stream`). Everything else is buffered JSON.
+
+## Error model
+
+Uniform `ErrorOutput { error, code?, merge_conflicts[] }` with `code ∈ unauthorized | forbidden | bad_request | not_found | conflict | internal`. Merge conflicts attach structured `MergeConflictOutput { table_key, row_id?, kind, message }`.
+
+HTTP status codes used: 200, 400, 401, 403, 404, 409, 500.
+
+## Body limits
+
+- Default: 1 MB
+- `/ingest`: 32 MB
+
+## Auth model (`bearer + SHA-256`)
+
+- Tokens are SHA-256 hashed on startup; plaintext is never persisted in memory.
+- Constant-time comparison via `subtle::ConstantTimeEq`.
+- Three sources, in precedence:
+  1. `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` — AWS Secrets Manager (build with `--features aws`)
+  2. `OMNIGRAPH_SERVER_BEARER_TOKENS_FILE` or `OMNIGRAPH_SERVER_BEARER_TOKENS_JSON` — JSON `{actor_id: token, …}`
+  3. `OMNIGRAPH_SERVER_BEARER_TOKEN` — single legacy token, actor `default`
+- If no tokens configured, server runs unauthenticated (local dev) and `/openapi.json` strips the security scheme.
+
+See [deployment.md](deployment.md) for token-source operational details.
+
+## Tracing & observability
+
+- `tower_http::TraceLayer::new_for_http()`
+- Policy decisions logged at INFO level with actor, action, branch, decision, matched rule
+- Startup logs: token source name, repo URI, bind address
+- Graceful SIGINT shutdown
+
+## Not implemented (by design or "TBD")
+
+- CORS — not configured; add `tower_http::cors` if needed.
+- Rate limiting — none.
+- Pagination — none (commits/branches/runs return everything; export streams).
+- Multi-tenant routing — one repo per process.
--- a/docs/storage.md
+++ b/docs/storage.md
@ -0,0 +1,46 @@
+# Storage
+
+## L1 — Lance dataset (per node/edge type)
+
+Every node type and every edge type is its own Lance dataset:
+
+- **Columnar Arrow storage**: each property is a column; nullable per Arrow schema.
+- **Fragments**: data is partitioned into fragments; new writes create new fragments.
+- **Manifest versioning**: every commit produces a new dataset version; old versions remain readable.
+- **Stable row IDs**: enabled by OmniGraph for the commit-graph and run-registry datasets so durable references survive compaction.
+- **Append / delete / `merge_insert`**: native Lance write modes.
+- **Per-dataset branches** (Lance native): copy-on-write at the dataset level.
+- **Object-store agnostic**: file://, s3://, gs://, az://, http (read-only via Lance) — OmniGraph wires file:// and s3:// (`storage.rs`).
+
+## L2 — Multi-dataset coordination via `__manifest`
+
+OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordinated through one append-only manifest table.
+
+- **Manifest table**: `__manifest/` Lance dataset.
+- **Layout** (`db/manifest/layout.rs`, `db/manifest/state.rs`):
+  - `nodes/{fnv1a64-hex(type_name)}` — one Lance dataset per node type
+  - `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
+  - `__manifest/` — the catalog of all sub-tables and their published versions
+  - `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
+  - `_graph_runs.lance` / `_graph_run_actors.lance` — the run registry and its actor map
+- **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`):
+  - `object_type` ∈ `table | table_version | table_tombstone`
+  - `table_key` ∈ `node:<TypeName> | edge:<EdgeName>`
+  - `table_branch` is `null` for the main lineage and the branch name otherwise
+- **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones whose `tombstone_version >= table_version`.
+- **Atomic publish**: multi-dataset commits publish via a `ManifestBatchPublisher` so a single write to `__manifest` flips all the new sub-table versions visible at once.
+
+## URI scheme support (`storage.rs`)
+
+| Scheme | Backend | Notes |
+|---|---|---|
+| local path / `file://` | `LocalStorageAdapter` (tokio) | Normalized to absolute paths |
+| `s3://bucket/prefix` | `S3StorageAdapter` (object_store) | Honors `AWS_ENDPOINT_URL_S3`, `AWS_ALLOW_HTTP`, `AWS_S3_FORCE_PATH_STYLE` |
+| `http(s)://host:port` | HTTP client to `omnigraph-server` | Used by CLI as a target, not a storage backend |
+
+## Object-store env vars (S3-compatible)
+
+- `AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`
+- `AWS_ENDPOINT_URL`, `AWS_ENDPOINT_URL_S3` — for MinIO / RustFS / GCS-via-XML
+- `AWS_S3_FORCE_PATH_STYLE=true` — path-style URLs
+- `AWS_ALLOW_HTTP=true` — allow plain HTTP (local dev)
--- a/scripts/check-agents-md.sh
+++ b/scripts/check-agents-md.sh
@ -0,0 +1,77 @@
+#!/usr/bin/env bash
+# Verify that AGENTS.md and docs/ stay in sync.
+#
+# Two checks:
+#   1. Every docs/*.md path linked from AGENTS.md exists on disk.
+#   2. Every doc in the canonical set is linked from AGENTS.md.
+#
+# Exit non-zero on any drift.
+
+set -euo pipefail
+
+repo_root="$(cd "$(dirname "$0")/.." && pwd)"
+cd "$repo_root"
+
+agents_file="AGENTS.md"
+if [[ ! -f "$agents_file" ]]; then
+  echo "error: $agents_file not found" >&2
+  exit 1
+fi
+
+# Canonical set: every docs/*.md (top-level), plus the releases/ index dir if present.
+canonical=()
+while IFS= read -r line; do
+  canonical+=("$line")
+done < <(find docs -mindepth 1 -maxdepth 1 -type f -name '*.md' | sort)
+if [[ -d docs/releases ]]; then
+  canonical+=("docs/releases/")
+fi
+
+# Extract docs/ links from AGENTS.md (markdown link form: (docs/...))
+linked=()
+while IFS= read -r line; do
+  linked+=("$line")
+done < <(grep -oE '\(docs/[^)]+\)' "$agents_file" | sed -E 's/^\(|\)$//g' | sort -u)
+
+fail=0
+
+# Check 1: every linked path exists.
+for link in "${linked[@]}"; do
+  # Strip in-page anchors like #foo
+  path="${link%%#*}"
+  if [[ "$path" == */ ]]; then
+    if [[ ! -d "$path" ]]; then
+      echo "error: AGENTS.md links to missing directory: $path" >&2
+      fail=1
+    fi
+  else
+    if [[ ! -f "$path" ]]; then
+      echo "error: AGENTS.md links to missing file: $path" >&2
+      fail=1
+    fi
+  fi
+done
+
+# Check 2: every canonical doc is linked at least once.
+for doc in "${canonical[@]}"; do
+  found=0
+  for link in "${linked[@]}"; do
+    path="${link%%#*}"
+    if [[ "$path" == "$doc" ]]; then
+      found=1
+      break
+    fi
+  done
+  if [[ "$found" -eq 0 ]]; then
+    echo "error: doc not linked from AGENTS.md: $doc" >&2
+    fail=1
+  fi
+done
+
+if [[ "$fail" -ne 0 ]]; then
+  echo >&2
+  echo "AGENTS.md / docs/ are out of sync. Either update AGENTS.md links or rename/remove the doc." >&2
+  exit 1
+fi
+
+echo "AGENTS.md ↔ docs/ links OK (${#linked[@]} links, ${#canonical[@]} docs)."