Merge branch 'main' into ragnorc/omnigraph-mcp-crate

Folds in v0.7.1 (release #290 + optimize/write-path/internal-table-compaction fixes #288/#291/#297) under the MCP branch. Conflict resolutions (5 files): - crates/omnigraph-server/Cargo.toml: take main's 0.7.1 path-dep constraints; keep our omnigraph-mcp dep (bumped to 0.7.1) + http dep. - crates/omnigraph-server/src/handlers.rs: keep our server_list_queries doc-comment (exposed @mcp(expose) subset, invoke_query-gated) — it supersedes main's pre-@mcp(expose) text, since this branch adds the per-query expose flag. - docs/user/operations/server.md: keep our GET /queries description (invoke_query gate + @mcp(expose) exposure) over main's read-gated/list-all text. - docs/dev/index.md: keep both in-flight RFC rows; renumber this branch's tenancy RFC 013 -> 014 (rfc-014-tenancy-cells.md) since main now owns RFC-013 (rfc-013-write-path-latency.md). Title + index link updated; link-check green. - openapi.json: regenerated from merged source (OMNIGRAPH_UPDATE_OPENAPI=1) — now info.version 0.7.1 with our invoke_query/@mcp schema. Coherence: omnigraph-mcp bumped 0.7.0 -> 0.7.1 to match the workspace; Cargo.lock updated. cargo build --workspace green; server/mcp/api-types/compiler suites green (schema_routes.rs reopen-after-apply flakes under parallel IO on a near-full disk, passes single-threaded — a pre-existing main test, unchanged by the merge).
2026-06-24 02:38:06 +02:00 · 2026-06-23 18:26:45 +02:00 · 2026-06-23 18:26:45 +02:00 · adc36adf32
commit adc36adf32
parent 8dab7e2e61 6d4606a830
44 changed files with 3595 additions and 528 deletions
--- a/docs/dev/docs-issues.md
+++ b/docs/dev/docs-issues.md
@ -0,0 +1,70 @@
+# User Docs Coherence Ledger
+
+**Last review:** 2026-06-20 (against 0.7.1)
+**Status:** all open findings resolved — living ledger for future audits.
+
+This page tracks stale or incoherent user-doc claims found during broad docs
+reviews. Findings are validated against current **code/behavior**, not just
+cross-doc consistency. Record new findings as they surface; mark them resolved
+(with the fixing commit) once the public pages are corrected.
+
+## Resolved — 2026-06-20 docs/user coherence sweep
+
+Every finding from the 2026-06-20 review was validated (all reproduced) and
+fixed. Branch `docs/user-coherence-0-7-1`.
+
+| Pri | Finding | Resolution |
+|---|---|---|
+| P1 | `cluster apply` documented as catalog-only / "Stage 3A" with graph+schema deferred — in both `cli/reference.md` and the shipped CLI help (`cli.rs`) | Rewrote both to describe the real converge behavior (creates graphs, applies schema with soft drops, writes catalog, executes approved deletes in one ordered run); `deferred` now means the genuinely-unsupported case (standalone schema delete). |
+| P1 | Stored-query exposure had two contracts: `server.md` documented a per-query `mcp:{expose:false}` knob; cluster docs said all queries are listed | Confirmed in code: cluster registry has no expose field (`QueryConfig`), boot bridge hardcodes `expose: true` (`omnigraph-server` settings), no GQ-level annotation. Removed the knob from `server.md`; documented "every applied query is listed; per-query exposure may become a Cedar-policy decision later". |
+| P1 | The same stale "`mcp.expose == true` subset" contract lived in the **OpenAPI surface**: utoipa annotations (`handlers.rs:1029,1037`, `omnigraph-api-types/src/lib.rs:404`) drove `openapi.json` (Greptile catch on #293) | Updated the three Rust doc-comment/annotation strings to "every stored query" and regenerated `openapi.json` (`OMNIGRAPH_UPDATE_OPENAPI=1`); drift test green. Same-change per AGENTS.md rule 4. |
+| P2 | `schema/index.md` claimed `allow_data_loss` honored "uniformly across transports" incl. HTTP `POST /schema/apply` | Scoped to the direct/embedded path; added that cluster-managed graphs evolve via `cluster apply` (soft drops only) and the HTTP route is 409-disabled for cluster serving. |
+| P2 | `/load` missing from admission / body-limit / rate-limit / manifest-conflict prose (named `/ingest` only); constants called it "Ingest body limit" | Documented `/load` as canonical everywhere with `/ingest` as the deprecated alias; renamed the constant to "Load (bulk-write) body limit". |
+| P2 | CLI "Bearer token resolution" section listed removed `omnigraph.yaml` keys (`graphs.<name>.bearer_token_env`, `auth.env_file`) | Replaced with a pointer to the keyed-credential model (`OMNIGRAPH_TOKEN_<NAME>` → `~/.omnigraph/credentials` → `OMNIGRAPH_BEARER_TOKEN`); no plaintext-in-config path. |
+| P2 | Flat route names in a cluster-only server (`POST /query`, `POST /mutate`, `GET /queries`, `POST /queries/{name}`) | Added a one-line note that the per-graph subsections use shorthand under `/graphs/{id}/…`; the endpoint table is already fully qualified. |
+| — | `version` printed `omnigraph 0.3.x` | → `0.7.x`. |
+| — | `search/indexes.md` used deprecated `ingest --mode merge` | → `load --mode merge`. |
+| — | `config.md` `deferred` disposition described as "graph/schema change, later phase" | → "an unsupported change (e.g. standalone schema delete)". |
+| — | Stale stage labels (`Stage 3A`, `Stage 2C`, `Stage 1`) in active reference docs | Removed / reworded to plain language; release notes keep history. |
+
+## Open — surfaced 2026-06-20, not yet fixed
+
+- **Stale "config-only apply" / "Stage 3A" comments in `omnigraph-cluster`
+  source** (internal rustdoc, not user docs — out of scope for the docs sweep
+  above): `src/types.rs:147` ("Applied changes execute (config-only query/policy
+  catalog writes)"), `src/types.rs:265` ("Output of config-only cluster apply"),
+  `src/diff.rs:256`, and `src/tests.rs:1129` ("config-only apply (Stage 3A)").
+  Apply now also runs graph creates, schema applies, and approved deletes
+  (`diff.rs:411` `GraphCreate` / `SchemaApply`; the Stage-4 create/schema/delete
+  executors + tests `apply_creates_graph_and_unblocks_dependents`,
+  `apply_schema_update_and_dependent_query_in_one_run`,
+  `apply_blocks_graph_delete_without_approval`). Update these comments in a
+  cluster-crate change.
+- **Cross-repo drift from this sweep** (separate repos):
+  - `omnigraph-ts` SDK — its generated `spec/openapi.json` +
+    `packages/sdk/src/generated/types.gen.ts` still describe the `GET /queries`
+    catalog as the `mcp.expose` subset. **No hand-fix:** the SDK's
+    `scripts/sync-spec.ts` pulls openapi.json from a *tagged* omnigraph release
+    (`/omnigraph/v{version}/openapi.json`), and the catalog fix landed on main
+    *after* the v0.7.1 tag — so it is in no tag yet and a hand-edit would be
+    overwritten on the next sync. It flows in automatically when the SDK bumps
+    to a tag containing the fix (v0.7.2+). Tracked, not actioned.
+  - `omnigraph-cookbooks/docs/best-practices.md` `bearer_token_env` chain —
+    **RESOLVED** by omnigraph-cookbooks PR #26 (2026-06-21), which deleted
+    `docs/best-practices.md` as part of the 0.7 restructure; the stale chain
+    survives nowhere on `main`.
+
+## Verification checklist (re-run on the next docs audit)
+
+```bash
+rg -n "Stage [0-9]|graph/schema changes are deferred|reserved for later stages" docs/user crates/omnigraph-cli/src/cli.rs
+rg -n "POST /query|POST /mutate|GET /queries|POST /queries/\{name\}|POST /schema/apply" docs/user
+rg -n "ingest --mode|Ingest body limit|/ingest" docs/user
+rg -n "0\.3\.x|bearer_token_env|auth\.env_file" docs/user
+rg -n "expose: false|mcp\.expose" docs/user
+```
+
+Expected: active user docs have no matches for stale phrases, or the remaining
+matches are explicitly marked as deprecated aliases, "no longer exist" notes, or
+route shorthand disclaimed relative to `/graphs/{id}`. Release notes are allowed
+to preserve historical behavior.
--- a/docs/dev/index.md
+++ b/docs/dev/index.md
@ -41,6 +41,7 @@ constraints. User-facing behavior should still be documented through
 | Error taxonomy and serialization | [errors.md](../user/operations/errors.md) |
 | Constants and tunables | [constants.md](../user/reference/constants.md) |
 | Transaction model public contract | [transactions.md](../user/branching/transactions.md) |
+| User-doc coherence cleanup ledger | [docs-issues.md](docs-issues.md) |

 ## Project Operations

@ -91,7 +92,8 @@ Working documents for in-flight feature work. Removed when the work lands.
 | Restructure the CLI around explicit planes — one graph-addressing model, declared capability surface, plane-grouped help (expands RFC-009 Phase 4) | [rfc-010-cli-planes-restructure.md](rfc-010-cli-planes-restructure.md) |
 | CLI refactoring — one addressing & config model post-`omnigraph.yaml`: scope + `--graph` + derived access path, served-default / privileged-direct, profiles, named queries, capability classifier (completes RFC-008) | [rfc-011-cli-refactoring.md](rfc-011-cli-refactoring.md) |
 | Provider-independent embedding configuration — one resolved `EmbeddingConfig` + sealed provider enum (Gemini/OpenAI/Mock), identity recorded in the schema IR, query-time same-space validation, NFR floor | [rfc-012-embedding-provider-config.md](rfc-012-embedding-provider-config.md) |
-| Tenancy model — cluster-as-tenant cells (silo the data, pool the compute): `CellRuntime` lifts the per-cluster runtime, one server hosts N cells resolved by host before auth, WorkOS org→cell 1:1 with per-cell audience, tiered dedicated/pooled/on-prem on one binary | [rfc-013-tenancy-cells.md](rfc-013-tenancy-cells.md) |
+| Write-path latency — capture-once `WriteTxn`, version-pinned opens, one `GraphPublishAuthority` fed declarative `PublishPlan`s, manifest-authoritative lineage, epoch fence, bounded history (compaction + cleanup), and an IO-counted cost contract (`iss-write-s3-roundtrip-amplification`, `iss-991`) | [rfc-013-write-path-latency.md](rfc-013-write-path-latency.md) |
+| Tenancy model — cluster-as-tenant cells (silo the data, pool the compute): `CellRuntime` lifts the per-cluster runtime, one server hosts N cells resolved by host before auth, WorkOS org→cell 1:1 with per-cell audience, tiered dedicated/pooled/on-prem on one binary | [rfc-014-tenancy-cells.md](rfc-014-tenancy-cells.md) |

 ## Boundary

--- a/docs/dev/invariants.md
+++ b/docs/dev/invariants.md
@ -285,11 +285,14 @@ them explicit.
  because Lance branch names can be deleted/recreated at the same version number;
  the manifest e_tag is carried into synthetic snapshot ids when available, and
  a detected same-branch manifest refresh clears read caches as the fallback for
-  e_tag-less table locations/topology. Remaining: the internal metadata tables
-  (`__manifest`, `_graph_commits`) are still not compacted, so the probe and
-  refresh cost still grows with fragment count on a long-lived graph (the
-  `optimize`-covers-internal-tables follow-up); the commit graph is not yet
-  reconcilable from the manifest; and the traversal id-map is still rebuilt.
+  e_tag-less table locations/topology. Remaining: `optimize` now compacts the
+  internal metadata tables (`__manifest`, `_graph_commits`) too (RFC-013 step 2),
+  so a *periodically-optimized* graph keeps the probe/refresh/per-write scan flat
+  in history; but they are not yet brought into `cleanup` (version GC), so the
+  `_versions/` chain still grows until an explicit cleanup (the cleanup half is
+  deferred — it needs the Q8 cleanup-resurrection watermark first). The commit
+  graph is not yet reconcilable from the manifest; and the traversal id-map is
+  still rebuilt.
 - **Commit-graph parent under concurrency:** `record_graph_commit` now refreshes
  the commit-graph head from storage before appending, so a same-branch write
  after an external commit no longer forks the commit DAG by parenting off a
--- a/docs/dev/rfc-013-write-path-latency.md
+++ b/docs/dev/rfc-013-write-path-latency.md
--- a/docs/dev/rfc-014-tenancy-cells.md
+++ b/docs/dev/rfc-014-tenancy-cells.md
@ -1,4 +1,4 @@
-# RFC-013: Tenancy model — cluster-as-tenant cells, pooled compute
+# RFC-014: Tenancy model — cluster-as-tenant cells, pooled compute

 **Status:** Proposed — general architecture (server topology, identity, deployment).
 **Date:** 2026-06-16
--- a/docs/dev/testing.md
+++ b/docs/dev/testing.md
@ -27,6 +27,8 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
 | `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs, nor open datasets directly (`Dataset::open` / `DatasetBuilder::from_uri`/`from_namespace`) — reads route through `Snapshot::open` and the held-handle cache; `// forbidden-api-allow: <reason>` sentinel exempts reviewed lines |
 | `lance_surface_guards.rs` | Pins the Lance API surfaces omnigraph depends on (named runtime + compile-only guards; see [lance.md](lance.md)) — the first smoke check on any Lance version bump; e.g. `compact_files_still_fails_on_blob_columns` turns red when the upstream blob-compaction fix lands |
 | `warm_read_cost.rs` | Cost-budget tests for the warm read path (query-latency work), measured at the object-store boundary with Lance `IOTracker` (the LanceDB IO-counted pattern): a warm same-branch read does 0 manifest opens, 0 commit-graph opens, 1 version probe, validates the schema once (Fix 1 / finding A / Fix 2 at commit-history depth); stale same-branch reads perform exactly 2 probes and refresh manifest-only; recreated non-main branches with the same Lance version refresh by incarnation; recreated branch-owned table handles are distinguished by table e_tag or refresh-time cache clearing; recreated traversal topology is protected by synthetic snapshot-id incarnation or refresh-time cache clearing; a warm *repeat* read does 0 table opens via the held-handle cache and a write re-opens only the changed table at its new version/e_tag (Fix 3/6A). See "Cost-budget tests" below |
+| `write_cost.rs` | Cost-budget tests for the WRITE path (RFC-013), the latency twin of `warm_read_cost.rs` on the **shared `helpers::cost` harness** (`measure`/`IoCounts`/`assert_flat`/`local_graph`). Runs on **local FS**; gates the **internal-table** term (`__manifest`/`_graph_commits` scans flat in commit-history depth — `internal_table_scans_are_flat_in_history`, now **green every-PR** since RFC-013 step 2 brought the internal tables into `optimize`; the test compacts at each depth before measuring) plus green every-PR guards (single-insert `data_writes` bounded, a per-write read-op ceiling that fails the moment a round-trip is added, and a `measure_with_staged` fitness assert that a keyed insert routes through `stage_merge_insert` once with no `stage_append`/vector-index build). The **data-table opener** term is S3-only — see `write_cost_s3.rs` and the backend-split note in "Cost-budget tests" below |
+| `helpers/cost.rs` | The shared cost-budget harness (not a test): `IoCounts`/`StagedCounts` (counts by table class), `measure`/`measure_with_staged` (the one place the `with_query_io_probes` + `MergeWriteProbes` task-local + `IOTracker` wiring lives), `assert_flat(curve, select, slack, what)`, and store-agnostic `local_graph`/`s3_graph` fixtures. `warm_read_cost.rs`, `write_cost.rs`, and `write_cost_s3.rs` all consume it so a cost test body is written once and reads in one vocabulary |
 | `lifecycle.rs` | Graph lifecycle, schema state |
 | `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) |
 | `changes.rs` | `diff_between` / `diff_commits` |
@ -70,9 +72,10 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav

 ## RustFS / S3 integration

-CI runs three S3-backed tests against a containerized RustFS server (`.github/workflows/ci.yml` → `rustfs_integration` job):
+CI runs these S3-backed tests against a containerized RustFS server (`.github/workflows/ci.yml` → `rustfs_integration` job):

 - `cargo test -p omnigraph-engine --test s3_storage`
+- `cargo test -p omnigraph-engine --test write_cost_s3` (RFC-013 step 3a's data-table opener cost gate — flat across commit depth on S3; the term local FS can't reproduce)
 - `cargo test -p omnigraph-server --test s3` (single-graph serving + config-free `--cluster s3://` boot)
 - `cargo test -p omnigraph-cluster --test s3_cluster` (full control-plane lifecycle on the bucket)
 - `cargo test -p omnigraph-cli --test system_local local_cli_s3_end_to_end_init_load_read_flow`
@ -127,7 +130,7 @@ When you pick up any change, walk through this:
 6. **For substrate-touching changes** (Lance behavior), reach for `failpoints` or fixture-driven scenarios, not stubbed-out mocks.
 7. **For server / API changes**, confirm the OpenAPI regeneration happens in `openapi.rs` and that the diff lands in `openapi.json`.
 8. **Verify your change makes an existing test fail before it makes the new one pass.** If you can break the code without breaking a test, your coverage gap is the problem to fix first.
-9. **Bound hot-path cost at history depth.** If the change touches a read or open path, add or extend a test that asserts a *bounded* cost (e.g. a warm same-branch read performs zero `Dataset::open`, or a fixed object-op count) against a fixture with realistic *commit-history depth*, not just realistic row counts. Cost that scales with history is invisible on a shallow fixture and only bites in production. See "Cost-budget tests" below.
+9. **Bound hot-path cost at history depth.** If the change touches a read, **write**, or open path, add or extend a test that asserts a *bounded* cost (e.g. a warm same-branch read performs zero `Dataset::open`, or a per-write read-op count flat across commit depth) against a fixture with realistic *commit-history depth*, not just realistic row counts. Reuse the shared `helpers::cost` harness (`measure`/`IoCounts`/`assert_flat`) — don't hand-roll `IOTracker` wiring. Cost that scales with history is invisible on a shallow fixture and only bites in production. See "Cost-budget tests" below.

 ## Cost-budget tests: bound hot-path cost at history depth

@ -135,6 +138,7 @@ Correctness bugs fail loudly in tests; cost-scaling bugs pass every test and deg

 - **Assert a cost budget, not just a result.** For a read/open path, assert the number of `Dataset::open` calls (or object-store ops) a warm query performs, and that it does not grow with commit count. The reference is LanceDB's IO-counted tests, which assert a cached read costs 0-1 IO and carry a named regression test against "a list call on every subsequent query."
 - **Test at history depth.** Build a fixture with many *commits* (not many rows) and assert warm-read cost is flat across depths. A shallow fixture cannot catch an O(commits) cost.
+- **Use the shared harness, and gate each term on the backend where it manifests.** `helpers::cost` (`measure`/`IoCounts`/`assert_flat`/`local_graph`/`s3_graph`) is the one place the `IOTracker`/task-local plumbing lives — consume it, don't duplicate it. The write path has *two distinct* depth terms that split cleanly across backends, and conflating them is a real trap (the local data-table read count grows with depth too, but for a different reason — the merge-insert/RI scan reading O(depth) *fragments*, reduced by compaction, not by the opener): (1) the **internal-table** scan term (`__manifest`/`_graph_commits` fragment scans) reproduces on **any** backend including local FS, so `write_cost.rs` gates it on local every-PR; (2) the **data-table opener** term (latest-version resolution) is a per-object-store-RPC phenomenon — local-FS resolves latest with one cheap `read_dir` regardless of the opener used, so the namespace-vs-direct difference is **invisible on local** and only shows on a real object store (per-version GETs), gated by the bucket-gated `write_cost_s3.rs`. Same harness, different fixture; each term asserted where it actually appears.
 - This is the testing companion to invariant 15 in [docs/dev/invariants.md](invariants.md) (hot-path cost is bounded by work, not history).

 When in doubt, re-read [docs/dev/invariants.md](invariants.md) — quality gates apply to every change.
--- a/docs/releases/v0.7.1.md
+++ b/docs/releases/v0.7.1.md
@ -0,0 +1,67 @@
+# Omnigraph v0.7.1
+
+A patch release on top of v0.7.0: three correctness fixes (camelCase filters,
+cluster-apply crash loops, branch-merge OOM on embedding tables), one CLI
+catalog-metadata improvement, and a warm-read performance fix. No breaking
+changes, no on-disk format change, and no migration — drop-in over v0.7.0.
+
+## Fixes
+
+- **camelCase property filters now execute (#283).** A query — or a chained
+  mutation — that filtered on a camelCase schema field (e.g. `repoName`) linted
+  and planned cleanly but failed at run time with `No field named reponame.
+  Column names are case sensitive.` The identifier's case was destroyed at two
+  engine→Lance boundaries: the read-filter pushdown built the column with a
+  case-normalizing constructor, and the pending-batch mutation scan re-parsed
+  the predicate through a normalizing SQL context. Both now preserve case (the
+  read path uses a case-preserving column reference; the pending scan disables
+  SQL identifier normalization), so camelCase fields work consistently in read
+  and write predicates and a camelCase `@index` equality still routes to the
+  scalar index. The fix is correct-by-construction rather than a per-query
+  guard; a regression test pins index routing so a silent full-scan fallback
+  can't slip back in.
+
+- **`cluster apply` no longer crash-loops a booting server (#284).** Applying a
+  schema change while a graph had non-main (agent/review) branches, or a
+  migration that needed a backfill, could throw a freshly-booting
+  `omnigraph-server --cluster` into an unescapable crash loop. Neither input is
+  an engine bug — the engine rejects both cleanly and before moving any graph
+  state — but `cluster apply` wrote a recovery sidecar before calling the
+  engine and left it in place on the clean rejection, and the server refuses to
+  boot while a sidecar is pending. The asymmetric-cleanup path is fixed so a
+  pre-movement rejection leaves no stale sidecar, breaking the loop.
+
+- **Branch-merge fast-forward no longer OOMs on embedding tables (#277).** A
+  branch→main fast-forward merge of a forked, embedding-bearing table
+  re-derived the whole branch through a single Lance `merge_insert` — a
+  full-outer hash join over the entire delta — which exhausted the DataFusion
+  memory pool on high-dimensional embeddings (e.g. 8k rows × 3072-dim) and hung
+  or failed the merge. New rows now stream through `stage_append` (no hash
+  join), only genuinely-changed rows are upserted, embeddings are no longer
+  stringified to diff them, and index coverage defers to the reconciler, so a
+  fast-forward merge completes in bounded work. The three-way merge path is
+  unchanged.
+
+## Improvements
+
+- **`omnigraph queries list` surfaces stored-query `@description` /
+  `@instruction` (#280).** The CLI now shows a stored query's catalog metadata —
+  what it does and how to invoke it — in both human and `--json` output,
+  matching what `GET /queries` already returned. Previously both fields were
+  silently dropped on the CLI side.
+
+- **Warm reads no longer pay an O(history) metadata tax (#268).** Warm reads
+  used to re-derive per-query metadata (coordinator re-open, `__manifest` +
+  commit-graph re-scans, per-table re-open, double schema validation) on a cost
+  that scaled with commit history and never warmed up. A warm same-branch read
+  now does one cheap version probe, one schema read, and zero table opens on a
+  warm repeat (warm coordinator reuse, open-by-location+version, validate-once,
+  held `Dataset` handles + one shared Lance `Session` per graph). This also
+  closes a commit-DAG fork where a same-branch write after an external commit
+  could append off a stale cached head.
+
+## Upgrade notes
+
+Drop-in over v0.7.0 — no configuration, schema, or data changes. Upgrade the
+server and CLI together as usual. Graphs created on v0.7.0 read and write
+identically on v0.7.1.
--- a/docs/user/cli/reference.md
+++ b/docs/user/cli/reference.md
@ -28,7 +28,7 @@ Top-level command families and subcommands. Graph-targeting commands accept a po
 | `policy validate \| test \| explain` | Cedar tooling against a cluster's applied policies (`--cluster <dir>`; `--graph <id>` picks a graph's bundle when several apply). `test` takes `--tests <file>`; `explain` takes `--actor`/`--action`/`--branch`/`--target-branch` |
 | `queries list \| validate` | inspect a cluster's applied stored-query registry (`--cluster <dir\|uri>`; `--graph <id>` to scope one graph). `list` prints each query's kind (read/mutation), name, typed params, and `[mcp: …]` exposure; a query's `@description`/`@instruction` are shown as indented `description:` / `instruction:` lines when declared (omitted otherwise). `--json` emits `{name, mcp_expose, tool_name, mutation, params}` plus `description`/`instruction` **only when present** — matching the HTTP `GET /queries` catalog ([server.md](../operations/server.md)). `validate` type-checks the registry and exits non-zero on a broken query |
 | `profile list \| show [<name>]` | read-only inspection of `~/.omnigraph/config.yaml` profiles. `list` shows each profile's binding (server/cluster/store) + default graph and marks the `$OMNIGRAPH_PROFILE`-active one; JSON keeps `binding` and adds `scope_kind`, `target`, `valid`, and `error`; `show` resolves one profile's scope (endpoint + default graph), defaulting to the active profile, else the flat operator defaults |
-| `version` / `-v` | print `omnigraph 0.3.x` |
+| `version` / `-v` | print `omnigraph 0.7.x` |

 ## Command capabilities

@ -189,22 +189,26 @@ omnigraph cluster import   --config company-brain --json
 omnigraph cluster force-unlock <LOCK_ID> --config company-brain --json
 ```

-`--config` is a directory containing `cluster.yaml`; it defaults to `.`.
-Stage 3A accepts graphs, schemas, stored queries, and policy bundle file
+`--config` is a directory containing `cluster.yaml`; it defaults to `.`. The
+config declares graphs, schemas, stored queries, and policy bundle file
 references. `cluster plan` reads local JSON state from
 `<config-dir>/__cluster/state.json`; a missing file means empty state. Plan,
 apply, refresh, and import acquire `__cluster/lock.json` by default and release
-it before returning. `cluster apply` executes only stored-query/policy catalog
-writes (content-addressed under `__cluster/resources/`) and requires an
-existing `state.json`; graph/schema changes are deferred with warnings, and
-applied resources do not serve traffic until an `omnigraph-server --cluster
-<dir>` restart picks them up. `cluster status` reads state only and reports any existing
-lock metadata. `force-unlock` removes a lock only when the supplied id exactly
-matches the lock file. `refresh` requires an existing `state.json`; `import`
-creates one only when it is missing. Both observe declared graphs read-only at
-`<config-dir>/graphs/<graph-id>.omni`. External state backends, graph/schema
-apply, automatic stale-lock breaking, `plan --refresh`, pipelines, UI specs,
-embeddings, aliases, and bindings are reserved for later stages. See
+it before returning. `cluster apply` converges the cluster to its config in one
+ordered run: it creates declared graphs, applies schema updates (soft drops
+only — see [schema](../schema/index.md)), writes stored-query/policy catalog
+resources (content-addressed under `__cluster/resources/`), and executes
+approved graph deletes; it requires an existing `state.json` (run `import`
+first). Applied state does not serve traffic until an `omnigraph-server
+--cluster <dir>` restart picks up the new revision. Standalone schema deletes
+remain unsupported and are reported as `deferred` with a warning. `cluster
+status` reads state only and reports any existing lock metadata. `force-unlock`
+removes a lock only when the supplied id exactly matches the lock file.
+`refresh` requires an existing `state.json`; `import` creates one only when it
+is missing. Both observe declared graphs read-only at
+`<config-dir>/graphs/<graph-id>.omni`. External state backends, automatic
+stale-lock breaking, `plan --refresh`, pipelines, UI specs, embeddings,
+aliases, and bindings are not yet supported. See
 [cluster-config.md](../clusters/config.md).

 ## Output formats (`query` command, alias: `read`)
@ -221,9 +225,12 @@ Precedence (high to low): explicit `--params` / `--params-file`, alias positiona

 ## Bearer token resolution (CLI)

-1. `graphs.<name>.bearer_token_env`
-2. `OMNIGRAPH_BEARER_TOKEN` global env
-3. `auth.env_file` referenced `.env`
+See **Credentials keyed by server name** above: a remote command resolves its
+token via `OMNIGRAPH_TOKEN_<NAME>` env → the `[<name>]` section in
+`~/.omnigraph/credentials` → the default `OMNIGRAPH_BEARER_TOKEN` env, and a
+keyed token is only ever sent to the server it is keyed to. Plaintext tokens are
+never stored in operator config; the removed `omnigraph.yaml` keys
+(`graphs.<name>.bearer_token_env`, `auth.env_file`) no longer exist.

 ## Duration parsing (cleanup)

--- a/docs/user/clusters/config.md
+++ b/docs/user/clusters/config.md
@ -212,7 +212,7 @@ resource is planned as a create. If present, the file must use this shape:
 ```

 `state_revision`, `resource_statuses`, `approval_records`, `recovery_records`,
-and `observations` are optional so older Stage 1 state fixtures keep working.
+and `observations` are optional so earlier state fixtures keep working.
 Missing `state_revision` is treated as `0`. Resource status values are
 `pending`, `planned`, `applying`, `applied`, `drifted`, `blocked`, or `error`.

@ -238,9 +238,10 @@ profile in the ledger; pre-profile ledgers are backfilled by an Update with
 catalog changes and count toward convergence.

 Each plan change carries a `disposition` field — an honest preview of what
-`cluster apply` will do with it in this stage: `applied` (executes), `derived`
-(a `graph.<id>` composite-digest update that converges automatically once its
-query digests land), `deferred` (graph/schema change, later phase), or
+`cluster apply` will do with it: `applied` (executes — graph creates, schema
+updates, catalog writes, approved deletes), `derived` (a `graph.<id>`
+composite-digest update that converges automatically once its query digests
+land), `deferred` (an unsupported change, e.g. a standalone schema delete), or
 `blocked` (query/policy gated by an unapplied or missing dependency, with the
 condition in `reason`).

@ -496,5 +497,5 @@ matches the argument. A wrong id, missing lock, invalid lock JSON, or unsupporte
 lock version exits non-zero and leaves the file untouched.

 This is manual recovery for abandoned local locks. OmniGraph does not perform
-PID-liveness checks, TTL expiry, stale-lock breaking, or automatic unlock in
-Stage 2C.
+PID-liveness checks, TTL expiry, stale-lock breaking, or automatic unlock
+today.
--- a/docs/user/operations/maintenance.md
+++ b/docs/user/operations/maintenance.md
@ -6,6 +6,8 @@

 - Compacts every node + edge table on `main`, then reindexes them, then **publishes the resulting version to the `__manifest`** so the manifest's recorded version tracks the compacted-and-reindexed state. Reads pin the manifest version, so without this publish the work would be invisible to readers *and* would break the version precondition of the next schema apply / strict update/delete ("stale view … refresh and retry"). The publish advances the graph version (a system-attributed commit) only for tables that actually changed.
 - Rewrites small fragments into fewer large ones; old fragments remain reachable via older versions until `cleanup` runs.
+- **Also compacts the internal system tables** `__manifest`, `_graph_commits`, and `_graph_commit_actors` (RFC-013 step 2), which accumulate one fragment per commit (the actor table only on the authenticated write path, where every commit carries an actor) and otherwise make every write's metadata scan grow with history. These take a simpler path than data tables: they are not `__manifest`-tracked (readers open them at their latest version), so compaction just advances their version in place — **no manifest publish and no recovery sidecar**. (The sidecar-free property is not because it is one commit — `compact_files` can emit a `ReserveFragments` commit before the `Rewrite`, and the auto-cleanup strip below is a further commit — but because every one of those commits is content-preserving and the table is read at its latest version, so a crash at any point leaves it readable and content-identical and the next `optimize` re-plans.) They appear in the returned stats under `table_key` `"__manifest"` / `"_graph_commits"` / `"_graph_commit_actors"` (the latter two only when present). They are **not yet covered by `cleanup`**, so their version chain still grows until the cleanup half lands (it requires a cleanup-resurrection safeguard first); run `optimize` on a cadence to keep per-write metadata scans flat.
+- **`optimize` is non-destructive by construction — it never garbage-collects versions, on any table (data or internal).** Compaction rewrites fragments and advances the version; old versions stay reachable until you run `cleanup`. This holds even for a graph created by an older binary that stored an on-by-default Lance `auto_cleanup` hook: `compact_files` / `optimize_indices` commit with the hook enabled and expose no skip override, so before compacting **any** table `optimize` strips its stale `lance.auto_cleanup.*` config first, so Lance's commit-time GC hook cannot fire and silently prune `__manifest`-pinned versions. (Graphs created by current binaries store no such config; the strip is the upgrade-path safety net.) The internal-table path additionally tolerates a concurrent live writer: it runs a **bounded** rebase-and-retry, so transient contention does not fail the operator's `optimize` or the live write — but sustained contention past the retry budget surfaces a loud conflict error rather than looping forever (bounded and observable, not a silent give-up). The data-table path holds the per-table write queue while it compacts, so it does not contend with mutations on that table in the first place.
 - **Reindex (index coverage maintenance).** A scalar/FTS/vector index only covers the fragments it was built over. Rows appended after the index was built (e.g. by `load --mode merge`, whose commit does not rebuild an already-existing index) are scanned unindexed, and compaction itself rewrites fragments out of an index's coverage. `optimize` runs Lance's incremental `optimize_indices` after compaction to fold those fragments back in (a delta merge, not a full retrain), restoring full coverage so equality/range/traversal predicates stay index-accelerated. This is why a table with **no compaction work but stale index coverage still commits** a new version under `optimize`. Run `optimize` on a cadence at least as frequent as your freshness window so recently-loaded rows do not linger in the unindexed flat-scan tail.
 - **Create declared-but-missing indexes (the index reconciler).** `@index`/`@key` declares intent; `schema apply` records it but builds nothing, and `load`/`mutate` defer a column that cannot be built yet (a `Vector` column with no trainable vectors). `optimize` materializes any such declared-but-unbuilt index over the compacted layout — so it is the convergence path for an `@index` added after data exists, or a vector index whose embeddings arrived via a later `embed`. A column still not buildable (no vectors yet) is reported on the table's stat as `pending_indexes` (visible in `--json`), not treated as a failure; the next `optimize` retries. So `optimize` is the single operator-facing index reconciler: it compacts, restores coverage, **and** builds declared-but-missing indexes.
 - Each table's compact→reindex→publish serializes with concurrent mutations on the same table. A crash mid-operation is recovered automatically on the next open (both compaction and reindex are content-preserving, so roll-forward is always safe).
--- a/docs/user/operations/server.md
+++ b/docs/user/operations/server.md
@ -40,7 +40,7 @@ storage root, with no local config directory. `--bind`,

 ### Stored-query validation at startup

-If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup**. Query parse/type failures quarantine that graph; if no graph remains healthy, startup refuses. Two MCP-exposed queries claiming the same tool name are likewise graph-local startup failures. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the exposed queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below).
+If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup**. Query parse/type failures quarantine that graph; if no graph remains healthy, startup refuses. Two MCP-exposed queries claiming the same tool name are likewise graph-local startup failures. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the stored queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below).

 ## Endpoint inventory

@ -77,6 +77,11 @@ Server-level management endpoints:
 |---|---|---|---|
 | GET | `/graphs` | bearer + `graph_list` on `Server::"root"` | list ready/served graphs |

+> The per-graph subsections below name routes in shorthand (`GET /queries`,
+> `POST /query`, `POST /mutate`, `POST /queries/{name}`); every one is served
+> under the `/graphs/{id}/…` prefix shown in the table — only `/graphs` and
+> `/healthz` are flat.
+
 ### Stored-query catalog (`GET /queries`)

 List the graph's **exposed** (`@mcp(expose: true)`) stored queries as a typed tool catalog — enough for a client to register each as a tool without fetching `.gq` source. (The server also projects these queries as live MCP tools at `POST /graphs/{id}/mcp` — see [mcp.md](mcp.md); this catalog endpoint is the REST view of the same registry.) Each entry: `{ name, tool_name, description, instruction, mutation, params }`, where each param is `{ name, kind, item_kind?, vector_dim?, nullable, description? }`. `kind` is one of `string | bool | int | bigint | float | date | datetime | blob | vector | list` (decomposed so a consumer maps it with a closed `switch`, never re-parsing GQ type spelling). `bigint` (I64/U64), `date`, `datetime`, and `blob` are carried as JSON **strings** — a 64-bit integer loses precision as a JSON number, dates are ISO strings, and a blob is a URI string.
@ -179,8 +184,8 @@ Uniform `ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? }` wi
 caller's pre-write view of one table's manifest version was stale.
 `ManifestConflictOutput { table_key, expected, actual }` tells the client
 which table to refresh and retry. This is the conflict shape produced by
-concurrent `/mutate` (or its `/change` alias) or `/ingest` calls landing
-the same `(table, branch)` race.
+concurrent `/mutate` (or its `/change` alias), `/load` (or its deprecated
+`/ingest` alias) calls landing the same `(table, branch)` race.

 HTTP status codes used: 200, 400, 401, 403, 404, 409, 429, 500.

@ -207,7 +212,8 @@ Cedar policy authorization runs **before** admission accounting so
 denied requests don't consume admission slots.

 Today admission gates every mutating handler: `/mutate` (and its
-deprecated alias `/change`), `/ingest`, `/branches/{create,delete,merge}`,
+deprecated alias `/change`), `/load` (and its deprecated alias `/ingest`),
+`/branches/{create,delete,merge}`,
 and `/schema/apply`. Read-only endpoints (`/snapshot`, `/query`, `/read`,
 `/export`, `/branches` GET, `/commits`, `/schema` GET) are not
 admission-gated.
@ -215,7 +221,7 @@ admission-gated.
 ## Body limits

 - Default: 1 MB
- `/ingest`: 32 MB
+- `/load` (and its deprecated `/ingest` alias): 32 MB

 ## Auth model (`bearer + SHA-256`)

@ -243,7 +249,7 @@ See [deployment.md](../deployment.md) for token-source operational details.

 - CORS — not configured; add `tower_http::cors` if needed.
 - Rate limiting — per-actor admission control gates `/mutate` (alias
-  `/change`), `/ingest`, `/branches/{create,delete,merge}`,
+  `/change`), `/load` (alias `/ingest`), `/branches/{create,delete,merge}`,
  `/schema/apply` (see "Per-actor
  admission control" above). No global rate limiter is configured;
  add `tower_http::limit` if a graph-wide cap is needed.
--- a/docs/user/reference/constants.md
+++ b/docs/user/reference/constants.md
@ -18,7 +18,7 @@
 | Expand CSR-build cost factor | `CSR_BUILD_FACTOR = 1.5` | traversal |
 | Expand mode override | `OMNIGRAPH_TRAVERSAL_MODE` (`indexed`\|`csr`; unset = cost-based auto) | traversal |
 | Default body limit | `1 MB` | HTTP server |
-| Ingest body limit | `32 MB` | HTTP server |
+| Load (bulk-write) body limit | `32 MB` | HTTP server (`/load`; shared by the deprecated `/ingest` alias) |
 | Default embed provider/model | `openai-compatible` / `openai/text-embedding-3-large` | engine embedding |
 | OpenAI-direct embed model | `text-embedding-3-large` | engine embedding |
 | Gemini-direct embed model | `gemini-embedding-2` | engine embedding |
--- a/docs/user/schema/index.md
+++ b/docs/user/schema/index.md
@ -72,6 +72,8 @@ Applying a plan reports whether it was supported, the steps applied, and the res

 `DropProperty` and `DropType` steps default to `Soft` mode: the catalog tombstones the entry but the prior column / dataset remains time-travel-reachable via `snapshot_at_version(prev)` until `omnigraph cleanup` runs. Soft drops are reversible.

-Pass `--allow-data-loss` (CLI) or `allow_data_loss: true` (HTTP `POST /schema/apply` body, SDK `SchemaApplyOptions`) to promote every drop in the plan to `Hard` mode. Hard drops run `cleanup_old_versions` on the affected dataset immediately after the manifest publish, making the prior column / dataset unreachable. **Irreversible.**
+Pass `--allow-data-loss` (CLI `schema apply`) or `allow_data_loss: true` (SDK `SchemaApplyOptions`) to promote every drop in the plan to `Hard` mode. Hard drops run `cleanup_old_versions` on the affected dataset immediately after the manifest publish, making the prior column / dataset unreachable. **Irreversible.**

-The flag is honored uniformly across transports — `omnigraph schema apply --allow-data-loss`, `POST /schema/apply { schema_source, allow_data_loss: true }`, and `apply_schema_with_options(.., SchemaApplyOptions { allow_data_loss: true })` produce identical plans and identical effects.
+This is the **direct/embedded** schema-apply path — `omnigraph schema apply --store …` and the embedded SDK `apply_schema_with_options(.., SchemaApplyOptions { allow_data_loss: true })` produce identical plans and identical effects.
+
+**Cluster-managed graphs are different.** A graph served from a cluster evolves only through `omnigraph cluster apply`, which performs **soft drops only** (no `allow_data_loss` path), and the HTTP `POST /schema/apply` route is **disabled (returns 409) for cluster-backed serving** — see [server](../operations/server.md) and [cluster-config](../clusters/config.md). Direct `schema apply` against a cluster-managed storage path is likewise refused.
--- a/docs/user/search/indexes.md
+++ b/docs/user/search/indexes.md
@ -22,7 +22,7 @@ list/`Blob` columns → none.

 > **Coverage and cost.** Each indexed column adds index files and build time, and
 > an index only covers the fragments it was built over. Rows appended after the
-> index was built (e.g. by `ingest --mode merge`) are scanned unindexed until a
+> index was built (e.g. by `load --mode merge`) are scanned unindexed until a
 > reindex extends coverage; see [maintenance](../operations/maintenance.md) → `optimize`.

 ## L2 — OmniGraph orchestration