* docs(rfc-013): bank the #295 spec-review comments as step-5 constraints (§5.1) 3b shipped a minimal WriteTxn{branch,base} and deferred the full §4.1 opener unification (pinned-base opener, shared Session, write-local handle cache, strict-op conflict-timing move) to step 5. The greptile comments on the #295 spec were moot for #298 (none of those constructs were built) but are load-bearing for step 5: (1) the handle cache must be Send+Sync (Mutex, not RefCell); (2) the strict-op timing move needs an explicit retry contract — txn discarded after any commit, retry re-opens a fresh base — which is the SAME contract as the stale-view false-fail (§1d.2); (3) the opener-equivalence test must advance HEAD externally then assert pinned-base, not the trivial HEAD==base. * feat(engine): fold graph lineage into the __manifest publish CAS (RFC-013 Phase 7) Graph lineage no longer lives in a second write to _graph_commits.lance. Each commit's graph_commit + graph_head:<branch> rows now ride the SAME __manifest merge-insert as the table-version rows (one atomic version), and CommitGraph reads its cache from the manifest projection (read_graph_lineage). _graph_commits.lance is no longer written commit rows (it remains only as a Lance branch-ref carrier). Mechanism: a LineageIntent { graph_commit_id (ULID, minted once), branch, actor, merged_parent, created_at } threads through ManifestBatchPublisher::publish. Inside the publisher retry loop the parent is resolved per attempt from the just-loaded branch-scoped manifest (the should_replace_head winner over the visible graph_commit rows — branch-correct by Lance branch isolation; the graph_head row is written for forward-compat + the §7.1 contention point but is not the parent source, so a freshly-forked branch resolves the right fork-point parent). A CAS-conflict retry re-reads the advanced head → correct new parent; the commit_id is stable across retries. Closes two known gaps BY CONSTRUCTION (one write, no second step to fail/ race): - manifest→commit-graph atomicity (no crash window between manifest + lineage), - commit-graph parent under concurrency (no refresh→append TOCTOU; the per-write commit_graph.refresh() is gone). Recovery, branch-merge, and genesis route their lineage through the same CAS (merge: one commit_merge_with_actor; recovery: publish_recovery_commit folds the recovery commit, actor=omnigraph:recovery; genesis rides the init __manifest write). The dead _graph_commits write helpers (append_commit/_merge/_actor) are #[allow(dead_code)] (the actor sidecar table is still enumerated by optimize). Verified (sequential): build clean; the new lineage_projection gate (manifest-only — _graph_commits/_actors have 0 rows; full lineage reconstructs via the projection); branching/merge_truth_table (exhaustive, branch-aware)/composite_flow/point_in_time/ changes/consistency/recovery; failpoints (59, incl. recovery lifecycle + the now-closed atomicity gap); full --workspace. Cost tests REVERT to their pre-fold values (writes +1, write_cost ceiling 80) — the proof of true single-CAS (no extra write). invariants.md marks both gaps CLOSED. PENDING (next stages, this PR): the §7.1 concurrent graph_head one-winner gate (stage 5 — two concurrent same-branch commits, exactly one wins); the stamp bump v4 + migrate_v3_to_v4 backfill + read-only refuse for EXISTING graphs (stage 4); full doc-sync of storage.md/architecture.md/writes.md. * feat(engine): migrate existing v3 graphs to manifest lineage (RFC-013 Phase 7 stage 4) The Phase-7 fold made CommitGraph read lineage from the __manifest projection, so a pre-Phase-7 (internal-schema v3) graph — lineage in _graph_commits.lance, none in __manifest — would read an empty commit DAG. Stage 4 makes existing graphs upgrade seamlessly and not break reads. - Stamp 3 -> 4 + migrate_v3_to_v4: bumps INTERNAL_MANIFEST_SCHEMA_VERSION and adds the 3 => migrate_v3_to_v4 arm. The migration reads this branch's _graph_commits/_actors, emits one graph_commit row per commit + exactly one graph_head:<branch> for the head (should_replace_head winner, deterministic id-sort — no hash-map-order in migration output), merge-inserts into __manifest, then set_stamp(4) LAST. Idempotency guard first (read_graph_lineage non-empty -> just stamp); crash before set_stamp re-enters at v3 and the guard completes it. Does NOT touch the unenforced-PK metadata. Runs per branch: migrate_on_open backfills main; load_publish_state backfills each branch on its first write (root_uri/branch threaded through migrate_internal_schema). - v3-read fallback: CommitGraph version-gates the lineage source — stamp < 4 reads the (re-activated) _graph_commits.lance; >= 4 uses the manifest projection. So a READ-ONLY open of an un-migrated graph reads correct history with no write. Correctness catch: the legacy _graph_commit_actors.lance was never branched, so the fallback reads it FLAT (no branch checkout) while checking out the branch only on the commits dataset. - Read-only stamp-refuse: a ReadOnly open of a FUTURE-stamped graph now refuses with the same upgrade error (future-proofing the next format bump; the write path already refused via migrate_internal_schema). - Docs: storage/architecture/writes/invariants/constants updated to manifest-stored lineage; release note docs/releases/v0.8.0.md (format v4, old writers clean-break, data preserved, upgrade writers first). 6 new tests (v3 backfill, idempotent, v3 read-only fallback, future-stamp refuse in both modes, crash-before-stamp completes, legacy branch+flat-actor read). Full engine suite + failpoints (59) + cargo test --workspace --locked green; check-agents-md passes. * test(engine): graph_head concurrency gate — disjoint same-branch writers form a linear commit DAG (RFC-013 Phase 7) Two (or N) writers committing disjoint tables on one branch still share the mutable `graph_head:<branch>` manifest row, so the only row-level CAS contention is that row. The contract — exactly one writer wins each CAS round; the loser retries inside the publisher, re-resolves its parent off the freshly-advanced head, and re-commits, so every writer lands and the graph_commit DAG stays a single LINEAR chain (no fork) — had no acceptance test. This adds it. - concurrent_disjoint_writes_share_head_and_form_linear_chain: two disjoint writers + distinct LineageIntent, tokio::join!; both commit; the on-disk DAG is genesis -> c -> c' (asserted linear: exactly one genesis, no two commits share a parent, the head is the unique non-parent). - n_concurrent_disjoint_writers_converge_to_one_linear_chain: N=8 disjoint writers each with an app-level retry loop (the publisher's internal budget can be exhausted under contention); all converge to one linear chain of 8. - concurrent_disjoint_writes_form_linear_chain_on_s3: the same race on a real object store (true conditional-put CAS), bucket-gated. Cites both tests from the §7.1 contention note in invariants.md. Test-only; no production change. * perf(engine): fold the lineage parent scan into the publish path's single __manifest scan (RFC-013 P2) Each lineage publish scanned `__manifest` twice: `load_publish_state` read table state via one scan, then `resolve_lineage_rows` did a second full `read_graph_lineage` scan only to find the parent commit. Fold the `graph_commit` extraction into the existing scan. - `read_manifest_scan` gains a `collect_lineage` flag. The publish path (`read_publish_scan`) collects the `graph_commit` rows in the same pass; the table-state hot path leaves them in the forward-compat skip arm, so it never pays the O(commits) lineage JSON decode (it also skips reading the `object_id` column entirely). One shared `decode_graph_commit_row` serves both the folded path and the standalone `read_graph_lineage`, so the two cannot drift. - `resolve_lineage_rows` is now sync and takes the already-parsed rows; the per-attempt re-read is preserved because `load_publish_state` runs once per CAS attempt, so a retry still re-parents off the advanced head. - `load_publish_state` returns a named `LoadedPublishState` instead of a four-tuple; the thin `read_registered_table_locations` / `read_tombstone_versions` accessors fold away. `read_manifest_entries` becomes `#[cfg(test)]`: the fold removes its last production caller, leaving only the test-only namespace module (`db/manifest.rs`: `#[cfg(test)] mod namespace`), so gating it keeps it from becoming dead code in non-test builds. Measured at depth ~5: per-write `__manifest` reads drop 44 -> 26 (total reads 54 -> 36). write_cost.rs gains a `manifest_reads <= 34` sub-ceiling that trips if a publish-path scan is re-added, and its calibration comment is corrected. * test(engine): red — transient legacy-open failure silently completes the v3→v4 migration A pre-Phase-7 (internal schema v3) graph keeps its graph lineage in `_graph_commits.lance`; the v3→v4 internal-schema migration backfills it into `__manifest` and stamps v4. `read_legacy_commit_cache` currently maps EVERY `Dataset::open` error to "no legacy data" (`Err(_) => empty`), so a transient or corrupt open during the one-time migration backfills nothing and still stamps v4 — orphaning the real lineage permanently (the migration runs once; the v3 fallback is then disabled). Add a `migration.v3_to_v4.legacy_open` failpoint that injects a non-not-found Lance error at the legacy open, and a fault-injection regression test in the `failpoints` binary. Against the current swallow the migration completes anyway, so the test fails on its "migration must abort" assertion — the predicted symptom. The fix follows in the next commit. Test support reachable from the `failpoints` integration binary (it compiles the crate without `cfg(test)`): the v3-fixture helpers and a stamp/row-count reader are gated `cfg(any(test, feature = "failpoints"))`, still excluded from release builds. Failpoint tests stay in the integration binary because the fail registry is process-global. * fix(engine): propagate non-not-found legacy-open errors in the v3→v4 migration `read_legacy_commit_cache` mapped EVERY `Dataset::open` error to an empty cache (`Err(_) => empty`) on both the legacy commits dataset and its actor sidecar. The v3→v4 internal-schema migration reads this once before stamping internal-schema v4; a transient or corrupt open therefore backfilled nothing and stamped v4 anyway, orphaning the graph's real lineage permanently (the migration runs once, and the stamp-gated v3 fallback is disabled at v4). This is the "no silent failures" deny-list violation, and realistic on object storage. Both opens now match the not-found variants — Lance maps an object-store NotFound to `DatasetNotFound` — as the benign "no legacy data" / "no authors" signal, and propagate anything else as a loud error. The two arms share the variant contract but carry different rationale (commits-absent is the legitimate empty signal; actor-sidecar-absent is benign, but a corrupt actor open silently wiping authorship before stamping v4 is the same loss hole), commented at each site. Pinned by the `lance_surface_guards.rs::dataset_open_missing_returns_not_found_variant` guard (turns red if a Lance bump changes the absence variant) and greens the fault-injection regression test from the previous commit. * test(engine): cover the per-branch v3→v4 migration against a real Lance branch `seed_legacy_v3_lineage` writes every commit (including the "feature"-tagged one) to MAIN's `_graph_commits.lance` with `manifest_branch` as a mere field, so the production per-branch migration path — `read_legacy_commit_cache` checking out a real Lance branch, and a branch-scoped `__manifest` — was never exercised. Add `seed_legacy_v3_lineage_with_branch`, which forks a real `feature` Lance branch on BOTH `_graph_commits.lance` and `__manifest` (the branch inherits main's stripped v3 state), and a test that migrates the BRANCH and asserts the branch's lineage lands in the BRANCH's `__manifest` (genesis + A + branch commit, `graph_head:feature` → branch commit, parents + actors intact) with main's `__manifest` untouched. This empirically resolves the open question behind the merge robustness work: the fast-path `read_graph_lineage(dataset)` has no `manifest_branch` filter, but `__manifest` is Lance-branched per graph-branch, so a branch reads only its own lineage — the test confirms migrating one branch does not leak into another. No branch filter is needed. * refactor(engine): type the lineage-backfill merge conflict via the publisher classifier `state::merge_lineage_rows` (the v3→v4 lineage backfill's standalone `__manifest` merge-insert) stringified its `execute_reader` error, discarding the Lance variant. Route it through the publisher's `map_lance_publish_error` (now `pub(crate)`) so a concurrent first-open's row-level CAS loss surfaces as the SAME typed `OmniError::Manifest{ details: RowLevelCasContention }` the publisher's own retry consumes — one vocabulary, no raw-Lance matching in the migration. Deliberately NOT unified with `optimize::is_retryable_lance_conflict`: that classifier also matches `CommitConflict`/`RetryableCommitConflict` from the compaction commit path, which a row-level merge-insert never emits. Cross-linked with a comment at both sites. Behavior-preserving: the only path that changes is the error TYPE on a CAS loss (previously an opaque `Lance` string, now a typed conflict); no success/failure outcome changes. The bounded re-open retry that consumes the new type lands next. * test(engine): red — concurrent v3→v4 migrations error instead of converging `migrate_v2_to_v3` is concurrent-runner idempotent by design; v3→v4 regressed it. `merge_lineage_rows` uses `conflict_retries(0)` and `migrate_v3_to_v4` has no app-level retry, so when two processes open the same legacy graph at once the backfill's row-level CAS loser errors the whole open instead of converging. The test opens two `__manifest` handles at the same pre-migration (v3, empty-lineage) HEAD and runs both `migrate_internal_schema` calls under `tokio::join!`, forcing the `graph_head:main` CAS to fire every run. Against the current code the loser fails with `RowLevelCasContention` ("Attempted 0 retries.") — the predicted symptom — so the "both must converge" assertion panics. The bounded re-open retry that makes both converge lands next. * fix(engine): make the v3→v4 lineage backfill converge under concurrent runners `migrate_v2_to_v3` is concurrent-runner idempotent; v3→v4 was not. Two processes (or open-for-write handles) opening the same legacy graph at once both reach the backfill merge, and `merge_lineage_rows`'s `conflict_retries(0)` made the row-level CAS loser error the whole open instead of converging. Two contention points, both now handled all-or-nothing: 1. The backfill merge on `graph_head:<branch>`. Wrap (fast-path re-read → read legacy → merge) in a bounded re-open retry loop: a `RowLevelCasContention` loss re-opens the manifest past the winner's (atomic) commit and re-loops; the fast-path re-read then sees the winner's lineage and stamps. On budget exhaustion it returns a `RowLevelCasContention`-typed error so the publisher's OUTER retry loop completes it. The retry decision reuses the publisher's `is_retryable_publish_conflict` so the two stay in lockstep. 2. The terminal stamp bump. Making the merge loser converge newly lets BOTH runners reach `set_stamp(4)` — an `UpdateConfig` commit on the same key — so the loser gets `lance::Error::IncompatibleTransaction` (NOT a row-level CAS, so the merge loop doesn't catch it). This surfaced only under the concurrent full-suite run, not the isolated test. Both write the SAME value, so the conflict is benign: `commit_v4_stamp_idempotently` re-opens and, if the stamp already reached the target, succeeds; else re-applies (bounded). Greens the race test from the previous commit (3x isolated, 5x full-suite, no flake). The new `IncompatibleTransaction` match is pinned by `lance_surface_guards.rs::lance_error_incompatible_transaction_variant_exists`. * fix(engine): refuse a future internal-schema stamp on the branch read path `load_commit_cache_for_branch` dispatched on the branch's internal-schema stamp — `< CURRENT` to the v3 legacy fallback, `>= CURRENT` to the manifest projection — but never refused a `> CURRENT` branch stamp, so a newer-binary shape would be misread by the projection rather than rejected. Add `refuse_if_stamp_too_new(stamp)` (re-exported `pub(crate)` from `migrations`) right after the branch stamp is read, mirroring the main read path's `refuse_if_internal_schema_too_new`. This is defense-in-depth, not a live hole: migrations run main-first (main migrates on open; each branch on its first write), so main's stamp is always >= every branch's and the main path refuses first. The guard closes the gap if that ordering invariant is ever weakened. Tested by force-stamping a real branch past CURRENT and asserting the branch read refuses with the upgrade error (the test misreads via the projection — returns Ok — without the guard, confirmed by removing it). * docs(rfc-013): record the v3→v4 migration robustness fixes invariants.md Known Gaps: the `migrate_v3_to_v4` entry now states the migration is loud on non-not-found legacy-open errors and concurrent-runner idempotent (bounded re-open retry on the merge CAS + idempotent stamp bump), and that the branch read path refuses a `> CURRENT` stamp. lance.md: note the two new surface guards the migration depends on (`dataset_open_missing_returns_not_found_variant`, `lance_error_incompatible_transaction_variant_exists`). testing.md: note the migration fault-injection test in the failpoints row. * refactor: remove dead code and silence warnings across engine + cluster Dead-code sweep follow-up to the RFC-013 stack. No behavior change. - engine: delete the orphaned `validate_edge_cardinality` — the load path uses `validate_edge_cardinality_with_pending_loader` for every mode (including Overwrite, which it treats as the replacement table image), so the old standalone validator had no caller — and correct its sibling's now-stale doc reference. Gate `TableStore::append_batch` `#[cfg(test)]`: it is the inline- commit residual kept only for recovery test setup, with no non-test caller. - cluster: drop unused imports in `lib.rs`, delete the unused `ClusterStore::payload_display`, and raise `LiveGraphObservation` / `GraphObservationJson` / `PolicyTarget` to `pub(crate)` to match the functions that return them. Both lib crates now build warning-free. * fix(engine): match Lance's typed DatasetAlreadyExists, not the message string The internal create-or-open idempotency fallbacks in `db/commit_graph.rs` and `db/recovery_audit.rs` classified the "already exists" race by `err.to_string().contains("Dataset already exists")` — a Lance display string, not an API contract. A wording change upstream would silently break the fallback (a re-create would error instead of opening the existing table). Match the typed `lance::Error::DatasetAlreadyExists { .. }` variant instead — the same discipline as the v3→v4 migration's not-found classifier — pinned by the new `lance_surface_guards.rs::lance_error_dataset_already_exists_variant_exists` guard so a Lance rename turns red instead of silently regressing. * refactor(engine): consolidate now_micros into one crate::db helper Four `fn now_micros() -> Result<i64>` copies (commit_graph, recovery_audit, graph_coordinator, manifest/graph) had already drifted: three mapped the clock error to `OmniError::manifest("...UNIX_EPOCH...")` while recovery_audit used `OmniError::manifest_internal("...unix epoch...")`. Replace all four with one `pub(crate) fn now_micros()` in `db/mod.rs` (the majority `manifest` variant), and repoint the eight call sites at `crate::db::now_micros()`. No test asserts on the failure message, so unifying the variant is behavior-safe; the timestamp-mapping contract can no longer fork across the rows it stamps. * refactor(engine): drop the dead snapshot param from roll_back_sidecar `roll_back_sidecar` took `snapshot: &Snapshot` only to discard it with `let _ = snapshot;` — rollbacks now always publish (the restored HEAD plus a recovery-commit lineage row), so the snapshot is never read to decide whether to skip a publish. Remove the parameter, the two call-site arguments, and the suppressor. A signature must not advertise inputs it does not consume. The `Snapshot` import stays — `process_sidecar`, `roll_forward_all`, and `record_audit_recovery_rollforward` still take it. * test(engine): red — open_at_branch wedges a branch on a missing commit-graph ref A v4 graph keeps its graph lineage in `__manifest` (RFC-013 Phase 7); the `_graph_commits.lance` branch ref is a derived artifact. An interrupted fork-reclaim or a `cleanup` race can drop that derived ref while the manifest lineage stays intact. Per invariants 7 + 15 a missing derived ref must not fail a logical read of the lineage. This wedge builds a real v4 `feature` branch (its `graph_head:feature` row in `__manifest`), force-deletes ONLY the `_graph_commits.lance` `feature` ref, then asserts the branch reads (`open_at_branch` / list-commits / `merge_base`) succeed from `__manifest` while a write that needs the derived ref (`create_branch`) fails loudly with the typed actionable error. Red against current code: `open_at_branch`'s hard `checkout_branch(branch)?` on the missing ref errors `OmniError::Lance` (Lance "Not found: _graph_commits.lance/tree/feature/_versions"), wedging the logical read. * fix(engine): read manifest lineage independent of the derived _graph_commits ref `CommitGraph::open_at_branch` did a hard `checkout_branch(branch)?` on the `_graph_commits.lance` branch ref before reading lineage — so a missing derived ref (an interrupted fork-reclaim, or a `cleanup` race) wedged the branch's commit-list / merge-base / snapshot resolution even though the lineage is readable from the authoritative `__manifest` (RFC-013 Phase 7). That is a derived/physical artifact failing a logical read — invariants 7 and 15. Make the held commits handle `Option<Dataset>` (mirroring `actor_dataset`). `open_at_branch` and `refresh` check out the derived ref best-effort: a typed not-found (`RefNotFound`/`NotFound`) yields a `None` handle while the read re-syncs from `__manifest`; any other open error still propagates. The manifest existence gate is unchanged — `load_commit_cache_for_branch` keeps its hard `?`, so a truly absent branch still fails loudly at the manifest. `create_branch` (the only writer that forks a ref) and the folded-in version lookup return a loud, actionable error on `None`, deferring repair to `cleanup`'s existing orphan reconciler rather than inlining a write on a read-side refresh. Reads (`head_commit`/`load_commits`/`get_commit`/`merge_base`) never touch the handle. Greens the wedge regression from the preceding commit. * fix(engine): v3→v4 retry loops return retryable contention on exhaustion `commit_v4_stamp_idempotently`'s retry loop used `0..=STAMP_RETRY_BUDGET` (6 iterations) with an `attempt < STAMP_RETRY_BUDGET` guard, so the LAST iteration's `IncompatibleTransaction` fell through to `Err(e) => OmniError::Lance(...)` — stringified, non-retryable — instead of the intended `RowLevelCasContention`, and the post-loop contention return was dead code. The publisher's outer retry only re-runs `is_retryable_publish_conflict`, so under sustained concurrent v3→v4 migration the one-time stamp bump could fail instead of converging, defeating the idempotency the migration is supposed to add. Fix the loop to `0..BUDGET` with an UNGUARDED `IncompatibleTransaction` arm: the retryable variant is always handled inside the loop (re-open + same-value check + retry), so it can never reach the stringifying catch-all, and the post-loop is the SINGLE reachable exhaustion path — the typed `RowLevelCasContention`. The `Err(e)` arm now catches only genuine non-contention errors. Apply the same range alignment to the sibling merge loop in `migrate_v3_to_v4` (behaviorally correct today — its `Err(err)` returns the already-typed contention — but it carried the identical off-by-one structure the stamp loop was copied from; aligning both stops the next copy from re-introducing it). Test-first. The exhaustion path is otherwise near-unreachable — a real concurrent winner stamps the same value, so the re-read returns Ok on the first retry — so a new `migration.v4_stamp.force_incompatible` failpoint forces every stamp attempt to lose, driving exhaustion deterministically. Against the pre-fix loop the new `v4_stamp_exhaustion_returns_retryable_contention` test goes red with `Lance("Incompatible transaction: injected failpoint triggered…")`; with the fix it asserts the typed `RowLevelCasContention`. Found by automated review on #299. * feat(engine): minimum-supported internal-schema floor + retirement tripwire The internal-schema migration chain (`migrate_internal_schema`) had a too-new ceiling but no floor, so every old `migrate_vN_…` arm and the v3 legacy readers it needs stay forever — the pile grows by one migration + readers + tests every schema version. Add `MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION` (1 today, a pure no-op: `read_stamp` floors an absent stamp at 1 and no real graph carries 0) as the oldest stamp this binary opens; raising it is how the chain sheds old code. Collapse the one-sided `refuse_if_stamp_too_new` into `refuse_if_stamp_unsupported` checking both bounds, so the floor lands at all three stamp-enforcement sites — the write-path migrate dispatcher, the read-only open guard, and the branch lineage-read path (`commit_graph.rs`) — via one compiler-enforced rename. A hand-wired floor twin would have had to touch each site, and the branch-read path is easy to miss; one combined guard cannot half-enforce. Rename the read-only wrapper `refuse_if_internal_schema_unsupported` to match. A compile-time tripwire (`const _: () = assert!(LOWEST_REGISTERED_MIGRATION_SOURCE == MIN_SUPPORTED…)`) fails the build if a future floor bump forgets to delete the now-dead migration arm (or vice versa) — stronger than a runtime test, impossible to skip, and it doubles as the use that keeps the mirror const live. Tests: a sub-floor graph is refused in both open modes (twin of `future_stamp_is_refused_in_both_open_modes`); the guard accepts exactly [MIN, CURRENT]. No behavior change for any real graph. The retirement runbook lives on the `MIN_SUPPORTED` doc-comment + invariants.md. * fix(engine): compose migration contention with publisher retry; precise recovery-converge audit commit Three review-surfaced fixes on the RFC-013 Phase 7 path. Publisher retry vs migration contention: `publish()` propagated a `load_publish_state` error fatally via `?`, so a `RowLevelCasContention` surfaced by the v3->v4 migration's exhausted merge/stamp budgets aborted the publish instead of being retried — only `merge_rows` conflicts hit the retry. This contradicted the migration's own design, which returns that typed error EXPECTING the publisher to re-run the load (by which point a concurrent winner has usually finished the migration, so the next scan is a no-op). Route a retryable load error through the same retry path as a retryable `merge_rows` conflict. Regression test (failpoints): a one-shot retryable contention injected into `load_publish_state` now commits via the retry; red without the fix (the write fails with the injected contention). Recovery-converge audit commit id: `converge_or_defer_roll_forward` recorded the branch HEAD as the audit row's `graph_commit_id`, but a concurrent user write can advance `graph_head` past the recovery commit between the winner's publish and this read — attributing the audit to a later, wrong commit. Use the latest `RECOVERY_ACTOR`-authored commit (what `publish_recovery_commit` mints), which is the recovery commit by construction. The audit's actor was already correct (it comes from `sidecar.actor_id`, not the commit). Dead param: drop the unused `snapshot` from `record_audit_recovery_rollforward` (removing the `let _ = snapshot;` suppressor). `storage` stays — it is used to delete the sidecar.
14 KiB
Architecture
OmniGraph is a typed property-graph engine built as a coordination layer over many Lance datasets, with Git-style branches and commits across the whole graph, multi-modal querying (vector + FTS + BM25 + RRF + graph traversal) in one runtime, an HTTP server with Cedar policy, and a CLI driven by a per-operator ~/.omnigraph/config.yaml plus team-owned cluster directories.
Reading guide
Three views, increasing zoom:
- System context — what OmniGraph is and what it touches.
- Layer view — the eight-layer stack inside one OmniGraph process.
- Component zoom-ins — what's inside each layer.
For runtime flows (read query, mutation), see docs/dev/execution.md. For the on-disk layout of a graph, see docs/user/storage.md.
L1 (orange in the diagrams) is what we inherit from Lance; L2 (blue) is what OmniGraph adds. The L1/L2 framing is also called out in prose at the bottom of this doc.
System context
flowchart LR
classDef external fill:#fef3e8,stroke:#c46900,color:#000
classDef omnigraph fill:#e8f4fd,stroke:#1e6aa8,color:#000
classDef store fill:#f0f0f0,stroke:#555,color:#000
cli[CLI users]:::external
http[HTTP clients<br/>and SDKs]:::external
agents[Agents]:::external
embed[Embedding providers<br/>OpenAI / Gemini]:::external
og[OmniGraph<br/>kernel]:::omnigraph
cedar[Cedar policy<br/>engine]:::external
s3[Object store<br/>local FS / S3 / RustFS]:::store
cli --> og
http --> og
agents --> og
og --> embed
og --> cedar
og --> s3
OmniGraph runs as a single process (one binary, multiple crates). External dependencies are the embedding APIs (called during ingest and at query-time normalization), Cedar (called for every privileged action), and an object store (everything OmniGraph persists lands here).
Layer view
Inside the OmniGraph process, work flows through these layers:
flowchart TB
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
classDef l1 fill:#fef3e8,stroke:#c46900,color:#000
subgraph CLIs[CLI and HTTP server]
cli[omnigraph CLI]:::l2
srv[omnigraph-server<br/>Axum + Cedar]:::l2
end
subgraph compiler[omnigraph-compiler]
front[parse → AST → typecheck → catalog → IR]:::l2
end
subgraph engine[omnigraph engine]
plan[exec query and mutation]:::l2
gi[graph index CSR/CSC<br/>RuntimeCache LRU 8]:::l2
coord[coordinator<br/>ManifestCoordinator · CommitGraph]:::l2
end
subgraph storage[storage trait — wraps Lance]
ts[table_store · storage.rs<br/>direct lance::Dataset today]:::l2
end
subgraph lance_layer[Lance 4.x — substrate]
lance[per-dataset versions, fragments<br/>BTREE · Inverted FTS · IVF/HNSW vector<br/>merge_insert · compact_files · cleanup_old_versions]:::l1
end
subgraph object_store[Object store]
os[local FS · S3 · RustFS · MinIO]:::l1
end
CLIs -- "string + params" --> compiler
compiler -- IROp --> engine
engine -- "scan / write request" --> storage
storage -- "Stream of RecordBatch" --> engine
storage -- "Lance API calls" --> lance_layer
lance_layer -- bytes --> object_store
The storage seam is partly aspirational. TableStorage exists as the sealed staged-write trait, but capability/stat surfaces and full call-site migration are still roadmap. The diagram shows the intended boundary.
Component zoom-ins
Compiler — omnigraph-compiler
flowchart LR
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
src[".gq source"]:::l2
p[parser Pest<br/>query.pest · schema.pest]:::l2
ast[AST<br/>QueryDecl · Mutation · Schema]:::l2
cat[catalog<br/>NodeType · EdgeType · Interface]:::l2
tc[typecheck<br/>typecheck_query]:::l2
low[lower<br/>lower_query]:::l2
ir[IROp pipeline<br/>NodeScan · Expand · Filter · AntiJoin]:::l2
src --> p --> ast --> tc
cat --> tc
tc --> low --> ir
The compiler crate has zero Lance dependency. It owns the schema language, the query language, and the AST → IR lowering.
Code paths:
- Parser:
crates/omnigraph-compiler/src/query/parser.rs,crates/omnigraph-compiler/src/query/query.pest - Typecheck:
crates/omnigraph-compiler/src/query/typecheck.rs:83(typecheck_query) - Lower:
crates/omnigraph-compiler/src/ir/lower.rs:11(lower_query) - Catalog:
crates/omnigraph-compiler/src/catalog/
Engine — omnigraph crate
flowchart TB
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
subgraph exec[exec module]
eq[query · execute_query<br/>query.rs:347]:::l2
em[mutation · mutate<br/>mutation.rs:511]:::l2
ld[loader · ingest<br/>loader/mod.rs:74]:::l2
end
subgraph state[graph state]
coord[GraphCoordinator]:::l2
mr[ManifestCoordinator<br/>db/manifest.rs]:::l2
cg[CommitGraph<br/>projection of __manifest graph_commit/graph_head rows]:::l2
stg[MutationStaging<br/>per-query in-memory accumulator<br/>exec/staging.rs]:::l2
end
subgraph idx[graph index]
gi[GraphIndex<br/>CSR/CSC built per query]:::l2
rc[RuntimeCache LRU=8]:::l2
end
subgraph io[Lance I/O]
ts[table_store]:::l2
st[storage adapter<br/>storage.rs]:::l2
end
eq --> gi
eq --> ts
em --> stg
em --> ts
ld --> stg
ld --> ts
eq --> mr
em --> mr
coord --> mr
coord --> cg
ts --> st
The engine binds the compiler IR to Lance. It owns multi-dataset coordination, the graph topology index, the per-query staging accumulator, and the snapshot/manifest read path.
Code paths:
- Read entry:
Omnigraph::queryatcrates/omnigraph/src/exec/query.rs:7 - Mutation entry:
Omnigraph::mutateatcrates/omnigraph/src/exec/mutation.rs:511 - Manifest commit:
ManifestCoordinator::commitatcrates/omnigraph/src/db/manifest.rs:280 - Graph index:
crates/omnigraph/src/graph_index/ - Loader:
Omnigraph::ingestatcrates/omnigraph/src/loader/mod.rs:74
Mutation atomicity — in-memory accumulator (MR-794)
Inserts and updates inside mutate_as and the bulk loader's
Append/Merge modes go through MutationStaging
(crates/omnigraph/src/exec/staging.rs),
a per-query in-memory accumulator. No Lance HEAD advance happens during
op execution; one stage_* + commit_staged per touched table runs
at end-of-query, then the publisher commits the manifest atomically.
op-1 (insert/update) → push RecordBatch → MutationStaging.pending[table]
op-2 (insert/update) → read committed via Lance + pending via DataFusion
MemTable (read-your-writes) → push batch
op-N → push batch
─── end of query ───────────────────────────────────────
finalize: per pending table:
concat batches → stage_append OR stage_merge_insert OR stage_overwrite
→ commit_staged
publisher: ManifestBatchPublisher::publish (one cross-table CAS)
A failed op leaves Lance HEAD untouched on the staged tables: the next mutation proceeds normally with no drift to reconcile. Concrete contracts:
D₂parse-time rule: a query is either insert/update-only or delete-only. Mixed → reject. Deletes still inline-commit (Lance 4.0.0 has no public two-phase delete); D₂ keeps the inline path safe.LoadMode::Overwriteuses LanceOperation::Overwritethrough the same staged path. Loader validation runs against the replacement in-memory batches before anycommit_staged, and the publish window is covered bySidecarKind::Loadrecovery.- Read sites consume
TableStore::scan_with_pending, which Lance-scans the committed snapshot at the capturedexpected_versionand unions with a DataFusionMemTableover the pending batches.
This pattern realizes read-your-writes within a multi-statement mutation and keeps failure scope bounded for inserts/updates by construction at the writer layer. See docs/dev/invariants.md and docs/dev/writes.md for the publisher CAS contract this builds on.
Storage trait — today vs. roadmap
flowchart LR
classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000
classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444
subgraph today[Today]
d1[table_store<br/>opens lance::Dataset directly]:::now
d2[storage.rs<br/>S3 / file URI plumbing]:::now
end
subgraph roadmap[Roadmap - storage capabilities]
t[trait Dataset<br/>schema · stats · placement<br/>capabilities · scan · write]:::future
impl1[LanceStorage]:::future
impl2[future test impl]:::future
end
today -.-> roadmap
t --> impl1
t --> impl2
The staged-write trait exists today as TableStorage, implemented by TableStore. Full engine migration plus capability and statistics surfaces remain roadmap, so the planner cannot yet reason about all pushdown opportunities through a documented trait surface.
Index lifecycle — today vs. roadmap
flowchart LR
classDef now fill:#e8f4fd,stroke:#1e6aa8,color:#000
classDef future fill:#fff,stroke:#888,stroke-dasharray:5 5,color:#444
subgraph today[Today]
ei[ensure_indices<br/>omnigraph.rs:445]:::now
manual[called manually<br/>or from optimize]:::now
end
subgraph roadmap[Roadmap - manifest reconciler]
rec[Reconciler<br/>observes manifest]:::future
diff[coverage diff<br/>fragments − fragment_bitmap]:::future
wp[worker pool<br/>builds index segments]:::future
end
manual --> ei
today -.-> roadmap
rec --> diff --> wp
Today, indexes are built explicitly via ensure_indices. Reads degrade gracefully when index coverage is partial — Lance's scanner unions indexed and scan paths automatically. The roadmap reconciler observes manifest state and converges coverage in the background.
Server / CLI
flowchart LR
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
cli[omnigraph CLI<br/>command families]:::l2
srv_in[Axum HTTP<br/>REST + OpenAPI]:::l2
auth[Bearer auth<br/>SHA-256 hashed tokens]:::l2
pol[Cedar policy gate<br/>per request]:::l2
wl[WorkloadController<br/>per-actor admission]:::l2
eng[engine API<br/>Arc<Omnigraph>]:::l2
wq[WriteQueueManager<br/>per-(table, branch)]:::l2
cli -.-> eng
srv_in --> auth --> pol --> wl --> eng
eng --> wq
The server applies Cedar policy at the HTTP boundary today. The roadmap, called out in docs/dev/invariants.md as a known gap, is to push policy into the planner as predicates. After Cedar, mutating handlers go through WorkloadController (per-actor admission cap + byte budget; PR 2 / MR-686) before reaching the engine. The engine itself holds an Arc<WriteQueueManager> so concurrent mutations on the same (table, branch) serialize at the queue, while disjoint keys run in parallel — see docs/user/server.md "Per-actor admission control" and docs/dev/writes.md. The CLI bypasses the HTTP layer (and admission) and calls the engine API directly.
Code paths:
- Server entry:
crates/omnigraph-server/src/lib.rs - Auth:
crates/omnigraph-server/src/auth.rs - Policy:
crates/omnigraph-server/src/policy.rs - CLI:
crates/omnigraph-cli/src/main.rs
L1 / L2 framing
Throughout the docs, capabilities are split into:
- L1 — Inherited from Lance: what OmniGraph gets "for free" by sitting on top of the Lance dataset format (columnar Arrow storage, per-dataset versions and branches, index types,
merge_insert,compact_files/cleanup_old_versions). - L2 — Added by OmniGraph: typing (schema language), graph semantics, multi-dataset coordination via
__manifest, graph-level branches and commits, the.gqquery language and IR, the topology index, the HTTP server, Cedar policy, the CLI.
Concurrency model
- MVCC: every Lance write bumps a per-dataset version; the OmniGraph manifest version coordinates which sub-table versions are visible together.
- Snapshot isolation: a query holds one
Snapshotfor its lifetime; concurrent writes don't leak in. - Cross-branch isolation: copy-on-write means readers and writers on different branches don't block each other.
- Per-query staging:
mutate_asandload(Append/Merge) accumulate insert/update batches in an in-memoryMutationStaging; onestage_*+commit_stagedper touched table runs at end-of-query, then the publisher commits the manifest atomically. A mid-query failure leaves Lance HEAD untouched on staged tables. (MR-794; pre-v0.4.0 used a__run__<id>staging branch + Run state machine, removed in MR-771.) - Schema-apply lock:
__schema_apply_lock__system branch serializes schema migrations. - Fail-points (
failpointscargo feature):failpoints::maybe_fail("operation.step")?inbranch_create, publish, etc., for deterministic failure injection in tests.
Workspace crates
omnigraph-compiler— schema and query grammars, catalog, IR, lowering, type checker, lint, migration planner, OpenAI-style embedding client.omnigraph(engine, published asomnigraph-engineon crates.io since v0.2.2) — the Lance-backed runtime: manifest, commit graph, snapshot, exec (incl. per-queryMutationStagingaccumulator), merge, loader, Gemini embedding client.omnigraph-cli— theomnigraphbinary.omnigraph-server— theomnigraph-serverbinary (Axum HTTP server).