omnigraph/docs/dev/writes.md

294 lines
17 KiB
Markdown
Raw Normal View History

# Direct-Publish Write Path
> History: the Run state machine and `__run__<id>` staging branches were
> removed in MR-771 (shipped v0.4.0). Writes now go directly to the target
> table; this document specifies that direct-publish path.
`mutate_as` and `load` write **directly to the target table**
and call `ManifestBatchPublisher::publish` once at the end with
`expected_table_versions` (the per-table manifest versions captured before
the first write). Cross-table OCC is enforced inside the publisher; the
publisher's row-level CAS on `__manifest` is the single fence.
## What this means in practice
- No `RunRecord`, no `_graph_runs.lance`, no `_graph_run_actors.lance`.
- No `omnigraph run *` CLI subcommands and no `/runs/*` HTTP endpoints.
feat(engine): sweep & remove legacy __run__ branch guard (MR-770) (#132) * feat(engine): sweep legacy __run__ branches via v2→v3 manifest migration Pre-v0.4.0 graphs can carry stale `__run__<id>` staging branches on the `__manifest` dataset, left by the Run state machine removed in MR-771. Lance's `list_branches` still enumerates them, so they leak into `branch_list()` and count as blocking branches at schema-apply time. Add a one-time `migrate_v2_to_v3` arm to the internal-schema dispatcher: on the first read-write open it enumerates `__manifest` branches, deletes every `__run__*` ref, and bumps the stamp to 3. Idempotent under retry (re-enumerates fresh each run). The `"__run__"` prefix is inlined so the migration does not depend on the run_registry guard that MR-770 removes next. This is the prerequisite sweep; the guard removal follows in the next commit. * refactor(engine): remove the legacy __run__ branch guard (MR-770) With the v2→v3 migration sweeping stale `__run__*` branches off `__manifest` on first read-write open, the defense-in-depth `is_internal_run_branch` guard is no longer needed. - delete `db/run_registry.rs`; drop the module + re-export from `db/mod.rs` - collapse `is_internal_system_branch` to the schema-apply-lock check only - `ensure_public_branch_ref`: drop the run-ref rejection; `__run__*` is now an ordinary branch name - `branch_merge`: reject `is_internal_system_branch` (was run-only) so the schema-apply lock is rejected consistently with create/delete — a small, deliberate tightening - update the inline schema-apply test + the writes integration tests (`public_branch_apis_reject_internal_run_refs` → `public_branch_apis_reject_internal_system_refs`, which also asserts `__run__*` now creates successfully) - docs: flip the "pending production sweep / defense-in-depth" notes to "auto-swept by the v2→v3 migration"; document the read-only-open limitation Known residual: the inert `_graph_runs.lance` / `_graph_run_actors.lance` bytes remain until a `StorageAdapter::delete_prefix` primitive lands. * fix(engine): run __run__ sweep at Omnigraph::open, not only on publish Review (PR #132) caught a regression: removing __run__ from `is_internal_system_branch` exposed legacy `__run__*` branches to the schema-apply blocking-branch checks (schema_apply.rs:104 and :778) and to `branch_list()`, but the v2→v3 sweep ran only inside the publisher's `load_publish_state`. On a pre-v0.4.0 graph whose first write is a schema apply, the blocking-branch check fires before any publish, so apply failed with "found non-main branches: __run__…". The same lazy timing also created a reverse hazard: a user-created `__run__*` branch on a still-v2 graph could be deleted by the first publish's sweep. Fix: run the internal-schema migration in `Omnigraph::open(ReadWrite)` (new `manifest::migrate_on_open`), before the coordinator reads branch state. The sweep now lands before any branch-observing code, and a graph is stamped v3 at open — so the one-time sweep can never catch a legitimately-created branch. Both checks and `branch_list` see the swept graph; correct by construction for every write path. Accepted residual: a read-only open of an unmigrated legacy graph still lists `__run__*` (read-only opens must not write, so they can't sweep). Documented. Regression test `legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply` confirmed RED before the fix (panicked on the branch_list leak assertion) and GREEN after. Also updates the stale schema_apply.rs comment, the writes.md "Migration code" section, and adds the v3 row to storage.md's migration table. * test(engine): sweep multiple legacy __run__ branches; doc nit Strengthen the v2→v3 migration test to synthesize three `__run__*` branches (a real legacy graph accumulates one per run) so the migration's delete loop is exercised on a single reused dataset handle, not just a single branch. Confirms multi-branch deletion is safe. Also drop a stale "active runs" reference from the branch_delete doc line. * fix(engine): force-delete in __run__ sweep for concurrency safety `migrate_v2_to_v3` ran `Dataset::delete_branch` (= `branches().delete(.., false)`), which errors "BranchContents not found" if the branch is already gone. Since the sweep now runs in `Omnigraph::open(ReadWrite)`, two processes opening the same legacy v2 graph concurrently would race: one wins each delete, the other's open fails. The migration only claimed idempotency under *sequential* retry. Switch to `Dataset::force_delete_branch` (= `delete(.., true)`), Lance's documented path for cleaning up zombie branches, which tolerates an already-absent branch. The sweep is now idempotent under concurrent runners and robust to partial/zombie state. Found in self-review; no behavior change for the common single-open path. * docs(release): note MR-770 __run__ cleanup in v0.6.1 * docs(branches): reconcile branch cleanup semantics
2026-06-07 17:33:14 +02:00
- No `__run__<id>` staging branches; `__run__*` is no longer a reserved
name. The branch-name guard was removed in MR-770, and any stale
`__run__*` branch on an upgraded graph is swept off `__manifest` by the
v2→v3 internal-schema migration on first read-write open. (The inert
`_graph_runs.lance` bytes remain until a `delete_prefix` primitive lands.)
- Cancelled mutation futures leave **no graph-level state** — only orphaned
Lance fragments, which the existing `omnigraph cleanup` pipe reclaims.
## Read-your-writes within a multi-statement mutation
A `.gq` query with multiple ops (e.g. `insert Person … insert Knows …`)
must observe earlier ops' writes when validating later ops (referential
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00
integrity, edge cardinality). After MR-794 step 2+ this is implemented
via an in-memory `MutationStaging` accumulator in
[`crates/omnigraph/src/exec/staging.rs`](../../crates/omnigraph/src/exec/staging.rs),
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00
shared by both `mutate_as` and the bulk loader:
- On the first touch of each table, the pre-write manifest version is
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00
captured into `expected_versions[table_key]` (the publisher's CAS
fence at end-of-query).
- Each insert/update op pushes a `RecordBatch` into the per-table
pending accumulator. Lance HEAD does **not** advance during op
execution.
- Read sites (validation, predicate matching for `update`) consume
`TableStore::scan_with_pending`, which scans committed via Lance
and applies the same SQL filter to the pending batches via DataFusion
`MemTable`. Same-query writes are visible to subsequent reads.
- At end-of-query, `MutationStaging::finalize` issues exactly one
`stage_*` + `commit_staged` per touched table (concatenating
accumulated batches; merge-mode dedupes by `id`, last-write-wins),
and the publisher publishes the manifest atomically across all
touched sub-tables. Cross-table conflicts surface as
`ManifestConflictDetails::ExpectedVersionMismatch`.
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00
- **Deletes still inline-commit.** Lance's `Dataset::delete` is not
exposed as a two-phase op in 4.0.0; deletes go through `delete_where`
immediately and record their post-write state in
`MutationStaging.inline_committed`. The parse-time D₂ rule (below)
prevents inserts/updates from coexisting with deletes in one query,
so the inline path is safe for delete-only mutations.
This upholds the manifest-atomic mutation and read-your-writes invariants
tracked in [docs/dev/invariants.md](invariants.md).
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00
### D₂ — parse-time mixed-mode rejection
A single mutation query is either insert/update-only or delete-only.
Mixed → rejected at parse time with a clear error directing the user to
split the query. Reason: mixing creates ordering hazards
(insert→delete on the same row would silently no-op because the staged
insert isn't visible to delete; cascading deletes of just-inserted
edges break referential integrity). Until Lance exposes a two-phase
delete API, the parse-time rejection keeps both paths atomic and
correct. Tracked: MR-793, plus a Lance-upstream ticket.
MR-793 phases 1-6: TableStorage trait + staged-write surface for engine writers Hoists Lance's stage+commit two-phase write pattern from "discipline at each writer" to a sealed trait surface (`TableStorage`). New engine code that needs to advance Lance HEAD MUST go through `stage_*` + `commit_staged`; the trait's opaque `SnapshotHandle` / `StagedHandle` types keep `lance::Dataset` and `lance::Transaction` out of trait signatures. Phases landed (see .context/mr-793-design.md for the full plan): * 1a: `crates/omnigraph/src/storage_layer.rs` — `TableStorage` trait, sealed (only in-tree types can impl), single impl on `TableStore` delegating to existing inherent methods; `Omnigraph::storage()` accessor returns `&dyn TableStorage`. * 2: three new staged primitives — `stage_overwrite`, `stage_create_btree_index`, `stage_create_inverted_index` — implementing the simple branch of Lance's `CreateIndexBuilder::execute` (scalar indices only; vector indices stay inline because `build_index_metadata_from_segments` is `pub(crate)` in lance-4.0.0). Six new tests in `tests/staged_writes.rs` pin both the new primitives and the inline residuals (`delete_where`, `create_vector_index`). * 3: `tests/forbidden_apis.rs` — defense-in-depth integration test walks engine source, fails on direct lance::* inline-commit API use outside `table_store.rs` / `db/manifest/`. Skips comment lines and honors `// forbidden-api-allow:` sentinels. * 4: `ensure_indices` migration — scalar index builds now route through `stage_create_*_index` + `commit_staged` instead of `create_*_index(&mut Dataset)`. Vector indices stay inline (residual, named honestly at the call site). * 5: `branch_merge::publish_rewritten_merge_table` migration — the merge_insert phase now uses `stage_merge_insert` + `commit_staged`; delete phase stays inline (Lance #6658 residual, named honestly). * 6: `schema_apply` rewritten_tables migration — non-empty rewrites use `stage_overwrite` + `commit_staged`; empty-batch rewrites stay inline because `InsertBuilder::execute_uncommitted` rejects empty data. The narrow inline window is bounded by `__schema_apply_lock__`. Verified-green test surface: * `cargo test -p omnigraph-engine` — 68 lib + ~120 integration tests (incl. 6 new staged_writes tests + the new forbidden_apis test). * `cargo test -p omnigraph-engine --features failpoints --test failpoints` — 5 tests, all green. * `cargo test --workspace` — green. Deferred to follow-up sessions (see design doc §17 split): * Phase 1b — convert remaining engine call sites to `&dyn TableStorage` (mostly READS that don't touch the staged-write invariant). * Phase 7 — recovery-on-open reconciler (closes Phase B → Phase C residual across process restarts; new subsystem). * Phase 8 — index-coverage reconciler (full §VII.35 compliance — removes synchronous index work from the publish path). * Phase 9 — demote unused `TableStore` inherent methods to `pub(crate)` (depends on Phase 1b). Lance upstream blockers documented: * lance-format/lance#6658 — two-phase delete API (open, no PRs). * Companion: `build_index_metadata_from_segments` should be `pub` so vector-index builds can be staged outside the lance crate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:03:15 +02:00
### MR-793 status (storage trait two-phase invariant) — partial
MR-793 hoists the staged-write pattern into a `TableStorage` trait
surface with sealed-trait enforcement and opaque `SnapshotHandle` /
`StagedHandle` types — see `crates/omnigraph/src/storage_layer.rs`.
The trait is the canonical surface for new engine code; existing call
sites still use the inherent `TableStore` methods (mechanical migration
deferred to a follow-up cycle — tracked).
Three writers have been migrated onto staged primitives:
* **`ensure_indices`** (`db/omnigraph/table_ops.rs::build_indices_on_dataset_for_catalog`)
— scalar indices (BTree, Inverted) now use `stage_create_*_index` +
`commit_staged`. Vector indices stay inline (residual — Lance
`build_index_metadata_from_segments` is `pub(crate)` in 4.0.0;
companion ticket to lance-format/lance#6658 needed).
* **`branch_merge::publish_rewritten_merge_table`**
(`exec/merge.rs`) — merge_insert now uses `stage_merge_insert` +
`commit_staged`. Deletes stay inline (Lance #6658 residual).
* **`schema_apply` rewritten_tables** (`db/omnigraph/schema_apply.rs`)
— non-empty rewrites use `stage_overwrite` + `commit_staged`.
Empty-batch rewrites stay inline (Lance `InsertBuilder::execute_uncommitted`
rejects empty data; the empty case is rare and bounded by the
schema-apply lock branch).
A defense-in-depth integration test (`tests/forbidden_apis.rs`) walks
engine source and fails if non-allow-listed code calls Lance's
inline-commit APIs directly. The trait surface itself is the primary
enforcement (sealed + only-callable-via-trait once call sites land);
the grep test catches type-system bypass attempts.
The "finalize → publisher residual" described below applies equally to
the migrated writers — Lance has no multi-dataset atomic commit
primitive, so the per-table commit_staged → manifest publish gap is
the same drift class. Closing it requires either upstream Lance
multi-dataset commit OR the omnigraph-side recovery-on-open reconciler
described in `.context/mr-793-design.md` §15 (deferred to MR-795).
### Inline-commit method residuals on `TableStorage` (MR-793 acceptance §1 option b)
MR-793's acceptance criterion §1 ("`TableStore` public API has no method that performs a manifest commit as a side effect of writing") is met **per-method** by enumerating every inline-commit method that remains on the trait surface, naming why it cannot yet be removed, and keeping the residual comment at every call site:
| Method on `TableStorage` | Inline-commit reason | Closes when |
|---|---|---|
| `delete_where` | `DeleteBuilder::execute_uncommitted` is not in Lance v6.0.1 (closed upstream as [#6658](https://github.com/lance-format/lance/issues/6658) but first ships in `v7.0.0-beta.10`); see [docs/dev/lance.md](lance.md) | MR-A: Lance v7.x bump migrates `delete_where` to staged, retires the parse-time D₂ mutation rule, and extends recovery sidecar coverage |
| `create_vector_index` | Vector indices take Lance's "segment commit path"; `build_index_metadata_from_segments` is `pub(crate)` (Lance [#6666](https://github.com/lance-format/lance/issues/6666) still open) | Lance #6666 lands and `stage_create_vector_index` joins the trait |
2026-05-13 01:19:29 +00:00
MR-854 (Phase 1b + Phase 9) closed the remaining residuals on the engine surface: every `db.table_store.X(...)` call site was converted to `db.storage().X(...)` (trait dispatch through `&dyn TableStorage`), and the legacy inline-commit inherent methods on `TableStore` (`append_batch`, `merge_insert_batch`, `merge_insert_batches`, `overwrite_batch`, `create_btree_index`, `create_inverted_index`) were demoted from `pub` to `pub(crate)`. They survive only as the bulk loader's `LoadMode::{Append, Overwrite, Merge}` concurrent fast-paths (see "`LoadMode::Overwrite` residual" below) and as internal helpers for the staged primitives — no engine call site outside `table_store.rs` and `loader::write_batch_to_dataset` reaches them.
2026-05-13 01:19:29 +00:00
After **MR-A (Lance v7 bump) + Lance #6666 ship**, the trait surface exposes only staged-write primitives + `commit_staged`. Until then this matrix names the two remaining residuals explicitly, every call site carries a one-line residual comment, and no engine code outside `table_store.rs` is permitted to reach the inline-commit Lance APIs (enforced by the `tests/forbidden_apis.rs` guard).
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00
### `LoadMode::Overwrite` residual
The bulk loader's Append and Merge modes use the staged-write path
described above. `LoadMode::Overwrite` keeps the legacy inline-commit
path: truncate-then-append doesn't fit the staged shape cleanly in
Lance v6.0.1, and overwrite has no in-flight read-your-writes
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00
requirement (the prior data is being wiped). A mid-overwrite failure
can leave Lance HEAD on a partially-truncated table; the next overwrite
will replace it. Operator-driven (rare in agent workloads); document
permanently until Lance exposes `Operation::Overwrite { fragments }` as
a two-phase op.
recovery: rename composite test, strip ticket references, address review Three bundled changes: 1. Rename `tests/agent_lifecycle.rs` -> `tests/composite_flow.rs` (and the test function). OmniGraph is consumed by both humans and agents - naming the test after one audience misframes the library. 2. Strip Linear ticket IDs, PR numbers, bot reviewer names, and review-round labels from source, tests, and docs added by this branch. Internal traceability belongs in commit messages and PR descriptions, not in checked-in artifacts. Upstream lance-format/lance issue refs and pre-existing MR-XXX refs in docs not touched by this branch are left alone. 3. Two outstanding review findings addressed: - `needs_index_work_node` / `needs_index_work_edge`: propagate `count_rows` errors instead of `unwrap_or(0)`. Silently treating transient I/O failures as "0 rows" risked skipping a table from the recovery sidecar pin set that was actually about to be modified. - `recovery_multi_sidecar_requires_fresh_snapshot_for_correctness`: strengthen the assertion to fail when sidecar B classifies under a stale snapshot. The new assertion checks post-recovery Lance HEAD == v3 (no `Dataset::restore` ran). The previous "sidecar deleted + audit rows present" pair passed in both the bug and fix paths because both delete the sidecar and write an audit row; the differentiator is the post-recovery HEAD. Strengthening the assertion exposed an additional nuance: in this overlapping- sidecar scenario sidecar B's audit kind is RolledBack (no-op) rather than RolledForward, since sidecar A's roll-forward publishes Lance HEAD as the new manifest pin (absorbing B's work). The docstring now explains why this is correct given current `roll_forward_all` semantics. All workspace tests pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:56:36 +02:00
### Open-time recovery sweep
MR-794 step 2: address PR #68 review — merge semantics, cardinality, residual Five fixes from PR #68 review (Cursor Bugbot + Codex + Cubic): * **scan_with_pending gains merge-shadow semantics** (Codex P1, Cubic P1#1): new `key_column: Option<&str>` parameter. When set, committed rows whose key value appears in any pending batch are excluded from the scan — making `scan_with_pending` correctly merge-semantic for chained updates instead of naively unioning. execute_update calls with Some("id"). Without this, a chained `update where age > 30` could match a row whose pending value already moved out of range. * **Multi-delete on same table no longer trips ExpectedVersionMismatch** (Cursor Bugbot HIGH): open_table_for_mutation routes through reopen_for_mutation when staging.inline_committed has the table, using the post-inline-commit Lance version captured at record_inline time. The legacy open_for_mutation_on_branch fence (Lance HEAD == manifest pinned) is correct cross-writer but wrong intra-query when deletes have already advanced HEAD on this table. Branch goes away when Lance ships two-phase delete (lance-format/lance#6658). * **Cardinality validation consolidated** (Cursor LOW + Codex P2 + Cubic P1#2 + Cubic P2): new exec/staging::count_src_per_edge + enforce_cardinality_bounds shared by mutation and loader paths. Restores the missing min-cardinality check on the engine path. Loader Merge mode passes Some("id") to dedupe edges being updated by id (not double-count committed + pending). Loader Append mode and engine path pass None (ULID-generated ids never collide). * **Dead count_rows_with_pending removed** (Cursor LOW): never called. * **Misleading concat-helper comment fixed** (Cubic P3): claimed schema normalization the helper doesn't implement. Updated to match reality. * **Documentation honesty** (Cubic P1#3): MR-794 narrows but doesn't eliminate the "Lance HEAD ahead of __manifest" drift class. Drift is unreachable for op-execution failures (the partial_failure test pins this), but a residual remains at the finalize→publisher boundary because Lance has no multi-dataset commit primitive: per-table commit_staged calls run sequentially before manifest commit. Updated docs/runs.md, docs/invariants.md §VI.25, docs/releases/v0.4.1.md to scope the claim precisely. * **Failpoint test pinning the residual**: new mutation.post_finalize_pre_publisher failpoint + two tests in tests/failpoints.rs that confirm the documented residual behavior. Catches future regressions that widen the residual. Test additions on tests/runs.rs: * chained_updates_with_overlapping_predicate_respects_intermediate_value * multi_statement_delete_on_same_node_table * cascade_delete_node_then_explicit_delete_edge_on_same_table * mutation_insert_edge_enforces_min_cardinality * load_merge_mode_dedupes_edge_for_cardinality_count 113/113 engine integration tests pass (runs + end_to_end + consistency + staged_writes + validators). Failpoints feature build runs in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:47:55 +02:00
The staged-write rewire eliminates one drift class **by construction at
the writer layer**: an op that fails before pushing to the in-memory
accumulator (validation errors, missing endpoints, parse-time D₂
rejection) leaves Lance HEAD untouched on every staged table. This is
the case the `partial_failure_leaves_target_queryable_and_unblocks_next_mutation`
test pins.
recovery: document MR-847 ship across all reference docs (Phase 10) Update the doc surface to reflect MR-847 having shipped end to end — sidecar protocol, classifier, all-or-nothing decision tree, roll-forward via ManifestBatchPublisher, roll-back via Dataset::restore with fragment-set short-circuit, audit trail in _graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and the four migrated writers all carrying sidecars across Phase B → Phase C. - docs/invariants.md §VI.23: change from "upheld at the writer-trait surface for inserts/updates/etc., per-table commit_staged → manifest publish window remains" to "upheld at the writer-trait surface AND across process boundaries". The MR-847 sweep closes the residual on the next Omnigraph::open. The "continuous in-process" property (no ExpectedVersionMismatch surfacing to subsequent writers between Phase B failure and process restart) is honest follow-up at MR-856. - docs/runs.md: replace "Finalize → publisher residual" section with "Open-time recovery sweep (MR-847)" — describes the sidecar protocol lifecycle (Phases A-D), the sweep's classifier + decision dispatch, the audit trail, and the operator-facing query (omnigraph commit list --filter actor=omnigraph:recovery). - AGENTS.md capability matrix "Atomic single-dataset commits" row: drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat; describe the three layers as all shipping; reference MR-856 for the background-reconciler follow-up. - docs/storage.md: add _graph_commit_recoveries.lance and __recovery/{ulid}.json to the on-disk layout (mermaid + prose). - docs/branches-commits.md: new "Recovery audit trail (MR-847)" subsection describing the join from _graph_commits.lance:actor_id="omnigraph:recovery" to _graph_commit_recoveries.lance:graph_commit_id for operator post-mortem. - docs/maintenance.md: note the MR-847 recovery floor on cleanup — --keep < 3 may garbage-collect Lance versions the recovery sweep needs as a rollback target. Default --keep 10 is safe. - docs/testing.md: add tests/recovery.rs to the engine integration-test table; expand the failpoints.rs row to mention the four MR-847 per-writer Phase B → recovery integration tests. - .context/mr-847-design.md: prepend a "Status: DONE" stanza listing every commit hash + scope across phases 1-10. AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs). Full workspace test sweep passes with --features failpoints (361 tests across 20 binaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
A second, narrower drift class — the **finalize → publisher window**
recovery: rename composite test, strip ticket references, address review Three bundled changes: 1. Rename `tests/agent_lifecycle.rs` -> `tests/composite_flow.rs` (and the test function). OmniGraph is consumed by both humans and agents - naming the test after one audience misframes the library. 2. Strip Linear ticket IDs, PR numbers, bot reviewer names, and review-round labels from source, tests, and docs added by this branch. Internal traceability belongs in commit messages and PR descriptions, not in checked-in artifacts. Upstream lance-format/lance issue refs and pre-existing MR-XXX refs in docs not touched by this branch are left alone. 3. Two outstanding review findings addressed: - `needs_index_work_node` / `needs_index_work_edge`: propagate `count_rows` errors instead of `unwrap_or(0)`. Silently treating transient I/O failures as "0 rows" risked skipping a table from the recovery sidecar pin set that was actually about to be modified. - `recovery_multi_sidecar_requires_fresh_snapshot_for_correctness`: strengthen the assertion to fail when sidecar B classifies under a stale snapshot. The new assertion checks post-recovery Lance HEAD == v3 (no `Dataset::restore` ran). The previous "sidecar deleted + audit rows present" pair passed in both the bug and fix paths because both delete the sidecar and write an audit row; the differentiator is the post-recovery HEAD. Strengthening the assertion exposed an additional nuance: in this overlapping- sidecar scenario sidecar B's audit kind is RolledBack (no-op) rather than RolledForward, since sidecar A's roll-forward publishes Lance HEAD as the new manifest pin (absorbing B's work). The docstring now explains why this is correct given current `roll_forward_all` semantics. All workspace tests pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:56:36 +02:00
is closed across one open cycle by the open-time recovery sweep:
recovery: document MR-847 ship across all reference docs (Phase 10) Update the doc surface to reflect MR-847 having shipped end to end — sidecar protocol, classifier, all-or-nothing decision tree, roll-forward via ManifestBatchPublisher, roll-back via Dataset::restore with fragment-set short-circuit, audit trail in _graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and the four migrated writers all carrying sidecars across Phase B → Phase C. - docs/invariants.md §VI.23: change from "upheld at the writer-trait surface for inserts/updates/etc., per-table commit_staged → manifest publish window remains" to "upheld at the writer-trait surface AND across process boundaries". The MR-847 sweep closes the residual on the next Omnigraph::open. The "continuous in-process" property (no ExpectedVersionMismatch surfacing to subsequent writers between Phase B failure and process restart) is honest follow-up at MR-856. - docs/runs.md: replace "Finalize → publisher residual" section with "Open-time recovery sweep (MR-847)" — describes the sidecar protocol lifecycle (Phases A-D), the sweep's classifier + decision dispatch, the audit trail, and the operator-facing query (omnigraph commit list --filter actor=omnigraph:recovery). - AGENTS.md capability matrix "Atomic single-dataset commits" row: drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat; describe the three layers as all shipping; reference MR-856 for the background-reconciler follow-up. - docs/storage.md: add _graph_commit_recoveries.lance and __recovery/{ulid}.json to the on-disk layout (mermaid + prose). - docs/branches-commits.md: new "Recovery audit trail (MR-847)" subsection describing the join from _graph_commits.lance:actor_id="omnigraph:recovery" to _graph_commit_recoveries.lance:graph_commit_id for operator post-mortem. - docs/maintenance.md: note the MR-847 recovery floor on cleanup — --keep < 3 may garbage-collect Lance versions the recovery sweep needs as a rollback target. Default --keep 10 is safe. - docs/testing.md: add tests/recovery.rs to the engine integration-test table; expand the failpoints.rs row to mention the four MR-847 per-writer Phase B → recovery integration tests. - .context/mr-847-design.md: prepend a "Status: DONE" stanza listing every commit hash + scope across phases 1-10. AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs). Full workspace test sweep passes with --features failpoints (361 tests across 20 binaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
`MutationStaging::finalize` runs `stage_*` + `commit_staged` per touched
table sequentially, then the publisher commits the manifest. Lance has
no multi-dataset atomic commit, so the per-table `commit_staged` calls
are independent operations: if commit_staged on table N+1 fails *after*
commit_staged on tables 1..N succeeded, or if the publisher's CAS
pre-check rejects *after* every commit_staged succeeded, tables 1..N
are left at `Lance HEAD = manifest_pinned + 1`.
**Recovery protocol** (lifecycle of every staged-write writer —
`MutationStaging::finalize`, `schema_apply::apply_schema_with_lock`,
fix: optimize publishes compaction; recovery roll-back converges manifest (#141) * test(optimize): cover manifest publish + HEAD-drift reconcile Red against the pre-fix optimize, which ran compact_files without publishing the compacted version to __manifest: - maintenance: optimize must publish so the manifest table_version tracks the compacted Lance HEAD and a later schema apply succeeds; and must reconcile a pre-existing manifest-behind-HEAD drift (forged via raw Lance compaction) so strict writes commit again. - end_to_end + composite_flow: post-optimize query / strict update / reopen in the full lifecycle (the canonical flow previously omitted post-optimize writes as a documented "known limitation"). - failpoints: a crash between compaction and the manifest publish rolls forward on next open. * fix(optimize): publish compaction to manifest and reconcile HEAD drift optimize ran Lance compact_files without publishing the new version to __manifest, so the manifest table_version lagged the Lance HEAD: reads stayed pinned to the pre-compaction version, and the next schema apply or strict update/delete failed its HEAD-vs-manifest precondition with "stale view ... refresh and retry" (open-time recovery rollback inflated the gap on retry). optimize now publishes each compacted table's version under the per-(table, main) write queue, guarded by a manifest CAS and a SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe because compaction is content-preserving). When a table has nothing left to compact but its Lance HEAD is already ahead of the manifest pin (pre-fix drift, or a recovery restore commit), optimize reconciles the manifest forward to HEAD (metadata-only, no sidecar). Caches and the CSR/CSC graph index are invalidated after a publish. Docs updated (maintenance, storage, branches-commits, writes, testing). * test(recovery): rollback convergence + optimize-defer regressions Red against the current code, landed before the fix: - recovery: after the open-time sweep rolls a sidecar back, the manifest must track Lance HEAD (no residual drift) so a follow-up schema apply succeeds — the original "+1 per retry" loop. Today roll-back restores without publishing, so the manifest lags HEAD and the apply fails its HEAD-vs-manifest precondition. - maintenance: optimize must refuse while a recovery sidecar is pending — operating on an unrecovered graph could publish a partial write the sweep would roll back. Also removes optimize_reconciles_preexisting_manifest_head_drift: the ad-hoc drift reconcile it covered is replaced by recovery-side convergence. * fix(recovery): converge manifest on roll-back; optimize defers on pending recovery Root of PR #141's review findings and the original "+1 per retry" loop: a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving drift vs. a partial write a sidecar will roll back), and optimize's reconcile guessed it benign. Close the class instead of guessing: - Recovery roll-back now PUBLISHES the restored version (via a push_table_update_at_head helper shared with roll-forward), so the manifest tracks the Lance HEAD after recovery — symmetric with roll-forward. This fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest precondition passes) and removes the only remaining source of orphaned drift. The audit still records the logical rolled-back-to version; the manifest is published at the restore commit (identical content). - optimize drops the ad-hoc drift reconcile and instead REFUSES when a __recovery sidecar is pending, so it only ever operates on a recovered graph (manifest == HEAD); its compaction publish can no longer commit a partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is moot. Updates the rollback recovery-test helper (manifest == HEAD after roll-back), the failpoints assertions, and the user/dev docs. * test(recovery): fix rollback assertion for manifest convergence The roll-back-publishes change makes the manifest version advance after a SchemaApply roll-back (to the old-schema content), so the schema_apply_without_schema_staging_rolls_back_on_next_open assertion must be `version > pre`, not `version == pre`. This update was dropped during the commit churn and surfaced as a CI Test Workspace failure; the old-schema-preserved intent stays covered by count_rows + _schema.pg + the RolledBack convergence invariant.
2026-06-08 01:50:12 +02:00
`branch_merge_on_current_target`, `ensure_indices_for_branch`,
`optimize_all_tables`):
recovery: document MR-847 ship across all reference docs (Phase 10) Update the doc surface to reflect MR-847 having shipped end to end — sidecar protocol, classifier, all-or-nothing decision tree, roll-forward via ManifestBatchPublisher, roll-back via Dataset::restore with fragment-set short-circuit, audit trail in _graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and the four migrated writers all carrying sidecars across Phase B → Phase C. - docs/invariants.md §VI.23: change from "upheld at the writer-trait surface for inserts/updates/etc., per-table commit_staged → manifest publish window remains" to "upheld at the writer-trait surface AND across process boundaries". The MR-847 sweep closes the residual on the next Omnigraph::open. The "continuous in-process" property (no ExpectedVersionMismatch surfacing to subsequent writers between Phase B failure and process restart) is honest follow-up at MR-856. - docs/runs.md: replace "Finalize → publisher residual" section with "Open-time recovery sweep (MR-847)" — describes the sidecar protocol lifecycle (Phases A-D), the sweep's classifier + decision dispatch, the audit trail, and the operator-facing query (omnigraph commit list --filter actor=omnigraph:recovery). - AGENTS.md capability matrix "Atomic single-dataset commits" row: drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat; describe the three layers as all shipping; reference MR-856 for the background-reconciler follow-up. - docs/storage.md: add _graph_commit_recoveries.lance and __recovery/{ulid}.json to the on-disk layout (mermaid + prose). - docs/branches-commits.md: new "Recovery audit trail (MR-847)" subsection describing the join from _graph_commits.lance:actor_id="omnigraph:recovery" to _graph_commit_recoveries.lance:graph_commit_id for operator post-mortem. - docs/maintenance.md: note the MR-847 recovery floor on cleanup — --keep < 3 may garbage-collect Lance versions the recovery sweep needs as a rollback target. Default --keep 10 is safe. - docs/testing.md: add tests/recovery.rs to the engine integration-test table; expand the failpoints.rs row to mention the four MR-847 per-writer Phase B → recovery integration tests. - .context/mr-847-design.md: prepend a "Status: DONE" stanza listing every commit hash + scope across phases 1-10. AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs). Full workspace test sweep passes with --features failpoints (361 tests across 20 binaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
1. **Phase A**: writer writes a sidecar JSON to
fix: optimize publishes compaction; recovery roll-back converges manifest (#141) * test(optimize): cover manifest publish + HEAD-drift reconcile Red against the pre-fix optimize, which ran compact_files without publishing the compacted version to __manifest: - maintenance: optimize must publish so the manifest table_version tracks the compacted Lance HEAD and a later schema apply succeeds; and must reconcile a pre-existing manifest-behind-HEAD drift (forged via raw Lance compaction) so strict writes commit again. - end_to_end + composite_flow: post-optimize query / strict update / reopen in the full lifecycle (the canonical flow previously omitted post-optimize writes as a documented "known limitation"). - failpoints: a crash between compaction and the manifest publish rolls forward on next open. * fix(optimize): publish compaction to manifest and reconcile HEAD drift optimize ran Lance compact_files without publishing the new version to __manifest, so the manifest table_version lagged the Lance HEAD: reads stayed pinned to the pre-compaction version, and the next schema apply or strict update/delete failed its HEAD-vs-manifest precondition with "stale view ... refresh and retry" (open-time recovery rollback inflated the gap on retry). optimize now publishes each compacted table's version under the per-(table, main) write queue, guarded by a manifest CAS and a SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe because compaction is content-preserving). When a table has nothing left to compact but its Lance HEAD is already ahead of the manifest pin (pre-fix drift, or a recovery restore commit), optimize reconciles the manifest forward to HEAD (metadata-only, no sidecar). Caches and the CSR/CSC graph index are invalidated after a publish. Docs updated (maintenance, storage, branches-commits, writes, testing). * test(recovery): rollback convergence + optimize-defer regressions Red against the current code, landed before the fix: - recovery: after the open-time sweep rolls a sidecar back, the manifest must track Lance HEAD (no residual drift) so a follow-up schema apply succeeds — the original "+1 per retry" loop. Today roll-back restores without publishing, so the manifest lags HEAD and the apply fails its HEAD-vs-manifest precondition. - maintenance: optimize must refuse while a recovery sidecar is pending — operating on an unrecovered graph could publish a partial write the sweep would roll back. Also removes optimize_reconciles_preexisting_manifest_head_drift: the ad-hoc drift reconcile it covered is replaced by recovery-side convergence. * fix(recovery): converge manifest on roll-back; optimize defers on pending recovery Root of PR #141's review findings and the original "+1 per retry" loop: a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving drift vs. a partial write a sidecar will roll back), and optimize's reconcile guessed it benign. Close the class instead of guessing: - Recovery roll-back now PUBLISHES the restored version (via a push_table_update_at_head helper shared with roll-forward), so the manifest tracks the Lance HEAD after recovery — symmetric with roll-forward. This fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest precondition passes) and removes the only remaining source of orphaned drift. The audit still records the logical rolled-back-to version; the manifest is published at the restore commit (identical content). - optimize drops the ad-hoc drift reconcile and instead REFUSES when a __recovery sidecar is pending, so it only ever operates on a recovered graph (manifest == HEAD); its compaction publish can no longer commit a partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is moot. Updates the rollback recovery-test helper (manifest == HEAD after roll-back), the failpoints assertions, and the user/dev docs. * test(recovery): fix rollback assertion for manifest convergence The roll-back-publishes change makes the manifest version advance after a SchemaApply roll-back (to the old-schema content), so the schema_apply_without_schema_staging_rolls_back_on_next_open assertion must be `version > pre`, not `version == pre`. This update was dropped during the commit churn and surfaced as a CI Test Workspace failure; the old-schema-preserved intent stays covered by count_rows + _schema.pg + the RolledBack convergence invariant.
2026-06-08 01:50:12 +02:00
`__recovery/{ulid}.json` BEFORE its first HEAD-advancing commit
(`commit_staged`, or `compact_files` for `optimize_all_tables`,
which advances the Lance HEAD via a reserve-fragments + rewrite
commit rather than a staged write). The
recovery: document MR-847 ship across all reference docs (Phase 10) Update the doc surface to reflect MR-847 having shipped end to end — sidecar protocol, classifier, all-or-nothing decision tree, roll-forward via ManifestBatchPublisher, roll-back via Dataset::restore with fragment-set short-circuit, audit trail in _graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and the four migrated writers all carrying sidecars across Phase B → Phase C. - docs/invariants.md §VI.23: change from "upheld at the writer-trait surface for inserts/updates/etc., per-table commit_staged → manifest publish window remains" to "upheld at the writer-trait surface AND across process boundaries". The MR-847 sweep closes the residual on the next Omnigraph::open. The "continuous in-process" property (no ExpectedVersionMismatch surfacing to subsequent writers between Phase B failure and process restart) is honest follow-up at MR-856. - docs/runs.md: replace "Finalize → publisher residual" section with "Open-time recovery sweep (MR-847)" — describes the sidecar protocol lifecycle (Phases A-D), the sweep's classifier + decision dispatch, the audit trail, and the operator-facing query (omnigraph commit list --filter actor=omnigraph:recovery). - AGENTS.md capability matrix "Atomic single-dataset commits" row: drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat; describe the three layers as all shipping; reference MR-856 for the background-reconciler follow-up. - docs/storage.md: add _graph_commit_recoveries.lance and __recovery/{ulid}.json to the on-disk layout (mermaid + prose). - docs/branches-commits.md: new "Recovery audit trail (MR-847)" subsection describing the join from _graph_commits.lance:actor_id="omnigraph:recovery" to _graph_commit_recoveries.lance:graph_commit_id for operator post-mortem. - docs/maintenance.md: note the MR-847 recovery floor on cleanup — --keep < 3 may garbage-collect Lance versions the recovery sweep needs as a rollback target. Default --keep 10 is safe. - docs/testing.md: add tests/recovery.rs to the engine integration-test table; expand the failpoints.rs row to mention the four MR-847 per-writer Phase B → recovery integration tests. - .context/mr-847-design.md: prepend a "Status: DONE" stanza listing every commit hash + scope across phases 1-10. AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs). Full workspace test sweep passes with --features failpoints (361 tests across 20 binaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
sidecar names every `(table_key, table_path, expected_version,
post_commit_pin)` it intends to commit + the writer kind +
actor_id.
2. **Phase B**: writer's per-table `commit_staged` loop runs.
3. **Phase C**: publisher commits the manifest.
4. **Phase D**: writer deletes the sidecar.
docs/tests: reserve Phase A/B/C/D for the per-writer recovery flow Three terminologies were calling themselves Phase A/B in PR #72: 1. Per-writer recovery (canonical, four phases A/B/C/D — sidecar / commit_staged loop / manifest publish / sidecar delete in `docs/runs.md:157`). 2. Per-table staged-write contract from MR-793 (two phases — `stage_*` then `commit_staged`). 3. Test-narrative scaffolding (Phase A = setup the failure, Phase B = verify recovery — used as section dividers in failpoints.rs). Same letters, three meanings; three reviewers including the bots have already misread the code in the resulting fog. This change keeps "Phase A/B/C/D" exclusively for #1 and rewrites the other two: - `ensure_indices_phase_a_btree_failure_leaves_existing_tables_writable` → `ensure_indices_stage_btree_failure_leaves_existing_tables_writable` (matches the `stage_create_btree_index` API verb). - Comment at `table_ops.rs:610` and the test docstring at `failpoints.rs:807` rewrite "a Phase A failure in the staged-index path" → "a stage-step failure in the staged-index path". - Twelve `// Phase A:` / `// Phase B:` test scaffolding comment headers in `failpoints.rs` (across six test fns) become `// Setup:` / `// Recovery:`. - A "Phase letter convention" note added near `docs/runs.md:165` spells the rule out for future readers. Also bundled: rename `composite_flow_init_load_branch_merge_time_travel_optimize_cleanup` → `composite_flow_canonical_lifecycle` so it pairs as a story name with `composite_flow_multi_branch_sequential_merges` (the previously- deferred symmetry rename). No behaviour change. Both renamed tests pass; full failpoints (18) + composite_flow (2) suites pass; workspace baseline + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 22:46:03 +02:00
> **Phase letter convention.** Throughout the recovery code, log
> messages, failpoint names (e.g. `branch_merge.post_phase_b_pre_manifest_commit`),
> and the per-writer integration tests, "Phase A/B/C/D" refers
> exclusively to the four-step lifecycle above. The per-table
> staged-write contract (`stage_*` then `commit_staged`, two steps)
> is referred to by those API verbs — never by phase letters — so a
> reader of `recovery.rs`, `failpoints.rs`, or this document only
> encounters phase letters in the per-writer context.
recovery: document MR-847 ship across all reference docs (Phase 10) Update the doc surface to reflect MR-847 having shipped end to end — sidecar protocol, classifier, all-or-nothing decision tree, roll-forward via ManifestBatchPublisher, roll-back via Dataset::restore with fragment-set short-circuit, audit trail in _graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and the four migrated writers all carrying sidecars across Phase B → Phase C. - docs/invariants.md §VI.23: change from "upheld at the writer-trait surface for inserts/updates/etc., per-table commit_staged → manifest publish window remains" to "upheld at the writer-trait surface AND across process boundaries". The MR-847 sweep closes the residual on the next Omnigraph::open. The "continuous in-process" property (no ExpectedVersionMismatch surfacing to subsequent writers between Phase B failure and process restart) is honest follow-up at MR-856. - docs/runs.md: replace "Finalize → publisher residual" section with "Open-time recovery sweep (MR-847)" — describes the sidecar protocol lifecycle (Phases A-D), the sweep's classifier + decision dispatch, the audit trail, and the operator-facing query (omnigraph commit list --filter actor=omnigraph:recovery). - AGENTS.md capability matrix "Atomic single-dataset commits" row: drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat; describe the three layers as all shipping; reference MR-856 for the background-reconciler follow-up. - docs/storage.md: add _graph_commit_recoveries.lance and __recovery/{ulid}.json to the on-disk layout (mermaid + prose). - docs/branches-commits.md: new "Recovery audit trail (MR-847)" subsection describing the join from _graph_commits.lance:actor_id="omnigraph:recovery" to _graph_commit_recoveries.lance:graph_commit_id for operator post-mortem. - docs/maintenance.md: note the MR-847 recovery floor on cleanup — --keep < 3 may garbage-collect Lance versions the recovery sweep needs as a rollback target. Default --keep 10 is safe. - docs/testing.md: add tests/recovery.rs to the engine integration-test table; expand the failpoints.rs row to mention the four MR-847 per-writer Phase B → recovery integration tests. - .context/mr-847-design.md: prepend a "Status: DONE" stanza listing every commit hash + scope across phases 1-10. AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs). Full workspace test sweep passes with --features failpoints (361 tests across 20 binaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
A failure between Phase A and Phase D leaves the sidecar on disk. The
next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the
recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`:
- For each sidecar in `__recovery/`, compare every named table's
Lance HEAD to the manifest pin. Classify per the all-or-nothing
decision tree (RolledPastExpected / NoMovement / UnexpectedAtP1 /
UnexpectedMultistep / InvariantViolation).
recovery: address PR #72 review findings Bot reviewers (cubic, cursor, chatgpt-codex) caught 4 merge-blocking bugs + 3 strongly-recommended fixes + 3 doc errors in the initial PR. Each fix has a paired test demonstrating the bug before the fix. Merge-blocking fixes: - BranchMerge moved to loose-match classifier arm. publish_rewritten_ merge_table runs multiple commit_staged calls per table (merge_insert + delete_where + index rebuilds). Strict classification rolled back valid completed Phase B work as UnexpectedMultistep. Three new unit tests pin the loose-match behavior for BranchMerge. - branch_merge sidecar uses self.active_branch() (the resolved target branch) instead of inferring from the first sorted table key. The previous heuristic could record None (== main) when the merge target was a non-main branch, causing recovery to publish to the wrong manifest namespace. - Best-effort sidecar delete in all 5 writer sites (mutation, loader, schema_apply, branch_merge, ensure_indices). Previously, a sidecar cleanup failure after a successful manifest publish would error out the user's call for a write that already landed. Now: log a warning and ignore — the next open's recovery sweep tidies the stale sidecar via NoMovement classification. - ensure_indices sidecar scoped to tables that need work via new helpers needs_index_work_node / needs_index_work_edge. Previously the sidecar pinned every catalog table; if only one needed indexing, the others classified as NoMovement and the all-or-nothing decision rolled back legitimate index work. Strongly-recommended fixes: - recover_manifest_drift now takes &mut GraphCoordinator and refreshes between sidecars. Sidecar B's classification needs to see sidecar A's manifest changes, otherwise B can be classified against stale pins and incorrectly roll back work that just landed. - list_sidecars sorts URIs before reading. Sidecar filenames are ULIDs (chronologically sortable), so this gives deterministic, time-ordered processing. Filesystem-order was nondeterministic. - ReadOnly opens skip recover_schema_state_files too (was: only the MR-847 sweep was gated). Read-only consumers may run with read-only credentials; silent open-time mutations violate the contract. Doc cleanups: - Removed stale "Phase 4 placeholder" comment from recover_manifest_drift. - docs/runs.md decision-tree wording now correctly surfaces the InvariantViolation abort path. - docs/branches-commits.md clarifies actor_id is in _graph_commit_actors.lance (joined by graph_commit_id), not on _graph_commits.lance itself. Test surface (post-fixes): - 25 unit tests in db::manifest::recovery (+4 from this commit). - 10 integration tests in tests/recovery.rs (+3 from this commit). - ~672 tests across ~25 binaries pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:21:40 +02:00
- If any table is `InvariantViolation` (Lance HEAD < manifest pinned
should be impossible), **abort** with a loud error and leave the
sidecar on disk for operator review.
- Otherwise, if every table is `RolledPastExpected`, **roll forward**:
a single `ManifestBatchPublisher::publish` call extends every pin
recovery: refresh-time roll-forward closes the in-process residual + invariants helper Bundle of three correctness fixes plus a shared invariants helper that existing tests now use. 1. SchemaApply atomicity: close the residual gap where a sidecar exists but staging files don't (e.g., Phase B failure BEFORE `_schema.pg.staging` write). `recover_schema_state_files` now returns a `SchemaStateRecovery` discriminator (`Noop` / `CleanedStaging` / `CompletedStagingRename { schema_apply_sidecar }`); the token threads through `recover_manifest_drift` → `process_sidecar`. SchemaApply sidecars are eligible for roll-forward ONLY when the staging rename completed in the same recovery pass. Full mode rolls back; RollForwardOnly defers. Without this, recovery would publish the manifest pin against new-schema data while `_schema.pg` stayed old (real corruption). New failpoint `schema_apply.before_staging_write` + new test `schema_apply_without_schema_staging_rolls_back_on_next_open` pin the gating. 2. Rollback target correction. Rollback now restores Lance HEAD to the current manifest pin (`state.manifest_pinned`) instead of the sidecar's `expected_version`. For UnexpectedAtP1/UnexpectedMultistep classifications these can differ; the old code could regress Lance HEAD past the manifest pin, re-introducing drift in the OTHER direction. The new behavior establishes `Lance HEAD == manifest pin` post-rollback — the canonical drift-free invariant. Param renamed from `expected_version` → `target_version` to match. Audit `to_version` records the actual restore target. This is a latent-behavior change. Any external consumer that compared `audit.to_version` against `sidecar.expected_version` for non-trivial classifications now sees the manifest pin instead. 3. Audit commit-graph unification. `record_audit` now opens the per-branch commit graph for ANY sidecar with `sidecar.branch.is_some()` — not just BranchMerge. Plain Mutation/Load/EnsureIndices commits on a feature branch now correctly land on that branch's commit graph, instead of main's. Closes the class of bug analogous to D2 but for non-merge writers. Pre-existing repos with non-main commits already on main's commit graph stay where they are; future recoveries write to the per-branch ref. Mixed-version compatibility is asymmetric but safe (old binaries ignore per-branch refs they don't know about; new binaries read both). 4. Recovery invariants helper + branch-axis cells. New `tests/helpers/recovery.rs` (~505 LOC) exports `assert_post_recovery_invariants(repo, op_id, RecoveryExpectation)` plus a `TableExpectation` builder. Six existing recovery tests refactored to call it; per-test bespoke assertions replaced. Two new branch-axis cells added in `tests/failpoints.rs`: - `recovery_rolls_forward_load_on_feature_branch` - `recovery_rolls_forward_ensure_indices_on_feature_branch` The loader gains a `mutation.post_finalize_pre_publisher` failpoint hook (gated on the `failpoints` feature; zero-cost in release) so the load test can pin the same Phase B → Phase C boundary the mutation path uses. Misc: - `Omnigraph::refresh` extracts `reload_schema_if_source_changed`: early-return when schema source unchanged (saves IR parse + catalog rebuild on the steady-state refresh path). - New test injection point `failpoint_publish_table_head_without_index_rebuild_for_test` under `#[cfg(feature = "failpoints")]`. Tests: 31 recovery + failpoint integration tests pass (14 + 17, up from 14 + 16). Full workspace sweep with `--features failpoints` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 16:04:48 +02:00
atomically. `SchemaApply` sidecars are eligible only when schema-state
recovery promoted the matching staging files in the same recovery pass;
otherwise full open-time recovery rolls them back and refresh-time
recovery leaves them for the next read-write open.
recovery: document MR-847 ship across all reference docs (Phase 10) Update the doc surface to reflect MR-847 having shipped end to end — sidecar protocol, classifier, all-or-nothing decision tree, roll-forward via ManifestBatchPublisher, roll-back via Dataset::restore with fragment-set short-circuit, audit trail in _graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and the four migrated writers all carrying sidecars across Phase B → Phase C. - docs/invariants.md §VI.23: change from "upheld at the writer-trait surface for inserts/updates/etc., per-table commit_staged → manifest publish window remains" to "upheld at the writer-trait surface AND across process boundaries". The MR-847 sweep closes the residual on the next Omnigraph::open. The "continuous in-process" property (no ExpectedVersionMismatch surfacing to subsequent writers between Phase B failure and process restart) is honest follow-up at MR-856. - docs/runs.md: replace "Finalize → publisher residual" section with "Open-time recovery sweep (MR-847)" — describes the sidecar protocol lifecycle (Phases A-D), the sweep's classifier + decision dispatch, the audit trail, and the operator-facing query (omnigraph commit list --filter actor=omnigraph:recovery). - AGENTS.md capability matrix "Atomic single-dataset commits" row: drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat; describe the three layers as all shipping; reference MR-856 for the background-reconciler follow-up. - docs/storage.md: add _graph_commit_recoveries.lance and __recovery/{ulid}.json to the on-disk layout (mermaid + prose). - docs/branches-commits.md: new "Recovery audit trail (MR-847)" subsection describing the join from _graph_commits.lance:actor_id="omnigraph:recovery" to _graph_commit_recoveries.lance:graph_commit_id for operator post-mortem. - docs/maintenance.md: note the MR-847 recovery floor on cleanup — --keep < 3 may garbage-collect Lance versions the recovery sweep needs as a rollback target. Default --keep 10 is safe. - docs/testing.md: add tests/recovery.rs to the engine integration-test table; expand the failpoints.rs row to mention the four MR-847 per-writer Phase B → recovery integration tests. - .context/mr-847-design.md: prepend a "Status: DONE" stanza listing every commit hash + scope across phases 1-10. AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs). Full workspace test sweep passes with --features failpoints (361 tests across 20 binaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
- Otherwise **roll back**: per-table `Dataset::restore` to the
fix: optimize publishes compaction; recovery roll-back converges manifest (#141) * test(optimize): cover manifest publish + HEAD-drift reconcile Red against the pre-fix optimize, which ran compact_files without publishing the compacted version to __manifest: - maintenance: optimize must publish so the manifest table_version tracks the compacted Lance HEAD and a later schema apply succeeds; and must reconcile a pre-existing manifest-behind-HEAD drift (forged via raw Lance compaction) so strict writes commit again. - end_to_end + composite_flow: post-optimize query / strict update / reopen in the full lifecycle (the canonical flow previously omitted post-optimize writes as a documented "known limitation"). - failpoints: a crash between compaction and the manifest publish rolls forward on next open. * fix(optimize): publish compaction to manifest and reconcile HEAD drift optimize ran Lance compact_files without publishing the new version to __manifest, so the manifest table_version lagged the Lance HEAD: reads stayed pinned to the pre-compaction version, and the next schema apply or strict update/delete failed its HEAD-vs-manifest precondition with "stale view ... refresh and retry" (open-time recovery rollback inflated the gap on retry). optimize now publishes each compacted table's version under the per-(table, main) write queue, guarded by a manifest CAS and a SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe because compaction is content-preserving). When a table has nothing left to compact but its Lance HEAD is already ahead of the manifest pin (pre-fix drift, or a recovery restore commit), optimize reconciles the manifest forward to HEAD (metadata-only, no sidecar). Caches and the CSR/CSC graph index are invalidated after a publish. Docs updated (maintenance, storage, branches-commits, writes, testing). * test(recovery): rollback convergence + optimize-defer regressions Red against the current code, landed before the fix: - recovery: after the open-time sweep rolls a sidecar back, the manifest must track Lance HEAD (no residual drift) so a follow-up schema apply succeeds — the original "+1 per retry" loop. Today roll-back restores without publishing, so the manifest lags HEAD and the apply fails its HEAD-vs-manifest precondition. - maintenance: optimize must refuse while a recovery sidecar is pending — operating on an unrecovered graph could publish a partial write the sweep would roll back. Also removes optimize_reconciles_preexisting_manifest_head_drift: the ad-hoc drift reconcile it covered is replaced by recovery-side convergence. * fix(recovery): converge manifest on roll-back; optimize defers on pending recovery Root of PR #141's review findings and the original "+1 per retry" loop: a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving drift vs. a partial write a sidecar will roll back), and optimize's reconcile guessed it benign. Close the class instead of guessing: - Recovery roll-back now PUBLISHES the restored version (via a push_table_update_at_head helper shared with roll-forward), so the manifest tracks the Lance HEAD after recovery — symmetric with roll-forward. This fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest precondition passes) and removes the only remaining source of orphaned drift. The audit still records the logical rolled-back-to version; the manifest is published at the restore commit (identical content). - optimize drops the ad-hoc drift reconcile and instead REFUSES when a __recovery sidecar is pending, so it only ever operates on a recovered graph (manifest == HEAD); its compaction publish can no longer commit a partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is moot. Updates the rollback recovery-test helper (manifest == HEAD after roll-back), the failpoints assertions, and the user/dev docs. * test(recovery): fix rollback assertion for manifest convergence The roll-back-publishes change makes the manifest version advance after a SchemaApply roll-back (to the old-schema content), so the schema_apply_without_schema_staging_rolls_back_on_next_open assertion must be `version > pre`, not `version == pre`. This update was dropped during the commit churn and surfaced as a CI Test Workspace failure; the old-schema-preserved intent stays covered by count_rows + _schema.pg + the RolledBack convergence invariant.
2026-06-08 01:50:12 +02:00
manifest-pinned table version, then a single `ManifestBatchPublisher::publish`
of the restored HEAD — symmetric with roll-forward, so `manifest == HEAD`
after recovery (no residual drift). This convergence is what lets a
failed-then-retried schema apply succeed instead of failing one version higher
each iteration. The audit row's `to_version` records the logical
rolled-back-to version (`manifest_pinned`); the manifest is published at the
restore commit (`manifest_pinned + 1`, same content).
recovery: address PR #72 review findings Bot reviewers (cubic, cursor, chatgpt-codex) caught 4 merge-blocking bugs + 3 strongly-recommended fixes + 3 doc errors in the initial PR. Each fix has a paired test demonstrating the bug before the fix. Merge-blocking fixes: - BranchMerge moved to loose-match classifier arm. publish_rewritten_ merge_table runs multiple commit_staged calls per table (merge_insert + delete_where + index rebuilds). Strict classification rolled back valid completed Phase B work as UnexpectedMultistep. Three new unit tests pin the loose-match behavior for BranchMerge. - branch_merge sidecar uses self.active_branch() (the resolved target branch) instead of inferring from the first sorted table key. The previous heuristic could record None (== main) when the merge target was a non-main branch, causing recovery to publish to the wrong manifest namespace. - Best-effort sidecar delete in all 5 writer sites (mutation, loader, schema_apply, branch_merge, ensure_indices). Previously, a sidecar cleanup failure after a successful manifest publish would error out the user's call for a write that already landed. Now: log a warning and ignore — the next open's recovery sweep tidies the stale sidecar via NoMovement classification. - ensure_indices sidecar scoped to tables that need work via new helpers needs_index_work_node / needs_index_work_edge. Previously the sidecar pinned every catalog table; if only one needed indexing, the others classified as NoMovement and the all-or-nothing decision rolled back legitimate index work. Strongly-recommended fixes: - recover_manifest_drift now takes &mut GraphCoordinator and refreshes between sidecars. Sidecar B's classification needs to see sidecar A's manifest changes, otherwise B can be classified against stale pins and incorrectly roll back work that just landed. - list_sidecars sorts URIs before reading. Sidecar filenames are ULIDs (chronologically sortable), so this gives deterministic, time-ordered processing. Filesystem-order was nondeterministic. - ReadOnly opens skip recover_schema_state_files too (was: only the MR-847 sweep was gated). Read-only consumers may run with read-only credentials; silent open-time mutations violate the contract. Doc cleanups: - Removed stale "Phase 4 placeholder" comment from recover_manifest_drift. - docs/runs.md decision-tree wording now correctly surfaces the InvariantViolation abort path. - docs/branches-commits.md clarifies actor_id is in _graph_commit_actors.lance (joined by graph_commit_id), not on _graph_commits.lance itself. Test surface (post-fixes): - 25 unit tests in db::manifest::recovery (+4 from this commit). - 10 integration tests in tests/recovery.rs (+3 from this commit). - ~672 tests across ~25 binaries pass with --features failpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:21:40 +02:00
- After a successful roll-forward or roll-back, an audit row is
recorded — `_graph_commits.lance` carries
recovery: document MR-847 ship across all reference docs (Phase 10) Update the doc surface to reflect MR-847 having shipped end to end — sidecar protocol, classifier, all-or-nothing decision tree, roll-forward via ManifestBatchPublisher, roll-back via Dataset::restore with fragment-set short-circuit, audit trail in _graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and the four migrated writers all carrying sidecars across Phase B → Phase C. - docs/invariants.md §VI.23: change from "upheld at the writer-trait surface for inserts/updates/etc., per-table commit_staged → manifest publish window remains" to "upheld at the writer-trait surface AND across process boundaries". The MR-847 sweep closes the residual on the next Omnigraph::open. The "continuous in-process" property (no ExpectedVersionMismatch surfacing to subsequent writers between Phase B failure and process restart) is honest follow-up at MR-856. - docs/runs.md: replace "Finalize → publisher residual" section with "Open-time recovery sweep (MR-847)" — describes the sidecar protocol lifecycle (Phases A-D), the sweep's classifier + decision dispatch, the audit trail, and the operator-facing query (omnigraph commit list --filter actor=omnigraph:recovery). - AGENTS.md capability matrix "Atomic single-dataset commits" row: drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat; describe the three layers as all shipping; reference MR-856 for the background-reconciler follow-up. - docs/storage.md: add _graph_commit_recoveries.lance and __recovery/{ulid}.json to the on-disk layout (mermaid + prose). - docs/branches-commits.md: new "Recovery audit trail (MR-847)" subsection describing the join from _graph_commits.lance:actor_id="omnigraph:recovery" to _graph_commit_recoveries.lance:graph_commit_id for operator post-mortem. - docs/maintenance.md: note the MR-847 recovery floor on cleanup — --keep < 3 may garbage-collect Lance versions the recovery sweep needs as a rollback target. Default --keep 10 is safe. - docs/testing.md: add tests/recovery.rs to the engine integration-test table; expand the failpoints.rs row to mention the four MR-847 per-writer Phase B → recovery integration tests. - .context/mr-847-design.md: prepend a "Status: DONE" stanza listing every commit hash + scope across phases 1-10. AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs). Full workspace test sweep passes with --features failpoints (361 tests across 20 binaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
a commit tagged `actor_id = "omnigraph:recovery"`, and a sibling
`_graph_commit_recoveries.lance` row carries `recovery_kind`,
`recovery_for_actor` (the original sidecar's actor), `operation_id`,
per-table outcomes. Operators run `omnigraph commit list --filter
actor=omnigraph:recovery` to find recoveries.
- Sidecar deleted as the final step.
Triggers for the residual: transient Lance write errors during finalize
(object-store retry budget exhaustion, disk full); persistent publisher
contention exceeding `PUBLISHER_RETRY_BUDGET = 5` retries.
recovery: refresh-time roll-forward closes the in-process residual Adds RecoveryMode { Full, RollForwardOnly } and wires Omnigraph::refresh to invoke roll-forward-only recovery. This closes the documented "long-running server between Phase B failure and process restart" residual without requiring a restart, for the common case (mutation / load finalize → publisher failure). Why roll-forward only and not full sweep: * Roll-forward is safe under concurrency (publisher uses row-level CAS). * Roll-back uses Dataset::restore, which "wins" against concurrent Append/Update/Delete/CreateIndex/Merge per check_restore_txn — silently orphaning the concurrent writer's commit (pinned by tests/staged_writes.rs::lance_restore_loses_to_concurrent_append_via_orphaning). Sidecars that classify as RollBack-eligible are LEFT ON DISK for the next ReadWrite open, where no concurrent writers exist and full restore is safe. Implementation: * recovery.rs: RecoveryMode enum; recover_manifest_drift takes mode; process_sidecar branches on mode for Abort and RollBack — both defer to next ReadWrite open under RollForwardOnly. RollForward behavior unchanged. * omnigraph.rs: Omnigraph::refresh promoted to pub; calls recover_manifest_drift in RollForwardOnly mode after coordinator refresh. Steady-state cost: one list_dir of __recovery (early return on empty). Adds refresh_coordinator_only — pub(crate) — for engine-internal callers that hold an in-flight sidecar (the schema_apply lease-check + lock-release paths). Without this split, refresh would race the in-flight sidecar. * schema_apply.rs: switch all 6 internal db.refresh() call sites to refresh_coordinator_only(). Tests: * refresh_runs_roll_forward_recovery_in_process — trigger mutation.post_finalize_pre_publisher; without restart, call db.refresh(); assert sidecar deleted, drifted row visible, subsequent mutation succeeds. * refresh_defers_rollback_eligible_sidecar_to_next_open — synthesize a Mutation sidecar with bogus expected (UnexpectedAtP1 → RollBack); refresh leaves it on disk and Lance HEAD unchanged; drop and reopen runs the full sweep which advances HEAD via restore. Docs: * docs/runs.md "Long-running servers" caveat updated to describe the refresh-time roll-forward path and the rollback-defer behavior. * docs/invariants.md §VI.23 status line updated to reflect in-process closure of the common case. Workspace tests pass with --features failpoints; no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 00:15:42 +02:00
**Long-running servers**: `Omnigraph::refresh` runs roll-forward-only
recovery in-process — the common Phase B → Phase C residual closes
without a restart. The next mutation on the same handle (after refresh)
no longer surfaces `ExpectedVersionMismatch` for the failed table.
Sidecars that would require a `Dataset::restore` (mixed / unexpected
state) are deferred to the next `OpenMode::ReadWrite` open: restore is
unsafe under concurrency because Lance's `check_restore_txn` accepts
the restore against in-flight Append/Update/Delete commits and
silently orphans them (pinned by
`tests/staged_writes.rs::lance_restore_loses_to_concurrent_append_via_orphaning`).
Continuous in-process recovery for the rollback path is the goal of a
future background reconciler with per-(table, branch) writer-queue
acquisition.
MR-794 step 2: address PR #68 review — merge semantics, cardinality, residual Five fixes from PR #68 review (Cursor Bugbot + Codex + Cubic): * **scan_with_pending gains merge-shadow semantics** (Codex P1, Cubic P1#1): new `key_column: Option<&str>` parameter. When set, committed rows whose key value appears in any pending batch are excluded from the scan — making `scan_with_pending` correctly merge-semantic for chained updates instead of naively unioning. execute_update calls with Some("id"). Without this, a chained `update where age > 30` could match a row whose pending value already moved out of range. * **Multi-delete on same table no longer trips ExpectedVersionMismatch** (Cursor Bugbot HIGH): open_table_for_mutation routes through reopen_for_mutation when staging.inline_committed has the table, using the post-inline-commit Lance version captured at record_inline time. The legacy open_for_mutation_on_branch fence (Lance HEAD == manifest pinned) is correct cross-writer but wrong intra-query when deletes have already advanced HEAD on this table. Branch goes away when Lance ships two-phase delete (lance-format/lance#6658). * **Cardinality validation consolidated** (Cursor LOW + Codex P2 + Cubic P1#2 + Cubic P2): new exec/staging::count_src_per_edge + enforce_cardinality_bounds shared by mutation and loader paths. Restores the missing min-cardinality check on the engine path. Loader Merge mode passes Some("id") to dedupe edges being updated by id (not double-count committed + pending). Loader Append mode and engine path pass None (ULID-generated ids never collide). * **Dead count_rows_with_pending removed** (Cursor LOW): never called. * **Misleading concat-helper comment fixed** (Cubic P3): claimed schema normalization the helper doesn't implement. Updated to match reality. * **Documentation honesty** (Cubic P1#3): MR-794 narrows but doesn't eliminate the "Lance HEAD ahead of __manifest" drift class. Drift is unreachable for op-execution failures (the partial_failure test pins this), but a residual remains at the finalize→publisher boundary because Lance has no multi-dataset commit primitive: per-table commit_staged calls run sequentially before manifest commit. Updated docs/runs.md, docs/invariants.md §VI.25, docs/releases/v0.4.1.md to scope the claim precisely. * **Failpoint test pinning the residual**: new mutation.post_finalize_pre_publisher failpoint + two tests in tests/failpoints.rs that confirm the documented residual behavior. Catches future regressions that widen the residual. Test additions on tests/runs.rs: * chained_updates_with_overlapping_predicate_respects_intermediate_value * multi_statement_delete_on_same_node_table * cascade_delete_node_then_explicit_delete_edge_on_same_table * mutation_insert_edge_enforces_min_cardinality * load_merge_mode_dedupes_edge_for_cardinality_count 113/113 engine integration tests pass (runs + end_to_end + consistency + staged_writes + validators). Failpoints feature build runs in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 13:47:55 +02:00
The publisher-CAS contract is unchanged: a *concurrent writer* that
advances any of our touched tables between snapshot capture and
publisher commit produces exactly one winner. The residual above is
about *our* abandoned commits in the failure path, not about
concurrency races.
## Conflict shape
Concurrent writers to the same `(table, branch)` produce exactly one
success and one failure. The losing writer's error is
`OmniError::Manifest` with kind `Conflict` and details
`ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected,
actual }`. The HTTP server maps this to **409 Conflict** with body
`{"error": "...", "code": "conflict", "manifest_conflict": { "table_key":
"...", "expected": N, "actual": M }}` — see [docs/user/server.md](../user/server.md).
## Audit
`actor_id` lands in `_graph_commits.lance` via `record_graph_commit` (no
intermediate run record). Audit history is queried via `omnigraph commit
list`.
## Migration code
feat(engine): sweep & remove legacy __run__ branch guard (MR-770) (#132) * feat(engine): sweep legacy __run__ branches via v2→v3 manifest migration Pre-v0.4.0 graphs can carry stale `__run__<id>` staging branches on the `__manifest` dataset, left by the Run state machine removed in MR-771. Lance's `list_branches` still enumerates them, so they leak into `branch_list()` and count as blocking branches at schema-apply time. Add a one-time `migrate_v2_to_v3` arm to the internal-schema dispatcher: on the first read-write open it enumerates `__manifest` branches, deletes every `__run__*` ref, and bumps the stamp to 3. Idempotent under retry (re-enumerates fresh each run). The `"__run__"` prefix is inlined so the migration does not depend on the run_registry guard that MR-770 removes next. This is the prerequisite sweep; the guard removal follows in the next commit. * refactor(engine): remove the legacy __run__ branch guard (MR-770) With the v2→v3 migration sweeping stale `__run__*` branches off `__manifest` on first read-write open, the defense-in-depth `is_internal_run_branch` guard is no longer needed. - delete `db/run_registry.rs`; drop the module + re-export from `db/mod.rs` - collapse `is_internal_system_branch` to the schema-apply-lock check only - `ensure_public_branch_ref`: drop the run-ref rejection; `__run__*` is now an ordinary branch name - `branch_merge`: reject `is_internal_system_branch` (was run-only) so the schema-apply lock is rejected consistently with create/delete — a small, deliberate tightening - update the inline schema-apply test + the writes integration tests (`public_branch_apis_reject_internal_run_refs` → `public_branch_apis_reject_internal_system_refs`, which also asserts `__run__*` now creates successfully) - docs: flip the "pending production sweep / defense-in-depth" notes to "auto-swept by the v2→v3 migration"; document the read-only-open limitation Known residual: the inert `_graph_runs.lance` / `_graph_run_actors.lance` bytes remain until a `StorageAdapter::delete_prefix` primitive lands. * fix(engine): run __run__ sweep at Omnigraph::open, not only on publish Review (PR #132) caught a regression: removing __run__ from `is_internal_system_branch` exposed legacy `__run__*` branches to the schema-apply blocking-branch checks (schema_apply.rs:104 and :778) and to `branch_list()`, but the v2→v3 sweep ran only inside the publisher's `load_publish_state`. On a pre-v0.4.0 graph whose first write is a schema apply, the blocking-branch check fires before any publish, so apply failed with "found non-main branches: __run__…". The same lazy timing also created a reverse hazard: a user-created `__run__*` branch on a still-v2 graph could be deleted by the first publish's sweep. Fix: run the internal-schema migration in `Omnigraph::open(ReadWrite)` (new `manifest::migrate_on_open`), before the coordinator reads branch state. The sweep now lands before any branch-observing code, and a graph is stamped v3 at open — so the one-time sweep can never catch a legitimately-created branch. Both checks and `branch_list` see the swept graph; correct by construction for every write path. Accepted residual: a read-only open of an unmigrated legacy graph still lists `__run__*` (read-only opens must not write, so they can't sweep). Documented. Regression test `legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply` confirmed RED before the fix (panicked on the branch_list leak assertion) and GREEN after. Also updates the stale schema_apply.rs comment, the writes.md "Migration code" section, and adds the v3 row to storage.md's migration table. * test(engine): sweep multiple legacy __run__ branches; doc nit Strengthen the v2→v3 migration test to synthesize three `__run__*` branches (a real legacy graph accumulates one per run) so the migration's delete loop is exercised on a single reused dataset handle, not just a single branch. Confirms multi-branch deletion is safe. Also drop a stale "active runs" reference from the branch_delete doc line. * fix(engine): force-delete in __run__ sweep for concurrency safety `migrate_v2_to_v3` ran `Dataset::delete_branch` (= `branches().delete(.., false)`), which errors "BranchContents not found" if the branch is already gone. Since the sweep now runs in `Omnigraph::open(ReadWrite)`, two processes opening the same legacy v2 graph concurrently would race: one wins each delete, the other's open fails. The migration only claimed idempotency under *sequential* retry. Switch to `Dataset::force_delete_branch` (= `delete(.., true)`), Lance's documented path for cleaning up zombie branches, which tolerates an already-absent branch. The sweep is now idempotent under concurrent runners and robust to partial/zombie state. Found in self-review; no behavior change for the common single-open path. * docs(release): note MR-770 __run__ cleanup in v0.6.1 * docs(branches): reconcile branch cleanup semantics
2026-06-07 17:33:14 +02:00
`db/manifest/migrations.rs` carries the v2→v3 internal-schema step (MR-770):
a one-time sweep that deletes legacy `__run__*` staging branches off
`__manifest`. It runs in `Omnigraph::open(ReadWrite)` (via
`manifest::migrate_on_open`, before the coordinator reads branch state) and
again on the publisher's write path; both are idempotent once the stamp is at
v3. Deleting the inert `_graph_runs.lance` / `_graph_run_actors.lance` dataset
*bytes* is still deferred — it needs a `StorageAdapter::delete_prefix`
primitive — but those bytes are invisible to graph-level state.
MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup Refresh user-facing and agent-facing docs for the staged-write rewire and clean up stale Run-state-machine references that survived MR-771. MR-794-specific updates: * docs/runs.md — remove "Known limitation: mid-query partial failure" section; document the in-memory accumulator + D₂ rule + the LoadMode::Overwrite residual. * docs/invariants.md §VI.25 — flip from aspirational/open to upheld for inserts/updates. Within-query read-your-writes is now load-bearing for the publisher CAS contract. * docs/architecture.md — add "Mutation atomicity — in-memory accumulator (MR-794)" subsection with per-op flow; refresh the engine + state diagrams to drop RunRegistry and add MutationStaging. * docs/execution.md — rewrite the mutation flow sequence diagram for the staged-write path; updated the LoadMode table to call out per-mode commit semantics; rewrote load vs ingest. * docs/query-language.md — document the D₂ parse-time rule. * docs/errors.md — add the D₂ BadRequest rejection path. * docs/testing.md — extend the runs.rs row to cover the new MR-794 contract tests; add the staged_writes.rs row. * docs/releases/v0.4.1.md (new) — release note covering the rewire, test additions, residuals, and files changed. * AGENTS.md (CLAUDE.md symlink) — update the atomic-per-query description and the L2 capability matrix row. Stale-reference cleanup (MR-771 leftovers): * docs/storage.md — drop live _graph_runs.lance / _graph_run_actors.lance from the layout diagram and prose; mark legacy. * docs/branches-commits.md — move __run__<id> to a legacy note; remove publish_run from the publish-trigger list. * docs/audit.md — refresh _as API list (drop begin_run_as / publish_run_as); legacy RunRecord.actor_id moved to a historical note. * docs/constants.md — mark run registry / branch-prefix rows as legacy. * docs/cli.md — replace the legacy omnigraph run * quickstart block with omnigraph commit list/show. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:43:19 +02:00
## Mid-query partial failure: closed by MR-794
The pre-MR-794 design had a known limitation: a multi-statement `.gq`
mutation where op-N inline-committed a Lance fragment and op-N+1 then
failed left the touched table at `Lance HEAD = manifest_version + 1`,
blocking the next mutation with `ExpectedVersionMismatch`.
MR-794 (step 1 + step 2+) closed this for inserts/updates **by
construction at the writer layer**: insert and update batches accumulate
in memory; no Lance HEAD advance happens during op execution; one
`stage_*` + `commit_staged` per touched table runs at end-of-query, and
only after every op succeeded. A failed op leaves Lance HEAD untouched
on the staged tables, so the next mutation proceeds normally with no
drift to reconcile.
The cancellation case (future drop mid-mutation) inherits the same
guarantee — the in-memory accumulator evaporates with the dropped task
and no Lance write was ever issued.
For delete-touching mutations the legacy inline-commit shape is
preserved (Lance has no public two-phase delete in 4.0.0) — the same
narrow window remains. The parse-time D₂ rule prevents inserts/updates
from coexisting with deletes in one query, so a pure-delete failure
cannot drift any staged-table state. If a delete-only multi-table
mutation fails mid-cascade, the same workaround as before applies
(retry; rely on `omnigraph cleanup` once a later successful commit
moves HEAD past the orphan version). Closing this requires Lance to
expose `DeleteJob::execute_uncommitted`; tracked in MR-793 and a
Lance-upstream ticket.