2026-05-30 23:20:56 +02:00
# Direct-Publish Write Path
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-05-30 23:20:56 +02:00
> History: the Run state machine and `__run__<id>` staging branches were
> removed in MR-771 (shipped v0.4.0). Writes now go directly to the target
> table; this document specifies that direct-publish path.
`mutate_as` and `load` write **directly to the target table**
2026-04-30 08:52:50 +02:00
and call `ManifestBatchPublisher::publish` once at the end with
`expected_table_versions` (the per-table manifest versions captured before
the first write). Cross-table OCC is enforced inside the publisher; the
publisher's row-level CAS on `__manifest` is the single fence.
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-04-30 08:52:50 +02:00
## What this means in practice
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-04-30 08:52:50 +02:00
- No `RunRecord` , no `_graph_runs.lance` , no `_graph_run_actors.lance` .
- No `omnigraph run *` CLI subcommands and no `/runs/*` HTTP endpoints.
feat(engine): sweep & remove legacy __run__ branch guard (MR-770) (#132)
* feat(engine): sweep legacy __run__ branches via v2→v3 manifest migration
Pre-v0.4.0 graphs can carry stale `__run__<id>` staging branches on the
`__manifest` dataset, left by the Run state machine removed in MR-771. Lance's
`list_branches` still enumerates them, so they leak into `branch_list()` and
count as blocking branches at schema-apply time.
Add a one-time `migrate_v2_to_v3` arm to the internal-schema dispatcher: on the
first read-write open it enumerates `__manifest` branches, deletes every
`__run__*` ref, and bumps the stamp to 3. Idempotent under retry (re-enumerates
fresh each run). The `"__run__"` prefix is inlined so the migration does not
depend on the run_registry guard that MR-770 removes next.
This is the prerequisite sweep; the guard removal follows in the next commit.
* refactor(engine): remove the legacy __run__ branch guard (MR-770)
With the v2→v3 migration sweeping stale `__run__*` branches off `__manifest`
on first read-write open, the defense-in-depth `is_internal_run_branch` guard
is no longer needed.
- delete `db/run_registry.rs`; drop the module + re-export from `db/mod.rs`
- collapse `is_internal_system_branch` to the schema-apply-lock check only
- `ensure_public_branch_ref`: drop the run-ref rejection; `__run__*` is now an
ordinary branch name
- `branch_merge`: reject `is_internal_system_branch` (was run-only) so the
schema-apply lock is rejected consistently with create/delete — a small,
deliberate tightening
- update the inline schema-apply test + the writes integration tests
(`public_branch_apis_reject_internal_run_refs` →
`public_branch_apis_reject_internal_system_refs`, which also asserts
`__run__*` now creates successfully)
- docs: flip the "pending production sweep / defense-in-depth" notes to
"auto-swept by the v2→v3 migration"; document the read-only-open limitation
Known residual: the inert `_graph_runs.lance` / `_graph_run_actors.lance` bytes
remain until a `StorageAdapter::delete_prefix` primitive lands.
* fix(engine): run __run__ sweep at Omnigraph::open, not only on publish
Review (PR #132) caught a regression: removing __run__ from
`is_internal_system_branch` exposed legacy `__run__*` branches to the
schema-apply blocking-branch checks (schema_apply.rs:104 and :778) and to
`branch_list()`, but the v2→v3 sweep ran only inside the publisher's
`load_publish_state`. On a pre-v0.4.0 graph whose first write is a schema
apply, the blocking-branch check fires before any publish, so apply failed
with "found non-main branches: __run__…". The same lazy timing also created a
reverse hazard: a user-created `__run__*` branch on a still-v2 graph could be
deleted by the first publish's sweep.
Fix: run the internal-schema migration in `Omnigraph::open(ReadWrite)` (new
`manifest::migrate_on_open`), before the coordinator reads branch state. The
sweep now lands before any branch-observing code, and a graph is stamped v3 at
open — so the one-time sweep can never catch a legitimately-created branch.
Both checks and `branch_list` see the swept graph; correct by construction for
every write path.
Accepted residual: a read-only open of an unmigrated legacy graph still lists
`__run__*` (read-only opens must not write, so they can't sweep). Documented.
Regression test `legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply`
confirmed RED before the fix (panicked on the branch_list leak assertion) and
GREEN after. Also updates the stale schema_apply.rs comment, the writes.md
"Migration code" section, and adds the v3 row to storage.md's migration table.
* test(engine): sweep multiple legacy __run__ branches; doc nit
Strengthen the v2→v3 migration test to synthesize three `__run__*` branches
(a real legacy graph accumulates one per run) so the migration's delete loop
is exercised on a single reused dataset handle, not just a single branch.
Confirms multi-branch deletion is safe.
Also drop a stale "active runs" reference from the branch_delete doc line.
* fix(engine): force-delete in __run__ sweep for concurrency safety
`migrate_v2_to_v3` ran `Dataset::delete_branch` (= `branches().delete(.., false)`),
which errors "BranchContents not found" if the branch is already gone. Since the
sweep now runs in `Omnigraph::open(ReadWrite)`, two processes opening the same
legacy v2 graph concurrently would race: one wins each delete, the other's open
fails. The migration only claimed idempotency under *sequential* retry.
Switch to `Dataset::force_delete_branch` (= `delete(.., true)`), Lance's
documented path for cleaning up zombie branches, which tolerates an
already-absent branch. The sweep is now idempotent under concurrent runners and
robust to partial/zombie state. Found in self-review; no behavior change for the
common single-open path.
* docs(release): note MR-770 __run__ cleanup in v0.6.1
* docs(branches): reconcile branch cleanup semantics
2026-06-07 17:33:14 +02:00
- No `__run__<id>` staging branches; `__run__*` is no longer a reserved
name. The branch-name guard was removed in MR-770, and any stale
`__run__*` branch on an upgraded graph is swept off `__manifest` by the
v2→v3 internal-schema migration on first read-write open. (The inert
`_graph_runs.lance` bytes remain until a `delete_prefix` primitive lands.)
2026-04-30 08:52:50 +02:00
- Cancelled mutation futures leave **no graph-level state** — only orphaned
Lance fragments, which the existing `omnigraph cleanup` pipe reclaims.
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-04-30 08:52:50 +02:00
## Read-your-writes within a multi-statement mutation
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-04-30 08:52:50 +02:00
A `.gq` query with multiple ops (e.g. `insert Person … insert Knows …` )
must observe earlier ops' writes when validating later ops (referential
2026-05-01 10:43:19 +02:00
integrity, edge cardinality). After MR-794 step 2+ this is implemented
via an in-memory `MutationStaging` accumulator in
2026-05-15 03:45:22 +03:00
[`crates/omnigraph/src/exec/staging.rs` ](../../crates/omnigraph/src/exec/staging.rs ),
2026-05-01 10:43:19 +02:00
shared by both `mutate_as` and the bulk loader:
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-04-30 08:52:50 +02:00
- On the first touch of each table, the pre-write manifest version is
2026-05-01 10:43:19 +02:00
captured into `expected_versions[table_key]` (the publisher's CAS
fence at end-of-query).
- Each insert/update op pushes a `RecordBatch` into the per-table
pending accumulator. Lance HEAD does **not** advance during op
execution.
- Read sites (validation, predicate matching for `update` ) consume
`TableStore::scan_with_pending` , which scans committed via Lance
and applies the same SQL filter to the pending batches via DataFusion
`MemTable` . Same-query writes are visible to subsequent reads.
- At end-of-query, `MutationStaging::finalize` issues exactly one
`stage_*` + `commit_staged` per touched table (concatenating
accumulated batches; merge-mode dedupes by `id` , last-write-wins),
and the publisher publishes the manifest atomically across all
touched sub-tables. Cross-table conflicts surface as
2026-04-30 08:52:50 +02:00
`ManifestConflictDetails::ExpectedVersionMismatch` .
2026-05-01 10:43:19 +02:00
- **Deletes still inline-commit.** Lance's `Dataset::delete` is not
exposed as a two-phase op in 4.0.0; deletes go through `delete_where`
immediately and record their post-write state in
`MutationStaging.inline_committed` . The parse-time D₂ rule (below)
prevents inserts/updates from coexisting with deletes in one query,
so the inline path is safe for delete-only mutations.
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-05-15 03:45:22 +03:00
This upholds the manifest-atomic mutation and read-your-writes invariants
tracked in [docs/dev/invariants.md ](invariants.md ).
2026-05-01 10:43:19 +02:00
### D₂ — parse-time mixed-mode rejection
A single mutation query is either insert/update-only or delete-only.
Mixed → rejected at parse time with a clear error directing the user to
split the query. Reason: mixing creates ordering hazards
(insert→delete on the same row would silently no-op because the staged
insert isn't visible to delete; cascading deletes of just-inserted
edges break referential integrity). Until Lance exposes a two-phase
delete API, the parse-time rejection keeps both paths atomic and
correct. Tracked: MR-793, plus a Lance-upstream ticket.
MR-793 phases 1-6: TableStorage trait + staged-write surface for engine writers
Hoists Lance's stage+commit two-phase write pattern from "discipline at
each writer" to a sealed trait surface (`TableStorage`). New engine code
that needs to advance Lance HEAD MUST go through `stage_*` + `commit_staged`;
the trait's opaque `SnapshotHandle` / `StagedHandle` types keep
`lance::Dataset` and `lance::Transaction` out of trait signatures.
Phases landed (see .context/mr-793-design.md for the full plan):
* 1a: `crates/omnigraph/src/storage_layer.rs` — `TableStorage` trait,
sealed (only in-tree types can impl), single impl on `TableStore`
delegating to existing inherent methods; `Omnigraph::storage()`
accessor returns `&dyn TableStorage`.
* 2: three new staged primitives — `stage_overwrite`,
`stage_create_btree_index`, `stage_create_inverted_index` —
implementing the simple branch of Lance's `CreateIndexBuilder::execute`
(scalar indices only; vector indices stay inline because
`build_index_metadata_from_segments` is `pub(crate)` in lance-4.0.0).
Six new tests in `tests/staged_writes.rs` pin both the new primitives
and the inline residuals (`delete_where`, `create_vector_index`).
* 3: `tests/forbidden_apis.rs` — defense-in-depth integration test
walks engine source, fails on direct lance::* inline-commit API use
outside `table_store.rs` / `db/manifest/`. Skips comment lines and
honors `// forbidden-api-allow:` sentinels.
* 4: `ensure_indices` migration — scalar index builds now route through
`stage_create_*_index` + `commit_staged` instead of
`create_*_index(&mut Dataset)`. Vector indices stay inline (residual,
named honestly at the call site).
* 5: `branch_merge::publish_rewritten_merge_table` migration — the
merge_insert phase now uses `stage_merge_insert` + `commit_staged`;
delete phase stays inline (Lance #6658 residual, named honestly).
* 6: `schema_apply` rewritten_tables migration — non-empty rewrites
use `stage_overwrite` + `commit_staged`; empty-batch rewrites stay
inline because `InsertBuilder::execute_uncommitted` rejects empty
data. The narrow inline window is bounded by `__schema_apply_lock__`.
Verified-green test surface:
* `cargo test -p omnigraph-engine` — 68 lib + ~120 integration tests
(incl. 6 new staged_writes tests + the new forbidden_apis test).
* `cargo test -p omnigraph-engine --features failpoints --test failpoints`
— 5 tests, all green.
* `cargo test --workspace` — green.
Deferred to follow-up sessions (see design doc §17 split):
* Phase 1b — convert remaining engine call sites to `&dyn TableStorage`
(mostly READS that don't touch the staged-write invariant).
* Phase 7 — recovery-on-open reconciler (closes Phase B → Phase C
residual across process restarts; new subsystem).
* Phase 8 — index-coverage reconciler (full §VII.35 compliance —
removes synchronous index work from the publish path).
* Phase 9 — demote unused `TableStore` inherent methods to `pub(crate)`
(depends on Phase 1b).
Lance upstream blockers documented:
* lance-format/lance#6658 — two-phase delete API (open, no PRs).
* Companion: `build_index_metadata_from_segments` should be `pub` so
vector-index builds can be staged outside the lance crate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:03:15 +02:00
### MR-793 status (storage trait two-phase invariant) — partial
MR-793 hoists the staged-write pattern into a `TableStorage` trait
surface with sealed-trait enforcement and opaque `SnapshotHandle` /
`StagedHandle` types — see `crates/omnigraph/src/storage_layer.rs` .
The trait is the canonical surface for new engine code; existing call
sites still use the inherent `TableStore` methods (mechanical migration
deferred to a follow-up cycle — tracked).
Three writers have been migrated onto staged primitives:
* **`ensure_indices` ** (`db/omnigraph/table_ops.rs::build_indices_on_dataset_for_catalog` )
— scalar indices (BTree, Inverted) now use `stage_create_*_index` +
`commit_staged` . Vector indices stay inline (residual — Lance
`build_index_metadata_from_segments` is `pub(crate)` in 4.0.0;
companion ticket to lance-format/lance#6658 needed).
* **`branch_merge::publish_rewritten_merge_table` **
(`exec/merge.rs` ) — merge_insert now uses `stage_merge_insert` +
`commit_staged` . Deletes stay inline (Lance #6658 residual).
* **`schema_apply` rewritten_tables** (`db/omnigraph/schema_apply.rs` )
— non-empty rewrites use `stage_overwrite` + `commit_staged` .
Empty-batch rewrites stay inline (Lance `InsertBuilder::execute_uncommitted`
rejects empty data; the empty case is rare and bounded by the
schema-apply lock branch).
A defense-in-depth integration test (`tests/forbidden_apis.rs` ) walks
engine source and fails if non-allow-listed code calls Lance's
inline-commit APIs directly. The trait surface itself is the primary
enforcement (sealed + only-callable-via-trait once call sites land);
the grep test catches type-system bypass attempts.
The "finalize → publisher residual" described below applies equally to
the migrated writers — Lance has no multi-dataset atomic commit
primitive, so the per-table commit_staged → manifest publish gap is
the same drift class. Closing it requires either upstream Lance
multi-dataset commit OR the omnigraph-side recovery-on-open reconciler
described in `.context/mr-793-design.md` §15 (deferred to MR-795).
2026-05-02 19:46:07 +02:00
### Inline-commit method residuals on `TableStorage` (MR-793 acceptance §1 option b)
MR-793's acceptance criterion §1 ("`TableStore` public API has no method that performs a manifest commit as a side effect of writing") is met **per-method** by enumerating every inline-commit method that remains on the trait surface, naming why it cannot yet be removed, and keeping the residual comment at every call site:
| Method on `TableStore` | Inline-commit reason | Closes when |
|---|---|---|
| `delete_where` | `DeleteJob` is `pub(crate)` in lance-4.0.0 — no public two-phase delete API | [lance-format/lance#6658 ](https://github.com/lance-format/lance/issues/6658 ) lands and `stage_delete` joins the trait |
| `create_vector_index` | Vector indices take Lance's "segment commit path"; the helper `build_index_metadata_from_segments` is `pub(crate)` | [lance-format/lance#6666 ](https://github.com/lance-format/lance/issues/6666 ) lands and `stage_create_vector_index` joins the trait |
| `append_batch` | Legacy inherent method; some engine call sites haven't migrated to `stage_append + commit_staged` yet | MR-793 Phase 1b (call-site conversion) + Phase 9 (demote to `pub(crate)` ) |
| `merge_insert_batch` / `merge_insert_batches` | Legacy inherent method | Same — Phase 1b + Phase 9 |
| `overwrite_batch` | Legacy inherent method | Same — Phase 1b + Phase 9 |
| `create_btree_index` (inherent) | Legacy inherent method (the migrated callers use `stage_create_btree_index` + `commit_staged` ; the inherent stays for tests / un-migrated paths) | Same — Phase 1b + Phase 9 |
| `create_inverted_index` (inherent) | Same | Same — Phase 1b + Phase 9 + index-class split (MR-848) |
| `truncate_table` (inherent on `TableStore` ) | Used by `overwrite_batch` internally | Phase 9 |
After **lance#6658 + lance#6666 ship + MR-793 Phase 1b + MR-793 Phase 9 all complete** , the trait surface exposes only staged-write primitives + `commit_staged` . Until then this matrix names every residual explicitly, every call site carries a one-line residual comment, and no engine code outside `table_store.rs` is permitted to reach the inline-commit Lance APIs (enforced by the `tests/forbidden_apis.rs` guard).
2026-05-01 10:43:19 +02:00
### `LoadMode::Overwrite` residual
The bulk loader's Append and Merge modes use the staged-write path
described above. `LoadMode::Overwrite` keeps the legacy inline-commit
path: truncate-then-append doesn't fit the staged shape cleanly in
Lance 4.0.0, and overwrite has no in-flight read-your-writes
requirement (the prior data is being wiped). A mid-overwrite failure
can leave Lance HEAD on a partially-truncated table; the next overwrite
will replace it. Operator-driven (rare in agent workloads); document
permanently until Lance exposes `Operation::Overwrite { fragments }` as
a two-phase op.
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-05-03 13:56:36 +02:00
### Open-time recovery sweep
2026-05-01 13:47:55 +02:00
The staged-write rewire eliminates one drift class **by construction at
the writer layer**: an op that fails before pushing to the in-memory
accumulator (validation errors, missing endpoints, parse-time D₂
rejection) leaves Lance HEAD untouched on every staged table. This is
the case the `partial_failure_leaves_target_queryable_and_unblocks_next_mutation`
test pins.
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
A second, narrower drift class — the **finalize → publisher window** —
2026-05-03 13:56:36 +02:00
is closed across one open cycle by the open-time recovery sweep:
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
`MutationStaging::finalize` runs `stage_*` + `commit_staged` per touched
table sequentially, then the publisher commits the manifest. Lance has
no multi-dataset atomic commit, so the per-table `commit_staged` calls
are independent operations: if commit_staged on table N+1 fails *after*
commit_staged on tables 1..N succeeded, or if the publisher's CAS
pre-check rejects *after* every commit_staged succeeded, tables 1..N
are left at `Lance HEAD = manifest_pinned + 1` .
**Recovery protocol** (lifecycle of every staged-write writer —
`MutationStaging::finalize` , `schema_apply::apply_schema_with_lock` ,
fix: optimize publishes compaction; recovery roll-back converges manifest (#141)
* test(optimize): cover manifest publish + HEAD-drift reconcile
Red against the pre-fix optimize, which ran compact_files without
publishing the compacted version to __manifest:
- maintenance: optimize must publish so the manifest table_version
tracks the compacted Lance HEAD and a later schema apply succeeds;
and must reconcile a pre-existing manifest-behind-HEAD drift (forged
via raw Lance compaction) so strict writes commit again.
- end_to_end + composite_flow: post-optimize query / strict update /
reopen in the full lifecycle (the canonical flow previously omitted
post-optimize writes as a documented "known limitation").
- failpoints: a crash between compaction and the manifest publish rolls
forward on next open.
* fix(optimize): publish compaction to manifest and reconcile HEAD drift
optimize ran Lance compact_files without publishing the new version to
__manifest, so the manifest table_version lagged the Lance HEAD: reads
stayed pinned to the pre-compaction version, and the next schema apply or
strict update/delete failed its HEAD-vs-manifest precondition with
"stale view ... refresh and retry" (open-time recovery rollback inflated
the gap on retry).
optimize now publishes each compacted table's version under the
per-(table, main) write queue, guarded by a manifest CAS and a
SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe
because compaction is content-preserving). When a table has nothing left
to compact but its Lance HEAD is already ahead of the manifest pin
(pre-fix drift, or a recovery restore commit), optimize reconciles the
manifest forward to HEAD (metadata-only, no sidecar). Caches and the
CSR/CSC graph index are invalidated after a publish.
Docs updated (maintenance, storage, branches-commits, writes, testing).
* test(recovery): rollback convergence + optimize-defer regressions
Red against the current code, landed before the fix:
- recovery: after the open-time sweep rolls a sidecar back, the manifest
must track Lance HEAD (no residual drift) so a follow-up schema apply
succeeds — the original "+1 per retry" loop. Today roll-back restores
without publishing, so the manifest lags HEAD and the apply fails its
HEAD-vs-manifest precondition.
- maintenance: optimize must refuse while a recovery sidecar is pending —
operating on an unrecovered graph could publish a partial write the
sweep would roll back.
Also removes optimize_reconciles_preexisting_manifest_head_drift: the
ad-hoc drift reconcile it covered is replaced by recovery-side convergence.
* fix(recovery): converge manifest on roll-back; optimize defers on pending recovery
Root of PR #141's review findings and the original "+1 per retry" loop:
a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving
drift vs. a partial write a sidecar will roll back), and optimize's reconcile
guessed it benign. Close the class instead of guessing:
- Recovery roll-back now PUBLISHES the restored version (via a
push_table_update_at_head helper shared with roll-forward), so the manifest
tracks the Lance HEAD after recovery — symmetric with roll-forward. This
fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest
precondition passes) and removes the only remaining source of orphaned
drift. The audit still records the logical rolled-back-to version; the
manifest is published at the restore commit (identical content).
- optimize drops the ad-hoc drift reconcile and instead REFUSES when a
__recovery sidecar is pending, so it only ever operates on a recovered
graph (manifest == HEAD); its compaction publish can no longer commit a
partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is
moot.
Updates the rollback recovery-test helper (manifest == HEAD after roll-back),
the failpoints assertions, and the user/dev docs.
* test(recovery): fix rollback assertion for manifest convergence
The roll-back-publishes change makes the manifest version advance after a
SchemaApply roll-back (to the old-schema content), so the
schema_apply_without_schema_staging_rolls_back_on_next_open assertion must
be `version > pre`, not `version == pre`. This update was dropped during
the commit churn and surfaced as a CI Test Workspace failure; the
old-schema-preserved intent stays covered by count_rows + _schema.pg + the
RolledBack convergence invariant.
2026-06-08 01:50:12 +02:00
`branch_merge_on_current_target` , `ensure_indices_for_branch` ,
`optimize_all_tables` ):
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
1. **Phase A** : writer writes a sidecar JSON to
fix: optimize publishes compaction; recovery roll-back converges manifest (#141)
* test(optimize): cover manifest publish + HEAD-drift reconcile
Red against the pre-fix optimize, which ran compact_files without
publishing the compacted version to __manifest:
- maintenance: optimize must publish so the manifest table_version
tracks the compacted Lance HEAD and a later schema apply succeeds;
and must reconcile a pre-existing manifest-behind-HEAD drift (forged
via raw Lance compaction) so strict writes commit again.
- end_to_end + composite_flow: post-optimize query / strict update /
reopen in the full lifecycle (the canonical flow previously omitted
post-optimize writes as a documented "known limitation").
- failpoints: a crash between compaction and the manifest publish rolls
forward on next open.
* fix(optimize): publish compaction to manifest and reconcile HEAD drift
optimize ran Lance compact_files without publishing the new version to
__manifest, so the manifest table_version lagged the Lance HEAD: reads
stayed pinned to the pre-compaction version, and the next schema apply or
strict update/delete failed its HEAD-vs-manifest precondition with
"stale view ... refresh and retry" (open-time recovery rollback inflated
the gap on retry).
optimize now publishes each compacted table's version under the
per-(table, main) write queue, guarded by a manifest CAS and a
SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe
because compaction is content-preserving). When a table has nothing left
to compact but its Lance HEAD is already ahead of the manifest pin
(pre-fix drift, or a recovery restore commit), optimize reconciles the
manifest forward to HEAD (metadata-only, no sidecar). Caches and the
CSR/CSC graph index are invalidated after a publish.
Docs updated (maintenance, storage, branches-commits, writes, testing).
* test(recovery): rollback convergence + optimize-defer regressions
Red against the current code, landed before the fix:
- recovery: after the open-time sweep rolls a sidecar back, the manifest
must track Lance HEAD (no residual drift) so a follow-up schema apply
succeeds — the original "+1 per retry" loop. Today roll-back restores
without publishing, so the manifest lags HEAD and the apply fails its
HEAD-vs-manifest precondition.
- maintenance: optimize must refuse while a recovery sidecar is pending —
operating on an unrecovered graph could publish a partial write the
sweep would roll back.
Also removes optimize_reconciles_preexisting_manifest_head_drift: the
ad-hoc drift reconcile it covered is replaced by recovery-side convergence.
* fix(recovery): converge manifest on roll-back; optimize defers on pending recovery
Root of PR #141's review findings and the original "+1 per retry" loop:
a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving
drift vs. a partial write a sidecar will roll back), and optimize's reconcile
guessed it benign. Close the class instead of guessing:
- Recovery roll-back now PUBLISHES the restored version (via a
push_table_update_at_head helper shared with roll-forward), so the manifest
tracks the Lance HEAD after recovery — symmetric with roll-forward. This
fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest
precondition passes) and removes the only remaining source of orphaned
drift. The audit still records the logical rolled-back-to version; the
manifest is published at the restore commit (identical content).
- optimize drops the ad-hoc drift reconcile and instead REFUSES when a
__recovery sidecar is pending, so it only ever operates on a recovered
graph (manifest == HEAD); its compaction publish can no longer commit a
partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is
moot.
Updates the rollback recovery-test helper (manifest == HEAD after roll-back),
the failpoints assertions, and the user/dev docs.
* test(recovery): fix rollback assertion for manifest convergence
The roll-back-publishes change makes the manifest version advance after a
SchemaApply roll-back (to the old-schema content), so the
schema_apply_without_schema_staging_rolls_back_on_next_open assertion must
be `version > pre`, not `version == pre`. This update was dropped during
the commit churn and surfaced as a CI Test Workspace failure; the
old-schema-preserved intent stays covered by count_rows + _schema.pg + the
RolledBack convergence invariant.
2026-06-08 01:50:12 +02:00
`__recovery/{ulid}.json` BEFORE its first HEAD-advancing commit
(`commit_staged` , or `compact_files` for `optimize_all_tables` ,
which advances the Lance HEAD via a reserve-fragments + rewrite
commit rather than a staged write). The
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
sidecar names every `(table_key, table_path, expected_version,
post_commit_pin)` it intends to commit + the writer kind +
actor_id.
2. **Phase B** : writer's per-table `commit_staged` loop runs.
3. **Phase C** : publisher commits the manifest.
4. **Phase D** : writer deletes the sidecar.
2026-05-05 22:46:03 +02:00
> **Phase letter convention.** Throughout the recovery code, log
> messages, failpoint names (e.g. `branch_merge.post_phase_b_pre_manifest_commit`),
> and the per-writer integration tests, "Phase A/B/C/D" refers
> exclusively to the four-step lifecycle above. The per-table
> staged-write contract (`stage_*` then `commit_staged`, two steps)
> is referred to by those API verbs — never by phase letters — so a
> reader of `recovery.rs`, `failpoints.rs`, or this document only
> encounters phase letters in the per-writer context.
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
A failure between Phase A and Phase D leaves the sidecar on disk. The
next `Omnigraph::open` (gated on `OpenMode::ReadWrite` ) runs the
recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs` :
- For each sidecar in `__recovery/` , compare every named table's
Lance HEAD to the manifest pin. Classify per the all-or-nothing
decision tree (RolledPastExpected / NoMovement / UnexpectedAtP1 /
UnexpectedMultistep / InvariantViolation).
recovery: address PR #72 review findings
Bot reviewers (cubic, cursor, chatgpt-codex) caught 4 merge-blocking
bugs + 3 strongly-recommended fixes + 3 doc errors in the initial PR.
Each fix has a paired test demonstrating the bug before the fix.
Merge-blocking fixes:
- BranchMerge moved to loose-match classifier arm. publish_rewritten_
merge_table runs multiple commit_staged calls per table (merge_insert
+ delete_where + index rebuilds). Strict classification rolled back
valid completed Phase B work as UnexpectedMultistep. Three new unit
tests pin the loose-match behavior for BranchMerge.
- branch_merge sidecar uses self.active_branch() (the resolved target
branch) instead of inferring from the first sorted table key. The
previous heuristic could record None (== main) when the merge target
was a non-main branch, causing recovery to publish to the wrong
manifest namespace.
- Best-effort sidecar delete in all 5 writer sites (mutation, loader,
schema_apply, branch_merge, ensure_indices). Previously, a sidecar
cleanup failure after a successful manifest publish would error out
the user's call for a write that already landed. Now: log a warning
and ignore — the next open's recovery sweep tidies the stale sidecar
via NoMovement classification.
- ensure_indices sidecar scoped to tables that need work via new
helpers needs_index_work_node / needs_index_work_edge. Previously
the sidecar pinned every catalog table; if only one needed indexing,
the others classified as NoMovement and the all-or-nothing decision
rolled back legitimate index work.
Strongly-recommended fixes:
- recover_manifest_drift now takes &mut GraphCoordinator and refreshes
between sidecars. Sidecar B's classification needs to see sidecar
A's manifest changes, otherwise B can be classified against stale
pins and incorrectly roll back work that just landed.
- list_sidecars sorts URIs before reading. Sidecar filenames are
ULIDs (chronologically sortable), so this gives deterministic,
time-ordered processing. Filesystem-order was nondeterministic.
- ReadOnly opens skip recover_schema_state_files too (was: only the
MR-847 sweep was gated). Read-only consumers may run with read-only
credentials; silent open-time mutations violate the contract.
Doc cleanups:
- Removed stale "Phase 4 placeholder" comment from
recover_manifest_drift.
- docs/runs.md decision-tree wording now correctly surfaces the
InvariantViolation abort path.
- docs/branches-commits.md clarifies actor_id is in
_graph_commit_actors.lance (joined by graph_commit_id), not on
_graph_commits.lance itself.
Test surface (post-fixes):
- 25 unit tests in db::manifest::recovery (+4 from this commit).
- 10 integration tests in tests/recovery.rs (+3 from this commit).
- ~672 tests across ~25 binaries pass with --features failpoints.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:21:40 +02:00
- If any table is `InvariantViolation` (Lance HEAD < manifest pinned —
should be impossible), **abort** with a loud error and leave the
sidecar on disk for operator review.
- Otherwise, if every table is `RolledPastExpected` , **roll forward** :
a single `ManifestBatchPublisher::publish` call extends every pin
2026-05-05 16:04:48 +02:00
atomically. `SchemaApply` sidecars are eligible only when schema-state
recovery promoted the matching staging files in the same recovery pass;
otherwise full open-time recovery rolls them back and refresh-time
recovery leaves them for the next read-write open.
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
- Otherwise **roll back** : per-table `Dataset::restore` to the
fix: optimize publishes compaction; recovery roll-back converges manifest (#141)
* test(optimize): cover manifest publish + HEAD-drift reconcile
Red against the pre-fix optimize, which ran compact_files without
publishing the compacted version to __manifest:
- maintenance: optimize must publish so the manifest table_version
tracks the compacted Lance HEAD and a later schema apply succeeds;
and must reconcile a pre-existing manifest-behind-HEAD drift (forged
via raw Lance compaction) so strict writes commit again.
- end_to_end + composite_flow: post-optimize query / strict update /
reopen in the full lifecycle (the canonical flow previously omitted
post-optimize writes as a documented "known limitation").
- failpoints: a crash between compaction and the manifest publish rolls
forward on next open.
* fix(optimize): publish compaction to manifest and reconcile HEAD drift
optimize ran Lance compact_files without publishing the new version to
__manifest, so the manifest table_version lagged the Lance HEAD: reads
stayed pinned to the pre-compaction version, and the next schema apply or
strict update/delete failed its HEAD-vs-manifest precondition with
"stale view ... refresh and retry" (open-time recovery rollback inflated
the gap on retry).
optimize now publishes each compacted table's version under the
per-(table, main) write queue, guarded by a manifest CAS and a
SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe
because compaction is content-preserving). When a table has nothing left
to compact but its Lance HEAD is already ahead of the manifest pin
(pre-fix drift, or a recovery restore commit), optimize reconciles the
manifest forward to HEAD (metadata-only, no sidecar). Caches and the
CSR/CSC graph index are invalidated after a publish.
Docs updated (maintenance, storage, branches-commits, writes, testing).
* test(recovery): rollback convergence + optimize-defer regressions
Red against the current code, landed before the fix:
- recovery: after the open-time sweep rolls a sidecar back, the manifest
must track Lance HEAD (no residual drift) so a follow-up schema apply
succeeds — the original "+1 per retry" loop. Today roll-back restores
without publishing, so the manifest lags HEAD and the apply fails its
HEAD-vs-manifest precondition.
- maintenance: optimize must refuse while a recovery sidecar is pending —
operating on an unrecovered graph could publish a partial write the
sweep would roll back.
Also removes optimize_reconciles_preexisting_manifest_head_drift: the
ad-hoc drift reconcile it covered is replaced by recovery-side convergence.
* fix(recovery): converge manifest on roll-back; optimize defers on pending recovery
Root of PR #141's review findings and the original "+1 per retry" loop:
a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving
drift vs. a partial write a sidecar will roll back), and optimize's reconcile
guessed it benign. Close the class instead of guessing:
- Recovery roll-back now PUBLISHES the restored version (via a
push_table_update_at_head helper shared with roll-forward), so the manifest
tracks the Lance HEAD after recovery — symmetric with roll-forward. This
fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest
precondition passes) and removes the only remaining source of orphaned
drift. The audit still records the logical rolled-back-to version; the
manifest is published at the restore commit (identical content).
- optimize drops the ad-hoc drift reconcile and instead REFUSES when a
__recovery sidecar is pending, so it only ever operates on a recovered
graph (manifest == HEAD); its compaction publish can no longer commit a
partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is
moot.
Updates the rollback recovery-test helper (manifest == HEAD after roll-back),
the failpoints assertions, and the user/dev docs.
* test(recovery): fix rollback assertion for manifest convergence
The roll-back-publishes change makes the manifest version advance after a
SchemaApply roll-back (to the old-schema content), so the
schema_apply_without_schema_staging_rolls_back_on_next_open assertion must
be `version > pre`, not `version == pre`. This update was dropped during
the commit churn and surfaced as a CI Test Workspace failure; the
old-schema-preserved intent stays covered by count_rows + _schema.pg + the
RolledBack convergence invariant.
2026-06-08 01:50:12 +02:00
manifest-pinned table version, then a single `ManifestBatchPublisher::publish`
of the restored HEAD — symmetric with roll-forward, so `manifest == HEAD`
after recovery (no residual drift). This convergence is what lets a
failed-then-retried schema apply succeed instead of failing one version higher
each iteration. The audit row's `to_version` records the logical
rolled-back-to version (`manifest_pinned` ); the manifest is published at the
restore commit (`manifest_pinned + 1` , same content).
recovery: address PR #72 review findings
Bot reviewers (cubic, cursor, chatgpt-codex) caught 4 merge-blocking
bugs + 3 strongly-recommended fixes + 3 doc errors in the initial PR.
Each fix has a paired test demonstrating the bug before the fix.
Merge-blocking fixes:
- BranchMerge moved to loose-match classifier arm. publish_rewritten_
merge_table runs multiple commit_staged calls per table (merge_insert
+ delete_where + index rebuilds). Strict classification rolled back
valid completed Phase B work as UnexpectedMultistep. Three new unit
tests pin the loose-match behavior for BranchMerge.
- branch_merge sidecar uses self.active_branch() (the resolved target
branch) instead of inferring from the first sorted table key. The
previous heuristic could record None (== main) when the merge target
was a non-main branch, causing recovery to publish to the wrong
manifest namespace.
- Best-effort sidecar delete in all 5 writer sites (mutation, loader,
schema_apply, branch_merge, ensure_indices). Previously, a sidecar
cleanup failure after a successful manifest publish would error out
the user's call for a write that already landed. Now: log a warning
and ignore — the next open's recovery sweep tidies the stale sidecar
via NoMovement classification.
- ensure_indices sidecar scoped to tables that need work via new
helpers needs_index_work_node / needs_index_work_edge. Previously
the sidecar pinned every catalog table; if only one needed indexing,
the others classified as NoMovement and the all-or-nothing decision
rolled back legitimate index work.
Strongly-recommended fixes:
- recover_manifest_drift now takes &mut GraphCoordinator and refreshes
between sidecars. Sidecar B's classification needs to see sidecar
A's manifest changes, otherwise B can be classified against stale
pins and incorrectly roll back work that just landed.
- list_sidecars sorts URIs before reading. Sidecar filenames are
ULIDs (chronologically sortable), so this gives deterministic,
time-ordered processing. Filesystem-order was nondeterministic.
- ReadOnly opens skip recover_schema_state_files too (was: only the
MR-847 sweep was gated). Read-only consumers may run with read-only
credentials; silent open-time mutations violate the contract.
Doc cleanups:
- Removed stale "Phase 4 placeholder" comment from
recover_manifest_drift.
- docs/runs.md decision-tree wording now correctly surfaces the
InvariantViolation abort path.
- docs/branches-commits.md clarifies actor_id is in
_graph_commit_actors.lance (joined by graph_commit_id), not on
_graph_commits.lance itself.
Test surface (post-fixes):
- 25 unit tests in db::manifest::recovery (+4 from this commit).
- 10 integration tests in tests/recovery.rs (+3 from this commit).
- ~672 tests across ~25 binaries pass with --features failpoints.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:21:40 +02:00
- After a successful roll-forward or roll-back, an audit row is
recorded — `_graph_commits.lance` carries
recovery: document MR-847 ship across all reference docs (Phase 10)
Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 00:46:03 +02:00
a commit tagged `actor_id = "omnigraph:recovery"` , and a sibling
`_graph_commit_recoveries.lance` row carries `recovery_kind` ,
`recovery_for_actor` (the original sidecar's actor), `operation_id` ,
per-table outcomes. Operators run `omnigraph commit list --filter
actor=omnigraph:recovery` to find recoveries.
- Sidecar deleted as the final step.
Triggers for the residual: transient Lance write errors during finalize
(object-store retry budget exhaustion, disk full); persistent publisher
contention exceeding `PUBLISHER_RETRY_BUDGET = 5` retries.
2026-05-04 00:15:42 +02:00
**Long-running servers**: `Omnigraph::refresh` runs roll-forward-only
recovery in-process — the common Phase B → Phase C residual closes
without a restart. The next mutation on the same handle (after refresh)
no longer surfaces `ExpectedVersionMismatch` for the failed table.
Sidecars that would require a `Dataset::restore` (mixed / unexpected
state) are deferred to the next `OpenMode::ReadWrite` open: restore is
unsafe under concurrency because Lance's `check_restore_txn` accepts
the restore against in-flight Append/Update/Delete commits and
silently orphans them (pinned by
`tests/staged_writes.rs::lance_restore_loses_to_concurrent_append_via_orphaning` ).
Continuous in-process recovery for the rollback path is the goal of a
future background reconciler with per-(table, branch) writer-queue
acquisition.
2026-05-01 13:47:55 +02:00
The publisher-CAS contract is unchanged: a *concurrent writer* that
advances any of our touched tables between snapshot capture and
publisher commit produces exactly one winner. The residual above is
about *our* abandoned commits in the failure path, not about
concurrency races.
2026-04-30 08:52:50 +02:00
## Conflict shape
Refactor AGENTS.md from encyclopedia to map; move spec into docs/
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 23:31:08 +02:00
2026-04-30 08:52:50 +02:00
Concurrent writers to the same `(table, branch)` produce exactly one
success and one failure. The losing writer's error is
`OmniError::Manifest` with kind `Conflict` and details
`ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected,
actual }`. The HTTP server maps this to **409 Conflict** with body
`{"error": "...", "code": "conflict", "manifest_conflict": { "table_key":
2026-05-15 03:45:22 +03:00
"...", "expected": N, "actual": M }}` — see [docs/user/server.md ](../user/server.md ).
2026-04-30 08:52:50 +02:00
## Audit
`actor_id` lands in `_graph_commits.lance` via `record_graph_commit` (no
intermediate run record). Audit history is queried via `omnigraph commit
list`.
## Migration code
feat(engine): sweep & remove legacy __run__ branch guard (MR-770) (#132)
* feat(engine): sweep legacy __run__ branches via v2→v3 manifest migration
Pre-v0.4.0 graphs can carry stale `__run__<id>` staging branches on the
`__manifest` dataset, left by the Run state machine removed in MR-771. Lance's
`list_branches` still enumerates them, so they leak into `branch_list()` and
count as blocking branches at schema-apply time.
Add a one-time `migrate_v2_to_v3` arm to the internal-schema dispatcher: on the
first read-write open it enumerates `__manifest` branches, deletes every
`__run__*` ref, and bumps the stamp to 3. Idempotent under retry (re-enumerates
fresh each run). The `"__run__"` prefix is inlined so the migration does not
depend on the run_registry guard that MR-770 removes next.
This is the prerequisite sweep; the guard removal follows in the next commit.
* refactor(engine): remove the legacy __run__ branch guard (MR-770)
With the v2→v3 migration sweeping stale `__run__*` branches off `__manifest`
on first read-write open, the defense-in-depth `is_internal_run_branch` guard
is no longer needed.
- delete `db/run_registry.rs`; drop the module + re-export from `db/mod.rs`
- collapse `is_internal_system_branch` to the schema-apply-lock check only
- `ensure_public_branch_ref`: drop the run-ref rejection; `__run__*` is now an
ordinary branch name
- `branch_merge`: reject `is_internal_system_branch` (was run-only) so the
schema-apply lock is rejected consistently with create/delete — a small,
deliberate tightening
- update the inline schema-apply test + the writes integration tests
(`public_branch_apis_reject_internal_run_refs` →
`public_branch_apis_reject_internal_system_refs`, which also asserts
`__run__*` now creates successfully)
- docs: flip the "pending production sweep / defense-in-depth" notes to
"auto-swept by the v2→v3 migration"; document the read-only-open limitation
Known residual: the inert `_graph_runs.lance` / `_graph_run_actors.lance` bytes
remain until a `StorageAdapter::delete_prefix` primitive lands.
* fix(engine): run __run__ sweep at Omnigraph::open, not only on publish
Review (PR #132) caught a regression: removing __run__ from
`is_internal_system_branch` exposed legacy `__run__*` branches to the
schema-apply blocking-branch checks (schema_apply.rs:104 and :778) and to
`branch_list()`, but the v2→v3 sweep ran only inside the publisher's
`load_publish_state`. On a pre-v0.4.0 graph whose first write is a schema
apply, the blocking-branch check fires before any publish, so apply failed
with "found non-main branches: __run__…". The same lazy timing also created a
reverse hazard: a user-created `__run__*` branch on a still-v2 graph could be
deleted by the first publish's sweep.
Fix: run the internal-schema migration in `Omnigraph::open(ReadWrite)` (new
`manifest::migrate_on_open`), before the coordinator reads branch state. The
sweep now lands before any branch-observing code, and a graph is stamped v3 at
open — so the one-time sweep can never catch a legitimately-created branch.
Both checks and `branch_list` see the swept graph; correct by construction for
every write path.
Accepted residual: a read-only open of an unmigrated legacy graph still lists
`__run__*` (read-only opens must not write, so they can't sweep). Documented.
Regression test `legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply`
confirmed RED before the fix (panicked on the branch_list leak assertion) and
GREEN after. Also updates the stale schema_apply.rs comment, the writes.md
"Migration code" section, and adds the v3 row to storage.md's migration table.
* test(engine): sweep multiple legacy __run__ branches; doc nit
Strengthen the v2→v3 migration test to synthesize three `__run__*` branches
(a real legacy graph accumulates one per run) so the migration's delete loop
is exercised on a single reused dataset handle, not just a single branch.
Confirms multi-branch deletion is safe.
Also drop a stale "active runs" reference from the branch_delete doc line.
* fix(engine): force-delete in __run__ sweep for concurrency safety
`migrate_v2_to_v3` ran `Dataset::delete_branch` (= `branches().delete(.., false)`),
which errors "BranchContents not found" if the branch is already gone. Since the
sweep now runs in `Omnigraph::open(ReadWrite)`, two processes opening the same
legacy v2 graph concurrently would race: one wins each delete, the other's open
fails. The migration only claimed idempotency under *sequential* retry.
Switch to `Dataset::force_delete_branch` (= `delete(.., true)`), Lance's
documented path for cleaning up zombie branches, which tolerates an
already-absent branch. The sweep is now idempotent under concurrent runners and
robust to partial/zombie state. Found in self-review; no behavior change for the
common single-open path.
* docs(release): note MR-770 __run__ cleanup in v0.6.1
* docs(branches): reconcile branch cleanup semantics
2026-06-07 17:33:14 +02:00
`db/manifest/migrations.rs` carries the v2→v3 internal-schema step (MR-770):
a one-time sweep that deletes legacy `__run__*` staging branches off
`__manifest` . It runs in `Omnigraph::open(ReadWrite)` (via
`manifest::migrate_on_open` , before the coordinator reads branch state) and
again on the publisher's write path; both are idempotent once the stamp is at
v3. Deleting the inert `_graph_runs.lance` / `_graph_run_actors.lance` dataset
*bytes* is still deferred — it needs a `StorageAdapter::delete_prefix`
primitive — but those bytes are invisible to graph-level state.
2026-04-30 08:52:50 +02:00
2026-05-01 10:43:19 +02:00
## Mid-query partial failure: closed by MR-794
The pre-MR-794 design had a known limitation: a multi-statement `.gq`
mutation where op-N inline-committed a Lance fragment and op-N+1 then
failed left the touched table at `Lance HEAD = manifest_version + 1` ,
blocking the next mutation with `ExpectedVersionMismatch` .
MR-794 (step 1 + step 2+) closed this for inserts/updates **by
construction at the writer layer**: insert and update batches accumulate
in memory; no Lance HEAD advance happens during op execution; one
`stage_*` + `commit_staged` per touched table runs at end-of-query, and
only after every op succeeded. A failed op leaves Lance HEAD untouched
on the staged tables, so the next mutation proceeds normally with no
drift to reconcile.
The cancellation case (future drop mid-mutation) inherits the same
guarantee — the in-memory accumulator evaporates with the dropped task
and no Lance write was ever issued.
For delete-touching mutations the legacy inline-commit shape is
preserved (Lance has no public two-phase delete in 4.0.0) — the same
narrow window remains. The parse-time D₂ rule prevents inserts/updates
from coexisting with deletes in one query, so a pure-delete failure
cannot drift any staged-table state. If a delete-only multi-table
mutation fails mid-cascade, the same workaround as before applies
(retry; rely on `omnigraph cleanup` once a later successful commit
moves HEAD past the orphan version). Closing this requires Lance to
expose `DeleteJob::execute_uncommitted` ; tracked in MR-793 and a
Lance-upstream ticket.