Update the doc surface to reflect MR-847 having shipped end to end —
sidecar protocol, classifier, all-or-nothing decision tree, roll-forward
via ManifestBatchPublisher, roll-back via Dataset::restore with
fragment-set short-circuit, audit trail in
_graph_commit_recoveries.lance, OpenMode::{ReadWrite, ReadOnly}, and
the four migrated writers all carrying sidecars across Phase B → Phase C.
- docs/invariants.md §VI.23: change from "upheld at the writer-trait
surface for inserts/updates/etc., per-table commit_staged → manifest
publish window remains" to "upheld at the writer-trait surface AND
across process boundaries". The MR-847 sweep closes the residual on
the next Omnigraph::open. The "continuous in-process" property
(no ExpectedVersionMismatch surfacing to subsequent writers between
Phase B failure and process restart) is honest follow-up at MR-856.
- docs/runs.md: replace "Finalize → publisher residual" section with
"Open-time recovery sweep (MR-847)" — describes the sidecar protocol
lifecycle (Phases A-D), the sweep's classifier + decision dispatch,
the audit trail, and the operator-facing query
(omnigraph commit list --filter actor=omnigraph:recovery).
- AGENTS.md capability matrix "Atomic single-dataset commits" row:
drop the "Layer (3) is not yet shipped — tracked in MR-847" caveat;
describe the three layers as all shipping; reference MR-856 for the
background-reconciler follow-up.
- docs/storage.md: add _graph_commit_recoveries.lance and
__recovery/{ulid}.json to the on-disk layout (mermaid + prose).
- docs/branches-commits.md: new "Recovery audit trail (MR-847)"
subsection describing the join from
_graph_commits.lance:actor_id="omnigraph:recovery" to
_graph_commit_recoveries.lance:graph_commit_id for operator
post-mortem.
- docs/maintenance.md: note the MR-847 recovery floor on cleanup —
--keep < 3 may garbage-collect Lance versions the recovery sweep
needs as a rollback target. Default --keep 10 is safe.
- docs/testing.md: add tests/recovery.rs to the engine integration-test
table; expand the failpoints.rs row to mention the four MR-847
per-writer Phase B → recovery integration tests.
- .context/mr-847-design.md: prepend a "Status: DONE" stanza listing
every commit hash + scope across phases 1-10.
AGENTS.md ↔ docs/ cross-link check passes (26 links, 26 docs).
Full workspace test sweep passes with --features failpoints (361 tests
across 20 binaries).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 KiB
Branches, Commits, Snapshots
L1 — Lance per-dataset branches
Lance supports branching at the dataset level: a branch is a named lineage of versions, and fork_branch_from_state(source_branch, target_branch, source_version) creates a copy-on-write fork.
L2 — Graph-level branches
OmniGraph builds graph branches on top by branching every sub-table coherently:
branch_create(name)/branch_create_from(target, name)— disallowed namemain; fails if branch exists; ensures the schema-apply lock is idle.branch_list()— returns public branches, filters internal__run__…and__schema_apply_lock__prefixes.branch_delete(name)— refuses if there are descendants or active runs on the branch; cleans up owned per-branch fragments.- Lazy forking: a branch only forks a sub-table when that sub-table is first mutated on it. Pure-read branches share fragments with their source.
sync_branch(branch)— re-binds the in-memory handle to the latest head of the branch.
L2 — Commit graph (db/commit_graph.rs)
In-memory shape of a graph commit:
GraphCommit {
graph_commit_id: ULID,
manifest_branch: Option<String>,
manifest_version: u64,
parent_commit_id: Option<String>,
merged_parent_commit_id: Option<String>, // populated for merge commits
actor_id: Option<String>, // joined in-memory from _graph_commit_actors.lance, NOT a column on _graph_commits.lance
created_at: i64 (microseconds since epoch),
}
Storage is split across two Lance datasets (both with stable row IDs):
_graph_commits.lance— every column above exceptactor_id._graph_commit_actors.lance— optional separate(graph_commit_id, actor_id)map, created on demand. Theactor_idfield above is populated by joining this dataset in-memory at load time.
Notes:
- Every successful publish (load / change / merge / schema_apply) appends one commit.
- Merge commits have two parents; linear commits have one.
- API:
list_commits(branch),get_commit(id),head_commit_id_for_branch(branch).
L2 — Snapshots & time travel
snapshot()— current snapshot for the bound branch; cached.snapshot_of(target)— snapshot at aReadTarget(branch | snapshot id).snapshot_at_version(v: u64)— historical snapshot from any manifest version.entity_at(table_key, id, version)— single-entity time travel without building a full snapshot.- A
Snapshotis a(version, HashMap<table_key, SubTableEntry>)— cheap to build, snapshot-isolated cross-table reads.
L2 — Internal system branches
Filtered from branch_list() but visible to internals:
__schema_apply_lock__— serializes schema migrations.__run__<run-id>— legacy from the pre-v0.4.0 Run state machine (removed in MR-771). The branch-name guard predicateis_internal_run_branchis kept as defense-in-depth so users cannot create a branch matching the legacy prefix; the filter will be removed once production legacy branches are swept (MR-770).
L2 — Recovery audit trail (MR-847)
The four migrated writers (MutationStaging::finalize, schema_apply, branch_merge, ensure_indices) protect their multi-table commits with a sidecar at __recovery/{ulid}.json written before Phase B and deleted after Phase C. The next Omnigraph::open (gated on OpenMode::ReadWrite) runs the recovery sweep in crates/omnigraph/src/db/manifest/recovery.rs: classify per-table state, decide all-or-nothing per sidecar, roll forward / back, record an audit row.
Audit rows live in _graph_commit_recoveries.lance (sibling to _graph_commits.lance) and reference the commit graph by graph_commit_id. The linked _graph_commits.lance row carries actor_id="omnigraph:recovery" (the system actor). To find recoveries for a specific original actor: omnigraph commit list --filter actor=omnigraph:recovery, then join to _graph_commit_recoveries.lance by graph_commit_id to read recovery_for_actor. Schema: see crates/omnigraph/src/db/recovery_audit.rs.