mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Ragnor Comerford e62d9166fb

fix: optimize publishes compaction; recovery roll-back converges manifest (#141 )

* test(optimize): cover manifest publish + HEAD-drift reconcile

Red against the pre-fix optimize, which ran compact_files without
publishing the compacted version to __manifest:

- maintenance: optimize must publish so the manifest table_version
  tracks the compacted Lance HEAD and a later schema apply succeeds;
  and must reconcile a pre-existing manifest-behind-HEAD drift (forged
  via raw Lance compaction) so strict writes commit again.
- end_to_end + composite_flow: post-optimize query / strict update /
  reopen in the full lifecycle (the canonical flow previously omitted
  post-optimize writes as a documented "known limitation").
- failpoints: a crash between compaction and the manifest publish rolls
  forward on next open.

* fix(optimize): publish compaction to manifest and reconcile HEAD drift

optimize ran Lance compact_files without publishing the new version to
__manifest, so the manifest table_version lagged the Lance HEAD: reads
stayed pinned to the pre-compaction version, and the next schema apply or
strict update/delete failed its HEAD-vs-manifest precondition with
"stale view ... refresh and retry" (open-time recovery rollback inflated
the gap on retry).

optimize now publishes each compacted table's version under the
per-(table, main) write queue, guarded by a manifest CAS and a
SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe
because compaction is content-preserving). When a table has nothing left
to compact but its Lance HEAD is already ahead of the manifest pin
(pre-fix drift, or a recovery restore commit), optimize reconciles the
manifest forward to HEAD (metadata-only, no sidecar). Caches and the
CSR/CSC graph index are invalidated after a publish.

Docs updated (maintenance, storage, branches-commits, writes, testing).

* test(recovery): rollback convergence + optimize-defer regressions

Red against the current code, landed before the fix:
- recovery: after the open-time sweep rolls a sidecar back, the manifest
  must track Lance HEAD (no residual drift) so a follow-up schema apply
  succeeds — the original "+1 per retry" loop. Today roll-back restores
  without publishing, so the manifest lags HEAD and the apply fails its
  HEAD-vs-manifest precondition.
- maintenance: optimize must refuse while a recovery sidecar is pending —
  operating on an unrecovered graph could publish a partial write the
  sweep would roll back.

Also removes optimize_reconciles_preexisting_manifest_head_drift: the
ad-hoc drift reconcile it covered is replaced by recovery-side convergence.

* fix(recovery): converge manifest on roll-back; optimize defers on pending recovery

Root of PR #141's review findings and the original "+1 per retry" loop:
a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving
drift vs. a partial write a sidecar will roll back), and optimize's reconcile
guessed it benign. Close the class instead of guessing:

- Recovery roll-back now PUBLISHES the restored version (via a
  push_table_update_at_head helper shared with roll-forward), so the manifest
  tracks the Lance HEAD after recovery — symmetric with roll-forward. This
  fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest
  precondition passes) and removes the only remaining source of orphaned
  drift. The audit still records the logical rolled-back-to version; the
  manifest is published at the restore commit (identical content).
- optimize drops the ad-hoc drift reconcile and instead REFUSES when a
  __recovery sidecar is pending, so it only ever operates on a recovered
  graph (manifest == HEAD); its compaction publish can no longer commit a
  partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is
  moot.

Updates the rollback recovery-test helper (manifest == HEAD after roll-back),
the failpoints assertions, and the user/dev docs.

* test(recovery): fix rollback assertion for manifest convergence

The roll-back-publishes change makes the manifest version advance after a
SchemaApply roll-back (to the old-schema content), so the
schema_apply_without_schema_staging_rolls_back_on_next_open assertion must
be `version > pre`, not `version == pre`. This update was dropped during
the commit churn and surfaced as a CI Test Workspace failure; the
old-schema-preserved intent stays covered by count_rows + _schema.pg + the
RolledBack convergence invariant.

2026-06-08 02:50:12 +03:00

5.7 KiB

Raw Blame History

Branches, Commits, Snapshots

L1 — Lance per-dataset branches

Lance supports branching at the dataset level: a branch is a named lineage of versions, and fork_branch_from_state(source_branch, target_branch, source_version) creates a copy-on-write fork.

L2 — Graph-level branches

OmniGraph builds graph branches on top by branching every sub-table coherently:

branch_create(name) / branch_create_from(target, name) — disallowed name main; fails if branch exists; ensures the schema-apply lock is idle. Atomic and authority-first like branch_delete: it flips the __manifest branch (authority), then creates the derived commit-graph branch, force-dropping any orphaned commit-graph ref left by an incomplete prior delete (the manifest branch is fresh, so a same-named commit-graph branch is provably a zombie). If commit-graph creation fails, the manifest branch is rolled back so the name never half-exists.
branch_list() — returns public branches, filters the internal __schema_apply_lock__ branch.
branch_delete(name) — refuses if there are descendants on the branch, or if it is the current branch. The manifest is the single authority for branch existence: deletion flips the __manifest branch ref first (one atomic op), after which the branch is gone from every snapshot. The owned per-table forks and the commit-graph branch are derived state, reclaimed best-effort with force_delete_branch after the flip. A failure during that reclaim (transient object-store error) does not fail the call or block the authority flip; the leftover forks are unreachable orphans that the cleanup reconciler converges. One consequence: if a delete's best-effort reclaim fails, reusing that branch name before the next cleanup surfaces a clear error pointing at cleanup (the stale fork would otherwise collide on first write).
Lazy forking: a branch only forks a sub-table when that sub-table is first mutated on it. Pure-read branches share fragments with their source. A fork collision is classified by the manifest authority, not by Lance branch versions: if the live manifest already records the fork on the active branch, a concurrent first-write won and the caller gets a retryable "refresh and retry"; if the manifest does not, a physical branch there is an orphan and the caller is pointed at cleanup.
sync_branch(branch) — re-binds the in-memory handle to the latest head of the branch.

L2 — Commit graph (`db/commit_graph.rs`)

In-memory shape of a graph commit:

GraphCommit {
  graph_commit_id: ULID,
  manifest_branch: Option<String>,
  manifest_version: u64,
  parent_commit_id: Option<String>,
  merged_parent_commit_id: Option<String>,   // populated for merge commits
  actor_id: Option<String>,                  // joined in-memory from _graph_commit_actors.lance, NOT a column on _graph_commits.lance
  created_at: i64 (microseconds since epoch),
}

Storage is split across two Lance datasets (both with stable row IDs):

_graph_commits.lance — every column above except actor_id.
_graph_commit_actors.lance — optional separate (graph_commit_id, actor_id) map, created on demand. The actor_id field above is populated by joining this dataset in-memory at load time.

Notes:

Every successful publish (load / change / merge / schema_apply) appends one commit.
Merge commits have two parents; linear commits have one.
API: list_commits(branch), get_commit(id), head_commit_id_for_branch(branch).

L2 — Snapshots & time travel

snapshot() — current snapshot for the bound branch; cached.
snapshot_of(target) — snapshot at a ReadTarget (branch | snapshot id).
snapshot_at_version(v: u64) — historical snapshot from any manifest version.
entity_at(table_key, id, version) — single-entity time travel without building a full snapshot.
A Snapshot is a (version, HashMap<table_key, SubTableEntry>) — cheap to build, snapshot-isolated cross-table reads.

L2 — Internal system branches

Internal or legacy branch refs:

__schema_apply_lock__ — serializes schema migrations; filtered from branch_list() but visible to internals.
__run__<run-id> — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). These are swept off __manifest on the first read-write open by the v2→v3 internal-schema migration (MR-770), and __run__* is no longer a reserved name. Known limitation: a pre-v0.4.0 graph opened read-only still surfaces any stale __run__* branch in branch_list() until its first read-write open (the migration is write-path-only, like all manifest migrations).

L2 — Recovery audit trail

The five migrated writers (MutationStaging::finalize, schema_apply, branch_merge, ensure_indices, optimize_all_tables) protect their multi-table commits with a sidecar at __recovery/{ulid}.json written before Phase B and deleted after Phase C. The next Omnigraph::open (gated on OpenMode::ReadWrite) runs the recovery sweep in crates/omnigraph/src/db/manifest/recovery.rs: classify per-table state, decide all-or-nothing per sidecar, roll forward / back, record an audit row.

Audit rows live in _graph_commit_recoveries.lance (sibling to _graph_commits.lance) and reference the commit graph by graph_commit_id. The linked recovery commit is identified by that same graph_commit_id, and actor_id="omnigraph:recovery" is stored in _graph_commit_actors.lance (joined by graph_commit_id) — _graph_commits.lance itself does not carry the actor_id column. To find recoveries for a specific original actor: omnigraph commit list --filter actor=omnigraph:recovery, then join to _graph_commit_recoveries.lance by graph_commit_id to read recovery_for_actor. Schema: see crates/omnigraph/src/db/recovery_audit.rs.

5.7 KiB Raw Blame History