mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-27 02:39:38 +02:00
feat(engine): graph lineage in __manifest — single-source fold, v3→v4 migration, schema-version floor (#299)
Some checks are pending
CI / Classify Changes (push) Waiting to run
CI / Check AGENTS.md Links (push) Waiting to run
CI / Container Entrypoint (push) Waiting to run
CI / Test Workspace (push) Blocked by required conditions
CI / Test omnigraph-server --features aws (push) Blocked by required conditions
CI / RustFS S3 Integration (push) Blocked by required conditions
Release Edge / Prepare edge release (push) Waiting to run
Release Edge / Build edge omnigraph-linux-x86_64 (push) Blocked by required conditions
Release Edge / Build edge omnigraph-macos-arm64 (push) Blocked by required conditions
Release Edge / Build edge omnigraph-windows-x86_64 (push) Blocked by required conditions
Release Edge / Smoke Windows installer (push) Blocked by required conditions
Some checks are pending
CI / Classify Changes (push) Waiting to run
CI / Check AGENTS.md Links (push) Waiting to run
CI / Container Entrypoint (push) Waiting to run
CI / Test Workspace (push) Blocked by required conditions
CI / Test omnigraph-server --features aws (push) Blocked by required conditions
CI / RustFS S3 Integration (push) Blocked by required conditions
Release Edge / Prepare edge release (push) Waiting to run
Release Edge / Build edge omnigraph-linux-x86_64 (push) Blocked by required conditions
Release Edge / Build edge omnigraph-macos-arm64 (push) Blocked by required conditions
Release Edge / Build edge omnigraph-windows-x86_64 (push) Blocked by required conditions
Release Edge / Smoke Windows installer (push) Blocked by required conditions
* docs(rfc-013): bank the #295 spec-review comments as step-5 constraints (§5.1) 3b shipped a minimal WriteTxn{branch,base} and deferred the full §4.1 opener unification (pinned-base opener, shared Session, write-local handle cache, strict-op conflict-timing move) to step 5. The greptile comments on the #295 spec were moot for #298 (none of those constructs were built) but are load-bearing for step 5: (1) the handle cache must be Send+Sync (Mutex, not RefCell); (2) the strict-op timing move needs an explicit retry contract — txn discarded after any commit, retry re-opens a fresh base — which is the SAME contract as the stale-view false-fail (§1d.2); (3) the opener-equivalence test must advance HEAD externally then assert pinned-base, not the trivial HEAD==base. * feat(engine): fold graph lineage into the __manifest publish CAS (RFC-013 Phase 7) Graph lineage no longer lives in a second write to _graph_commits.lance. Each commit's graph_commit + graph_head:<branch> rows now ride the SAME __manifest merge-insert as the table-version rows (one atomic version), and CommitGraph reads its cache from the manifest projection (read_graph_lineage). _graph_commits.lance is no longer written commit rows (it remains only as a Lance branch-ref carrier). Mechanism: a LineageIntent { graph_commit_id (ULID, minted once), branch, actor, merged_parent, created_at } threads through ManifestBatchPublisher::publish. Inside the publisher retry loop the parent is resolved per attempt from the just-loaded branch-scoped manifest (the should_replace_head winner over the visible graph_commit rows — branch-correct by Lance branch isolation; the graph_head row is written for forward-compat + the §7.1 contention point but is not the parent source, so a freshly-forked branch resolves the right fork-point parent). A CAS-conflict retry re-reads the advanced head → correct new parent; the commit_id is stable across retries. Closes two known gaps BY CONSTRUCTION (one write, no second step to fail/ race): - manifest→commit-graph atomicity (no crash window between manifest + lineage), - commit-graph parent under concurrency (no refresh→append TOCTOU; the per-write commit_graph.refresh() is gone). Recovery, branch-merge, and genesis route their lineage through the same CAS (merge: one commit_merge_with_actor; recovery: publish_recovery_commit folds the recovery commit, actor=omnigraph:recovery; genesis rides the init __manifest write). The dead _graph_commits write helpers (append_commit/_merge/_actor) are #[allow(dead_code)] (the actor sidecar table is still enumerated by optimize). Verified (sequential): build clean; the new lineage_projection gate (manifest-only — _graph_commits/_actors have 0 rows; full lineage reconstructs via the projection); branching/merge_truth_table (exhaustive, branch-aware)/composite_flow/point_in_time/ changes/consistency/recovery; failpoints (59, incl. recovery lifecycle + the now-closed atomicity gap); full --workspace. Cost tests REVERT to their pre-fold values (writes +1, write_cost ceiling 80) — the proof of true single-CAS (no extra write). invariants.md marks both gaps CLOSED. PENDING (next stages, this PR): the §7.1 concurrent graph_head one-winner gate (stage 5 — two concurrent same-branch commits, exactly one wins); the stamp bump v4 + migrate_v3_to_v4 backfill + read-only refuse for EXISTING graphs (stage 4); full doc-sync of storage.md/architecture.md/writes.md. * feat(engine): migrate existing v3 graphs to manifest lineage (RFC-013 Phase 7 stage 4) The Phase-7 fold made CommitGraph read lineage from the __manifest projection, so a pre-Phase-7 (internal-schema v3) graph — lineage in _graph_commits.lance, none in __manifest — would read an empty commit DAG. Stage 4 makes existing graphs upgrade seamlessly and not break reads. - Stamp 3 -> 4 + migrate_v3_to_v4: bumps INTERNAL_MANIFEST_SCHEMA_VERSION and adds the 3 => migrate_v3_to_v4 arm. The migration reads this branch's _graph_commits/_actors, emits one graph_commit row per commit + exactly one graph_head:<branch> for the head (should_replace_head winner, deterministic id-sort — no hash-map-order in migration output), merge-inserts into __manifest, then set_stamp(4) LAST. Idempotency guard first (read_graph_lineage non-empty -> just stamp); crash before set_stamp re-enters at v3 and the guard completes it. Does NOT touch the unenforced-PK metadata. Runs per branch: migrate_on_open backfills main; load_publish_state backfills each branch on its first write (root_uri/branch threaded through migrate_internal_schema). - v3-read fallback: CommitGraph version-gates the lineage source — stamp < 4 reads the (re-activated) _graph_commits.lance; >= 4 uses the manifest projection. So a READ-ONLY open of an un-migrated graph reads correct history with no write. Correctness catch: the legacy _graph_commit_actors.lance was never branched, so the fallback reads it FLAT (no branch checkout) while checking out the branch only on the commits dataset. - Read-only stamp-refuse: a ReadOnly open of a FUTURE-stamped graph now refuses with the same upgrade error (future-proofing the next format bump; the write path already refused via migrate_internal_schema). - Docs: storage/architecture/writes/invariants/constants updated to manifest-stored lineage; release note docs/releases/v0.8.0.md (format v4, old writers clean-break, data preserved, upgrade writers first). 6 new tests (v3 backfill, idempotent, v3 read-only fallback, future-stamp refuse in both modes, crash-before-stamp completes, legacy branch+flat-actor read). Full engine suite + failpoints (59) + cargo test --workspace --locked green; check-agents-md passes. * test(engine): graph_head concurrency gate — disjoint same-branch writers form a linear commit DAG (RFC-013 Phase 7) Two (or N) writers committing disjoint tables on one branch still share the mutable `graph_head:<branch>` manifest row, so the only row-level CAS contention is that row. The contract — exactly one writer wins each CAS round; the loser retries inside the publisher, re-resolves its parent off the freshly-advanced head, and re-commits, so every writer lands and the graph_commit DAG stays a single LINEAR chain (no fork) — had no acceptance test. This adds it. - concurrent_disjoint_writes_share_head_and_form_linear_chain: two disjoint writers + distinct LineageIntent, tokio::join!; both commit; the on-disk DAG is genesis -> c -> c' (asserted linear: exactly one genesis, no two commits share a parent, the head is the unique non-parent). - n_concurrent_disjoint_writers_converge_to_one_linear_chain: N=8 disjoint writers each with an app-level retry loop (the publisher's internal budget can be exhausted under contention); all converge to one linear chain of 8. - concurrent_disjoint_writes_form_linear_chain_on_s3: the same race on a real object store (true conditional-put CAS), bucket-gated. Cites both tests from the §7.1 contention note in invariants.md. Test-only; no production change. * perf(engine): fold the lineage parent scan into the publish path's single __manifest scan (RFC-013 P2) Each lineage publish scanned `__manifest` twice: `load_publish_state` read table state via one scan, then `resolve_lineage_rows` did a second full `read_graph_lineage` scan only to find the parent commit. Fold the `graph_commit` extraction into the existing scan. - `read_manifest_scan` gains a `collect_lineage` flag. The publish path (`read_publish_scan`) collects the `graph_commit` rows in the same pass; the table-state hot path leaves them in the forward-compat skip arm, so it never pays the O(commits) lineage JSON decode (it also skips reading the `object_id` column entirely). One shared `decode_graph_commit_row` serves both the folded path and the standalone `read_graph_lineage`, so the two cannot drift. - `resolve_lineage_rows` is now sync and takes the already-parsed rows; the per-attempt re-read is preserved because `load_publish_state` runs once per CAS attempt, so a retry still re-parents off the advanced head. - `load_publish_state` returns a named `LoadedPublishState` instead of a four-tuple; the thin `read_registered_table_locations` / `read_tombstone_versions` accessors fold away. `read_manifest_entries` becomes `#[cfg(test)]`: the fold removes its last production caller, leaving only the test-only namespace module (`db/manifest.rs`: `#[cfg(test)] mod namespace`), so gating it keeps it from becoming dead code in non-test builds. Measured at depth ~5: per-write `__manifest` reads drop 44 -> 26 (total reads 54 -> 36). write_cost.rs gains a `manifest_reads <= 34` sub-ceiling that trips if a publish-path scan is re-added, and its calibration comment is corrected. * test(engine): red — transient legacy-open failure silently completes the v3→v4 migration A pre-Phase-7 (internal schema v3) graph keeps its graph lineage in `_graph_commits.lance`; the v3→v4 internal-schema migration backfills it into `__manifest` and stamps v4. `read_legacy_commit_cache` currently maps EVERY `Dataset::open` error to "no legacy data" (`Err(_) => empty`), so a transient or corrupt open during the one-time migration backfills nothing and still stamps v4 — orphaning the real lineage permanently (the migration runs once; the v3 fallback is then disabled). Add a `migration.v3_to_v4.legacy_open` failpoint that injects a non-not-found Lance error at the legacy open, and a fault-injection regression test in the `failpoints` binary. Against the current swallow the migration completes anyway, so the test fails on its "migration must abort" assertion — the predicted symptom. The fix follows in the next commit. Test support reachable from the `failpoints` integration binary (it compiles the crate without `cfg(test)`): the v3-fixture helpers and a stamp/row-count reader are gated `cfg(any(test, feature = "failpoints"))`, still excluded from release builds. Failpoint tests stay in the integration binary because the fail registry is process-global. * fix(engine): propagate non-not-found legacy-open errors in the v3→v4 migration `read_legacy_commit_cache` mapped EVERY `Dataset::open` error to an empty cache (`Err(_) => empty`) on both the legacy commits dataset and its actor sidecar. The v3→v4 internal-schema migration reads this once before stamping internal-schema v4; a transient or corrupt open therefore backfilled nothing and stamped v4 anyway, orphaning the graph's real lineage permanently (the migration runs once, and the stamp-gated v3 fallback is disabled at v4). This is the "no silent failures" deny-list violation, and realistic on object storage. Both opens now match the not-found variants — Lance maps an object-store NotFound to `DatasetNotFound` — as the benign "no legacy data" / "no authors" signal, and propagate anything else as a loud error. The two arms share the variant contract but carry different rationale (commits-absent is the legitimate empty signal; actor-sidecar-absent is benign, but a corrupt actor open silently wiping authorship before stamping v4 is the same loss hole), commented at each site. Pinned by the `lance_surface_guards.rs::dataset_open_missing_returns_not_found_variant` guard (turns red if a Lance bump changes the absence variant) and greens the fault-injection regression test from the previous commit. * test(engine): cover the per-branch v3→v4 migration against a real Lance branch `seed_legacy_v3_lineage` writes every commit (including the "feature"-tagged one) to MAIN's `_graph_commits.lance` with `manifest_branch` as a mere field, so the production per-branch migration path — `read_legacy_commit_cache` checking out a real Lance branch, and a branch-scoped `__manifest` — was never exercised. Add `seed_legacy_v3_lineage_with_branch`, which forks a real `feature` Lance branch on BOTH `_graph_commits.lance` and `__manifest` (the branch inherits main's stripped v3 state), and a test that migrates the BRANCH and asserts the branch's lineage lands in the BRANCH's `__manifest` (genesis + A + branch commit, `graph_head:feature` → branch commit, parents + actors intact) with main's `__manifest` untouched. This empirically resolves the open question behind the merge robustness work: the fast-path `read_graph_lineage(dataset)` has no `manifest_branch` filter, but `__manifest` is Lance-branched per graph-branch, so a branch reads only its own lineage — the test confirms migrating one branch does not leak into another. No branch filter is needed. * refactor(engine): type the lineage-backfill merge conflict via the publisher classifier `state::merge_lineage_rows` (the v3→v4 lineage backfill's standalone `__manifest` merge-insert) stringified its `execute_reader` error, discarding the Lance variant. Route it through the publisher's `map_lance_publish_error` (now `pub(crate)`) so a concurrent first-open's row-level CAS loss surfaces as the SAME typed `OmniError::Manifest{ details: RowLevelCasContention }` the publisher's own retry consumes — one vocabulary, no raw-Lance matching in the migration. Deliberately NOT unified with `optimize::is_retryable_lance_conflict`: that classifier also matches `CommitConflict`/`RetryableCommitConflict` from the compaction commit path, which a row-level merge-insert never emits. Cross-linked with a comment at both sites. Behavior-preserving: the only path that changes is the error TYPE on a CAS loss (previously an opaque `Lance` string, now a typed conflict); no success/failure outcome changes. The bounded re-open retry that consumes the new type lands next. * test(engine): red — concurrent v3→v4 migrations error instead of converging `migrate_v2_to_v3` is concurrent-runner idempotent by design; v3→v4 regressed it. `merge_lineage_rows` uses `conflict_retries(0)` and `migrate_v3_to_v4` has no app-level retry, so when two processes open the same legacy graph at once the backfill's row-level CAS loser errors the whole open instead of converging. The test opens two `__manifest` handles at the same pre-migration (v3, empty-lineage) HEAD and runs both `migrate_internal_schema` calls under `tokio::join!`, forcing the `graph_head:main` CAS to fire every run. Against the current code the loser fails with `RowLevelCasContention` ("Attempted 0 retries.") — the predicted symptom — so the "both must converge" assertion panics. The bounded re-open retry that makes both converge lands next. * fix(engine): make the v3→v4 lineage backfill converge under concurrent runners `migrate_v2_to_v3` is concurrent-runner idempotent; v3→v4 was not. Two processes (or open-for-write handles) opening the same legacy graph at once both reach the backfill merge, and `merge_lineage_rows`'s `conflict_retries(0)` made the row-level CAS loser error the whole open instead of converging. Two contention points, both now handled all-or-nothing: 1. The backfill merge on `graph_head:<branch>`. Wrap (fast-path re-read → read legacy → merge) in a bounded re-open retry loop: a `RowLevelCasContention` loss re-opens the manifest past the winner's (atomic) commit and re-loops; the fast-path re-read then sees the winner's lineage and stamps. On budget exhaustion it returns a `RowLevelCasContention`-typed error so the publisher's OUTER retry loop completes it. The retry decision reuses the publisher's `is_retryable_publish_conflict` so the two stay in lockstep. 2. The terminal stamp bump. Making the merge loser converge newly lets BOTH runners reach `set_stamp(4)` — an `UpdateConfig` commit on the same key — so the loser gets `lance::Error::IncompatibleTransaction` (NOT a row-level CAS, so the merge loop doesn't catch it). This surfaced only under the concurrent full-suite run, not the isolated test. Both write the SAME value, so the conflict is benign: `commit_v4_stamp_idempotently` re-opens and, if the stamp already reached the target, succeeds; else re-applies (bounded). Greens the race test from the previous commit (3x isolated, 5x full-suite, no flake). The new `IncompatibleTransaction` match is pinned by `lance_surface_guards.rs::lance_error_incompatible_transaction_variant_exists`. * fix(engine): refuse a future internal-schema stamp on the branch read path `load_commit_cache_for_branch` dispatched on the branch's internal-schema stamp — `< CURRENT` to the v3 legacy fallback, `>= CURRENT` to the manifest projection — but never refused a `> CURRENT` branch stamp, so a newer-binary shape would be misread by the projection rather than rejected. Add `refuse_if_stamp_too_new(stamp)` (re-exported `pub(crate)` from `migrations`) right after the branch stamp is read, mirroring the main read path's `refuse_if_internal_schema_too_new`. This is defense-in-depth, not a live hole: migrations run main-first (main migrates on open; each branch on its first write), so main's stamp is always >= every branch's and the main path refuses first. The guard closes the gap if that ordering invariant is ever weakened. Tested by force-stamping a real branch past CURRENT and asserting the branch read refuses with the upgrade error (the test misreads via the projection — returns Ok — without the guard, confirmed by removing it). * docs(rfc-013): record the v3→v4 migration robustness fixes invariants.md Known Gaps: the `migrate_v3_to_v4` entry now states the migration is loud on non-not-found legacy-open errors and concurrent-runner idempotent (bounded re-open retry on the merge CAS + idempotent stamp bump), and that the branch read path refuses a `> CURRENT` stamp. lance.md: note the two new surface guards the migration depends on (`dataset_open_missing_returns_not_found_variant`, `lance_error_incompatible_transaction_variant_exists`). testing.md: note the migration fault-injection test in the failpoints row. * refactor: remove dead code and silence warnings across engine + cluster Dead-code sweep follow-up to the RFC-013 stack. No behavior change. - engine: delete the orphaned `validate_edge_cardinality` — the load path uses `validate_edge_cardinality_with_pending_loader` for every mode (including Overwrite, which it treats as the replacement table image), so the old standalone validator had no caller — and correct its sibling's now-stale doc reference. Gate `TableStore::append_batch` `#[cfg(test)]`: it is the inline- commit residual kept only for recovery test setup, with no non-test caller. - cluster: drop unused imports in `lib.rs`, delete the unused `ClusterStore::payload_display`, and raise `LiveGraphObservation` / `GraphObservationJson` / `PolicyTarget` to `pub(crate)` to match the functions that return them. Both lib crates now build warning-free. * fix(engine): match Lance's typed DatasetAlreadyExists, not the message string The internal create-or-open idempotency fallbacks in `db/commit_graph.rs` and `db/recovery_audit.rs` classified the "already exists" race by `err.to_string().contains("Dataset already exists")` — a Lance display string, not an API contract. A wording change upstream would silently break the fallback (a re-create would error instead of opening the existing table). Match the typed `lance::Error::DatasetAlreadyExists { .. }` variant instead — the same discipline as the v3→v4 migration's not-found classifier — pinned by the new `lance_surface_guards.rs::lance_error_dataset_already_exists_variant_exists` guard so a Lance rename turns red instead of silently regressing. * refactor(engine): consolidate now_micros into one crate::db helper Four `fn now_micros() -> Result<i64>` copies (commit_graph, recovery_audit, graph_coordinator, manifest/graph) had already drifted: three mapped the clock error to `OmniError::manifest("...UNIX_EPOCH...")` while recovery_audit used `OmniError::manifest_internal("...unix epoch...")`. Replace all four with one `pub(crate) fn now_micros()` in `db/mod.rs` (the majority `manifest` variant), and repoint the eight call sites at `crate::db::now_micros()`. No test asserts on the failure message, so unifying the variant is behavior-safe; the timestamp-mapping contract can no longer fork across the rows it stamps. * refactor(engine): drop the dead snapshot param from roll_back_sidecar `roll_back_sidecar` took `snapshot: &Snapshot` only to discard it with `let _ = snapshot;` — rollbacks now always publish (the restored HEAD plus a recovery-commit lineage row), so the snapshot is never read to decide whether to skip a publish. Remove the parameter, the two call-site arguments, and the suppressor. A signature must not advertise inputs it does not consume. The `Snapshot` import stays — `process_sidecar`, `roll_forward_all`, and `record_audit_recovery_rollforward` still take it. * test(engine): red — open_at_branch wedges a branch on a missing commit-graph ref A v4 graph keeps its graph lineage in `__manifest` (RFC-013 Phase 7); the `_graph_commits.lance` branch ref is a derived artifact. An interrupted fork-reclaim or a `cleanup` race can drop that derived ref while the manifest lineage stays intact. Per invariants 7 + 15 a missing derived ref must not fail a logical read of the lineage. This wedge builds a real v4 `feature` branch (its `graph_head:feature` row in `__manifest`), force-deletes ONLY the `_graph_commits.lance` `feature` ref, then asserts the branch reads (`open_at_branch` / list-commits / `merge_base`) succeed from `__manifest` while a write that needs the derived ref (`create_branch`) fails loudly with the typed actionable error. Red against current code: `open_at_branch`'s hard `checkout_branch(branch)?` on the missing ref errors `OmniError::Lance` (Lance "Not found: _graph_commits.lance/tree/feature/_versions"), wedging the logical read. * fix(engine): read manifest lineage independent of the derived _graph_commits ref `CommitGraph::open_at_branch` did a hard `checkout_branch(branch)?` on the `_graph_commits.lance` branch ref before reading lineage — so a missing derived ref (an interrupted fork-reclaim, or a `cleanup` race) wedged the branch's commit-list / merge-base / snapshot resolution even though the lineage is readable from the authoritative `__manifest` (RFC-013 Phase 7). That is a derived/physical artifact failing a logical read — invariants 7 and 15. Make the held commits handle `Option<Dataset>` (mirroring `actor_dataset`). `open_at_branch` and `refresh` check out the derived ref best-effort: a typed not-found (`RefNotFound`/`NotFound`) yields a `None` handle while the read re-syncs from `__manifest`; any other open error still propagates. The manifest existence gate is unchanged — `load_commit_cache_for_branch` keeps its hard `?`, so a truly absent branch still fails loudly at the manifest. `create_branch` (the only writer that forks a ref) and the folded-in version lookup return a loud, actionable error on `None`, deferring repair to `cleanup`'s existing orphan reconciler rather than inlining a write on a read-side refresh. Reads (`head_commit`/`load_commits`/`get_commit`/`merge_base`) never touch the handle. Greens the wedge regression from the preceding commit. * fix(engine): v3→v4 retry loops return retryable contention on exhaustion `commit_v4_stamp_idempotently`'s retry loop used `0..=STAMP_RETRY_BUDGET` (6 iterations) with an `attempt < STAMP_RETRY_BUDGET` guard, so the LAST iteration's `IncompatibleTransaction` fell through to `Err(e) => OmniError::Lance(...)` — stringified, non-retryable — instead of the intended `RowLevelCasContention`, and the post-loop contention return was dead code. The publisher's outer retry only re-runs `is_retryable_publish_conflict`, so under sustained concurrent v3→v4 migration the one-time stamp bump could fail instead of converging, defeating the idempotency the migration is supposed to add. Fix the loop to `0..BUDGET` with an UNGUARDED `IncompatibleTransaction` arm: the retryable variant is always handled inside the loop (re-open + same-value check + retry), so it can never reach the stringifying catch-all, and the post-loop is the SINGLE reachable exhaustion path — the typed `RowLevelCasContention`. The `Err(e)` arm now catches only genuine non-contention errors. Apply the same range alignment to the sibling merge loop in `migrate_v3_to_v4` (behaviorally correct today — its `Err(err)` returns the already-typed contention — but it carried the identical off-by-one structure the stamp loop was copied from; aligning both stops the next copy from re-introducing it). Test-first. The exhaustion path is otherwise near-unreachable — a real concurrent winner stamps the same value, so the re-read returns Ok on the first retry — so a new `migration.v4_stamp.force_incompatible` failpoint forces every stamp attempt to lose, driving exhaustion deterministically. Against the pre-fix loop the new `v4_stamp_exhaustion_returns_retryable_contention` test goes red with `Lance("Incompatible transaction: injected failpoint triggered…")`; with the fix it asserts the typed `RowLevelCasContention`. Found by automated review on #299. * feat(engine): minimum-supported internal-schema floor + retirement tripwire The internal-schema migration chain (`migrate_internal_schema`) had a too-new ceiling but no floor, so every old `migrate_vN_…` arm and the v3 legacy readers it needs stay forever — the pile grows by one migration + readers + tests every schema version. Add `MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION` (1 today, a pure no-op: `read_stamp` floors an absent stamp at 1 and no real graph carries 0) as the oldest stamp this binary opens; raising it is how the chain sheds old code. Collapse the one-sided `refuse_if_stamp_too_new` into `refuse_if_stamp_unsupported` checking both bounds, so the floor lands at all three stamp-enforcement sites — the write-path migrate dispatcher, the read-only open guard, and the branch lineage-read path (`commit_graph.rs`) — via one compiler-enforced rename. A hand-wired floor twin would have had to touch each site, and the branch-read path is easy to miss; one combined guard cannot half-enforce. Rename the read-only wrapper `refuse_if_internal_schema_unsupported` to match. A compile-time tripwire (`const _: () = assert!(LOWEST_REGISTERED_MIGRATION_SOURCE == MIN_SUPPORTED…)`) fails the build if a future floor bump forgets to delete the now-dead migration arm (or vice versa) — stronger than a runtime test, impossible to skip, and it doubles as the use that keeps the mirror const live. Tests: a sub-floor graph is refused in both open modes (twin of `future_stamp_is_refused_in_both_open_modes`); the guard accepts exactly [MIN, CURRENT]. No behavior change for any real graph. The retirement runbook lives on the `MIN_SUPPORTED` doc-comment + invariants.md. * fix(engine): compose migration contention with publisher retry; precise recovery-converge audit commit Three review-surfaced fixes on the RFC-013 Phase 7 path. Publisher retry vs migration contention: `publish()` propagated a `load_publish_state` error fatally via `?`, so a `RowLevelCasContention` surfaced by the v3->v4 migration's exhausted merge/stamp budgets aborted the publish instead of being retried — only `merge_rows` conflicts hit the retry. This contradicted the migration's own design, which returns that typed error EXPECTING the publisher to re-run the load (by which point a concurrent winner has usually finished the migration, so the next scan is a no-op). Route a retryable load error through the same retry path as a retryable `merge_rows` conflict. Regression test (failpoints): a one-shot retryable contention injected into `load_publish_state` now commits via the retry; red without the fix (the write fails with the injected contention). Recovery-converge audit commit id: `converge_or_defer_roll_forward` recorded the branch HEAD as the audit row's `graph_commit_id`, but a concurrent user write can advance `graph_head` past the recovery commit between the winner's publish and this read — attributing the audit to a later, wrong commit. Use the latest `RECOVERY_ACTOR`-authored commit (what `publish_recovery_commit` mints), which is the recovery commit by construction. The audit's actor was already correct (it comes from `sidecar.actor_id`, not the commit). Dead param: drop the unused `snapshot` from `record_audit_recovery_rollforward` (removing the `let _ = snapshot;` suppressor). `storage` stays — it is used to delete the sidecar.
This commit is contained in:
parent
b6c19bfa5d
commit
1c5cb8741e
36 changed files with 3798 additions and 657 deletions
|
|
@ -263,7 +263,7 @@ omnigraph policy explain --cluster ./company-brain --graph knowledge --actor act
|
|||
| Schema language | — | `.pg` + Pest grammar + catalog + interfaces + constraints + annotations |
|
||||
| Query language | — | `.gq` + Pest grammar + IR + lowering + linter |
|
||||
| Schema migration planning | — | `plan_schema_migration` + `apply_schema` step types + `__schema_apply_lock__` |
|
||||
| Commit graph (DAG) across whole graph | — | `_graph_commits.lance` with linear + merge parents, ULID ids, actor map |
|
||||
| Commit graph (DAG) across whole graph | — | Lineage (linear + merge parents, ULID ids, actor) stored as `graph_commit`/`graph_head` rows in `__manifest`, written in the same publish CAS as the table-version rows (RFC-013 Phase 7 — no separate `_graph_commits.lance` write; manifest→commit-graph atomicity gap closed); the in-memory commit graph is a projection of those rows |
|
||||
| Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing |
|
||||
| Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` |
|
||||
| Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming |
|
||||
|
|
|
|||
|
|
@ -474,7 +474,7 @@ pub(crate) async fn preview_schema_migration(
|
|||
Ok(preview.plan)
|
||||
}
|
||||
|
||||
struct LiveGraphObservation {
|
||||
pub(crate) struct LiveGraphObservation {
|
||||
manifest_version: u64,
|
||||
schema_digest: String,
|
||||
}
|
||||
|
|
@ -494,7 +494,7 @@ pub(crate) async fn observe_live_graph(graph_uri: &str) -> Result<LiveGraphObser
|
|||
})
|
||||
}
|
||||
|
||||
struct GraphObservationJson<'a> {
|
||||
pub(crate) struct GraphObservationJson<'a> {
|
||||
address: &'a str,
|
||||
graph_uri: &'a str,
|
||||
observed_at: &'a str,
|
||||
|
|
@ -949,7 +949,7 @@ pub(crate) fn validate_id(kind: &str, path: &str, value: &str, diagnostics: &mut
|
|||
}
|
||||
}
|
||||
|
||||
enum PolicyTarget {
|
||||
pub(crate) enum PolicyTarget {
|
||||
Cluster,
|
||||
Graph(String),
|
||||
WrongKind(String),
|
||||
|
|
|
|||
|
|
@ -1,8 +1,6 @@
|
|||
use std::collections::{BTreeMap, BTreeSet};
|
||||
use std::fs::{self, OpenOptions};
|
||||
use std::io::{ErrorKind, Write};
|
||||
use std::fs::{self};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::process;
|
||||
|
||||
use omnigraph::db::{Omnigraph, ReadTarget, SchemaApplyOptions};
|
||||
use omnigraph_compiler::SchemaMigrationPlan;
|
||||
|
|
@ -26,11 +24,7 @@ mod store;
|
|||
mod sweep;
|
||||
mod types;
|
||||
use config::{
|
||||
QueriesDecl, future_field_diagnostics, graph_address, initial_import_state, load_desired,
|
||||
normalize_policy_target, observe_declared_graphs, observe_live_graph, parse_cluster_config,
|
||||
policy_address, preview_schema_migration, query_address, resolve_config_path,
|
||||
resolve_query_decls, schema_address, state_resource_digests, validate_cluster_header,
|
||||
validate_id, validate_query_source,
|
||||
QueriesDecl, graph_address, initial_import_state, load_desired, observe_declared_graphs, parse_cluster_config, preview_schema_migration, schema_address, state_resource_digests, validate_cluster_header,
|
||||
};
|
||||
use diff::{
|
||||
FailedGraphOrigin, ResourceKind, append_embedding_profile_changes,
|
||||
|
|
@ -42,13 +36,12 @@ pub use serve::{
|
|||
cluster_root_for_graph_uri, read_serving_snapshot, read_serving_snapshot_from_storage,
|
||||
resolve_graph_storage_uri,
|
||||
};
|
||||
use store::{ClusterStore, StateLockGuard, StateSnapshot};
|
||||
use store::ClusterStore;
|
||||
use sweep::{
|
||||
mark_approvals_consumed, record_approval_consumed, sweep_recovery_sidecars,
|
||||
tombstone_graph_subtree, warn_pending_recovery_sidecars,
|
||||
};
|
||||
pub use types::*;
|
||||
use types::*;
|
||||
|
||||
pub const CLUSTER_CONFIG_FILE: &str = "cluster.yaml";
|
||||
pub const CLUSTER_GRAPHS_DIR: &str = "graphs";
|
||||
|
|
|
|||
|
|
@ -408,10 +408,6 @@ impl ClusterStore {
|
|||
}
|
||||
}
|
||||
|
||||
pub(crate) fn payload_display(&self, kind: &ResourceKind, digest: &str) -> Option<String> {
|
||||
Self::payload_relative(kind, digest).map(|relative| self.display(&relative))
|
||||
}
|
||||
|
||||
pub(crate) async fn payload_exists(&self, kind: &ResourceKind, digest: &str) -> bool {
|
||||
let Some(relative) = Self::payload_relative(kind, digest) else {
|
||||
return false;
|
||||
|
|
|
|||
|
|
@ -1,6 +1,5 @@
|
|||
use std::collections::{HashMap, VecDeque};
|
||||
use std::sync::Arc;
|
||||
use std::time::{SystemTime, UNIX_EPOCH};
|
||||
|
||||
use arrow_array::{
|
||||
Array, RecordBatch, RecordBatchIterator, StringArray, TimestampMicrosecondArray, UInt64Array,
|
||||
|
|
@ -29,7 +28,16 @@ pub struct GraphCommit {
|
|||
|
||||
pub struct CommitGraph {
|
||||
root_uri: String,
|
||||
dataset: Dataset,
|
||||
/// Handle on `_graph_commits.lance` at the active branch, held only for the
|
||||
/// branch-management WRITES (`create_branch`, formerly `version`) and
|
||||
/// `refresh`. It is a DERIVED artifact (RFC-013 Phase 7): graph lineage lives
|
||||
/// in `__manifest`, and reads (`head_commit`/`load_commits`/`get_commit`/
|
||||
/// `merge_base`) never touch it. `None` means the branch's
|
||||
/// `_graph_commits.lance` ref is missing (an interrupted fork-reclaim or a
|
||||
/// `cleanup` race) while the manifest lineage is still authoritative — so the
|
||||
/// READS stay correct and only a subsequent `create_branch` surfaces the loud
|
||||
/// actionable error. Mirrors `actor_dataset`'s best-effort `Option`.
|
||||
dataset: Option<Dataset>,
|
||||
actor_dataset: Option<Dataset>,
|
||||
active_branch: Option<String>,
|
||||
actor_by_commit_id: HashMap<String, String>,
|
||||
|
|
@ -38,20 +46,19 @@ pub struct CommitGraph {
|
|||
}
|
||||
|
||||
impl CommitGraph {
|
||||
pub async fn init(root_uri: &str, manifest_version: u64) -> Result<Self> {
|
||||
/// Create the commit-graph datasets for a fresh graph. The genesis
|
||||
/// `graph_commit` + `graph_head` rows live in `__manifest` (folded into the
|
||||
/// init write — RFC-013 Phase 7), so `_graph_commits.lance` is created EMPTY
|
||||
/// here: it exists only to carry the Lance branch refs that `create_branch` /
|
||||
/// `list_branches` / the `cleanup` orphan reconciler operate on. No commit
|
||||
/// rows are ever written to it. The in-memory cache is sourced from the
|
||||
/// manifest projection — the same path as [`open`], so genesis is seen
|
||||
/// identically whether the graph was just initialized or reopened.
|
||||
pub async fn init(root_uri: &str) -> Result<Self> {
|
||||
let root = root_uri.trim_end_matches('/');
|
||||
let uri = graph_commits_uri(root);
|
||||
let genesis = GraphCommit {
|
||||
graph_commit_id: ulid::Ulid::new().to_string(),
|
||||
manifest_branch: None,
|
||||
manifest_version,
|
||||
parent_commit_id: None,
|
||||
merged_parent_commit_id: None,
|
||||
actor_id: None,
|
||||
created_at: now_micros()?,
|
||||
};
|
||||
|
||||
let batch = commits_to_batch(&[genesis.clone()])?;
|
||||
let batch = RecordBatch::new_empty(commit_graph_schema());
|
||||
let reader = RecordBatchIterator::new(vec![Ok(batch)], commit_graph_schema());
|
||||
let params = WriteParams {
|
||||
mode: WriteMode::Create,
|
||||
|
|
@ -66,17 +73,30 @@ impl CommitGraph {
|
|||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
let actor_dataset = create_commit_actor_dataset(root).await?;
|
||||
|
||||
let (commit_by_id, head_commit) = load_commit_cache_from_manifest(root, None).await?;
|
||||
Ok(Self {
|
||||
root_uri: root.to_string(),
|
||||
dataset,
|
||||
dataset: Some(dataset),
|
||||
actor_dataset: Some(actor_dataset),
|
||||
active_branch: None,
|
||||
actor_by_commit_id: HashMap::new(),
|
||||
commit_by_id: HashMap::from([(genesis.graph_commit_id.clone(), genesis.clone())]),
|
||||
head_commit: Some(genesis),
|
||||
commit_by_id,
|
||||
head_commit,
|
||||
})
|
||||
}
|
||||
|
||||
/// Insert a just-published commit into the in-memory cache (RFC-013 Phase 7).
|
||||
/// The durable write already happened in the manifest publish CAS; this only
|
||||
/// keeps the cache consistent for same-handle reads, with no storage I/O.
|
||||
/// Head selection matches the manifest-sourced load (`should_replace_head`).
|
||||
pub fn insert_committed(&mut self, commit: GraphCommit) {
|
||||
if should_replace_head(self.head_commit.as_ref(), &commit) {
|
||||
self.head_commit = Some(commit.clone());
|
||||
}
|
||||
self.commit_by_id
|
||||
.insert(commit.graph_commit_id.clone(), commit);
|
||||
}
|
||||
|
||||
pub async fn open(root_uri: &str) -> Result<Self> {
|
||||
let root = root_uri.trim_end_matches('/');
|
||||
let wrapper = crate::instrumentation::commit_graph_wrapper();
|
||||
|
|
@ -87,17 +107,24 @@ impl CommitGraph {
|
|||
crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(root), wrapper)
|
||||
.await
|
||||
.ok();
|
||||
let actor_by_commit_id = match &actor_dataset {
|
||||
Some(dataset) => load_commit_actor_cache(dataset).await?,
|
||||
None => HashMap::new(),
|
||||
};
|
||||
let (commit_by_id, head_commit) = load_commit_cache(&dataset, &actor_by_commit_id).await?;
|
||||
// RFC-013 step 4: source the in-memory cache from the `__manifest`
|
||||
// lineage projection (which carries the actor inline), not from
|
||||
// `_graph_commits.lance`. The dataset handles above are retained for the
|
||||
// branch-management ops (create/delete/list/version) that still target
|
||||
// the commit-graph dataset; the actor dataset is only kept for the
|
||||
// dual-write append path. The projection-equivalence gate proves this
|
||||
// cache equals the prior `_graph_commits.lance` read. A pre-Phase-7 (v3)
|
||||
// graph not yet migrated falls back to the legacy read — see
|
||||
// `load_commit_cache_for_branch`.
|
||||
let (commit_by_id, head_commit) = load_commit_cache_for_branch(root, None).await?;
|
||||
Ok(Self {
|
||||
root_uri: root.to_string(),
|
||||
dataset,
|
||||
// `open` targets main and never checks out a branch (main cannot be
|
||||
// deleted/recreated), so the handle is always present here.
|
||||
dataset: Some(dataset),
|
||||
actor_dataset,
|
||||
active_branch: None,
|
||||
actor_by_commit_id,
|
||||
actor_by_commit_id: HashMap::new(),
|
||||
commit_by_id,
|
||||
head_commit,
|
||||
})
|
||||
|
|
@ -109,25 +136,33 @@ impl CommitGraph {
|
|||
let dataset =
|
||||
crate::instrumentation::open_dataset_tracked(&graph_commits_uri(root), wrapper.clone())
|
||||
.await?;
|
||||
let dataset = dataset
|
||||
.checkout_branch(branch)
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
// Best-effort checkout of the DERIVED `_graph_commits.lance` branch ref.
|
||||
// It is held only for `create_branch` (a write); the lineage READ below
|
||||
// comes from `__manifest`. A missing ref (interrupted fork-reclaim /
|
||||
// `cleanup` race) must not wedge the read, so a typed not-found yields a
|
||||
// `None` handle — a subsequent `create_branch` then surfaces the loud
|
||||
// error. Any OTHER open error (transient IO / corrupt) still propagates,
|
||||
// matching the `force_delete_branch` / `read_legacy_commit_cache` idiom.
|
||||
let dataset = match dataset.checkout_branch(branch).await {
|
||||
Ok(ds) => Some(ds),
|
||||
Err(lance::Error::RefNotFound { .. }) | Err(lance::Error::NotFound { .. }) => None,
|
||||
Err(e) => return Err(OmniError::Lance(e.to_string())),
|
||||
};
|
||||
let actor_dataset =
|
||||
crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(root), wrapper)
|
||||
.await
|
||||
.ok();
|
||||
let actor_by_commit_id = match &actor_dataset {
|
||||
Some(dataset) => load_commit_actor_cache(dataset).await?,
|
||||
None => HashMap::new(),
|
||||
};
|
||||
let (commit_by_id, head_commit) = load_commit_cache(&dataset, &actor_by_commit_id).await?;
|
||||
// Hard `?`: the manifest existence gate. `load_commit_cache_for_branch`
|
||||
// opens the branch's `__manifest` (its own `checkout_branch` on the
|
||||
// authoritative table), so a TRULY absent branch still fails loudly here —
|
||||
// only the derived `_graph_commits.lance` ref is allowed to be missing.
|
||||
let (commit_by_id, head_commit) = load_commit_cache_for_branch(root, Some(branch)).await?;
|
||||
Ok(Self {
|
||||
root_uri: root.to_string(),
|
||||
dataset,
|
||||
actor_dataset,
|
||||
active_branch: Some(branch.to_string()),
|
||||
actor_by_commit_id,
|
||||
actor_by_commit_id: HashMap::new(),
|
||||
commit_by_id,
|
||||
head_commit,
|
||||
})
|
||||
|
|
@ -136,40 +171,49 @@ impl CommitGraph {
|
|||
pub async fn refresh(&mut self) -> Result<()> {
|
||||
let root = self.root_uri.clone();
|
||||
let wrapper = crate::instrumentation::commit_graph_wrapper();
|
||||
self.dataset = crate::instrumentation::open_dataset_tracked(
|
||||
let dataset = crate::instrumentation::open_dataset_tracked(
|
||||
&graph_commits_uri(&root),
|
||||
wrapper.clone(),
|
||||
)
|
||||
.await?;
|
||||
if let Some(branch) = &self.active_branch {
|
||||
self.dataset = self
|
||||
.dataset
|
||||
.checkout_branch(branch)
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
}
|
||||
// Same best-effort checkout as `open_at_branch`: a missing DERIVED branch
|
||||
// ref leaves the handle `None` (only `create_branch` then errors), while
|
||||
// the in-memory cache re-syncs from the authoritative manifest below.
|
||||
self.dataset = match &self.active_branch {
|
||||
Some(branch) => match dataset.checkout_branch(branch).await {
|
||||
Ok(ds) => Some(ds),
|
||||
Err(lance::Error::RefNotFound { .. }) | Err(lance::Error::NotFound { .. }) => None,
|
||||
Err(e) => return Err(OmniError::Lance(e.to_string())),
|
||||
},
|
||||
None => Some(dataset),
|
||||
};
|
||||
self.actor_dataset =
|
||||
crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(&root), wrapper)
|
||||
.await
|
||||
.ok();
|
||||
self.actor_by_commit_id = match &self.actor_dataset {
|
||||
Some(dataset) => load_commit_actor_cache(dataset).await?,
|
||||
None => HashMap::new(),
|
||||
};
|
||||
let (commit_by_id, head_commit) =
|
||||
load_commit_cache(&self.dataset, &self.actor_by_commit_id).await?;
|
||||
load_commit_cache_for_branch(&root, self.active_branch.as_deref()).await?;
|
||||
self.commit_by_id = commit_by_id;
|
||||
self.head_commit = head_commit;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn version(&self) -> u64 {
|
||||
self.dataset.version().version
|
||||
}
|
||||
|
||||
pub async fn create_branch(&mut self, name: &str) -> Result<()> {
|
||||
let mut ds = self.dataset.clone();
|
||||
ds.create_branch(name, self.version(), None)
|
||||
// The held `_graph_commits.lance` handle is the only thing that can fork a
|
||||
// branch ref. If it is missing (an interrupted fork-reclaim or a `cleanup`
|
||||
// race dropped the derived ref while manifest lineage stayed authoritative),
|
||||
// fail loudly + actionably rather than silently. Repair is the existing
|
||||
// `cleanup` orphan reconciler (`reconcile_commit_graph_orphans`), not an
|
||||
// inline write on this path.
|
||||
let Some(dataset) = &self.dataset else {
|
||||
let branch = self.active_branch.as_deref().unwrap_or("main");
|
||||
return Err(OmniError::manifest_internal(format!(
|
||||
"commit-graph branch ref for '{branch}' is missing; run `omnigraph cleanup` then retry"
|
||||
)));
|
||||
};
|
||||
let version = dataset.version().version;
|
||||
let mut ds = dataset.clone();
|
||||
ds.create_branch(name, version, None)
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
Ok(())
|
||||
|
|
@ -216,7 +260,17 @@ impl CommitGraph {
|
|||
Ok(branches.into_keys().collect())
|
||||
}
|
||||
|
||||
pub async fn append_commit(
|
||||
// DEAD as of RFC-013 Phase 7: graph commits are recorded in `__manifest`
|
||||
// (folded into the publish CAS), never appended to `_graph_commits.lance`.
|
||||
// These append helpers are retained only because the actor sidecar table they
|
||||
// touch is still enumerated by `optimize` (internal-table compaction); they
|
||||
// have no caller on any write path. The single-source invariant is guarded by
|
||||
// `tests/lineage_projection.rs`, which fails if `_graph_commits.lance` ever
|
||||
// gains a commit row. Do NOT call these to record a commit — use the
|
||||
// coordinator's `commit_*_with_actor` / `commit_merge_with_actor`, which carry
|
||||
// the lineage intent into the manifest publish.
|
||||
#[allow(dead_code)]
|
||||
async fn append_commit(
|
||||
&mut self,
|
||||
manifest_branch: Option<&str>,
|
||||
manifest_version: u64,
|
||||
|
|
@ -233,7 +287,8 @@ impl CommitGraph {
|
|||
.await
|
||||
}
|
||||
|
||||
pub async fn append_merge_commit(
|
||||
#[allow(dead_code)]
|
||||
async fn append_merge_commit(
|
||||
&mut self,
|
||||
manifest_branch: Option<&str>,
|
||||
manifest_version: u64,
|
||||
|
|
@ -251,6 +306,7 @@ impl CommitGraph {
|
|||
.await
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
async fn append_commit_with_parents(
|
||||
&mut self,
|
||||
manifest_branch: Option<&str>,
|
||||
|
|
@ -267,16 +323,22 @@ impl CommitGraph {
|
|||
parent_commit_id: parent_commit_id.map(|s| s.to_string()),
|
||||
merged_parent_commit_id: merged_parent_commit_id.map(|s| s.to_string()),
|
||||
actor_id: actor_id.map(str::to_string),
|
||||
created_at: now_micros()?,
|
||||
created_at: crate::db::now_micros()?,
|
||||
};
|
||||
|
||||
let batch = commits_to_batch(&[commit.clone()])?;
|
||||
let reader = RecordBatchIterator::new(vec![Ok(batch)], commit_graph_schema());
|
||||
let mut ds = self.dataset.clone();
|
||||
// This helper is dead on every write path (RFC-013 Phase 7) — reached only
|
||||
// by the transitional v3 fixtures, which always hold the commits dataset.
|
||||
// A `None` here would be a fixture bug, so fail loudly rather than silently.
|
||||
let mut ds = self
|
||||
.dataset
|
||||
.clone()
|
||||
.ok_or_else(|| OmniError::manifest_internal("commit-graph dataset is missing"))?;
|
||||
ds.append(reader, None)
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
self.dataset = ds;
|
||||
self.dataset = Some(ds);
|
||||
if let Some(actor_id) = actor_id {
|
||||
self.append_actor(&graph_commit_id, actor_id).await?;
|
||||
}
|
||||
|
|
@ -289,6 +351,7 @@ impl CommitGraph {
|
|||
Ok(graph_commit_id)
|
||||
}
|
||||
|
||||
#[allow(dead_code)] // RFC-013 Phase 7: dead — see `append_commit`.
|
||||
async fn append_actor(&mut self, graph_commit_id: &str, actor_id: &str) -> Result<()> {
|
||||
if self
|
||||
.actor_by_commit_id
|
||||
|
|
@ -301,7 +364,7 @@ impl CommitGraph {
|
|||
let record = CommitActorRecord {
|
||||
graph_commit_id: graph_commit_id.to_string(),
|
||||
actor_id: actor_id.to_string(),
|
||||
created_at: now_micros()?,
|
||||
created_at: crate::db::now_micros()?,
|
||||
};
|
||||
let batch = commit_actors_to_batch(&[record])?;
|
||||
let reader = RecordBatchIterator::new(vec![Ok(batch)], commit_actor_schema());
|
||||
|
|
@ -452,7 +515,12 @@ async fn create_commit_actor_dataset(root_uri: &str) -> Result<Dataset> {
|
|||
};
|
||||
match Dataset::write(reader, &uri as &str, Some(params)).await {
|
||||
Ok(dataset) => Ok(dataset),
|
||||
Err(err) if err.to_string().contains("Dataset already exists") => Dataset::open(&uri)
|
||||
// Create-or-open idempotency: a concurrent/prior create raced us. Match
|
||||
// the typed `DatasetAlreadyExists` variant, not the display string — the
|
||||
// message is not a Lance API contract (a wording change would silently
|
||||
// break this fallback). Pinned by
|
||||
// `lance_surface_guards.rs::lance_error_dataset_already_exists_variant_exists`.
|
||||
Err(lance::Error::DatasetAlreadyExists { .. }) => Dataset::open(&uri)
|
||||
.await
|
||||
.map_err(|open_err| OmniError::Lance(open_err.to_string())),
|
||||
Err(err) => Err(OmniError::Lance(err.to_string())),
|
||||
|
|
@ -490,6 +558,156 @@ fn commits_to_batch(commits: &[GraphCommit]) -> Result<RecordBatch> {
|
|||
.map_err(|e| OmniError::Lance(e.to_string()))
|
||||
}
|
||||
|
||||
/// Build the in-memory commit cache for `branch`, choosing the source by the
|
||||
/// branch manifest's internal-schema stamp (RFC-013 step 4 forward/back-compat):
|
||||
///
|
||||
/// - stamp ≥ v4 (post-Phase-7, the normal case): the `__manifest` lineage
|
||||
/// projection — `graph_commit`/`graph_head` rows folded into the publish CAS.
|
||||
/// - stamp < v4 (a pre-Phase-7 graph not yet migrated): the legacy
|
||||
/// `_graph_commits.lance` read. This is the **transitional v3 fallback** that
|
||||
/// lets a READ-ONLY open of an un-migrated graph still see correct history —
|
||||
/// a read-only open never runs the v3→v4 backfill (it must not write), so
|
||||
/// without this gate it would read an empty DAG from `__manifest`. A
|
||||
/// read-write open backfills `__manifest` on its first write and thereafter
|
||||
/// takes the projection branch.
|
||||
///
|
||||
/// Both sources pick the head with `should_replace_head`, so the cache is
|
||||
/// identical regardless of which branch is taken. Remove the fallback once no
|
||||
/// graph below internal-schema v4 remains.
|
||||
async fn load_commit_cache_for_branch(
|
||||
root_uri: &str,
|
||||
branch: Option<&str>,
|
||||
) -> Result<(HashMap<String, GraphCommit>, Option<GraphCommit>)> {
|
||||
let stamp = crate::db::manifest::internal_schema_stamp_at(root_uri, branch).await?;
|
||||
// Defense-in-depth: refuse a branch whose stamp this binary cannot serve —
|
||||
// newer than CURRENT, or below MIN_SUPPORTED — for the same reason the main
|
||||
// read path does (`refuse_if_internal_schema_unsupported`). A `> CURRENT` stamp
|
||||
// means a newer binary wrote a shape we can't read, so the projection below
|
||||
// would misread it; a `< MIN` stamp predates the legacy readers this binary
|
||||
// still carries. Not a live hole today: migrations run main-first
|
||||
// (`migrate_on_open` migrates main; each branch migrates on its own first
|
||||
// write), so main's stamp bounds every branch's and the main read path already
|
||||
// refuses first. The guard closes the gap if that ordering is ever weakened.
|
||||
crate::db::manifest::refuse_if_stamp_unsupported(stamp)?;
|
||||
if stamp < crate::db::manifest::INTERNAL_MANIFEST_SCHEMA_VERSION {
|
||||
// Transitional: un-migrated v3 graph — read lineage from the legacy
|
||||
// `_graph_commits.lance` so reads (incl. read-only opens) see history.
|
||||
return read_legacy_commit_cache(root_uri, branch).await;
|
||||
}
|
||||
load_commit_cache_from_manifest(root_uri, branch).await
|
||||
}
|
||||
|
||||
/// Build the in-memory commit cache from the `__manifest` graph-lineage
|
||||
/// projection (RFC-013 step 4) rather than `_graph_commits.lance`. The lineage
|
||||
/// rows carry the actor inline, so no separate actor-table read is needed. Head
|
||||
/// selection is identical to [`load_commit_cache`] (`should_replace_head`), so
|
||||
/// the resulting cache is equivalent to the prior `_graph_commits.lance` read.
|
||||
async fn load_commit_cache_from_manifest(
|
||||
root_uri: &str,
|
||||
branch: Option<&str>,
|
||||
) -> Result<(HashMap<String, GraphCommit>, Option<GraphCommit>)> {
|
||||
let (rows, _heads) =
|
||||
crate::db::manifest::ManifestCoordinator::read_graph_lineage_at(root_uri, branch).await?;
|
||||
let mut commit_by_id = HashMap::with_capacity(rows.len());
|
||||
let mut head_commit = None;
|
||||
for row in rows {
|
||||
let commit = GraphCommit {
|
||||
graph_commit_id: row.graph_commit_id,
|
||||
manifest_branch: row.manifest_branch,
|
||||
manifest_version: row.manifest_version,
|
||||
parent_commit_id: row.parent_commit_id,
|
||||
merged_parent_commit_id: row.merged_parent_commit_id,
|
||||
actor_id: row.actor_id,
|
||||
created_at: row.created_at,
|
||||
};
|
||||
if should_replace_head(head_commit.as_ref(), &commit) {
|
||||
head_commit = Some(commit.clone());
|
||||
}
|
||||
commit_by_id.insert(commit.graph_commit_id.clone(), commit);
|
||||
}
|
||||
Ok((commit_by_id, head_commit))
|
||||
}
|
||||
|
||||
/// Read the legacy `_graph_commits.lance` (+ its actor sidecar) for `branch`
|
||||
/// into an in-memory cache — the transitional source for graphs not yet
|
||||
/// migrated to internal-schema v4 (RFC-013 step 4). Two callers, both
|
||||
/// transitional: the v3→v4 migration backfill (which copies these rows into
|
||||
/// `__manifest`) and the read-only v3 fallback in `CommitGraph::open*`. Returns
|
||||
/// `(commit_by_id, head)`, with the head picked by `should_replace_head` —
|
||||
/// identical to the manifest projection. A genuinely ABSENT (not-found) commit
|
||||
/// dataset or actor sidecar yields an empty cache (no head); any OTHER open error
|
||||
/// (transient IO / corrupt file) propagates loudly rather than being read as
|
||||
/// "empty" — a swallow here would let the v3→v4 migration backfill nothing and
|
||||
/// still stamp v4, orphaning the real lineage permanently. This keeps the legacy
|
||||
/// readers alive while any v3 graph survives; once no graph is below v4 it can
|
||||
/// retire.
|
||||
pub(crate) async fn read_legacy_commit_cache(
|
||||
root_uri: &str,
|
||||
branch: Option<&str>,
|
||||
) -> Result<(HashMap<String, GraphCommit>, Option<GraphCommit>)> {
|
||||
let root = root_uri.trim_end_matches('/');
|
||||
let commits_uri = graph_commits_uri(root);
|
||||
let commits_open = match crate::failpoints::maybe_fail_lance_open("migration.v3_to_v4.legacy_open")
|
||||
{
|
||||
Ok(()) => Dataset::open(&commits_uri).await,
|
||||
Err(injected) => Err(injected),
|
||||
};
|
||||
let mut dataset = match commits_open {
|
||||
Ok(dataset) => dataset,
|
||||
// An ABSENT commits dataset is the legitimate "no legacy data" signal —
|
||||
// a graph with no `_graph_commits.lance` (or none on this branch) yields
|
||||
// an empty cache. But ONLY a genuine not-found gets that treatment: a
|
||||
// transient/corrupt open (IO / CorruptFile / …) must propagate, never be
|
||||
// read as "empty". The v3→v4 migration calls this once before stamping
|
||||
// v4; swallowing a non-not-found error here would backfill nothing and
|
||||
// stamp v4 anyway, orphaning the real lineage permanently (the migration
|
||||
// never re-runs, and the v3 fallback is then disabled). Lance maps an
|
||||
// object-store NotFound to `DatasetNotFound`; the variant match (vs an
|
||||
// existence probe) is exactly right and not over-strict — pinned by
|
||||
// `lance_surface_guards.rs::dataset_open_missing_returns_not_found_variant`.
|
||||
Err(lance::Error::DatasetNotFound { .. }) | Err(lance::Error::NotFound { .. }) => {
|
||||
return Ok((HashMap::new(), None));
|
||||
}
|
||||
Err(e) => return Err(OmniError::Lance(e.to_string())),
|
||||
};
|
||||
if let Some(branch) = branch.filter(|b| *b != "main") {
|
||||
dataset = dataset
|
||||
.checkout_branch(branch)
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
}
|
||||
|
||||
// The actor sidecar may be absent (older graphs without authored commits);
|
||||
// an empty actor map then leaves every commit's actor `None`. It is read
|
||||
// FLAT (no branch checkout): the pre-Phase-7 commit graph never forked the
|
||||
// actor dataset — actors are keyed by `graph_commit_id` globally — so a
|
||||
// branch's commits resolve their actor from the same single actor table.
|
||||
// This matches the live `CommitGraph::open_at_branch`, which also opens the
|
||||
// actor dataset on main while checking out the branch only on the commits
|
||||
// dataset.
|
||||
let actors_open =
|
||||
match crate::failpoints::maybe_fail_lance_open("migration.v3_to_v4.legacy_open") {
|
||||
Ok(()) => Dataset::open(&graph_commit_actors_uri(root)).await,
|
||||
Err(injected) => Err(injected),
|
||||
};
|
||||
let actor_by_commit_id = match actors_open {
|
||||
Ok(actor_dataset) => load_commit_actor_cache(&actor_dataset).await?,
|
||||
// An ABSENT actor sidecar is benign (older graphs without authored
|
||||
// commits) — every commit's actor stays `None`. A not-found is therefore
|
||||
// the empty-map signal. But a CORRUPT/transient actor open must NOT be
|
||||
// read as "no authors": silently wiping all authorship and then stamping
|
||||
// v4 is the same permanent-loss hole as the commits arm, so anything
|
||||
// other than not-found propagates. (Same variant contract, different
|
||||
// rationale — absence is normal here, error is not.)
|
||||
Err(lance::Error::DatasetNotFound { .. }) | Err(lance::Error::NotFound { .. }) => {
|
||||
HashMap::new()
|
||||
}
|
||||
Err(e) => return Err(OmniError::Lance(e.to_string())),
|
||||
};
|
||||
|
||||
load_commit_cache(&dataset, &actor_by_commit_id).await
|
||||
}
|
||||
|
||||
async fn load_commit_cache(
|
||||
dataset: &Dataset,
|
||||
actor_by_commit_id: &HashMap<String, String>,
|
||||
|
|
@ -694,11 +912,170 @@ async fn open_for_branch(root_uri: &str, branch: Option<&str>) -> Result<CommitG
|
|||
}
|
||||
}
|
||||
|
||||
fn now_micros() -> Result<i64> {
|
||||
let duration = SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.map_err(|e| OmniError::manifest(format!("system clock before UNIX_EPOCH: {}", e)))?;
|
||||
Ok(duration.as_micros() as i64)
|
||||
/// Identities of the commits written into a synthetic pre-Phase-7 (v3) graph by
|
||||
/// [`seed_legacy_v3_lineage`], for assertions after migration.
|
||||
//
|
||||
// Gated on `test` OR the `failpoints` feature: the v3→v4 migration fault-injection
|
||||
// test lives in the `failpoints` integration binary (the fail registry is
|
||||
// process-global, so failpoint tests must not run in-source), and that binary
|
||||
// compiles the crate without `cfg(test)` — so it needs this fixture under the
|
||||
// feature too. Still excluded from release builds.
|
||||
#[cfg(any(test, feature = "failpoints"))]
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct V3LineageFixture {
|
||||
/// The genesis (parentless) commit id.
|
||||
pub genesis: String,
|
||||
/// A direct, authored commit on main (actor `act-a`).
|
||||
pub commit_a: String,
|
||||
/// A commit tagged to the `feature` branch (actor `act-feature`).
|
||||
pub feature_commit: String,
|
||||
/// The merge commit on main: parent = `commit_a`, merged_parent =
|
||||
/// `feature_commit`, actor `act-merger`. This is the head of main.
|
||||
pub merge_commit: String,
|
||||
/// Every commit id written, in append order (for count assertions).
|
||||
pub all_ids: Vec<String>,
|
||||
}
|
||||
|
||||
/// Build a synthetic pre-Phase-7 (internal-schema v3) graph at `root_uri`: graph
|
||||
/// lineage lives ONLY in `_graph_commits.lance` (+ its actor sidecar), `__manifest`
|
||||
/// carries NO `graph_commit`/`graph_head` rows, and the stamp is set to v3. This
|
||||
/// reproduces exactly the on-disk shape a graph created by a pre-RFC-013-Phase-7
|
||||
/// binary would have, so the v3→v4 migration and the v3-read fallback can be
|
||||
/// tested against it.
|
||||
///
|
||||
/// The lineage is a realistic DAG with a branch + a real merge: genesis → A →
|
||||
/// (feature commit, off to the side) → merge(A, feature) at the head of main,
|
||||
/// with authored actors on the non-genesis commits. Reaches the dead-on-the-
|
||||
/// write-path `append_commit_with_parents`/`append_actor` (still present for
|
||||
/// exactly this transitional purpose) to write the legacy rows.
|
||||
#[cfg(any(test, feature = "failpoints"))]
|
||||
pub async fn seed_legacy_v3_lineage(root_uri: &str) -> Result<V3LineageFixture> {
|
||||
let root = root_uri.trim_end_matches('/');
|
||||
|
||||
// 1. Create `__manifest` (Phase-7 folds genesis lineage into it) and the
|
||||
// EMPTY legacy `_graph_commits.lance`. We then append the v3-style commit
|
||||
// rows below — a real v3 graph carried its genesis in `_graph_commits`.
|
||||
crate::db::manifest::seed_manifest_for_v3_fixture(root).await?;
|
||||
let mut cg = CommitGraph::init(root).await?;
|
||||
// Clear the cache that init seeded from the (genesis-bearing) manifest, so
|
||||
// the appended rows below are the whole story and parents come out right.
|
||||
cg.commit_by_id.clear();
|
||||
cg.head_commit = None;
|
||||
|
||||
// 2. Append the legacy lineage to `_graph_commits.lance` on main.
|
||||
let genesis = cg
|
||||
.append_commit_with_parents(None, 1, None, None, None)
|
||||
.await?;
|
||||
let commit_a = cg
|
||||
.append_commit_with_parents(None, 2, Some(&genesis), None, Some("act-a"))
|
||||
.await?;
|
||||
let feature_commit = cg
|
||||
.append_commit_with_parents(Some("feature"), 3, Some(&commit_a), None, Some("act-feature"))
|
||||
.await?;
|
||||
let merge_commit = cg
|
||||
.append_commit_with_parents(
|
||||
None,
|
||||
4,
|
||||
Some(&commit_a),
|
||||
Some(&feature_commit),
|
||||
Some("act-merger"),
|
||||
)
|
||||
.await?;
|
||||
|
||||
// 3. Strip the genesis lineage rows the Phase-7 init folded into `__manifest`
|
||||
// and rewind the stamp to v3, so the manifest matches a true pre-Phase-7
|
||||
// graph (no lineage in `__manifest`, stamp v3).
|
||||
crate::db::manifest::strip_lineage_and_set_v3_stamp_for_fixture(root).await?;
|
||||
|
||||
Ok(V3LineageFixture {
|
||||
genesis: genesis.clone(),
|
||||
commit_a: commit_a.clone(),
|
||||
feature_commit: feature_commit.clone(),
|
||||
merge_commit: merge_commit.clone(),
|
||||
all_ids: vec![genesis, commit_a, feature_commit, merge_commit],
|
||||
})
|
||||
}
|
||||
|
||||
/// Identities of a synthetic pre-Phase-7 (v3) graph that carries a REAL Lance
|
||||
/// branch (built by [`seed_legacy_v3_lineage_with_branch`]).
|
||||
#[cfg(test)]
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct V3BranchedLineageFixture {
|
||||
/// The genesis (parentless) commit on main.
|
||||
pub genesis: String,
|
||||
/// A direct authored commit on main (actor `act-a`). The head of main.
|
||||
pub commit_a: String,
|
||||
/// A commit on the real `feature` Lance branch (actor `act-branch`),
|
||||
/// parented off `commit_a`. The head of `feature`.
|
||||
pub branch_commit: String,
|
||||
/// The branch name forked on both `_graph_commits.lance` and `__manifest`.
|
||||
pub branch: String,
|
||||
}
|
||||
|
||||
/// Build a synthetic pre-Phase-7 (internal-schema v3) graph at `root_uri` that
|
||||
/// carries a REAL Lance branch `feature` on BOTH `_graph_commits.lance` and
|
||||
/// `__manifest`, reproducing exactly the on-disk shape of a branched graph
|
||||
/// created by a pre-RFC-013-Phase-7 binary:
|
||||
///
|
||||
/// - `_graph_commits.lance`: main has `genesis → A`; the `feature` Lance branch
|
||||
/// adds `branch_commit` (parent `A`). Authored actors land in the FLAT actor
|
||||
/// sidecar (the pre-Phase-7 commit graph never forked the actor table).
|
||||
/// - `__manifest`: main is stamped v3 with NO lineage rows; the `feature` branch
|
||||
/// is forked from main's v3 state, so it too is v3 with NO lineage of its own.
|
||||
///
|
||||
/// This is the fixture the per-branch v3→v4 migration runs against: it lets a
|
||||
/// test prove that migrating the `feature` branch reads the branch's legacy
|
||||
/// lineage, writes it into the BRANCH's `__manifest`, and leaves main untouched —
|
||||
/// the case the main-only [`seed_legacy_v3_lineage`] cannot exercise.
|
||||
#[cfg(test)]
|
||||
pub async fn seed_legacy_v3_lineage_with_branch(root_uri: &str) -> Result<V3BranchedLineageFixture> {
|
||||
let root = root_uri.trim_end_matches('/');
|
||||
|
||||
// 1. `__manifest` (genesis folded by Phase-7 init) + an empty legacy
|
||||
// `_graph_commits.lance`. Clear the init-seeded cache so the rows we
|
||||
// append below are the whole story.
|
||||
crate::db::manifest::seed_manifest_for_v3_fixture(root).await?;
|
||||
let mut cg = CommitGraph::init(root).await?;
|
||||
cg.commit_by_id.clear();
|
||||
cg.head_commit = None;
|
||||
|
||||
// 2. Main lineage on `_graph_commits.lance`: genesis → A (authored).
|
||||
let genesis = cg
|
||||
.append_commit_with_parents(None, 1, None, None, None)
|
||||
.await?;
|
||||
let commit_a = cg
|
||||
.append_commit_with_parents(None, 2, Some(&genesis), None, Some("act-a"))
|
||||
.await?;
|
||||
|
||||
// 3. Fork a real `feature` Lance branch on `_graph_commits.lance`, switch the
|
||||
// handle to it, and append an authored branch commit (its actor lands in
|
||||
// the flat main actor table — exactly the pre-Phase-7 shape).
|
||||
cg.create_branch("feature").await?;
|
||||
let commits_ds = cg
|
||||
.dataset
|
||||
.take()
|
||||
.expect("commits dataset present after create_branch")
|
||||
.checkout_branch("feature")
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
cg.dataset = Some(commits_ds);
|
||||
cg.active_branch = Some("feature".to_string());
|
||||
let branch_commit = cg
|
||||
.append_commit_with_parents(Some("feature"), 3, Some(&commit_a), None, Some("act-branch"))
|
||||
.await?;
|
||||
|
||||
// 4. Rewind main's `__manifest` to the v3 shape (strip the folded genesis
|
||||
// lineage, set stamp 3) BEFORE forking — so the `feature` manifest branch
|
||||
// inherits the stripped v3 state (no lineage, stamp 3).
|
||||
crate::db::manifest::strip_lineage_and_set_v3_stamp_for_fixture(root).await?;
|
||||
crate::db::manifest::fork_manifest_branch_for_v3_fixture(root, "feature").await?;
|
||||
|
||||
Ok(V3BranchedLineageFixture {
|
||||
genesis,
|
||||
commit_a,
|
||||
branch_commit,
|
||||
branch: "feature".to_string(),
|
||||
})
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
|
|
@ -709,6 +1086,83 @@ mod tests {
|
|||
|
||||
use super::*;
|
||||
|
||||
// RFC-013 step 4: the v3-read fallback / migration source reads a NAMED
|
||||
// branch's lineage from a real Lance branch on `_graph_commits.lance`, while
|
||||
// resolving actors from the FLAT actor table (the pre-Phase-7 commit graph
|
||||
// forked only the commits dataset, never the actor sidecar). This guards
|
||||
// both that branch-checkout path and the flat-actor resolution — the case
|
||||
// the main-branch fixture (commits on main only) does not exercise.
|
||||
#[tokio::test]
|
||||
async fn read_legacy_commit_cache_resolves_branch_commits_with_flat_actors() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap();
|
||||
|
||||
// A v3 graph needs `__manifest` to exist for `CommitGraph::init`'s
|
||||
// genesis-cache seed; we clear that cache and write our own legacy rows.
|
||||
crate::db::manifest::seed_manifest_for_v3_fixture(uri)
|
||||
.await
|
||||
.unwrap();
|
||||
let mut cg = CommitGraph::init(uri).await.unwrap();
|
||||
cg.commit_by_id.clear();
|
||||
cg.head_commit = None;
|
||||
|
||||
// Main lineage: genesis → A (authored). The actor lands in the FLAT
|
||||
// `_graph_commit_actors.lance` (never branched).
|
||||
let genesis = cg
|
||||
.append_commit_with_parents(None, 1, None, None, None)
|
||||
.await
|
||||
.unwrap();
|
||||
let commit_a = cg
|
||||
.append_commit_with_parents(None, 2, Some(&genesis), None, Some("act-a"))
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
// Fork a real Lance branch on `_graph_commits.lance`, switch the handle
|
||||
// to it, and append an authored branch commit (its actor also goes to
|
||||
// the flat main actor table — exactly the pre-Phase-7 shape).
|
||||
cg.create_branch("feature").await.unwrap();
|
||||
cg.dataset = Some(
|
||||
cg.dataset
|
||||
.take()
|
||||
.unwrap()
|
||||
.checkout_branch("feature")
|
||||
.await
|
||||
.unwrap(),
|
||||
);
|
||||
cg.active_branch = Some("feature".to_string());
|
||||
let branch_commit = cg
|
||||
.append_commit_with_parents(
|
||||
Some("feature"),
|
||||
3,
|
||||
Some(&commit_a),
|
||||
None,
|
||||
Some("act-branch"),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
// The legacy read at the branch sees the inherited main commits + the
|
||||
// branch commit, the head is the branch commit, and the authored actors
|
||||
// resolve from the flat table (no branch checkout on the actor dataset).
|
||||
let (commits, head) = read_legacy_commit_cache(uri, Some("feature")).await.unwrap();
|
||||
assert_eq!(commits.len(), 3, "branch inherits genesis + A + its own commit");
|
||||
assert_eq!(
|
||||
head.as_ref().unwrap().graph_commit_id,
|
||||
branch_commit,
|
||||
"the branch commit is the head"
|
||||
);
|
||||
assert_eq!(
|
||||
commits.get(&commit_a).unwrap().actor_id.as_deref(),
|
||||
Some("act-a"),
|
||||
"main commit's actor resolves from the flat actor table",
|
||||
);
|
||||
assert_eq!(
|
||||
commits.get(&branch_commit).unwrap().actor_id.as_deref(),
|
||||
Some("act-branch"),
|
||||
"branch commit's actor resolves from the flat actor table",
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn load_commits_from_batches_returns_error_for_bad_schema() {
|
||||
let batch = RecordBatch::try_new(
|
||||
|
|
|
|||
|
|
@ -106,13 +106,17 @@ impl GraphCoordinator {
|
|||
storage: Arc<dyn StorageAdapter>,
|
||||
) -> Result<Self> {
|
||||
let root = normalize_root_uri(root_uri)?;
|
||||
// The genesis graph commit is folded into the manifest init write, so
|
||||
// `__manifest` is the single source of graph lineage from version one
|
||||
// (RFC-013 Phase 7). `CommitGraph::init` then creates the empty
|
||||
// branch-ref dataset and seeds its cache from that manifest genesis.
|
||||
let manifest = ManifestCoordinator::init(&root, catalog).await?;
|
||||
let commit_graph = Some(CommitGraph::init(&root, manifest.version()).await?);
|
||||
let commit_graph = CommitGraph::init(&root).await?;
|
||||
Ok(Self {
|
||||
root_uri: root,
|
||||
storage,
|
||||
manifest,
|
||||
commit_graph,
|
||||
commit_graph: Some(commit_graph),
|
||||
bound_branch: None,
|
||||
})
|
||||
}
|
||||
|
|
@ -438,7 +442,12 @@ impl GraphCoordinator {
|
|||
.exists(&graph_commits_uri(self.root_uri()))
|
||||
.await?
|
||||
{
|
||||
let _ = CommitGraph::init(self.root_uri(), self.manifest.version()).await?;
|
||||
// A graph opened without a commit-graph dataset gets the empty
|
||||
// branch-ref dataset created lazily here. Graph lineage lives in
|
||||
// `__manifest` (RFC-013 Phase 7) — a graph initialized by current
|
||||
// code already carries its genesis there, and the commit graph
|
||||
// sources its cache from it. No genesis is written here.
|
||||
CommitGraph::init(self.root_uri()).await?;
|
||||
}
|
||||
self.commit_graph = match self.current_branch() {
|
||||
Some(branch) => Some(CommitGraph::open_at_branch(self.root_uri(), branch).await?),
|
||||
|
|
@ -452,12 +461,8 @@ impl GraphCoordinator {
|
|||
updates: &[SubTableUpdate],
|
||||
actor_id: Option<&str>,
|
||||
) -> Result<PublishedSnapshot> {
|
||||
let manifest_version = self.commit_manifest_updates(updates).await?;
|
||||
let snapshot_id = self.record_graph_commit(manifest_version, actor_id).await?;
|
||||
Ok(PublishedSnapshot {
|
||||
manifest_version,
|
||||
_snapshot_id: snapshot_id,
|
||||
})
|
||||
self.commit_updates_with_actor_with_expected(updates, &HashMap::new(), actor_id)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Commit with publisher-level OCC fence. The `expected_table_versions` map
|
||||
|
|
@ -471,45 +476,9 @@ impl GraphCoordinator {
|
|||
expected_table_versions: &HashMap<String, u64>,
|
||||
actor_id: Option<&str>,
|
||||
) -> Result<PublishedSnapshot> {
|
||||
let manifest_version = self
|
||||
.commit_manifest_updates_with_expected(updates, expected_table_versions)
|
||||
.await?;
|
||||
let snapshot_id = self.record_graph_commit(manifest_version, actor_id).await?;
|
||||
Ok(PublishedSnapshot {
|
||||
manifest_version,
|
||||
_snapshot_id: snapshot_id,
|
||||
})
|
||||
}
|
||||
|
||||
pub(crate) async fn commit_manifest_updates(
|
||||
&mut self,
|
||||
updates: &[SubTableUpdate],
|
||||
) -> Result<u64> {
|
||||
let manifest_version = self.manifest.commit(updates).await?;
|
||||
failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_AFTER_MANIFEST_COMMIT)?;
|
||||
Ok(manifest_version)
|
||||
}
|
||||
|
||||
pub(crate) async fn commit_manifest_updates_with_expected(
|
||||
&mut self,
|
||||
updates: &[SubTableUpdate],
|
||||
expected_table_versions: &HashMap<String, u64>,
|
||||
) -> Result<u64> {
|
||||
let manifest_version = self
|
||||
.manifest
|
||||
.commit_with_expected(updates, expected_table_versions)
|
||||
.await?;
|
||||
failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_AFTER_MANIFEST_COMMIT)?;
|
||||
Ok(manifest_version)
|
||||
}
|
||||
|
||||
pub(crate) async fn commit_manifest_changes(
|
||||
&mut self,
|
||||
changes: &[ManifestChange],
|
||||
) -> Result<u64> {
|
||||
let manifest_version = self.manifest.commit_changes(changes).await?;
|
||||
failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_AFTER_MANIFEST_COMMIT)?;
|
||||
Ok(manifest_version)
|
||||
let changes = updates_to_changes(updates);
|
||||
self.commit_changes_with_actor_with_expected(&changes, expected_table_versions, actor_id)
|
||||
.await
|
||||
}
|
||||
|
||||
pub(crate) async fn commit_changes_with_actor(
|
||||
|
|
@ -517,71 +486,110 @@ impl GraphCoordinator {
|
|||
changes: &[ManifestChange],
|
||||
actor_id: Option<&str>,
|
||||
) -> Result<PublishedSnapshot> {
|
||||
let manifest_version = self.commit_manifest_changes(changes).await?;
|
||||
let snapshot_id = self.record_graph_commit(manifest_version, actor_id).await?;
|
||||
self.commit_changes_with_actor_with_expected(changes, &HashMap::new(), actor_id)
|
||||
.await
|
||||
}
|
||||
|
||||
/// Publish `changes` and record one graph commit in the SAME manifest CAS
|
||||
/// (RFC-013 Phase 7). The lineage intent (a freshly minted commit id, the
|
||||
/// branch, the actor) rides the publish so the `graph_commit` + `graph_head`
|
||||
/// rows land atomically with the table-version rows — one manifest version,
|
||||
/// no separate write, no `commit_graph.refresh()` to pick a parent (the
|
||||
/// publisher resolves it under the CAS). The in-memory commit cache is then
|
||||
/// updated from the intent + the resolved parent without a re-read.
|
||||
async fn commit_changes_with_actor_with_expected(
|
||||
&mut self,
|
||||
changes: &[ManifestChange],
|
||||
expected_table_versions: &HashMap<String, u64>,
|
||||
actor_id: Option<&str>,
|
||||
) -> Result<PublishedSnapshot> {
|
||||
self.ensure_commit_graph_initialized().await?;
|
||||
let intent = self.new_lineage_intent(actor_id, None)?;
|
||||
failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_BEFORE_COMMIT_APPEND)?;
|
||||
let outcome = self
|
||||
.manifest
|
||||
.commit_changes_with_lineage(changes, expected_table_versions, Some(&intent))
|
||||
.await?;
|
||||
failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_AFTER_MANIFEST_COMMIT)?;
|
||||
let snapshot_id = self.apply_lineage_to_cache(intent, &outcome);
|
||||
Ok(PublishedSnapshot {
|
||||
manifest_version,
|
||||
manifest_version: outcome.version,
|
||||
_snapshot_id: snapshot_id,
|
||||
})
|
||||
}
|
||||
|
||||
pub(crate) async fn record_graph_commit(
|
||||
/// Publish a branch-merge: `updates` (the merged table versions) plus the
|
||||
/// merge commit, in one manifest CAS (RFC-013 Phase 7). The merge commit's
|
||||
/// merged-in parent is `merged_parent_commit_id` (the source head, stable);
|
||||
/// its first parent is resolved by the publisher as the current target-branch
|
||||
/// head — the live head, which is the post-merge correct parent even if the
|
||||
/// target advanced since the merge began.
|
||||
pub(crate) async fn commit_merge_with_actor(
|
||||
&mut self,
|
||||
manifest_version: u64,
|
||||
actor_id: Option<&str>,
|
||||
) -> Result<SnapshotId> {
|
||||
self.ensure_commit_graph_initialized().await?;
|
||||
let current_branch = self.current_branch().map(str::to_string);
|
||||
let Some(commit_graph) = &mut self.commit_graph else {
|
||||
return Ok(SnapshotId::synthetic(
|
||||
current_branch.as_deref(),
|
||||
manifest_version,
|
||||
self.manifest_incarnation().e_tag.as_deref(),
|
||||
));
|
||||
};
|
||||
failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_BEFORE_COMMIT_APPEND)?;
|
||||
// Refresh the commit-graph head from storage before selecting the
|
||||
// parent. `append_commit` parents the new commit on the IN-MEMORY head
|
||||
// (`head_commit_id`, zero storage read), but the manifest was just
|
||||
// committed against a freshly rebased pin (`commit_all` opens a fresh
|
||||
// coordinator) while THIS coordinator's cached head may be stale because
|
||||
// an external writer advanced the branch. Without this refresh a
|
||||
// same-branch write after an external commit appends off the stale head
|
||||
// and FORKS the commit DAG (the new commit and the external commit
|
||||
// sharing a parent). Refreshing makes the parent the true current head;
|
||||
// the just-committed manifest version has no commit-graph row yet, so the
|
||||
// fresh head is exactly the prior commit. (record_merge_commit is
|
||||
// unaffected — it passes explicit parents, never the cached head.)
|
||||
commit_graph.refresh().await?;
|
||||
let graph_commit_id = commit_graph
|
||||
.append_commit(current_branch.as_deref(), manifest_version, actor_id)
|
||||
.await?;
|
||||
Ok(SnapshotId::new(graph_commit_id))
|
||||
}
|
||||
|
||||
pub(crate) async fn record_merge_commit(
|
||||
&mut self,
|
||||
manifest_version: u64,
|
||||
parent_commit_id: &str,
|
||||
updates: &[SubTableUpdate],
|
||||
merged_parent_commit_id: &str,
|
||||
actor_id: Option<&str>,
|
||||
) -> Result<SnapshotId> {
|
||||
self.ensure_commit_graph_initialized().await?;
|
||||
let current_branch = self.current_branch().map(str::to_string);
|
||||
let commit_graph = self.commit_graph.as_mut().ok_or_else(|| {
|
||||
OmniError::manifest("branch merge requires _graph_commits.lance".to_string())
|
||||
})?;
|
||||
let intent =
|
||||
self.new_lineage_intent(actor_id, Some(merged_parent_commit_id.to_string()))?;
|
||||
failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_BEFORE_COMMIT_APPEND)?;
|
||||
let graph_commit_id = commit_graph
|
||||
.append_merge_commit(
|
||||
current_branch.as_deref(),
|
||||
manifest_version,
|
||||
parent_commit_id,
|
||||
merged_parent_commit_id,
|
||||
actor_id,
|
||||
)
|
||||
let changes = updates_to_changes(updates);
|
||||
let outcome = self
|
||||
.manifest
|
||||
.commit_changes_with_lineage(&changes, &HashMap::new(), Some(&intent))
|
||||
.await?;
|
||||
Ok(SnapshotId::new(graph_commit_id))
|
||||
failpoints::maybe_fail(crate::failpoints::names::GRAPH_PUBLISH_AFTER_MANIFEST_COMMIT)?;
|
||||
Ok(self.apply_lineage_to_cache(intent, &outcome))
|
||||
}
|
||||
|
||||
/// Mint a [`LineageIntent`] for the next commit on the current branch: a
|
||||
/// fresh ULID (stable across the publisher's CAS retries) and a timestamp.
|
||||
/// The parent is NOT chosen here — the publisher resolves it per attempt
|
||||
/// against the manifest it commits against.
|
||||
fn new_lineage_intent(
|
||||
&self,
|
||||
actor_id: Option<&str>,
|
||||
merged_parent_commit_id: Option<String>,
|
||||
) -> Result<crate::db::manifest::LineageIntent> {
|
||||
Ok(crate::db::manifest::LineageIntent {
|
||||
graph_commit_id: ulid::Ulid::new().to_string(),
|
||||
branch: self.current_branch().map(str::to_string),
|
||||
actor_id: actor_id.map(str::to_string),
|
||||
merged_parent_commit_id,
|
||||
created_at: crate::db::now_micros()?,
|
||||
})
|
||||
}
|
||||
|
||||
/// Insert the just-published commit into the in-memory commit cache from the
|
||||
/// intent + the publisher-resolved parent + the new manifest version. No
|
||||
/// storage I/O: the durable write already happened in the publish CAS, and
|
||||
/// this keeps a same-handle read's `head_commit_id` consistent with the
|
||||
/// snapshot it just advanced. Falls back to a synthetic id only when the
|
||||
/// commit graph is somehow absent (never on a real write).
|
||||
fn apply_lineage_to_cache(
|
||||
&mut self,
|
||||
intent: crate::db::manifest::LineageIntent,
|
||||
outcome: &crate::db::manifest::CommitOutcome,
|
||||
) -> SnapshotId {
|
||||
let Some(commit_graph) = &mut self.commit_graph else {
|
||||
return SnapshotId::synthetic(
|
||||
self.bound_branch.as_deref(),
|
||||
outcome.version,
|
||||
self.manifest.incarnation().e_tag.as_deref(),
|
||||
);
|
||||
};
|
||||
let commit = GraphCommit {
|
||||
graph_commit_id: intent.graph_commit_id.clone(),
|
||||
manifest_branch: intent.branch,
|
||||
manifest_version: outcome.version,
|
||||
parent_commit_id: outcome.parent_commit_id.clone(),
|
||||
merged_parent_commit_id: intent.merged_parent_commit_id,
|
||||
actor_id: intent.actor_id,
|
||||
created_at: intent.created_at,
|
||||
};
|
||||
commit_graph.insert_committed(commit);
|
||||
SnapshotId::new(intent.graph_commit_id)
|
||||
}
|
||||
|
||||
async fn open_commit_graph_for_branch(
|
||||
|
|
@ -625,6 +633,15 @@ fn graph_commits_uri(root_uri: &str) -> String {
|
|||
join_uri(root_uri, GRAPH_COMMITS_DIR)
|
||||
}
|
||||
|
||||
/// Wrap each `SubTableUpdate` as a `ManifestChange::Update` for the publisher.
|
||||
fn updates_to_changes(updates: &[SubTableUpdate]) -> Vec<ManifestChange> {
|
||||
updates
|
||||
.iter()
|
||||
.cloned()
|
||||
.map(ManifestChange::Update)
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn normalize_branch_name(branch: &str) -> Result<Option<String>> {
|
||||
let branch = branch.trim();
|
||||
if branch.is_empty() {
|
||||
|
|
|
|||
|
|
@ -35,7 +35,9 @@ pub(crate) use metadata::TableVersionMetadata;
|
|||
use metadata::{OMNIGRAPH_ROW_COUNT_KEY, table_version_metadata_for_state};
|
||||
#[cfg(test)]
|
||||
use namespace::{branch_manifest_namespace, staged_table_namespace};
|
||||
use publisher::{GraphNamespacePublisher, ManifestBatchPublisher};
|
||||
pub(crate) use migrations::refuse_if_stamp_unsupported;
|
||||
pub(crate) use publisher::LineageIntent;
|
||||
use publisher::{GraphNamespacePublisher, ManifestBatchPublisher, PublishOutcome};
|
||||
pub(crate) use recovery::{
|
||||
RecoveryMode, RecoverySidecar, RecoverySidecarHandle, SidecarKind, SidecarTablePin,
|
||||
SidecarTableRegistration, SidecarTombstone, confirm_sidecar_phase_b, delete_sidecar,
|
||||
|
|
@ -43,6 +45,7 @@ pub(crate) use recovery::{
|
|||
recover_manifest_drift, schema_apply_serial_queue_key, write_sidecar,
|
||||
};
|
||||
pub use state::SubTableEntry;
|
||||
pub(crate) use state::{GraphLineageRow, read_graph_lineage};
|
||||
#[cfg(test)]
|
||||
use state::string_column;
|
||||
use state::{ManifestState, read_manifest_state};
|
||||
|
|
@ -50,8 +53,34 @@ use state::{ManifestState, read_manifest_state};
|
|||
const OBJECT_TYPE_TABLE: &str = "table";
|
||||
const OBJECT_TYPE_TABLE_VERSION: &str = "table_version";
|
||||
const OBJECT_TYPE_TABLE_TOMBSTONE: &str = "table_tombstone";
|
||||
/// Immutable per-commit graph-lineage row (RFC-013 Phase 7). One row per graph
|
||||
/// commit; the projected form reconstructs a [`GraphCommit`]. `__manifest` is
|
||||
/// the single source — written in the same publish CAS as the table-version
|
||||
/// rows (no `_graph_commits.lance` row).
|
||||
const OBJECT_TYPE_GRAPH_COMMIT: &str = "graph_commit";
|
||||
/// Mutable per-branch head pointer for the graph lineage (RFC-013 Phase 7).
|
||||
/// `object_id` is `graph_head:<branch>` (`graph_head:main` for the main branch).
|
||||
const OBJECT_TYPE_GRAPH_HEAD: &str = "graph_head";
|
||||
const TABLE_VERSION_MANAGEMENT_KEY: &str = "table_version_management";
|
||||
|
||||
/// Stable head-key segment for the main branch in `graph_head:<branch>` rows.
|
||||
/// `table_branch`/`manifest_branch` encode main as null, but `object_id` must be
|
||||
/// non-null, so the head row needs a literal — matching the `"main"` sentinel
|
||||
/// already used by `SnapshotId::synthetic` and `open_for_branch`.
|
||||
pub(crate) const MAIN_BRANCH_HEAD_KEY: &str = "main";
|
||||
|
||||
/// The result of a manifest commit that may have folded in a graph commit
|
||||
/// (RFC-013 Phase 7).
|
||||
#[derive(Debug, Clone)]
|
||||
pub(crate) struct CommitOutcome {
|
||||
/// The new `__manifest` version after the publish.
|
||||
pub version: u64,
|
||||
/// The parent the publisher resolved for the recorded commit, or `None` when
|
||||
/// no lineage was recorded or the commit is the genesis. Lets the caller
|
||||
/// update its in-memory commit cache without re-reading the manifest.
|
||||
pub parent_commit_id: Option<String>,
|
||||
}
|
||||
|
||||
/// Apply pending internal-schema migrations against `__manifest` on the
|
||||
/// open-for-write path, independent of a publish.
|
||||
///
|
||||
|
|
@ -65,7 +94,105 @@ const TABLE_VERSION_MANAGEMENT_KEY: &str = "table_version_management";
|
|||
/// Idempotent: a no-op stamp read when the on-disk version already matches.
|
||||
pub(crate) async fn migrate_on_open(root_uri: &str) -> Result<()> {
|
||||
let mut dataset = open_manifest_dataset(root_uri, None).await?;
|
||||
migrations::migrate_internal_schema(&mut dataset).await
|
||||
// Main branch: the v3→v4 lineage backfill reads `_graph_commits.lance` at
|
||||
// main. Named branches migrate on their own first write via the publisher.
|
||||
migrations::migrate_internal_schema(&mut dataset, root_uri, None).await
|
||||
}
|
||||
|
||||
/// The on-disk internal-schema stamp of `__manifest` at `branch` (main when
|
||||
/// `None`). The transitional v3-read fallback in `CommitGraph` uses this to
|
||||
/// decide whether to source lineage from `__manifest` (stamp ≥ v4, post-Phase-7)
|
||||
/// or from the legacy `_graph_commits.lance` (stamp < v4, not yet migrated).
|
||||
pub(crate) async fn internal_schema_stamp_at(root_uri: &str, branch: Option<&str>) -> Result<u32> {
|
||||
let dataset = open_manifest_dataset(root_uri, branch).await?;
|
||||
Ok(migrations::read_stamp(&dataset))
|
||||
}
|
||||
|
||||
/// Refuse to open a graph whose `__manifest` is stamped outside this binary's
|
||||
/// supported internal-schema range (newer than CURRENT, or older than
|
||||
/// MIN_SUPPORTED). The read-only open path calls this — it skips the write-path
|
||||
/// migration where the refusal otherwise lives — so an old binary still refuses a
|
||||
/// newer graph instead of silently misreading it, and a too-new binary refuses a
|
||||
/// below-floor graph instead of opening an unmigrated one.
|
||||
pub(crate) async fn refuse_if_internal_schema_unsupported(root_uri: &str) -> Result<()> {
|
||||
let stamp = internal_schema_stamp_at(root_uri, None).await?;
|
||||
migrations::refuse_if_stamp_unsupported(stamp)
|
||||
}
|
||||
|
||||
/// The internal-schema version this binary writes. Exposed so the v3-read
|
||||
/// fallback can compare a branch's on-disk stamp against it.
|
||||
pub(crate) const INTERNAL_MANIFEST_SCHEMA_VERSION: u32 =
|
||||
migrations::INTERNAL_MANIFEST_SCHEMA_VERSION;
|
||||
|
||||
/// Test-only: create a `__manifest` for a minimal catalog, the first half of a
|
||||
/// synthetic pre-Phase-7 (v3) graph (see `commit_graph::seed_legacy_v3_lineage`).
|
||||
/// A small two-type schema is enough — the v3→v4 migration touches only the
|
||||
/// lineage rows, never the table-version rows.
|
||||
#[cfg(any(test, feature = "failpoints"))]
|
||||
pub(crate) async fn seed_manifest_for_v3_fixture(root_uri: &str) -> Result<()> {
|
||||
let schema = omnigraph_compiler::schema::parser::parse_schema(
|
||||
"node Person { name: String }\nedge Knows: Person -> Person { }\n",
|
||||
)
|
||||
.map_err(|e| OmniError::manifest(e.to_string()))?;
|
||||
let catalog =
|
||||
omnigraph_compiler::catalog::build_catalog(&schema).map_err(|e| OmniError::manifest(e.to_string()))?;
|
||||
ManifestCoordinator::init(root_uri, &catalog).await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Test-only: strip the `graph_commit`/`graph_head` rows that Phase-7 init folds
|
||||
/// into `__manifest`, then rewind the internal-schema stamp to v3 — completing a
|
||||
/// synthetic pre-Phase-7 graph whose lineage lives only in `_graph_commits.lance`.
|
||||
#[cfg(any(test, feature = "failpoints"))]
|
||||
pub(crate) async fn strip_lineage_and_set_v3_stamp_for_fixture(root_uri: &str) -> Result<()> {
|
||||
let mut dataset = open_manifest_dataset(root_uri, None).await?;
|
||||
dataset
|
||||
.delete("object_type = 'graph_commit' OR object_type = 'graph_head'")
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
// Re-open so the stamp write lands on the post-delete HEAD.
|
||||
let mut dataset = open_manifest_dataset(root_uri, None).await?;
|
||||
migrations::set_stamp_for_test(&mut dataset, 3).await
|
||||
}
|
||||
|
||||
/// Test-only: fork a real Lance branch `name` on `__manifest` from main's CURRENT
|
||||
/// state. Call AFTER `strip_lineage_and_set_v3_stamp_for_fixture` so the forked
|
||||
/// branch inherits the v3 stamp with no lineage rows — i.e. a faithful
|
||||
/// pre-Phase-7 branch whose `__manifest` carries no lineage of its own. The
|
||||
/// branch's commits live only on the `_graph_commits.lance` branch until the
|
||||
/// per-branch v3→v4 migration runs against this branch's `__manifest`.
|
||||
#[cfg(test)]
|
||||
pub(crate) async fn fork_manifest_branch_for_v3_fixture(root_uri: &str, name: &str) -> Result<()> {
|
||||
let mut dataset = open_manifest_dataset(root_uri, None).await?;
|
||||
let version = dataset.version().version;
|
||||
dataset
|
||||
.create_branch(name, version, None)
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Test-support re-export of the read-write migration entry point for the
|
||||
/// `failpoints` integration binary (which can't reach `pub(crate)` items). Gated
|
||||
/// on `test` OR `failpoints`; never in a release build.
|
||||
#[cfg(any(test, feature = "failpoints"))]
|
||||
pub async fn migrate_on_open_for_test(root_uri: &str) -> Result<()> {
|
||||
migrate_on_open(root_uri).await
|
||||
}
|
||||
|
||||
/// Test-support: the number of `graph_commit` lineage rows in `__manifest` at
|
||||
/// `branch` (main when `None`), plus the on-disk internal-schema stamp. Lets the
|
||||
/// `failpoints` integration binary assert the migration neither stamped nor
|
||||
/// backfilled when a legacy-open fault fired. Gated on `test` OR `failpoints`.
|
||||
#[cfg(any(test, feature = "failpoints"))]
|
||||
pub async fn lineage_row_count_and_stamp_for_test(
|
||||
root_uri: &str,
|
||||
branch: Option<&str>,
|
||||
) -> Result<(usize, u32)> {
|
||||
let dataset = open_manifest_dataset(root_uri, branch).await?;
|
||||
let stamp = migrations::read_stamp(&dataset);
|
||||
let (rows, _heads) = read_graph_lineage(&dataset).await?;
|
||||
Ok((rows.len(), stamp))
|
||||
}
|
||||
|
||||
/// Immutable point-in-time view of the database.
|
||||
|
|
@ -313,6 +440,9 @@ impl ManifestCoordinator {
|
|||
/// Create a new graph at `root_uri` from a catalog.
|
||||
///
|
||||
/// Creates per-type Lance datasets and the namespace `__manifest` table.
|
||||
/// The genesis graph commit is folded into the init write, so `__manifest`
|
||||
/// is the single source of graph lineage from version one — callers read it
|
||||
/// back through the lineage projection rather than via a second write.
|
||||
pub async fn init(root_uri: &str, catalog: &Catalog) -> Result<Self> {
|
||||
let root = root_uri.trim_end_matches('/');
|
||||
let (dataset, known_state) = init_manifest_graph(root, catalog).await?;
|
||||
|
|
@ -419,17 +549,58 @@ impl ManifestCoordinator {
|
|||
changes: &[ManifestChange],
|
||||
expected_table_versions: &HashMap<String, u64>,
|
||||
) -> Result<u64> {
|
||||
if changes.is_empty() && expected_table_versions.is_empty() {
|
||||
return Ok(self.version());
|
||||
Ok(self
|
||||
.commit_changes_with_lineage(changes, expected_table_versions, None)
|
||||
.await?
|
||||
.version)
|
||||
}
|
||||
|
||||
/// Publish `changes` and, when `lineage` is present, record the graph commit
|
||||
/// in the SAME merge-insert (RFC-013 Phase 7). `__manifest` is the single
|
||||
/// source of graph lineage: the `graph_commit` + `graph_head:<branch>` rows
|
||||
/// ride the table-version publish so the whole commit lands at one manifest
|
||||
/// version — no separate write, no manifest→commit-graph atomicity gap, no
|
||||
/// per-write commit-graph refresh. Returns the new version and the parent the
|
||||
/// publisher resolved for the commit (so the caller can update its in-memory
|
||||
/// commit cache without a re-read).
|
||||
pub(crate) async fn commit_changes_with_lineage(
|
||||
&mut self,
|
||||
changes: &[ManifestChange],
|
||||
expected_table_versions: &HashMap<String, u64>,
|
||||
lineage: Option<&LineageIntent>,
|
||||
) -> Result<CommitOutcome> {
|
||||
if changes.is_empty() && expected_table_versions.is_empty() && lineage.is_none() {
|
||||
return Ok(CommitOutcome {
|
||||
version: self.version(),
|
||||
parent_commit_id: None,
|
||||
});
|
||||
}
|
||||
|
||||
self.dataset = self
|
||||
let PublishOutcome {
|
||||
dataset,
|
||||
parent_commit_id,
|
||||
} = self
|
||||
.publisher
|
||||
.publish(changes, expected_table_versions)
|
||||
.publish(changes, expected_table_versions, lineage)
|
||||
.await?;
|
||||
self.dataset = dataset;
|
||||
|
||||
self.known_state = read_manifest_state(&self.dataset).await?;
|
||||
Ok(self.version())
|
||||
Ok(CommitOutcome {
|
||||
version: self.version(),
|
||||
parent_commit_id,
|
||||
})
|
||||
}
|
||||
|
||||
/// Project the graph-lineage rows out of `__manifest` at `branch` without an
|
||||
/// open coordinator. Opens the manifest fresh; used by `CommitGraph` to
|
||||
/// source its in-memory cache from the manifest projection.
|
||||
pub(crate) async fn read_graph_lineage_at(
|
||||
root_uri: &str,
|
||||
branch: Option<&str>,
|
||||
) -> Result<(Vec<GraphLineageRow>, HashMap<String, String>)> {
|
||||
let dataset = open_manifest_dataset(root_uri, branch).await?;
|
||||
read_graph_lineage(&dataset).await
|
||||
}
|
||||
|
||||
/// Current manifest version.
|
||||
|
|
|
|||
|
|
@ -14,9 +14,17 @@ use super::layout::{manifest_uri, open_manifest_dataset, type_name_hash};
|
|||
use super::metadata::TableVersionMetadata;
|
||||
use super::migrations::stamp_current_version;
|
||||
use super::state::{
|
||||
ManifestState, SubTableEntry, entries_to_batch, manifest_schema, read_manifest_state,
|
||||
GraphLineageRow, ManifestState, SubTableEntry, entries_to_batch, graph_lineage_row_parts,
|
||||
manifest_schema, read_manifest_state,
|
||||
};
|
||||
|
||||
/// The manifest version the init `Dataset::write` produces (Lance datasets start
|
||||
/// at version one). The genesis graph commit pins this version — a snapshot at
|
||||
/// it is the empty, freshly-initialized graph. The two config-only commits that
|
||||
/// follow (`update_config`, `stamp_current_version`) advance the live manifest
|
||||
/// version but add no table data, so genesis correctly stays pinned at one.
|
||||
const GENESIS_MANIFEST_VERSION: u64 = 1;
|
||||
|
||||
pub(super) async fn init_manifest_graph(
|
||||
root_uri: &str,
|
||||
catalog: &Catalog,
|
||||
|
|
@ -24,7 +32,21 @@ pub(super) async fn init_manifest_graph(
|
|||
let root = root_uri.trim_end_matches('/');
|
||||
let (entries, version_metadata) = build_initial_entries(root, catalog).await?;
|
||||
|
||||
let manifest_batch = entries_to_batch(&entries, &version_metadata)?;
|
||||
// Genesis graph commit: parentless, actorless, minted once and folded into
|
||||
// the init write so `__manifest` is the single source of graph lineage from
|
||||
// version one (no `_graph_commits.lance` row, no separate publish).
|
||||
let genesis = GraphLineageRow {
|
||||
graph_commit_id: ulid::Ulid::new().to_string(),
|
||||
manifest_branch: None,
|
||||
manifest_version: GENESIS_MANIFEST_VERSION,
|
||||
parent_commit_id: None,
|
||||
merged_parent_commit_id: None,
|
||||
actor_id: None,
|
||||
created_at: crate::db::now_micros()?,
|
||||
};
|
||||
let genesis_lineage = graph_lineage_row_parts(&genesis, None)?;
|
||||
|
||||
let manifest_batch = entries_to_batch(&entries, &version_metadata, &genesis_lineage)?;
|
||||
let schema = manifest_schema();
|
||||
let reader = RecordBatchIterator::new(vec![Ok(manifest_batch)], schema);
|
||||
let params = WriteParams {
|
||||
|
|
|
|||
|
|
@ -37,6 +37,9 @@ use lance::Dataset;
|
|||
|
||||
use crate::error::{OmniError, Result};
|
||||
|
||||
use crate::db::commit_graph::GraphCommit;
|
||||
use super::state::{GraphLineageRow, graph_lineage_row_parts, merge_lineage_rows, read_graph_lineage};
|
||||
|
||||
/// Current internal schema version this binary expects to find on disk.
|
||||
///
|
||||
/// History:
|
||||
|
|
@ -50,14 +53,62 @@ use crate::error::{OmniError, Result};
|
|||
/// `__manifest` dataset by the pre-v0.4.0 Run state machine (removed in
|
||||
/// MR-771). Once swept, the `is_internal_run_branch` defense-in-depth guard
|
||||
/// is no longer needed (MR-770).
|
||||
pub(super) const INTERNAL_MANIFEST_SCHEMA_VERSION: u32 = 3;
|
||||
/// - v4 — RFC-013 Phase 7 folds graph lineage into `__manifest` as
|
||||
/// `graph_commit`/`graph_head` rows written in the publish CAS. A pre-Phase-7
|
||||
/// (v3) graph has its lineage only in `_graph_commits.lance`, so the new
|
||||
/// binary would read an empty commit DAG. This one-time per-branch backfill
|
||||
/// copies the lineage from `_graph_commits.lance` into `__manifest`
|
||||
/// (`migrate_v3_to_v4`). `_graph_commits.lance` is left in place as the
|
||||
/// branch-ref carrier; no commit rows are ever written to it again.
|
||||
pub(crate) const INTERNAL_MANIFEST_SCHEMA_VERSION: u32 = 4;
|
||||
|
||||
/// The oldest on-disk internal-schema stamp this binary will open. A graph below
|
||||
/// this floor is refused (`refuse_if_stamp_unsupported`) with a "migrate it
|
||||
/// forward with an older release first" error, instead of obliging this binary to
|
||||
/// carry that version's `migrate_vN_…` arm and the legacy readers it needs
|
||||
/// forever. Raising the floor is how the migration chain sheds old code.
|
||||
///
|
||||
/// **Retirement runbook** — turning "accumulates forever" into a sliding window:
|
||||
/// 1. *Shed version N* once no graph below `N+1` remains in the fleet: bump this
|
||||
/// floor AND `LOWEST_REGISTERED_MIGRATION_SOURCE` to `N+1`, then delete the
|
||||
/// `N =>` arm in `migrate_internal_schema`, `migrate_vN_to_vN+1`, and its
|
||||
/// helpers + tests. The tripwire test keeps the two consts in lockstep, so a
|
||||
/// half-done shed fails CI.
|
||||
/// 2. *Retire the v3 legacy readers entirely* once MIN ≥ 4: `git rm` the
|
||||
/// `commit_graph/commit_graph_legacy_v3.rs` seam file and flip the single
|
||||
/// `stamp < CURRENT` gate in `load_commit_cache_for_branch` to read the
|
||||
/// manifest projection unconditionally.
|
||||
///
|
||||
/// MIN = 1 today is a pure no-op: `read_stamp` floors an absent stamp at 1 and no
|
||||
/// real graph carries 0, so nothing is refused.
|
||||
pub(crate) const MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION: u32 = 1;
|
||||
|
||||
/// The lowest `current` value the `migrate_internal_schema` dispatcher still has a
|
||||
/// `match` arm for. Mirrors the lowest registered migration source so a floor bump
|
||||
/// that forgets to delete the now-dead arm (or vice versa) is caught by the
|
||||
/// compile-time tripwire below. Migration arms aren't an enumerable registry, so
|
||||
/// this hand-mirrored const is the minimal enforced coupling — cheaper than
|
||||
/// reshaping the dispatcher into a data-driven table.
|
||||
const LOWEST_REGISTERED_MIGRATION_SOURCE: u32 = 1;
|
||||
|
||||
/// Retirement tripwire (compile-time): the refusal floor and the lowest migration
|
||||
/// arm must move together. Raising `MIN_SUPPORTED` without deleting the now-dead
|
||||
/// below-floor arm — or vice versa — fails the build with this message, which is
|
||||
/// stronger than a runtime test and impossible to skip. Migration arms can't be
|
||||
/// enumerated, so this const-mirror is the check.
|
||||
const _: () = assert!(
|
||||
LOWEST_REGISTERED_MIGRATION_SOURCE == MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION,
|
||||
"internal-schema floor drifted from the lowest registered migration arm: when raising \
|
||||
MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION, delete every below-floor `N =>` arm + migrate_vN_… \
|
||||
+ its helpers/tests and bump LOWEST_REGISTERED_MIGRATION_SOURCE to match (or vice versa)",
|
||||
);
|
||||
|
||||
const INTERNAL_SCHEMA_VERSION_KEY: &str = "omnigraph:internal_schema_version";
|
||||
const OBJECT_ID_PK_KEY: &str = "lance-schema:unenforced-primary-key";
|
||||
|
||||
/// Read the on-disk stamp from `__manifest`'s schema-level metadata.
|
||||
/// Absent ⇒ v1 (pre-stamp world).
|
||||
pub(super) fn read_stamp(dataset: &Dataset) -> u32 {
|
||||
pub(crate) fn read_stamp(dataset: &Dataset) -> u32 {
|
||||
dataset
|
||||
.schema()
|
||||
.metadata
|
||||
|
|
@ -72,20 +123,52 @@ pub(super) async fn stamp_current_version(dataset: &mut Dataset) -> Result<()> {
|
|||
set_stamp(dataset, INTERNAL_MANIFEST_SCHEMA_VERSION).await
|
||||
}
|
||||
|
||||
/// Refuse to open a manifest whose stamp this binary cannot serve — in either
|
||||
/// direction — with a clear upgrade path. Shared by every place a stamp is read
|
||||
/// and enforced: the write-path migration dispatcher, the read-only open guard,
|
||||
/// and the branch lineage-read path. Checking both bounds in one function means a
|
||||
/// new stamp-reading caller gets the floor and the ceiling together and cannot
|
||||
/// half-enforce.
|
||||
///
|
||||
/// - `stamp > CURRENT`: the graph was written by a newer binary — upgrade omnigraph.
|
||||
/// - `stamp < MIN_SUPPORTED`: the graph predates the oldest migration this binary
|
||||
/// still carries — migrate it forward with an older release first, then reopen.
|
||||
pub(crate) fn refuse_if_stamp_unsupported(stamp: u32) -> Result<()> {
|
||||
if stamp > INTERNAL_MANIFEST_SCHEMA_VERSION {
|
||||
return Err(OmniError::manifest(format!(
|
||||
"__manifest is stamped at internal schema v{} but this binary expects v{} \
|
||||
— upgrade omnigraph before opening this graph",
|
||||
stamp, INTERNAL_MANIFEST_SCHEMA_VERSION,
|
||||
)));
|
||||
}
|
||||
if stamp < MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION {
|
||||
return Err(OmniError::manifest(format!(
|
||||
"__manifest is stamped at internal schema v{} but this binary supports v{} or later \
|
||||
— open it with an older omnigraph release to migrate it forward first, then reopen",
|
||||
stamp, MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION,
|
||||
)));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Apply any pending internal-schema migrations to the manifest dataset.
|
||||
///
|
||||
/// Idempotent: when the on-disk stamp matches the binary, this is a single
|
||||
/// metadata read with no writes.
|
||||
pub(super) async fn migrate_internal_schema(dataset: &mut Dataset) -> Result<()> {
|
||||
///
|
||||
/// `root_uri` + `branch` identify which graph + branch this `dataset` is a
|
||||
/// manifest for. The v3→v4 lineage backfill needs them to read that branch's
|
||||
/// `_graph_commits.lance`. `migrate_on_open` passes the main branch
|
||||
/// (`branch = None`); the publisher's `load_publish_state` passes its own
|
||||
/// branch, so each branch backfills on its first write.
|
||||
pub(super) async fn migrate_internal_schema(
|
||||
dataset: &mut Dataset,
|
||||
root_uri: &str,
|
||||
branch: Option<&str>,
|
||||
) -> Result<()> {
|
||||
let mut current = read_stamp(dataset);
|
||||
|
||||
if current > INTERNAL_MANIFEST_SCHEMA_VERSION {
|
||||
return Err(OmniError::manifest(format!(
|
||||
"__manifest is stamped at internal schema v{} but this binary expects v{} \
|
||||
— upgrade omnigraph before opening this graph for writes",
|
||||
current, INTERNAL_MANIFEST_SCHEMA_VERSION,
|
||||
)));
|
||||
}
|
||||
refuse_if_stamp_unsupported(current)?;
|
||||
|
||||
while current < INTERNAL_MANIFEST_SCHEMA_VERSION {
|
||||
match current {
|
||||
|
|
@ -97,6 +180,10 @@ pub(super) async fn migrate_internal_schema(dataset: &mut Dataset) -> Result<()>
|
|||
migrate_v2_to_v3(dataset).await?;
|
||||
current = 3;
|
||||
}
|
||||
3 => {
|
||||
migrate_v3_to_v4(dataset, root_uri, branch).await?;
|
||||
current = 4;
|
||||
}
|
||||
other => {
|
||||
return Err(OmniError::manifest_internal(format!(
|
||||
"no internal-schema migration registered for v{} → v{}",
|
||||
|
|
@ -202,6 +289,218 @@ async fn migrate_v2_to_v3(dataset: &mut Dataset) -> Result<()> {
|
|||
set_stamp(dataset, 3).await
|
||||
}
|
||||
|
||||
/// v3 → v4: backfill the graph lineage from `_graph_commits.lance` into
|
||||
/// `__manifest`, then bump the stamp.
|
||||
///
|
||||
/// RFC-013 Phase 7 made `__manifest` the single source of graph lineage
|
||||
/// (`graph_commit` / `graph_head:<branch>` rows, written in the publish CAS).
|
||||
/// A pre-Phase-7 (v3) graph has its lineage only in `_graph_commits.lance` and
|
||||
/// none in `__manifest`, so the new binary would read an EMPTY commit DAG. This
|
||||
/// one-time per-branch migration copies that branch's commits + the single head
|
||||
/// into `__manifest` so reads see the real history. `_graph_commits.lance`
|
||||
/// itself is left untouched as the branch-ref carrier (no commit row is ever
|
||||
/// written to it again).
|
||||
///
|
||||
/// `dataset` is the `__manifest` for `branch` (main when `branch` is `None`);
|
||||
/// the migration runs per-branch on that branch's first write, so it reads
|
||||
/// `_graph_commits.lance` at the SAME branch.
|
||||
///
|
||||
/// Idempotency + crash recovery: the stamp bump is the LAST step, and the
|
||||
/// lineage merge is keyed on `object_id` (re-inserting the same commit rows is a
|
||||
/// no-op update). A crash after the merge but before the stamp bump re-enters
|
||||
/// here at v3 and re-runs harmlessly. As a fast path, if `__manifest` already
|
||||
/// carries `graph_commit` rows (a previous run completed the merge), we skip
|
||||
/// straight to the stamp bump.
|
||||
///
|
||||
/// Concurrent runners: two processes (or two open-for-write handles) can open the
|
||||
/// same legacy graph at once and both reach the backfill merge. `merge_lineage_rows`
|
||||
/// uses `conflict_retries(0)`, so the row-level CAS loser on `graph_head:<branch>`
|
||||
/// must be re-driven here rather than failing the open — `migrate_v2_to_v3` is
|
||||
/// concurrent-runner idempotent and this step must be too. The bounded loop
|
||||
/// re-reads the fast path (a concurrent winner's merge is one atomic Lance commit,
|
||||
/// so a re-read sees either zero or all of its rows, never partial), re-opens the
|
||||
/// stale handle past the winner's commit, and retries. On budget exhaustion it
|
||||
/// returns a `RowLevelCasContention`-typed error so the publisher's OUTER retry
|
||||
/// loop (which only re-runs `is_retryable_publish_conflict` conflicts) completes
|
||||
/// it on the next attempt — the same converge-on-next-attempt contract the
|
||||
/// recovery sweep uses.
|
||||
async fn migrate_v3_to_v4(
|
||||
dataset: &mut Dataset,
|
||||
root_uri: &str,
|
||||
branch: Option<&str>,
|
||||
) -> Result<()> {
|
||||
// Mirror the publisher's budget (`publisher::PUBLISHER_RETRY_BUDGET = 5`); kept
|
||||
// as a local const rather than re-exporting that private one — the two are the
|
||||
// same shape (bounded row-level-CAS retries) but independent knobs.
|
||||
const MIGRATION_MERGE_RETRY_BUDGET: u32 = 5;
|
||||
|
||||
// Exclusive range + an unguarded retryable arm (see `commit_v4_stamp_idempotently`
|
||||
// for the rationale): every retryable conflict re-opens and retries inside the
|
||||
// loop, and the SINGLE reachable exhaustion path is the typed contention return
|
||||
// below — so the retryable variant can never fall through to the `Err(err)`
|
||||
// propagate arm on the last iteration.
|
||||
for _ in 0..MIGRATION_MERGE_RETRY_BUDGET {
|
||||
// Fast path / idempotency + concurrent-winner guard: if the backfill
|
||||
// already landed (a previous run, OR a concurrent runner that won the CAS
|
||||
// — its merge is atomic, so this is all-or-nothing), don't re-merge — just
|
||||
// (re)stamp. `dataset` is re-opened past any winner's commit below, so this
|
||||
// re-read sees the winner's rows on a retry.
|
||||
let (existing_lineage, _heads) = read_graph_lineage(dataset).await?;
|
||||
if !existing_lineage.is_empty() {
|
||||
return commit_v4_stamp_idempotently(dataset, root_uri, branch).await;
|
||||
}
|
||||
|
||||
// Read this branch's legacy commit cache (commits + the head). An absent or
|
||||
// empty `_graph_commits.lance` yields no commits — nothing to backfill.
|
||||
let (commit_by_id, head) =
|
||||
crate::db::commit_graph::read_legacy_commit_cache(root_uri, branch).await?;
|
||||
if commit_by_id.is_empty() {
|
||||
return commit_v4_stamp_idempotently(dataset, root_uri, branch).await;
|
||||
}
|
||||
|
||||
let parts = build_lineage_backfill_parts(&commit_by_id, head.as_ref(), branch)?;
|
||||
|
||||
match merge_lineage_rows(dataset.clone(), &parts).await {
|
||||
Ok(new_dataset) => {
|
||||
*dataset = new_dataset;
|
||||
// Stamp LAST. Crash window: a failure between the merge above and
|
||||
// this stamp bump leaves stamp v3 + lineage present in `__manifest`.
|
||||
// The next open re-enters at v3, the fast path at the top sees the
|
||||
// lineage and skips straight to the stamp bump — completing the
|
||||
// migration with no duplicate rows (the merge is keyed on
|
||||
// `object_id`). Pinned by
|
||||
// `crash_after_merge_before_stamp_completes_on_next_open`.
|
||||
return commit_v4_stamp_idempotently(dataset, root_uri, branch).await;
|
||||
}
|
||||
// A concurrent runner won the `graph_head:<branch>` CAS. Our in-hand
|
||||
// handle is stale at the pre-contention HEAD, so a re-open is required
|
||||
// to see the winner's commit; then re-loop (the fast path will see the
|
||||
// winner's lineage and stamp). Bounded by the budget.
|
||||
Err(err) if super::publisher::is_retryable_publish_conflict(&err) => {
|
||||
*dataset = super::layout::open_manifest_dataset(root_uri, branch).await?;
|
||||
continue;
|
||||
}
|
||||
Err(err) => return Err(err),
|
||||
}
|
||||
}
|
||||
|
||||
// Budget exhausted under sustained contention. Return a CAS-typed error (not a
|
||||
// plain conflict) so the publisher's outer retry loop — which only re-runs
|
||||
// `is_retryable_publish_conflict` — re-runs `load_publish_state` and completes
|
||||
// the migration, rather than giving up.
|
||||
Err(OmniError::manifest_row_level_cas_contention(format!(
|
||||
"v3→v4 lineage backfill exhausted {} retries against concurrent runners",
|
||||
MIGRATION_MERGE_RETRY_BUDGET
|
||||
)))
|
||||
}
|
||||
|
||||
/// Stamp the v3→v4 migration's terminal version idempotently under concurrent
|
||||
/// runners. `set_stamp` issues an `UpdateConfig` Lance commit; once the merge CAS
|
||||
/// loser is made to converge (above), BOTH runners reach this stamp bump and race
|
||||
/// it — the loser gets `lance::Error::IncompatibleTransaction` (two `UpdateConfig`
|
||||
/// commits touching the same metadata key), which is NOT a row-level CAS
|
||||
/// contention and so is not caught by the merge loop. But both write the SAME
|
||||
/// value, so the conflict is benign: re-open and, if the stamp already reached the
|
||||
/// target (the concurrent runner finished it), succeed; otherwise re-apply.
|
||||
/// Bounded; on exhaustion surface a CAS-typed error for the publisher's outer
|
||||
/// retry, same as the merge loop.
|
||||
async fn commit_v4_stamp_idempotently(
|
||||
dataset: &mut Dataset,
|
||||
root_uri: &str,
|
||||
branch: Option<&str>,
|
||||
) -> Result<()> {
|
||||
const STAMP_RETRY_BUDGET: u32 = 5;
|
||||
// Exclusive range + an UNGUARDED `IncompatibleTransaction` arm: the retryable
|
||||
// variant is always handled inside the loop (re-open + same-value check + retry),
|
||||
// so it can never fall through to the stringifying `Err(e)` catch-all, and the
|
||||
// SINGLE reachable exhaustion path is the typed contention return below. (A
|
||||
// `0..=BUDGET` range with an `attempt < BUDGET` guard let the last iteration's
|
||||
// retryable conflict reach the catch-all and return a non-retryable
|
||||
// `OmniError::Lance` — the publisher's outer retry would then give up.)
|
||||
for _ in 0..STAMP_RETRY_BUDGET {
|
||||
// Inline the `update_schema_metadata` write (rather than `set_stamp`) so the
|
||||
// raw Lance error variant is in hand — `set_stamp` pre-stringifies it.
|
||||
let stamp_result = stamp_internal_schema(dataset).await;
|
||||
match stamp_result {
|
||||
Ok(_) => return Ok(()),
|
||||
Err(lance::Error::IncompatibleTransaction { .. }) => {
|
||||
// A concurrent runner's `UpdateConfig` preempted ours — the
|
||||
// retryable case. Re-open past its commit; if it already stamped to
|
||||
// the target we're done (the value is identical), else fall through
|
||||
// to retry on the advanced handle.
|
||||
*dataset = super::layout::open_manifest_dataset(root_uri, branch).await?;
|
||||
if read_stamp(dataset) >= INTERNAL_MANIFEST_SCHEMA_VERSION {
|
||||
return Ok(());
|
||||
}
|
||||
}
|
||||
Err(e) => return Err(OmniError::Lance(e.to_string())),
|
||||
}
|
||||
}
|
||||
|
||||
// Exhausted the budget against sustained concurrent stampers. Return a
|
||||
// CAS-typed (retryable) error so the publisher's OUTER retry — which only
|
||||
// re-runs `is_retryable_publish_conflict` — completes it, rather than the
|
||||
// stringified `OmniError::Lance` it would treat as fatal.
|
||||
Err(OmniError::manifest_row_level_cas_contention(format!(
|
||||
"v3→v4 stamp bump exhausted {} retries against concurrent runners",
|
||||
STAMP_RETRY_BUDGET
|
||||
)))
|
||||
}
|
||||
|
||||
/// The single `update_schema_metadata` write that bumps the on-disk internal-schema
|
||||
/// stamp to the current version. Extracted from `commit_v4_stamp_idempotently`'s
|
||||
/// retry loop so a `failpoints` test can inject a concurrent-stamper
|
||||
/// `IncompatibleTransaction` deterministically (the loop's exhaustion path is
|
||||
/// otherwise near-unreachable). Returns the RAW `lance::Error` so the loop can match
|
||||
/// the `IncompatibleTransaction` variant — `set_stamp` pre-stringifies it.
|
||||
async fn stamp_internal_schema(dataset: &mut Dataset) -> std::result::Result<(), lance::Error> {
|
||||
crate::failpoints::maybe_fail_lance_incompatible("migration.v4_stamp.force_incompatible")?;
|
||||
dataset
|
||||
.update_schema_metadata([(
|
||||
INTERNAL_SCHEMA_VERSION_KEY.to_string(),
|
||||
INTERNAL_MANIFEST_SCHEMA_VERSION.to_string(),
|
||||
)])
|
||||
.await
|
||||
.map(|_| ())
|
||||
}
|
||||
|
||||
/// Build the `__manifest` rows for the v3→v4 backfill: one immutable
|
||||
/// `graph_commit` row per commit, plus EXACTLY ONE `graph_head:<branch>` row for
|
||||
/// the actual head. Each commit encodes to a `[graph_commit, graph_head]` pair,
|
||||
/// but only the head commit's head row is kept — the others would be redundant
|
||||
/// updates of the same `graph_head:<branch>` object_id (the head is per-branch,
|
||||
/// not per-commit).
|
||||
fn build_lineage_backfill_parts(
|
||||
commit_by_id: &std::collections::HashMap<String, GraphCommit>,
|
||||
head: Option<&GraphCommit>,
|
||||
branch: Option<&str>,
|
||||
) -> Result<Vec<super::state::GraphLineageRowPart>> {
|
||||
let head_id = head.map(|h| h.graph_commit_id.as_str());
|
||||
// Deterministic iteration order (the source is a HashMap): merge-insert is
|
||||
// keyed on `object_id` so the final manifest content is order-independent,
|
||||
// but a stable order keeps the produced batch reproducible regardless.
|
||||
let mut commits: Vec<&GraphCommit> = commit_by_id.values().collect();
|
||||
commits.sort_by(|a, b| a.graph_commit_id.cmp(&b.graph_commit_id));
|
||||
let mut parts = Vec::with_capacity(commits.len() + 1);
|
||||
for commit in commits {
|
||||
let row = GraphLineageRow {
|
||||
graph_commit_id: commit.graph_commit_id.clone(),
|
||||
manifest_branch: commit.manifest_branch.clone(),
|
||||
manifest_version: commit.manifest_version,
|
||||
parent_commit_id: commit.parent_commit_id.clone(),
|
||||
merged_parent_commit_id: commit.merged_parent_commit_id.clone(),
|
||||
actor_id: commit.actor_id.clone(),
|
||||
created_at: commit.created_at,
|
||||
};
|
||||
let [commit_part, head_part] = graph_lineage_row_parts(&row, branch)?;
|
||||
parts.push(commit_part);
|
||||
if Some(commit.graph_commit_id.as_str()) == head_id {
|
||||
parts.push(head_part);
|
||||
}
|
||||
}
|
||||
Ok(parts)
|
||||
}
|
||||
|
||||
async fn set_stamp(dataset: &mut Dataset, version: u32) -> Result<()> {
|
||||
dataset
|
||||
.update_schema_metadata([(INTERNAL_SCHEMA_VERSION_KEY.to_string(), version.to_string())])
|
||||
|
|
@ -209,3 +508,42 @@ async fn set_stamp(dataset: &mut Dataset, version: u32) -> Result<()> {
|
|||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Test-only: force the on-disk internal-schema stamp to `version`. Used to
|
||||
/// synthesize a pre-migration graph (rewinding to v3) and to simulate a crash
|
||||
/// that lost the final stamp bump. Gated on `test` OR `failpoints` so the
|
||||
/// fault-injection migration test (in the `failpoints` integration binary,
|
||||
/// compiled without `cfg(test)`) can reach it too.
|
||||
#[cfg(any(test, feature = "failpoints"))]
|
||||
pub(crate) async fn set_stamp_for_test(dataset: &mut Dataset, version: u32) -> Result<()> {
|
||||
set_stamp(dataset, version).await
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
/// The floor never refuses any stamp the binary can actually serve — a graph
|
||||
/// at MIN through CURRENT passes, only sub-MIN / super-CURRENT are rejected.
|
||||
/// With MIN = 1 and CURRENT = 4 this proves the live range is exactly [1, 4]
|
||||
/// and that the floor is a no-op for every real graph (lowest real stamp is 1).
|
||||
#[test]
|
||||
fn unsupported_guard_accepts_exactly_the_supported_range() {
|
||||
for stamp in MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION..=INTERNAL_MANIFEST_SCHEMA_VERSION {
|
||||
assert!(
|
||||
refuse_if_stamp_unsupported(stamp).is_ok(),
|
||||
"stamp v{stamp} is within [MIN, CURRENT] and must be accepted"
|
||||
);
|
||||
}
|
||||
if MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION > 0 {
|
||||
assert!(
|
||||
refuse_if_stamp_unsupported(MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION - 1).is_err(),
|
||||
"a sub-floor stamp must be refused"
|
||||
);
|
||||
}
|
||||
assert!(
|
||||
refuse_if_stamp_unsupported(INTERNAL_MANIFEST_SCHEMA_VERSION + 1).is_err(),
|
||||
"a future stamp must be refused"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -35,8 +35,8 @@ use super::layout::{open_manifest_dataset, tombstone_object_id, version_object_i
|
|||
use super::metadata::parse_namespace_version_request;
|
||||
use super::migrations::migrate_internal_schema;
|
||||
use super::state::{
|
||||
manifest_rows_batch, manifest_schema, read_manifest_entries, read_registered_table_locations,
|
||||
read_tombstone_versions,
|
||||
GraphLineageRow, GraphLineageRowPart, graph_lineage_row_parts, head_lineage_row,
|
||||
manifest_rows_batch, manifest_schema, read_publish_scan,
|
||||
};
|
||||
use super::{
|
||||
ManifestChange, OBJECT_TYPE_TABLE, OBJECT_TYPE_TABLE_TOMBSTONE, OBJECT_TYPE_TABLE_VERSION,
|
||||
|
|
@ -50,13 +50,48 @@ use super::{
|
|||
/// iteration re-runs `load_publish_state` and the expected-version pre-check.
|
||||
const PUBLISHER_RETRY_BUDGET: u32 = 5;
|
||||
|
||||
/// The graph-lineage commit to record atomically with a manifest publish
|
||||
/// (RFC-013 Phase 7). One logical commit per publish: the `graph_commit_id` is
|
||||
/// minted once by the caller and stays stable across the publisher's CAS
|
||||
/// retries; only the parent re-resolves per attempt (against the freshly loaded
|
||||
/// `__manifest`), so a retry after a concurrent commit parents off the new head
|
||||
/// — the TOCTOU the dual-write era's `commit_graph.refresh()` guarded is closed
|
||||
/// by construction.
|
||||
#[derive(Debug, Clone)]
|
||||
pub(crate) struct LineageIntent {
|
||||
/// ULID minted once before the publish loop; the graph commit's identity.
|
||||
pub graph_commit_id: String,
|
||||
/// The branch this commit lands on (`None` = main). Selects the
|
||||
/// `graph_head:<branch>` pointer row that gets updated.
|
||||
pub branch: Option<String>,
|
||||
/// Authoring actor, or `None` for unauthored / system writes.
|
||||
pub actor_id: Option<String>,
|
||||
/// The merged-in source head — `Some` only for a branch-merge commit.
|
||||
pub merged_parent_commit_id: Option<String>,
|
||||
/// Commit timestamp (microseconds since the UNIX epoch).
|
||||
pub created_at: i64,
|
||||
}
|
||||
|
||||
/// The result of a manifest publish that may have folded in a graph commit.
|
||||
#[derive(Debug)]
|
||||
pub(super) struct PublishOutcome {
|
||||
/// The advanced `__manifest` dataset (its version is the published version).
|
||||
pub dataset: Dataset,
|
||||
/// The parent the publisher resolved for the recorded commit, if a
|
||||
/// [`LineageIntent`] was supplied. Returned so the caller can update its
|
||||
/// in-memory commit cache without a re-read. `None` when no lineage was
|
||||
/// recorded, or when the commit is the genesis (no parent).
|
||||
pub parent_commit_id: Option<String>,
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
pub(super) trait ManifestBatchPublisher: Send + Sync {
|
||||
async fn publish(
|
||||
&self,
|
||||
changes: &[ManifestChange],
|
||||
expected_table_versions: &HashMap<String, u64>,
|
||||
) -> Result<Dataset>;
|
||||
lineage: Option<&LineageIntent>,
|
||||
) -> Result<PublishOutcome>;
|
||||
}
|
||||
|
||||
pub(super) struct GraphNamespacePublisher {
|
||||
|
|
@ -76,6 +111,19 @@ struct PendingVersionRow {
|
|||
row_count: Option<u64>,
|
||||
}
|
||||
|
||||
/// Everything one CAS attempt needs out of a single `__manifest` scan
|
||||
/// (RFC-013 P2): the open dataset, table state for the pre-check + pending-row
|
||||
/// build, and the `graph_commit` lineage rows for parent resolution. Folding the
|
||||
/// lineage into this struct is what lets `resolve_lineage_rows` skip its own
|
||||
/// `read_graph_lineage` scan.
|
||||
struct LoadedPublishState {
|
||||
dataset: Dataset,
|
||||
registered_tables: HashMap<String, String>,
|
||||
existing_versions: HashMap<(String, u64), SubTableEntry>,
|
||||
existing_tombstones: HashMap<(String, u64), ()>,
|
||||
lineage_rows: Vec<GraphLineageRow>,
|
||||
}
|
||||
|
||||
impl GraphNamespacePublisher {
|
||||
pub(super) fn new(root_uri: &str, branch: Option<&str>) -> Self {
|
||||
Self {
|
||||
|
|
@ -90,22 +138,31 @@ impl GraphNamespacePublisher {
|
|||
open_manifest_dataset(&self.root_uri, self.branch.as_deref()).await
|
||||
}
|
||||
|
||||
async fn load_publish_state(
|
||||
&self,
|
||||
) -> Result<(
|
||||
Dataset,
|
||||
HashMap<String, String>,
|
||||
HashMap<(String, u64), SubTableEntry>,
|
||||
HashMap<(String, u64), ()>,
|
||||
)> {
|
||||
async fn load_publish_state(&self) -> Result<LoadedPublishState> {
|
||||
// Test seam: inject a retryable contention here to exercise the outer
|
||||
// retry loop's re-run-on-retryable-load-error path (no-op without the
|
||||
// `failpoints` feature). The migration surfaces the same typed error.
|
||||
crate::failpoints::maybe_fail_retryable_contention(
|
||||
crate::failpoints::names::PUBLISH_LOAD_STATE_RETRYABLE_CONTENTION,
|
||||
)?;
|
||||
let mut dataset = self.dataset().await?;
|
||||
// Run pending internal-schema migrations exactly once per publish on
|
||||
// the open-for-write path; idempotent when the on-disk stamp already
|
||||
// matches this binary. See `db/manifest/migrations.rs`.
|
||||
migrate_internal_schema(&mut dataset).await?;
|
||||
let registered_tables = read_registered_table_locations(&dataset).await?;
|
||||
let existing_entries = read_manifest_entries(&dataset).await?;
|
||||
let existing_versions = existing_entries
|
||||
// matches this binary. Pass this publisher's branch so the v3→v4 lineage
|
||||
// backfill reads `_graph_commits.lance` at the SAME branch it is
|
||||
// publishing to (each branch backfills on its first write). See
|
||||
// `db/manifest/migrations.rs`.
|
||||
migrate_internal_schema(&mut dataset, &self.root_uri, self.branch.as_deref()).await?;
|
||||
// ONE `__manifest` scan for everything the publish needs: table
|
||||
// locations, version entries, tombstones, AND the `graph_commit` lineage
|
||||
// rows for parent resolution (RFC-013 P2). The lineage extraction rides
|
||||
// this pass instead of a second `read_graph_lineage` scan in
|
||||
// `resolve_lineage_rows`; the per-attempt re-read is preserved because
|
||||
// `load_publish_state` runs once per CAS attempt, so a retry sees the
|
||||
// advanced head and re-parents correctly.
|
||||
let scan = read_publish_scan(&dataset).await?;
|
||||
let existing_versions = scan
|
||||
.version_entries
|
||||
.iter()
|
||||
.map(|entry| {
|
||||
(
|
||||
|
|
@ -114,13 +171,14 @@ impl GraphNamespacePublisher {
|
|||
)
|
||||
})
|
||||
.collect();
|
||||
let existing_tombstones = read_tombstone_versions(&dataset).await?;
|
||||
Ok((
|
||||
let existing_tombstones = scan.tombstones.into_iter().collect();
|
||||
Ok(LoadedPublishState {
|
||||
dataset,
|
||||
registered_tables,
|
||||
registered_tables: scan.table_locations,
|
||||
existing_versions,
|
||||
existing_tombstones,
|
||||
))
|
||||
lineage_rows: scan.lineage_rows,
|
||||
})
|
||||
}
|
||||
|
||||
fn build_pending_rows(
|
||||
|
|
@ -266,6 +324,50 @@ impl GraphNamespacePublisher {
|
|||
Ok(rows)
|
||||
}
|
||||
|
||||
/// Resolve the parent for `intent` against the just-loaded `dataset` and
|
||||
/// build the two lineage rows (`graph_commit` + `graph_head:<branch>`) to
|
||||
/// fold into the publish batch. Runs INSIDE the CAS retry loop, so the
|
||||
/// parent is read from the manifest state this attempt will commit against —
|
||||
/// a retry after a concurrent commit re-reads the advanced head and parents
|
||||
/// correctly (TOCTOU closed). `new_manifest_version` is the version this
|
||||
/// publish produces (the recorded commit pins it).
|
||||
///
|
||||
/// The parent is the current head of the branch's lineage — the
|
||||
/// `should_replace_head` winner over the visible `graph_commit` rows, the
|
||||
/// same selection the commit-graph cache uses. (The denormalized
|
||||
/// `graph_head:<branch>` row is written for forward-compat but is not the
|
||||
/// parent source here: a branch freshly forked from main inherits main's
|
||||
/// commits but not yet a `graph_head:<its-name>` row, and the head-over-rows
|
||||
/// computation gives the correct fork-point parent in that case.)
|
||||
///
|
||||
/// `lineage_rows` is the `graph_commit` set this attempt already parsed in
|
||||
/// `load_publish_state`'s single scan (RFC-013 P2) — NOT a fresh
|
||||
/// `read_graph_lineage` scan. The per-attempt re-read is still preserved: the
|
||||
/// retry loop re-runs `load_publish_state`, so each attempt's `lineage_rows`
|
||||
/// reflects the head as it stands for that attempt.
|
||||
fn resolve_lineage_rows(
|
||||
lineage_rows: &[GraphLineageRow],
|
||||
intent: &LineageIntent,
|
||||
new_manifest_version: u64,
|
||||
) -> Result<(Vec<PendingVersionRow>, Option<String>)> {
|
||||
let parent_commit_id = head_lineage_row(lineage_rows).map(|h| h.graph_commit_id.clone());
|
||||
|
||||
let commit = GraphLineageRow {
|
||||
graph_commit_id: intent.graph_commit_id.clone(),
|
||||
manifest_branch: intent.branch.clone(),
|
||||
manifest_version: new_manifest_version,
|
||||
parent_commit_id: parent_commit_id.clone(),
|
||||
merged_parent_commit_id: intent.merged_parent_commit_id.clone(),
|
||||
actor_id: intent.actor_id.clone(),
|
||||
created_at: intent.created_at,
|
||||
};
|
||||
let parts = graph_lineage_row_parts(&commit, intent.branch.as_deref())?;
|
||||
Ok((
|
||||
parts.into_iter().map(lineage_part_to_pending).collect(),
|
||||
parent_commit_id,
|
||||
))
|
||||
}
|
||||
|
||||
fn pending_rows_to_batch(rows: Vec<PendingVersionRow>) -> Result<arrow_array::RecordBatch> {
|
||||
let mut object_ids = Vec::with_capacity(rows.len());
|
||||
let mut object_types = Vec::with_capacity(rows.len());
|
||||
|
|
@ -420,7 +522,25 @@ impl GraphNamespacePublisher {
|
|||
}))
|
||||
})
|
||||
.collect::<Result<Vec<_>>>()?;
|
||||
self.publish(&changes, &HashMap::new()).await
|
||||
Ok(self.publish(&changes, &HashMap::new(), None).await?.dataset)
|
||||
}
|
||||
}
|
||||
|
||||
/// Map a `state::GraphLineageRowPart` onto a `PendingVersionRow` so a graph
|
||||
/// commit's two lineage rows ride the same publish batch as the table-version
|
||||
/// rows (RFC-013 Phase 7). Lineage rows carry no table identity: `table_key` is
|
||||
/// the empty string (never matched by a real key) and `location`/`row_count`
|
||||
/// are null.
|
||||
fn lineage_part_to_pending(part: GraphLineageRowPart) -> PendingVersionRow {
|
||||
PendingVersionRow {
|
||||
object_id: part.object_id,
|
||||
object_type: part.object_type.to_string(),
|
||||
location: None,
|
||||
metadata: Some(part.metadata),
|
||||
table_key: String::new(),
|
||||
table_version: part.table_version,
|
||||
table_branch: part.table_branch,
|
||||
row_count: None,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -429,7 +549,17 @@ impl GraphNamespacePublisher {
|
|||
/// merge-insert join key, annotated as an unenforced primary key on
|
||||
/// `__manifest`). Translate it to a typed manifest conflict so callers can
|
||||
/// match without parsing strings; everything else is opaque storage.
|
||||
fn map_lance_publish_error(err: LanceError) -> OmniError {
|
||||
///
|
||||
/// Shared (`pub(crate)`) with the v3→v4 lineage backfill
|
||||
/// (`state::merge_lineage_rows`), which issues its own `__manifest` merge-insert
|
||||
/// outside the publisher and must surface the SAME typed
|
||||
/// `RowLevelCasContention` so the migration's re-open retry loop can recognize a
|
||||
/// CAS loss. This is the merge-insert (`execute_reader`) conflict vocabulary
|
||||
/// only. It is deliberately NOT `optimize::is_retryable_lance_conflict`: that one
|
||||
/// also matches `CommitConflict`/`RetryableCommitConflict` from the COMPACTION
|
||||
/// commit path (`compact_files` -> `apply_commit`), which a row-level merge-insert
|
||||
/// never emits — folding it in here would match impossible variants.
|
||||
pub(crate) fn map_lance_publish_error(err: LanceError) -> OmniError {
|
||||
if matches!(err, LanceError::TooMuchWriteContention { .. }) {
|
||||
return OmniError::manifest_row_level_cas_contention(format!(
|
||||
"manifest publish lost a row-level CAS race: {}",
|
||||
|
|
@ -445,14 +575,40 @@ impl ManifestBatchPublisher for GraphNamespacePublisher {
|
|||
&self,
|
||||
changes: &[ManifestChange],
|
||||
expected_table_versions: &HashMap<String, u64>,
|
||||
) -> Result<Dataset> {
|
||||
if changes.is_empty() && expected_table_versions.is_empty() {
|
||||
return self.dataset().await;
|
||||
lineage: Option<&LineageIntent>,
|
||||
) -> Result<PublishOutcome> {
|
||||
if changes.is_empty() && expected_table_versions.is_empty() && lineage.is_none() {
|
||||
return Ok(PublishOutcome {
|
||||
dataset: self.dataset().await?,
|
||||
parent_commit_id: None,
|
||||
});
|
||||
}
|
||||
|
||||
for attempt in 0..=PUBLISHER_RETRY_BUDGET {
|
||||
let (dataset, known_tables, existing_versions, existing_tombstones) =
|
||||
self.load_publish_state().await?;
|
||||
// `load_publish_state` runs the v3→v4 migration (`migrate_internal_schema`)
|
||||
// on its first scan. The migration's bounded merge/stamp retries surface a
|
||||
// retryable `RowLevelCasContention` on exhaustion EXPECTING this outer loop
|
||||
// to re-run them — a re-run re-reads the manifest, by which point a
|
||||
// concurrent winner has usually completed the migration (next scan is a
|
||||
// no-op). Route a retryable load error through the SAME retry path as a
|
||||
// retryable `merge_rows` conflict below, so that typed contention actually
|
||||
// composes with the publisher retry instead of aborting the publish.
|
||||
let loaded = match self.load_publish_state().await {
|
||||
Ok(loaded) => loaded,
|
||||
Err(err)
|
||||
if attempt < PUBLISHER_RETRY_BUDGET && is_retryable_publish_conflict(&err) =>
|
||||
{
|
||||
continue;
|
||||
}
|
||||
Err(err) => return Err(err),
|
||||
};
|
||||
let LoadedPublishState {
|
||||
dataset,
|
||||
registered_tables: known_tables,
|
||||
existing_versions,
|
||||
existing_tombstones,
|
||||
lineage_rows,
|
||||
} = loaded;
|
||||
|
||||
let latest_per_table =
|
||||
Self::latest_visible_per_table(&existing_versions, &existing_tombstones);
|
||||
|
|
@ -461,19 +617,48 @@ impl ManifestBatchPublisher for GraphNamespacePublisher {
|
|||
// surfaced as `ExpectedVersionMismatch` rather than retried.
|
||||
Self::check_expected_table_versions(&latest_per_table, expected_table_versions)?;
|
||||
|
||||
if changes.is_empty() {
|
||||
return Ok(dataset);
|
||||
}
|
||||
|
||||
let rows = Self::build_pending_rows(
|
||||
let mut rows = Self::build_pending_rows(
|
||||
changes,
|
||||
&known_tables,
|
||||
&existing_versions,
|
||||
&existing_tombstones,
|
||||
)?;
|
||||
|
||||
// Fold the graph commit into the SAME batch so table-version rows
|
||||
// and lineage rows land in one merge-insert (one Lance commit, one
|
||||
// manifest version) — no separate write, no manifest→commit-graph
|
||||
// atomicity gap. The merge-insert advances exactly one version on
|
||||
// top of the loaded dataset, so the commit pins
|
||||
// `current + 1`. The parent is resolved here, per attempt, from the
|
||||
// lineage rows THIS attempt's scan loaded (TOCTOU closed on a CAS
|
||||
// retry — a retry re-runs `load_publish_state` → fresh lineage).
|
||||
let parent_commit_id = match lineage {
|
||||
Some(intent) => {
|
||||
let new_manifest_version = dataset.version().version + 1;
|
||||
let (commit_rows, parent) =
|
||||
Self::resolve_lineage_rows(&lineage_rows, intent, new_manifest_version)?;
|
||||
rows.extend(commit_rows);
|
||||
parent
|
||||
}
|
||||
None => None,
|
||||
};
|
||||
|
||||
if rows.is_empty() {
|
||||
// Expected-version-only publish with no changes and no lineage:
|
||||
// the precondition held, nothing to write.
|
||||
return Ok(PublishOutcome {
|
||||
dataset,
|
||||
parent_commit_id,
|
||||
});
|
||||
}
|
||||
|
||||
match self.merge_rows(dataset, rows).await {
|
||||
Ok(new_dataset) => return Ok(new_dataset),
|
||||
Ok(new_dataset) => {
|
||||
return Ok(PublishOutcome {
|
||||
dataset: new_dataset,
|
||||
parent_commit_id,
|
||||
});
|
||||
}
|
||||
Err(err) => {
|
||||
if attempt < PUBLISHER_RETRY_BUDGET && is_retryable_publish_conflict(&err) {
|
||||
continue;
|
||||
|
|
@ -497,7 +682,12 @@ impl ManifestBatchPublisher for GraphNamespacePublisher {
|
|||
/// contention; if the caller's `expected_table_versions` still holds against
|
||||
/// the new manifest state, we re-attempt. Other conflict variants (notably
|
||||
/// `ExpectedVersionMismatch`) propagate so the caller learns immediately.
|
||||
fn is_retryable_publish_conflict(err: &OmniError) -> bool {
|
||||
///
|
||||
/// Shared (`pub(crate)`) with the v3→v4 lineage backfill's re-open retry loop
|
||||
/// (`migrations::migrate_v3_to_v4`), so the migration's retry decision matches the
|
||||
/// publisher's by construction — both retry exactly `RowLevelCasContention` and
|
||||
/// propagate everything else.
|
||||
pub(crate) fn is_retryable_publish_conflict(err: &OmniError) -> bool {
|
||||
matches!(
|
||||
err,
|
||||
OmniError::Manifest(m)
|
||||
|
|
|
|||
|
|
@ -40,17 +40,14 @@ use lance::Dataset;
|
|||
use serde::{Deserialize, Serialize};
|
||||
use tracing::warn;
|
||||
|
||||
use crate::db::commit_graph::CommitGraph;
|
||||
use crate::db::graph_coordinator::GraphCoordinator;
|
||||
use crate::db::recovery_audit::{
|
||||
RecoveryAudit, RecoveryAuditRecord, RecoveryKind, TableOutcome, now_micros,
|
||||
};
|
||||
use crate::db::recovery_audit::{RecoveryAudit, RecoveryAuditRecord, RecoveryKind, TableOutcome};
|
||||
use crate::db::schema_state::SchemaStateRecovery;
|
||||
use crate::error::{OmniError, Result};
|
||||
use crate::storage::StorageAdapter;
|
||||
|
||||
use super::Snapshot;
|
||||
use super::publisher::{GraphNamespacePublisher, ManifestBatchPublisher};
|
||||
use super::publisher::{GraphNamespacePublisher, LineageIntent, ManifestBatchPublisher};
|
||||
use super::{ManifestChange, SubTableUpdate, TableRegistration, TableTombstone};
|
||||
|
||||
/// System actor identifier recorded on every recovery commit. Operators
|
||||
|
|
@ -59,6 +56,44 @@ use super::{ManifestChange, SubTableUpdate, TableRegistration, TableTombstone};
|
|||
/// into the audit row's `recovery_for_actor` field.
|
||||
pub(crate) const RECOVERY_ACTOR: &str = "omnigraph:recovery";
|
||||
|
||||
/// Publish a recovery action's manifest `updates` AND its recovery commit in one
|
||||
/// CAS (RFC-013 Phase 7). The recovery commit's lineage (`graph_commit` +
|
||||
/// `graph_head`) rides the same merge-insert as the table-version re-pin — there
|
||||
/// is no separate `_graph_commits.lance` write and no manifest→commit-graph gap.
|
||||
/// `updates` is empty for the no-table-change recovery paths (all-NoMovement
|
||||
/// roll-back, stale-sidecar cleanup, orphaned-branch discard); the lineage rows
|
||||
/// still publish, so the recovery commit is always durable.
|
||||
///
|
||||
/// The commit's first parent is resolved by the publisher (the live head of the
|
||||
/// recovery's branch); its merged-in parent is the sidecar's recorded source
|
||||
/// head for a rolled-forward branch merge, matching the pre-Phase-7 merge-commit
|
||||
/// shape. Returns the new manifest version and the minted recovery commit id
|
||||
/// (which the audit row references).
|
||||
async fn publish_recovery_commit(
|
||||
root_uri: &str,
|
||||
sidecar: &RecoverySidecar,
|
||||
kind: RecoveryKind,
|
||||
updates: &[ManifestChange],
|
||||
expected: &HashMap<String, u64>,
|
||||
) -> Result<(u64, String)> {
|
||||
let merged_parent_commit_id = match (sidecar.writer_kind, kind) {
|
||||
(SidecarKind::BranchMerge, RecoveryKind::RolledForward) => {
|
||||
sidecar.merge_source_commit_id.clone()
|
||||
}
|
||||
_ => None,
|
||||
};
|
||||
let intent = LineageIntent {
|
||||
graph_commit_id: ulid::Ulid::new().to_string(),
|
||||
branch: sidecar.branch.clone(),
|
||||
actor_id: Some(RECOVERY_ACTOR.to_string()),
|
||||
merged_parent_commit_id,
|
||||
created_at: crate::db::now_micros()?,
|
||||
};
|
||||
let publisher = GraphNamespacePublisher::new(root_uri, sidecar.branch.as_deref());
|
||||
let outcome = publisher.publish(updates, expected, Some(&intent)).await?;
|
||||
Ok((outcome.dataset.version().version, intent.graph_commit_id))
|
||||
}
|
||||
|
||||
/// Subdirectory under the graph root holding sidecar files.
|
||||
pub(crate) const RECOVERY_DIR_NAME: &str = "__recovery";
|
||||
|
||||
|
|
@ -831,20 +866,13 @@ pub(crate) async fn heal_pending_sidecars_roll_forward(
|
|||
// authority) BEFORE opening: a deferred sidecar whose
|
||||
// branch was deleted would otherwise wedge every write
|
||||
// on the dead-branch open.
|
||||
let (branch_exists, main_version) = {
|
||||
let branch_exists = {
|
||||
let mut coord = coordinator.write().await;
|
||||
coord.refresh().await?;
|
||||
let exists = coord.all_branches().await?.iter().any(|name| name == b);
|
||||
(exists, coord.snapshot().version())
|
||||
coord.all_branches().await?.iter().any(|name| name == b)
|
||||
};
|
||||
if !branch_exists {
|
||||
discard_orphaned_branch_sidecar(
|
||||
root_uri,
|
||||
storage.as_ref(),
|
||||
&sidecar,
|
||||
main_version,
|
||||
)
|
||||
.await?;
|
||||
discard_orphaned_branch_sidecar(root_uri, storage.as_ref(), &sidecar).await?;
|
||||
processed_any = true;
|
||||
continue;
|
||||
}
|
||||
|
|
@ -893,7 +921,6 @@ async fn discard_orphaned_branch_sidecar(
|
|||
root_uri: &str,
|
||||
storage: &dyn StorageAdapter,
|
||||
sidecar: &RecoverySidecar,
|
||||
manifest_version: u64,
|
||||
) -> Result<()> {
|
||||
warn!(
|
||||
operation_id = sidecar.operation_id.as_str(),
|
||||
|
|
@ -922,22 +949,31 @@ async fn discard_orphaned_branch_sidecar(
|
|||
&& record.recovery_kind == RecoveryKind::OrphanedBranchDiscarded
|
||||
});
|
||||
if !already_recorded {
|
||||
let mut graph = CommitGraph::open(root_uri).await?;
|
||||
let graph_commit_id = graph
|
||||
.append_commit(None, manifest_version, Some(RECOVERY_ACTOR))
|
||||
.await?;
|
||||
// Failpoint: the residual window above — commit appended, audit
|
||||
// The orphan-discard commit is recorded on MAIN (the sidecar's own
|
||||
// branch is gone), via a lineage-only publish into `__manifest` (RFC-013
|
||||
// Phase 7) — no `_graph_commits.lance` row. The publisher stamps the
|
||||
// commit at the version it produces.
|
||||
let intent = LineageIntent {
|
||||
graph_commit_id: ulid::Ulid::new().to_string(),
|
||||
branch: None,
|
||||
actor_id: Some(RECOVERY_ACTOR.to_string()),
|
||||
merged_parent_commit_id: None,
|
||||
created_at: crate::db::now_micros()?,
|
||||
};
|
||||
let publisher = GraphNamespacePublisher::new(root_uri, None);
|
||||
publisher.publish(&[], &HashMap::new(), Some(&intent)).await?;
|
||||
// Failpoint: the residual window above — commit published, audit
|
||||
// not yet durable.
|
||||
crate::failpoints::maybe_fail(crate::failpoints::names::RECOVERY_ORPHAN_DISCARD_AUDIT_APPEND)?;
|
||||
audit
|
||||
.append(RecoveryAuditRecord {
|
||||
graph_commit_id,
|
||||
graph_commit_id: intent.graph_commit_id,
|
||||
recovery_kind: RecoveryKind::OrphanedBranchDiscarded,
|
||||
recovery_for_actor: sidecar.actor_id.clone(),
|
||||
operation_id: sidecar.operation_id.clone(),
|
||||
sidecar_writer_kind: format!("{:?}", sidecar.writer_kind),
|
||||
per_table_outcomes: Vec::new(),
|
||||
created_at: now_micros()?,
|
||||
created_at: crate::db::now_micros()?,
|
||||
})
|
||||
.await?;
|
||||
}
|
||||
|
|
@ -1014,13 +1050,7 @@ pub(crate) async fn recover_manifest_drift(
|
|||
.iter()
|
||||
.any(|name| name == b)
|
||||
{
|
||||
discard_orphaned_branch_sidecar(
|
||||
root_uri,
|
||||
storage.as_ref(),
|
||||
&sidecar,
|
||||
coordinator.snapshot().version(),
|
||||
)
|
||||
.await?;
|
||||
discard_orphaned_branch_sidecar(root_uri, storage.as_ref(), &sidecar).await?;
|
||||
continue;
|
||||
}
|
||||
let mut branch_coord =
|
||||
|
|
@ -1154,7 +1184,7 @@ async fn process_sidecar(
|
|||
);
|
||||
}
|
||||
return record_audit_recovery_rollforward(
|
||||
root_uri, storage.as_ref(), snapshot, sidecar, &states,
|
||||
root_uri, storage.as_ref(), sidecar, &states,
|
||||
)
|
||||
.await
|
||||
.map(|()| true);
|
||||
|
|
@ -1176,7 +1206,7 @@ async fn process_sidecar(
|
|||
writer_kind = ?sidecar.writer_kind,
|
||||
"recovery: rolling back sidecar (mixed or unexpected state)"
|
||||
);
|
||||
roll_back_sidecar(root_uri, storage.as_ref(), snapshot, sidecar, &states)
|
||||
roll_back_sidecar(root_uri, storage.as_ref(), sidecar, &states)
|
||||
.await
|
||||
.map(|()| true)
|
||||
}
|
||||
|
|
@ -1191,7 +1221,7 @@ async fn process_sidecar(
|
|||
"recovery: rolling back SchemaApply sidecar because schema staging \
|
||||
files were not promoted in this recovery pass"
|
||||
);
|
||||
roll_back_sidecar(root_uri, storage.as_ref(), snapshot, sidecar, &states)
|
||||
roll_back_sidecar(root_uri, storage.as_ref(), sidecar, &states)
|
||||
.await
|
||||
.map(|()| true)
|
||||
}
|
||||
|
|
@ -1218,7 +1248,10 @@ async fn process_sidecar(
|
|||
crate::failpoints::maybe_fail(
|
||||
crate::failpoints::names::RECOVERY_BEFORE_ROLL_FORWARD_PUBLISH,
|
||||
)?;
|
||||
let (new_manifest_version, published_versions) =
|
||||
// RFC-013 Phase 7: `roll_forward_all` folds the recovery commit into the
|
||||
// manifest publish CAS, so it also returns the minted `graph_commit_id`
|
||||
// for the audit row below.
|
||||
let (new_manifest_version, published_versions, graph_commit_id) =
|
||||
match roll_forward_all(root_uri, sidecar, &states, snapshot).await {
|
||||
Ok(published) => published,
|
||||
// Convergence-idempotent (invariants 7 & 15): a roll-forward's
|
||||
|
|
@ -1237,6 +1270,7 @@ async fn process_sidecar(
|
|||
}
|
||||
Err(err) => return Err(err),
|
||||
};
|
||||
let _ = new_manifest_version;
|
||||
// `to_version` records the ACTUAL Lance HEAD published for
|
||||
// each table (not pin.post_commit_pin, which is a lower bound
|
||||
// for loose-match writers like SchemaApply / EnsureIndices /
|
||||
|
|
@ -1266,7 +1300,7 @@ async fn process_sidecar(
|
|||
record_audit(
|
||||
root_uri,
|
||||
sidecar,
|
||||
new_manifest_version,
|
||||
graph_commit_id,
|
||||
RecoveryKind::RolledForward,
|
||||
outcomes,
|
||||
)
|
||||
|
|
@ -1435,10 +1469,37 @@ async fn converge_or_defer_roll_forward(
|
|||
.unwrap_or(0),
|
||||
});
|
||||
}
|
||||
// RFC-013 Phase 7: the winning writer folded its recovery commit into the
|
||||
// manifest CAS, so the converge audit references THAT commit. We lost the CAS
|
||||
// and never minted it, but a recovery commit is distinguishable by its
|
||||
// `RECOVERY_ACTOR` authorship (`publish_recovery_commit`), so the latest
|
||||
// recovery-actored commit on this branch IS it. Do NOT use the branch head:
|
||||
// a concurrent USER write can advance `graph_head` past the recovery commit
|
||||
// between the winner's publish and this read, which would attribute the audit
|
||||
// row to the wrong (later, user) commit. (We only reach here with the sidecar
|
||||
// still on disk: the winner advanced the manifest but crashed before its own
|
||||
// audit+delete, so we finish its bookkeeping.)
|
||||
let cache = match sidecar.branch.as_deref() {
|
||||
Some(branch) => {
|
||||
crate::db::commit_graph::CommitGraph::open_at_branch(root_uri, branch).await?
|
||||
}
|
||||
None => crate::db::commit_graph::CommitGraph::open(root_uri).await?,
|
||||
};
|
||||
let converged_commit_id = match cache
|
||||
.load_commits()
|
||||
.await?
|
||||
.into_iter()
|
||||
.rfind(|c| c.actor_id.as_deref() == Some(RECOVERY_ACTOR))
|
||||
{
|
||||
Some(recovery_commit) => recovery_commit.graph_commit_id,
|
||||
// No recovery commit visible — unexpected on this path (the winner just
|
||||
// published one); fall back to the head rather than an empty id.
|
||||
None => cache.head_commit_id().await?.unwrap_or_default(),
|
||||
};
|
||||
record_audit(
|
||||
root_uri,
|
||||
sidecar,
|
||||
fresh.version(),
|
||||
converged_commit_id,
|
||||
RecoveryKind::RolledForward,
|
||||
outcomes,
|
||||
)
|
||||
|
|
@ -1462,7 +1523,6 @@ struct ClassifiedTable {
|
|||
async fn roll_back_sidecar(
|
||||
root_uri: &str,
|
||||
storage: &dyn StorageAdapter,
|
||||
snapshot: &Snapshot,
|
||||
sidecar: &RecoverySidecar,
|
||||
states: &[ClassifiedTable],
|
||||
) -> Result<()> {
|
||||
|
|
@ -1522,23 +1582,18 @@ async fn roll_back_sidecar(
|
|||
});
|
||||
}
|
||||
}
|
||||
// Publish the restored HEADs so manifest == HEAD. A degenerate all-NoMovement
|
||||
// roll-back restores nothing — there's nothing to publish, and the audit
|
||||
// records the unchanged snapshot version.
|
||||
let manifest_version = if updates.is_empty() {
|
||||
snapshot.version()
|
||||
} else {
|
||||
let publisher = GraphNamespacePublisher::new(root_uri, sidecar.branch.as_deref());
|
||||
publisher
|
||||
.publish(&updates, &expected)
|
||||
.await?
|
||||
.version()
|
||||
.version
|
||||
};
|
||||
// Publish the restored HEADs so manifest == HEAD AND record the recovery
|
||||
// commit in the same CAS (RFC-013 Phase 7). A degenerate all-NoMovement
|
||||
// roll-back restores no table — `updates` is empty — but the recovery commit
|
||||
// lineage still publishes (a lineage-only merge), so the rollback is recorded
|
||||
// in the commit history just like a roll-forward.
|
||||
let (_manifest_version, graph_commit_id) =
|
||||
publish_recovery_commit(root_uri, sidecar, RecoveryKind::RolledBack, &updates, &expected)
|
||||
.await?;
|
||||
record_audit(
|
||||
root_uri,
|
||||
sidecar,
|
||||
manifest_version,
|
||||
graph_commit_id,
|
||||
RecoveryKind::RolledBack,
|
||||
outcomes,
|
||||
)
|
||||
|
|
@ -1564,7 +1619,6 @@ async fn roll_back_sidecar(
|
|||
async fn record_audit_recovery_rollforward(
|
||||
root_uri: &str,
|
||||
storage: &dyn StorageAdapter,
|
||||
snapshot: &Snapshot,
|
||||
sidecar: &RecoverySidecar,
|
||||
states: &[ClassifiedTable],
|
||||
) -> Result<()> {
|
||||
|
|
@ -1578,10 +1632,22 @@ async fn record_audit_recovery_rollforward(
|
|||
to_version: state.manifest_pinned,
|
||||
})
|
||||
.collect();
|
||||
// The substrate is already in the post-roll-forward state (the prior pass's
|
||||
// table re-pin landed), so there are no table `updates` — but a recovery
|
||||
// commit is still recorded for this cleanup pass via a lineage-only publish
|
||||
// (RFC-013 Phase 7), which the audit row references.
|
||||
let (_manifest_version, graph_commit_id) = publish_recovery_commit(
|
||||
root_uri,
|
||||
sidecar,
|
||||
RecoveryKind::RolledForward,
|
||||
&[],
|
||||
&HashMap::new(),
|
||||
)
|
||||
.await?;
|
||||
record_audit(
|
||||
root_uri,
|
||||
sidecar,
|
||||
snapshot.version(),
|
||||
graph_commit_id,
|
||||
RecoveryKind::RolledForward,
|
||||
outcomes,
|
||||
)
|
||||
|
|
@ -1601,17 +1667,19 @@ async fn record_audit_recovery_rollforward(
|
|||
/// contention; persistent contention surfaces the typed conflict error to
|
||||
/// the recovery sweep, which leaves the sidecar in place for the next
|
||||
/// open's retry.
|
||||
/// Returns `(new_manifest_version, per_table_published_versions)`. The
|
||||
/// per-table map is what the audit row's `to_version` should record —
|
||||
/// for loose-match writers the actual Lance HEAD can be higher than the
|
||||
/// sidecar's `post_commit_pin` (which is a lower bound), so the pin is
|
||||
/// the wrong source of truth for an operator-facing audit field.
|
||||
/// Returns `(new_manifest_version, per_table_published_versions,
|
||||
/// recovery_commit_id)`. The per-table map is what the audit row's `to_version`
|
||||
/// should record — for loose-match writers the actual Lance HEAD can be higher
|
||||
/// than the sidecar's `post_commit_pin` (which is a lower bound), so the pin is
|
||||
/// the wrong source of truth for an operator-facing audit field. The recovery
|
||||
/// commit id is the `graph_commit` folded into the publish CAS (RFC-013
|
||||
/// Phase 7), which the audit row references.
|
||||
async fn roll_forward_all(
|
||||
root_uri: &str,
|
||||
sidecar: &RecoverySidecar,
|
||||
states: &[ClassifiedTable],
|
||||
snapshot: &Snapshot,
|
||||
) -> Result<(u64, HashMap<String, u64>)> {
|
||||
) -> Result<(u64, HashMap<String, u64>, String)> {
|
||||
let total_changes =
|
||||
sidecar.tables.len() + sidecar.additional_registrations.len() + sidecar.tombstones.len();
|
||||
let mut updates: Vec<ManifestChange> = Vec::with_capacity(total_changes);
|
||||
|
|
@ -1722,9 +1790,10 @@ async fn roll_forward_all(
|
|||
);
|
||||
}
|
||||
|
||||
let publisher = GraphNamespacePublisher::new(root_uri, sidecar.branch.as_deref());
|
||||
let new_dataset = publisher.publish(&updates, &expected).await?;
|
||||
Ok((new_dataset.version().version, published_versions))
|
||||
let (new_manifest_version, graph_commit_id) =
|
||||
publish_recovery_commit(root_uri, sidecar, RecoveryKind::RolledForward, &updates, &expected)
|
||||
.await?;
|
||||
Ok((new_manifest_version, published_versions, graph_commit_id))
|
||||
}
|
||||
|
||||
/// Open `table_path` at its branch HEAD, read the current Lance HEAD version,
|
||||
|
|
@ -1794,62 +1863,27 @@ async fn push_table_update(
|
|||
Ok(published_version)
|
||||
}
|
||||
|
||||
/// Append the audit row describing this recovery action.
|
||||
/// Append the audit row describing this recovery action (RFC-013 Phase 7).
|
||||
///
|
||||
/// Two-part write: (a) `_graph_commits.lance` row anchored on the recovery
|
||||
/// actor (`omnigraph:recovery`); (b) `_graph_commit_recoveries.lance` row
|
||||
/// linking back to (a) and naming the original actor + per-table outcomes.
|
||||
/// Same not-atomic-pair-write shape as the existing `_graph_commits`
|
||||
/// + `_graph_commit_actors` split — a crash between the two leaves an
|
||||
/// orphan commit row with no audit row. The recovery sweep tolerates this:
|
||||
/// on re-entry the classifier surfaces `NoMovement` for already-restored /
|
||||
/// already-published tables, the action is a no-op, and the audit append
|
||||
/// is retried.
|
||||
/// The recovery COMMIT (`graph_commit` + `graph_head`) was already recorded
|
||||
/// durably in `__manifest` by `publish_recovery_commit` (folded into the same
|
||||
/// CAS as the table re-pin), so this only writes the `_graph_commit_recoveries`
|
||||
/// row, referencing that commit by `graph_commit_id`. A crash between the
|
||||
/// recovery publish and this audit append leaves a recovery commit with no audit
|
||||
/// row — the same not-atomic-pair-write shape as before; the sweep tolerates it
|
||||
/// (on re-entry the classifier surfaces `NoMovement`, the action is a no-op, and
|
||||
/// the audit append is retried, minting a fresh recovery commit).
|
||||
async fn record_audit(
|
||||
root_uri: &str,
|
||||
sidecar: &RecoverySidecar,
|
||||
manifest_version: u64,
|
||||
graph_commit_id: String,
|
||||
kind: RecoveryKind,
|
||||
outcomes: Vec<TableOutcome>,
|
||||
) -> Result<()> {
|
||||
// Failpoint: models an audit write failure after the roll-forward /
|
||||
// roll-back publish already landed — the sweep aborts, the sidecar
|
||||
// stays, and re-entry records the audit row (see the retry note in
|
||||
// the doc comment above).
|
||||
// roll-back publish (with its folded-in recovery commit) already landed —
|
||||
// the sweep aborts, the sidecar stays, and re-entry records the audit row.
|
||||
crate::failpoints::maybe_fail(crate::failpoints::names::RECOVERY_RECORD_AUDIT)?;
|
||||
// Non-main recovery commits must be appended on the sidecar branch's
|
||||
// commit graph, otherwise parent_commit_id comes from the global
|
||||
// main head. BranchMerge additionally records the source branch's
|
||||
// HEAD as merged_parent_commit_id so future merges between the same
|
||||
// pair recognize "already up-to-date".
|
||||
let target_branch = sidecar.branch.as_deref();
|
||||
let mut graph = match target_branch {
|
||||
Some(branch) => CommitGraph::open_at_branch(root_uri, branch).await?,
|
||||
None => CommitGraph::open(root_uri).await?,
|
||||
};
|
||||
let graph_commit_id = match (
|
||||
sidecar.writer_kind,
|
||||
sidecar.merge_source_commit_id.as_deref(),
|
||||
kind,
|
||||
) {
|
||||
(SidecarKind::BranchMerge, Some(source_id), RecoveryKind::RolledForward) => {
|
||||
let parent_commit_id = graph.head_commit_id().await?.unwrap_or_default();
|
||||
graph
|
||||
.append_merge_commit(
|
||||
target_branch,
|
||||
manifest_version,
|
||||
&parent_commit_id,
|
||||
source_id,
|
||||
Some(RECOVERY_ACTOR),
|
||||
)
|
||||
.await?
|
||||
}
|
||||
_ => {
|
||||
graph
|
||||
.append_commit(target_branch, manifest_version, Some(RECOVERY_ACTOR))
|
||||
.await?
|
||||
}
|
||||
};
|
||||
let mut audit = RecoveryAudit::open(root_uri).await?;
|
||||
audit
|
||||
.append(RecoveryAuditRecord {
|
||||
|
|
@ -1859,7 +1893,7 @@ async fn record_audit(
|
|||
operation_id: sidecar.operation_id.clone(),
|
||||
sidecar_writer_kind: format!("{:?}", sidecar.writer_kind),
|
||||
per_table_outcomes: outcomes,
|
||||
created_at: now_micros()?,
|
||||
created_at: crate::db::now_micros()?,
|
||||
})
|
||||
.await?;
|
||||
Ok(())
|
||||
|
|
|
|||
|
|
@ -10,7 +10,10 @@ use crate::error::{OmniError, Result};
|
|||
|
||||
use super::layout::version_object_id;
|
||||
use super::metadata::TableVersionMetadata;
|
||||
use super::{OBJECT_TYPE_TABLE, OBJECT_TYPE_TABLE_TOMBSTONE, OBJECT_TYPE_TABLE_VERSION};
|
||||
use super::{
|
||||
MAIN_BRANCH_HEAD_KEY, OBJECT_TYPE_GRAPH_COMMIT, OBJECT_TYPE_GRAPH_HEAD, OBJECT_TYPE_TABLE,
|
||||
OBJECT_TYPE_TABLE_TOMBSTONE, OBJECT_TYPE_TABLE_VERSION,
|
||||
};
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct SubTableEntry {
|
||||
|
|
@ -34,11 +37,64 @@ struct TableTombstoneEntry {
|
|||
tombstone_version: u64,
|
||||
}
|
||||
|
||||
/// A graph-lineage commit projected out of the `__manifest` `graph_commit`
|
||||
/// rows (RFC-013 step 4). Field-for-field identical to `commit_graph::GraphCommit`
|
||||
/// so the commit-graph cache can be sourced from the manifest projection without
|
||||
/// touching any reader above that boundary. Kept as a separate struct here to
|
||||
/// keep `state.rs` free of the `commit_graph` module dependency.
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub(crate) struct GraphLineageRow {
|
||||
pub(crate) graph_commit_id: String,
|
||||
pub(crate) manifest_branch: Option<String>,
|
||||
pub(crate) manifest_version: u64,
|
||||
pub(crate) parent_commit_id: Option<String>,
|
||||
pub(crate) merged_parent_commit_id: Option<String>,
|
||||
pub(crate) actor_id: Option<String>,
|
||||
pub(crate) created_at: i64,
|
||||
}
|
||||
|
||||
/// JSON payload of a `graph_commit` row's `metadata` column. The immutable
|
||||
/// commit fields that have no dedicated manifest column live here; the mutable
|
||||
/// ones (`graph_commit_id`, `manifest_branch`, `manifest_version`) reuse
|
||||
/// `object_id` / `table_branch` / `table_version`.
|
||||
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
|
||||
struct GraphCommitMetadata {
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
parent_commit_id: Option<String>,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
merged_parent_commit_id: Option<String>,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
actor_id: Option<String>,
|
||||
created_at: i64,
|
||||
}
|
||||
|
||||
/// JSON payload of a `graph_head` row's `metadata` column.
|
||||
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
|
||||
struct GraphHeadMetadata {
|
||||
head_commit_id: String,
|
||||
#[serde(default, skip_serializing_if = "Option::is_none")]
|
||||
parent_commit_id: Option<String>,
|
||||
}
|
||||
|
||||
/// The `object_id` for a branch's mutable head pointer row. Main encodes as
|
||||
/// `graph_head:main`; named branches as `graph_head:<branch>`.
|
||||
pub(crate) fn graph_head_object_id(branch: Option<&str>) -> String {
|
||||
format!("graph_head:{}", branch.unwrap_or(MAIN_BRANCH_HEAD_KEY))
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
struct ManifestScan {
|
||||
table_locations: HashMap<String, String>,
|
||||
version_entries: Vec<SubTableEntry>,
|
||||
tombstones: Vec<TableTombstoneEntry>,
|
||||
/// Graph-lineage `graph_commit` rows, collected in the SAME pass only when
|
||||
/// the caller asked (`collect_lineage`). Empty on the table-state read hot
|
||||
/// path so it never pays the O(commits) lineage JSON decode; populated on the
|
||||
/// publish path, where `load_publish_state` already needs the parent and would
|
||||
/// otherwise scan `__manifest` a second time via `read_graph_lineage`. `graph_head`
|
||||
/// rows are not collected here — parent resolution uses the head-over-commits
|
||||
/// computation, not the denormalized head pointer (see `resolve_lineage_rows`).
|
||||
lineage_rows: Vec<GraphLineageRow>,
|
||||
}
|
||||
|
||||
pub(super) fn manifest_schema() -> SchemaRef {
|
||||
|
|
@ -73,7 +129,8 @@ pub(super) fn manifest_schema() -> SchemaRef {
|
|||
|
||||
pub(super) async fn read_manifest_state(dataset: &Dataset) -> Result<ManifestState> {
|
||||
let version = dataset.version().version;
|
||||
let scan = read_manifest_scan(dataset).await?;
|
||||
// The table-state hot path never needs lineage, so don't pay its JSON decode.
|
||||
let scan = read_manifest_scan(dataset, false).await?;
|
||||
let mut latest_versions = HashMap::<String, SubTableEntry>::new();
|
||||
|
||||
for entry in scan.version_entries {
|
||||
|
|
@ -109,28 +166,85 @@ pub(super) async fn read_manifest_state(dataset: &Dataset) -> Result<ManifestSta
|
|||
Ok(ManifestState { version, entries })
|
||||
}
|
||||
|
||||
// After RFC-013 P2 folded the publish path off this accessor (it now projects
|
||||
// version entries out of `read_publish_scan`'s single scan), the only remaining
|
||||
// caller is `BranchManifestNamespace::version_entries`. That namespace module is
|
||||
// `#[cfg(test)]` (see `db/manifest.rs`: "nothing in production routes through it;
|
||||
// the `LanceNamespace` impls are retained only to validate the contract in unit
|
||||
// tests"), so this stays `#[cfg(test)]` too — otherwise it is dead code in
|
||||
// non-test builds.
|
||||
#[cfg(test)]
|
||||
pub(super) async fn read_manifest_entries(dataset: &Dataset) -> Result<Vec<SubTableEntry>> {
|
||||
Ok(read_manifest_scan(dataset).await?.version_entries)
|
||||
Ok(read_manifest_scan(dataset, false).await?.version_entries)
|
||||
}
|
||||
|
||||
pub(super) async fn read_registered_table_locations(
|
||||
dataset: &Dataset,
|
||||
) -> Result<HashMap<String, String>> {
|
||||
Ok(read_manifest_scan(dataset).await?.table_locations)
|
||||
/// The full table state the publisher needs to build its CAS batch, plus the
|
||||
/// `graph_commit` lineage rows for parent resolution — all from ONE `__manifest`
|
||||
/// scan (RFC-013 P2). Replaces the prior four scans on the publish path (three
|
||||
/// thin accessors + a separate `read_graph_lineage`): `load_publish_state`
|
||||
/// projects every piece it needs out of this single result.
|
||||
pub(super) struct PublishScan {
|
||||
pub(super) table_locations: HashMap<String, String>,
|
||||
pub(super) version_entries: Vec<SubTableEntry>,
|
||||
pub(super) tombstones: Vec<((String, u64), ())>,
|
||||
pub(super) lineage_rows: Vec<GraphLineageRow>,
|
||||
}
|
||||
|
||||
pub(super) async fn read_tombstone_versions(
|
||||
dataset: &Dataset,
|
||||
) -> Result<HashMap<(String, u64), ()>> {
|
||||
Ok(read_manifest_scan(dataset)
|
||||
.await?
|
||||
.tombstones
|
||||
.into_iter()
|
||||
.map(|tombstone| ((tombstone.table_key, tombstone.tombstone_version), ()))
|
||||
.collect())
|
||||
/// One-scan read of everything the publish path needs. `collect_lineage` is
|
||||
/// always on here (the publisher resolves a parent), so the lineage JSON decode
|
||||
/// rides the same pass as the table-state assembly instead of a second scan.
|
||||
pub(super) async fn read_publish_scan(dataset: &Dataset) -> Result<PublishScan> {
|
||||
let scan = read_manifest_scan(dataset, true).await?;
|
||||
Ok(PublishScan {
|
||||
table_locations: scan.table_locations,
|
||||
version_entries: scan.version_entries,
|
||||
tombstones: scan
|
||||
.tombstones
|
||||
.into_iter()
|
||||
.map(|tombstone| ((tombstone.table_key, tombstone.tombstone_version), ()))
|
||||
.collect(),
|
||||
lineage_rows: scan.lineage_rows,
|
||||
})
|
||||
}
|
||||
|
||||
async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
|
||||
/// Decode one `graph_commit` row (`object_type == OBJECT_TYPE_GRAPH_COMMIT`) into
|
||||
/// a [`GraphLineageRow`]. The single decode for both lineage readers — the
|
||||
/// dedicated `read_graph_lineage` scan and the folded `collect_lineage` branch of
|
||||
/// `read_manifest_scan` — so the two cannot drift. The caller has already matched
|
||||
/// the object type; `row` indexes into the per-batch columns.
|
||||
fn decode_graph_commit_row(
|
||||
object_ids: &StringArray,
|
||||
metadata: &StringArray,
|
||||
versions: &UInt64Array,
|
||||
branches: &StringArray,
|
||||
row: usize,
|
||||
) -> Result<GraphLineageRow> {
|
||||
if metadata.is_null(row) {
|
||||
return Err(OmniError::manifest_internal(format!(
|
||||
"manifest graph_commit row missing metadata for {}",
|
||||
object_ids.value(row)
|
||||
)));
|
||||
}
|
||||
let commit_meta: GraphCommitMetadata =
|
||||
serde_json::from_str(metadata.value(row)).map_err(|e| {
|
||||
OmniError::manifest_internal(format!("failed to decode graph_commit metadata: {e}"))
|
||||
})?;
|
||||
Ok(GraphLineageRow {
|
||||
graph_commit_id: object_ids.value(row).to_string(),
|
||||
manifest_branch: if branches.is_null(row) {
|
||||
None
|
||||
} else {
|
||||
Some(branches.value(row).to_string())
|
||||
},
|
||||
manifest_version: required_u64(versions, row, "table_version")?,
|
||||
parent_commit_id: commit_meta.parent_commit_id,
|
||||
merged_parent_commit_id: commit_meta.merged_parent_commit_id,
|
||||
actor_id: commit_meta.actor_id,
|
||||
created_at: commit_meta.created_at,
|
||||
})
|
||||
}
|
||||
|
||||
async fn read_manifest_scan(dataset: &Dataset, collect_lineage: bool) -> Result<ManifestScan> {
|
||||
let batches: Vec<RecordBatch> = dataset
|
||||
.scan()
|
||||
.try_into_stream()
|
||||
|
|
@ -143,6 +257,7 @@ async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
|
|||
let mut table_locations = HashMap::new();
|
||||
let mut version_entries = Vec::new();
|
||||
let mut tombstones = Vec::new();
|
||||
let mut lineage_rows = Vec::new();
|
||||
|
||||
for batch in &batches {
|
||||
let object_types = string_column(batch, "object_type")?;
|
||||
|
|
@ -152,6 +267,13 @@ async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
|
|||
let versions = u64_column(batch, "table_version")?;
|
||||
let branches = string_column(batch, "table_branch")?;
|
||||
let row_counts = u64_column(batch, "row_count")?;
|
||||
// `object_id` is only needed for lineage decoding; skip the lookup
|
||||
// entirely on the table-state hot path (`collect_lineage == false`).
|
||||
let object_ids = if collect_lineage {
|
||||
Some(string_column(batch, "object_id")?)
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
for row in 0..batch.num_rows() {
|
||||
let table_key = table_keys.value(row).to_string();
|
||||
|
|
@ -195,6 +317,21 @@ async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
|
|||
tombstone_version,
|
||||
});
|
||||
}
|
||||
// `graph_commit` rows (RFC-013) are decoded into the scan ONLY
|
||||
// when `collect_lineage` is set (the publish path, which resolves
|
||||
// a parent). The table-state hot path leaves them — and
|
||||
// `graph_head` + any future object type — in the `_` arm so it
|
||||
// never pays the O(commits) lineage JSON decode. When NOT
|
||||
// collecting, `object_ids` is `None`, so this arm is the same
|
||||
// forward-compat skip as the `_` arm.
|
||||
OBJECT_TYPE_GRAPH_COMMIT if collect_lineage => {
|
||||
let object_ids = object_ids.expect("object_ids read when collect_lineage");
|
||||
lineage_rows.push(decode_graph_commit_row(
|
||||
object_ids, metadata, versions, branches, row,
|
||||
)?);
|
||||
}
|
||||
// Skipped on the table-state path (and for `graph_head` / unknown
|
||||
// future object types on every path): no table snapshot needs them.
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
|
@ -225,21 +362,167 @@ async fn read_manifest_scan(dataset: &Dataset) -> Result<ManifestScan> {
|
|||
table_locations,
|
||||
version_entries: entries,
|
||||
tombstones,
|
||||
lineage_rows,
|
||||
})
|
||||
}
|
||||
|
||||
/// Project the graph-lineage rows (`graph_commit` + `graph_head`) out of
|
||||
/// `__manifest` (RFC-013 step 4). Returns every commit and the per-branch head
|
||||
/// map (keyed by branch name, `"main"` for main). `__manifest` is the single
|
||||
/// source of graph lineage: the commit-graph cache is sourced from here, and the
|
||||
/// publisher resolves a new commit's parent from here inside its CAS loop.
|
||||
///
|
||||
/// Dedicated scan (separate from `read_manifest_scan`): it decodes ONLY the two
|
||||
/// lineage object types and builds no table snapshot, so the table-state hot
|
||||
/// path never pays for lineage JSON and this path never pays for table-entry
|
||||
/// assembly.
|
||||
pub(crate) async fn read_graph_lineage(
|
||||
dataset: &Dataset,
|
||||
) -> Result<(Vec<GraphLineageRow>, HashMap<String, String>)> {
|
||||
let batches: Vec<RecordBatch> = dataset
|
||||
.scan()
|
||||
.try_into_stream()
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?
|
||||
.try_collect()
|
||||
.await
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
|
||||
let mut graph_commits = Vec::new();
|
||||
let mut graph_heads = HashMap::new();
|
||||
|
||||
for batch in &batches {
|
||||
let object_ids = string_column(batch, "object_id")?;
|
||||
let object_types = string_column(batch, "object_type")?;
|
||||
let metadata = string_column(batch, "metadata")?;
|
||||
let versions = u64_column(batch, "table_version")?;
|
||||
let branches = string_column(batch, "table_branch")?;
|
||||
|
||||
for row in 0..batch.num_rows() {
|
||||
match object_types.value(row) {
|
||||
OBJECT_TYPE_GRAPH_COMMIT => {
|
||||
graph_commits.push(decode_graph_commit_row(
|
||||
object_ids, metadata, versions, branches, row,
|
||||
)?);
|
||||
}
|
||||
OBJECT_TYPE_GRAPH_HEAD => {
|
||||
if metadata.is_null(row) {
|
||||
return Err(OmniError::manifest_internal(format!(
|
||||
"manifest graph_head row missing metadata for {}",
|
||||
object_ids.value(row)
|
||||
)));
|
||||
}
|
||||
let head_meta: GraphHeadMetadata = serde_json::from_str(metadata.value(row))
|
||||
.map_err(|e| {
|
||||
OmniError::manifest_internal(format!(
|
||||
"failed to decode graph_head metadata: {e}"
|
||||
))
|
||||
})?;
|
||||
// `object_id` is `graph_head:<branch>`; the branch key after
|
||||
// the prefix is the projection's map key (`main` for main).
|
||||
let branch_key = object_ids
|
||||
.value(row)
|
||||
.strip_prefix("graph_head:")
|
||||
.unwrap_or_default()
|
||||
.to_string();
|
||||
graph_heads.insert(branch_key, head_meta.head_commit_id);
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok((graph_commits, graph_heads))
|
||||
}
|
||||
|
||||
/// The current head of a branch's lineage: the [`GraphLineageRow`] with the
|
||||
/// greatest `(manifest_version, created_at, graph_commit_id)`. This is the same
|
||||
/// ordering the commit-graph cache uses to pick its head (`should_replace_head`)
|
||||
/// — kept in one place so the publisher's per-attempt parent resolution and the
|
||||
/// cache agree by construction. `None` only for a graph with no commits yet
|
||||
/// (a parentless genesis).
|
||||
pub(crate) fn head_lineage_row(rows: &[GraphLineageRow]) -> Option<&GraphLineageRow> {
|
||||
rows.iter().max_by(|a, b| {
|
||||
a.manifest_version
|
||||
.cmp(&b.manifest_version)
|
||||
.then_with(|| a.created_at.cmp(&b.created_at))
|
||||
.then_with(|| a.graph_commit_id.cmp(&b.graph_commit_id))
|
||||
})
|
||||
}
|
||||
|
||||
/// One `__manifest` row materializing a piece of a graph commit's lineage. The
|
||||
/// publisher maps these onto its `PendingVersionRow`s (folding lineage into the
|
||||
/// table-version publish batch), and the genesis init path pushes them straight
|
||||
/// into the init batch.
|
||||
pub(crate) struct GraphLineageRowPart {
|
||||
pub(crate) object_id: String,
|
||||
pub(crate) object_type: &'static str,
|
||||
pub(crate) metadata: String,
|
||||
pub(crate) table_version: Option<u64>,
|
||||
pub(crate) table_branch: Option<String>,
|
||||
}
|
||||
|
||||
/// Encode one graph commit into its two `__manifest` rows: the immutable
|
||||
/// `graph_commit` row plus the mutable `graph_head:<branch>` pointer (a
|
||||
/// merge-insert on `object_id` updates the head in place). `branch` is `None`
|
||||
/// for main. The immutable commit fields with no dedicated column live in the
|
||||
/// `graph_commit` row's `metadata` JSON; the mutable head pointer payload lives
|
||||
/// in the `graph_head` row's `metadata`.
|
||||
pub(crate) fn graph_lineage_row_parts(
|
||||
commit: &GraphLineageRow,
|
||||
branch: Option<&str>,
|
||||
) -> Result<[GraphLineageRowPart; 2]> {
|
||||
let commit_metadata = serde_json::to_string(&GraphCommitMetadata {
|
||||
parent_commit_id: commit.parent_commit_id.clone(),
|
||||
merged_parent_commit_id: commit.merged_parent_commit_id.clone(),
|
||||
actor_id: commit.actor_id.clone(),
|
||||
created_at: commit.created_at,
|
||||
})
|
||||
.map_err(|e| {
|
||||
OmniError::manifest_internal(format!("failed to encode graph_commit metadata: {e}"))
|
||||
})?;
|
||||
let head_metadata = serde_json::to_string(&GraphHeadMetadata {
|
||||
head_commit_id: commit.graph_commit_id.clone(),
|
||||
parent_commit_id: commit.parent_commit_id.clone(),
|
||||
})
|
||||
.map_err(|e| {
|
||||
OmniError::manifest_internal(format!("failed to encode graph_head metadata: {e}"))
|
||||
})?;
|
||||
|
||||
Ok([
|
||||
// Only the immutable commit row carries the manifest version + branch.
|
||||
GraphLineageRowPart {
|
||||
object_id: commit.graph_commit_id.clone(),
|
||||
object_type: OBJECT_TYPE_GRAPH_COMMIT,
|
||||
metadata: commit_metadata,
|
||||
table_version: Some(commit.manifest_version),
|
||||
table_branch: commit.manifest_branch.clone(),
|
||||
},
|
||||
// The head row reuses `metadata` for its pointer payload.
|
||||
GraphLineageRowPart {
|
||||
object_id: graph_head_object_id(branch),
|
||||
object_type: OBJECT_TYPE_GRAPH_HEAD,
|
||||
metadata: head_metadata,
|
||||
table_version: None,
|
||||
table_branch: None,
|
||||
},
|
||||
])
|
||||
}
|
||||
|
||||
pub(super) fn entries_to_batch(
|
||||
entries: &[SubTableEntry],
|
||||
version_metadata: &HashMap<String, String>,
|
||||
genesis_lineage: &[GraphLineageRowPart],
|
||||
) -> Result<RecordBatch> {
|
||||
let mut object_ids = Vec::with_capacity(entries.len() * 2);
|
||||
let mut object_types = Vec::with_capacity(entries.len() * 2);
|
||||
let mut locations = Vec::with_capacity(entries.len() * 2);
|
||||
let mut metadata = Vec::with_capacity(entries.len() * 2);
|
||||
let mut table_keys = Vec::with_capacity(entries.len() * 2);
|
||||
let mut table_versions = Vec::with_capacity(entries.len() * 2);
|
||||
let mut table_branches = Vec::with_capacity(entries.len() * 2);
|
||||
let mut row_counts = Vec::with_capacity(entries.len() * 2);
|
||||
let cap = entries.len() * 2 + genesis_lineage.len();
|
||||
let mut object_ids = Vec::with_capacity(cap);
|
||||
let mut object_types = Vec::with_capacity(cap);
|
||||
let mut locations = Vec::with_capacity(cap);
|
||||
let mut metadata = Vec::with_capacity(cap);
|
||||
let mut table_keys = Vec::with_capacity(cap);
|
||||
let mut table_versions = Vec::with_capacity(cap);
|
||||
let mut table_branches = Vec::with_capacity(cap);
|
||||
let mut row_counts = Vec::with_capacity(cap);
|
||||
|
||||
for entry in entries {
|
||||
object_ids.push(entry.table_key.clone());
|
||||
|
|
@ -271,6 +554,22 @@ pub(super) fn entries_to_batch(
|
|||
row_counts.push(Some(entry.row_count));
|
||||
}
|
||||
|
||||
// Genesis graph-lineage rows ride the init write so a fresh graph carries
|
||||
// its `graph_commit` + `graph_head` in `__manifest` from version one (no
|
||||
// separate lineage fragment, no second commit). `table_key` is non-nullable
|
||||
// but lineage rows have no table identity, so the empty string stands in
|
||||
// (never matched by a real key).
|
||||
for part in genesis_lineage {
|
||||
object_ids.push(part.object_id.clone());
|
||||
object_types.push(part.object_type.to_string());
|
||||
locations.push(None);
|
||||
metadata.push(Some(part.metadata.clone()));
|
||||
table_keys.push(String::new());
|
||||
table_versions.push(part.table_version);
|
||||
table_branches.push(part.table_branch.clone());
|
||||
row_counts.push(None);
|
||||
}
|
||||
|
||||
manifest_rows_batch(
|
||||
object_ids,
|
||||
object_types,
|
||||
|
|
@ -283,6 +582,72 @@ pub(super) fn entries_to_batch(
|
|||
)
|
||||
}
|
||||
|
||||
/// Merge-insert a set of graph-lineage rows (`graph_commit` + `graph_head`)
|
||||
/// straight into `__manifest`, keyed on `object_id`. Used only by the v3→v4
|
||||
/// internal-schema backfill (RFC-013 step 4): the normal write path folds
|
||||
/// lineage into the publisher's batch, but the migration writes lineage with
|
||||
/// no accompanying table-version change, so it issues its own merge.
|
||||
///
|
||||
/// Mirrors the publisher's merge knobs (`use_index(false)`, `skip_auto_cleanup`,
|
||||
/// `conflict_retries(0)`) so it has identical CAS / cleanup semantics. The
|
||||
/// migration runs under the open-for-write path and is idempotent (re-inserting
|
||||
/// the same `object_id` rows updates them in place), so it does not need the
|
||||
/// publisher's retry loop. Returns the advanced dataset (its version is the
|
||||
/// commit the lineage landed in).
|
||||
pub(crate) async fn merge_lineage_rows(
|
||||
dataset: Dataset,
|
||||
parts: &[GraphLineageRowPart],
|
||||
) -> Result<Dataset> {
|
||||
let len = parts.len();
|
||||
let mut object_ids = Vec::with_capacity(len);
|
||||
let mut object_types = Vec::with_capacity(len);
|
||||
let mut metadata = Vec::with_capacity(len);
|
||||
let mut table_versions = Vec::with_capacity(len);
|
||||
let mut table_branches = Vec::with_capacity(len);
|
||||
for part in parts {
|
||||
object_ids.push(part.object_id.clone());
|
||||
object_types.push(part.object_type.to_string());
|
||||
metadata.push(Some(part.metadata.clone()));
|
||||
table_versions.push(part.table_version);
|
||||
table_branches.push(part.table_branch.clone());
|
||||
}
|
||||
// Lineage rows carry no table identity: empty `table_key`, null location /
|
||||
// row_count (matching `lineage_part_to_pending` in the publisher).
|
||||
let batch = manifest_rows_batch(
|
||||
object_ids,
|
||||
object_types,
|
||||
vec![None; len],
|
||||
metadata,
|
||||
vec![String::new(); len],
|
||||
table_versions,
|
||||
table_branches,
|
||||
vec![None; len],
|
||||
)?;
|
||||
let reader =
|
||||
arrow_array::RecordBatchIterator::new(vec![Ok(batch)], manifest_schema());
|
||||
let dataset = Arc::new(dataset);
|
||||
let mut merge_builder =
|
||||
lance::dataset::MergeInsertBuilder::try_new(dataset, vec!["object_id".to_string()])
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?;
|
||||
merge_builder.when_matched(lance::dataset::WhenMatched::UpdateAll);
|
||||
merge_builder.when_not_matched(lance::dataset::WhenNotMatched::InsertAll);
|
||||
merge_builder.conflict_retries(0);
|
||||
merge_builder.use_index(false);
|
||||
merge_builder.skip_auto_cleanup(true);
|
||||
let (new_dataset, _stats) = merge_builder
|
||||
.try_build()
|
||||
.map_err(|e| OmniError::Lance(e.to_string()))?
|
||||
.execute_reader(Box::new(reader))
|
||||
// Route through the publisher's classifier (not a stringify) so a
|
||||
// concurrent first-open's CAS loss on `__manifest` surfaces as the SAME
|
||||
// typed `RowLevelCasContention` the publisher's retry consumes. The
|
||||
// migration's re-open retry loop matches on that to converge instead of
|
||||
// erroring out (FIX B).
|
||||
.await
|
||||
.map_err(super::publisher::map_lance_publish_error)?;
|
||||
Ok(Arc::try_unwrap(new_dataset).unwrap_or_else(|arc| (*arc).clone()))
|
||||
}
|
||||
|
||||
pub(super) fn manifest_rows_batch(
|
||||
object_ids: Vec<String>,
|
||||
object_types: Vec<String>,
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
|
|
@ -17,6 +17,8 @@ pub use omnigraph::{
|
|||
SchemaApplyResult, SkipReason, TableCleanupStats, TableOptimizeStats, TableRepairStats,
|
||||
};
|
||||
|
||||
use crate::error::{OmniError, Result};
|
||||
|
||||
pub(crate) const SCHEMA_APPLY_LOCK_BRANCH: &str = "__schema_apply_lock__";
|
||||
|
||||
/// Mutation kind, threaded through the version-check call sites so the
|
||||
|
|
@ -74,3 +76,14 @@ pub(crate) fn is_internal_system_branch(name: &str) -> bool {
|
|||
// only internal branch the engine still creates is the schema-apply lock.
|
||||
is_schema_apply_lock_branch(name)
|
||||
}
|
||||
|
||||
/// Microseconds since the UNIX epoch — the `created_at` stamp threaded through
|
||||
/// every graph-lineage / recovery-audit / commit-graph row. One canonical
|
||||
/// helper so the clock-error mapping (variant + message) cannot drift across
|
||||
/// the call sites that record those timestamps.
|
||||
pub(crate) fn now_micros() -> Result<i64> {
|
||||
let duration = std::time::SystemTime::now()
|
||||
.duration_since(std::time::UNIX_EPOCH)
|
||||
.map_err(|e| OmniError::manifest(format!("system clock before UNIX_EPOCH: {e}")))?;
|
||||
Ok(duration.as_micros() as i64)
|
||||
}
|
||||
|
|
|
|||
|
|
@ -123,12 +123,12 @@ pub struct Omnigraph {
|
|||
/// calls without a global write lock). Reads (`snapshot`, `version`,
|
||||
/// `current_branch`, `branch_list`, `resolve_*`, `head_commit_id`,
|
||||
/// `list_commits`, …) acquire `.read().await` and parallelize.
|
||||
/// Writes (`refresh`, `branch_create`, `branch_delete`, `commit_*`,
|
||||
/// `record_*`) acquire `.write().await` and serialize. The atomic
|
||||
/// commit invariant — `commit_manifest_updates` followed by
|
||||
/// `record_graph_commit` must be atomic — is preserved by the
|
||||
/// single `.write()` covering both calls inside
|
||||
/// `commit_updates_with_actor_with_expected`. PR 2 Phase 2
|
||||
/// Writes (`refresh`, `branch_create`, `branch_delete`, `commit_*`)
|
||||
/// acquire `.write().await` and serialize. The atomic commit invariant —
|
||||
/// table-version rows and the graph commit are one unit — holds by
|
||||
/// construction since RFC-013 Phase 7: both ride a SINGLE manifest publish
|
||||
/// CAS (`commit_changes_with_lineage`), so there is no two-write window to
|
||||
/// keep atomic. PR 2 Phase 2
|
||||
/// converted from `Mutex` to `RwLock` because the bench showed
|
||||
/// the Mutex was the dominant serializer for disjoint-table
|
||||
/// workloads. Lock acquisition order: always before `runtime_cache`
|
||||
|
|
@ -417,6 +417,14 @@ impl Omnigraph {
|
|||
// first read-write open (an accepted, documented limitation).
|
||||
if matches!(mode, OpenMode::ReadWrite) {
|
||||
crate::db::manifest::migrate_on_open(&root).await?;
|
||||
} else {
|
||||
// A read-only open skips `migrate_on_open` (no object-store writes),
|
||||
// which is where the version refusal otherwise lives. Still refuse a
|
||||
// `__manifest` stamped outside this binary's supported range — newer
|
||||
// than CURRENT (an old binary cannot silently misread a newer graph,
|
||||
// e.g. one folded to internal-schema v4 lineage), or below
|
||||
// MIN_SUPPORTED (predates the readers we carry). Read-only, no write.
|
||||
crate::db::manifest::refuse_if_internal_schema_unsupported(&root).await?;
|
||||
}
|
||||
// Open the coordinator first so the schema-staging recovery sweep can
|
||||
// compare its snapshot against any leftover staging files.
|
||||
|
|
@ -1779,28 +1787,17 @@ impl Omnigraph {
|
|||
table_ops::commit_updates(self, updates).await
|
||||
}
|
||||
|
||||
pub(crate) async fn commit_manifest_updates(
|
||||
/// Publish a branch merge: the merged table `updates` and the merge commit
|
||||
/// in one manifest CAS (RFC-013 Phase 7). The merge commit's merged-in parent
|
||||
/// is `merged_parent_commit_id` (the source head); its first parent is the
|
||||
/// live target-branch head, resolved by the publisher.
|
||||
pub(crate) async fn commit_merge_with_actor(
|
||||
&self,
|
||||
updates: &[crate::db::SubTableUpdate],
|
||||
) -> Result<u64> {
|
||||
table_ops::commit_manifest_updates(self, updates).await
|
||||
}
|
||||
|
||||
pub(crate) async fn record_merge_commit(
|
||||
&self,
|
||||
manifest_version: u64,
|
||||
parent_commit_id: &str,
|
||||
merged_parent_commit_id: &str,
|
||||
actor_id: Option<&str>,
|
||||
) -> Result<String> {
|
||||
table_ops::record_merge_commit(
|
||||
self,
|
||||
manifest_version,
|
||||
parent_commit_id,
|
||||
merged_parent_commit_id,
|
||||
actor_id,
|
||||
)
|
||||
.await
|
||||
table_ops::commit_merge_with_actor(self, updates, merged_parent_commit_id, actor_id).await
|
||||
}
|
||||
|
||||
pub(crate) async fn commit_updates_on_branch_with_expected(
|
||||
|
|
|
|||
|
|
@ -1379,33 +1379,16 @@ pub(super) async fn commit_updates(
|
|||
commit_prepared_updates(db, &prepared, None).await
|
||||
}
|
||||
|
||||
pub(super) async fn commit_manifest_updates(
|
||||
pub(super) async fn commit_merge_with_actor(
|
||||
db: &Omnigraph,
|
||||
updates: &[crate::db::SubTableUpdate],
|
||||
) -> Result<u64> {
|
||||
db.coordinator
|
||||
.write()
|
||||
.await
|
||||
.commit_manifest_updates(updates)
|
||||
.await
|
||||
}
|
||||
|
||||
pub(super) async fn record_merge_commit(
|
||||
db: &Omnigraph,
|
||||
manifest_version: u64,
|
||||
parent_commit_id: &str,
|
||||
merged_parent_commit_id: &str,
|
||||
actor_id: Option<&str>,
|
||||
) -> Result<String> {
|
||||
db.coordinator
|
||||
.write()
|
||||
.await
|
||||
.record_merge_commit(
|
||||
manifest_version,
|
||||
parent_commit_id,
|
||||
merged_parent_commit_id,
|
||||
actor_id,
|
||||
)
|
||||
.commit_merge_with_actor(updates, merged_parent_commit_id, actor_id)
|
||||
.await
|
||||
.map(|snapshot_id| snapshot_id.as_str().to_string())
|
||||
}
|
||||
|
|
|
|||
|
|
@ -14,15 +14,14 @@
|
|||
//! this change additive.
|
||||
//!
|
||||
//! Atomicity caveat: append to `_graph_commit_recoveries.lance` is
|
||||
//! sequential w.r.t. the `CommitGraph::append_commit` write. A crash
|
||||
//! between the two leaves an orphan commit-graph row with no audit row.
|
||||
//! Same shape as the existing `_graph_commits` + `_graph_commit_actors`
|
||||
//! split; the recovery sweep tolerates it the same way (re-entry sees
|
||||
//! `NoMovement` for already-restored / already-published tables; the
|
||||
//! audit append is retried).
|
||||
//! sequential w.r.t. the recovery commit, which RFC-013 Phase 7 records in
|
||||
//! `__manifest` (folded into the recovery publish CAS via `publish_recovery_commit`).
|
||||
//! A crash between the publish and this audit append leaves a recovery commit
|
||||
//! with no audit row. The recovery sweep tolerates it the same way (re-entry
|
||||
//! sees `NoMovement` for already-restored / already-published tables; the audit
|
||||
//! append is retried, minting a fresh recovery commit).
|
||||
|
||||
use std::sync::Arc;
|
||||
use std::time::{SystemTime, UNIX_EPOCH};
|
||||
|
||||
use arrow_array::{
|
||||
Array, RecordBatch, RecordBatchIterator, StringArray, TimestampMicrosecondArray,
|
||||
|
|
@ -195,7 +194,11 @@ async fn create_recoveries_dataset(root_uri: &str) -> Result<Dataset> {
|
|||
};
|
||||
match Dataset::write(reader, &uri as &str, Some(params)).await {
|
||||
Ok(dataset) => Ok(dataset),
|
||||
Err(err) if err.to_string().contains("Dataset already exists") => Dataset::open(&uri)
|
||||
// Create-or-open idempotency — match the typed `DatasetAlreadyExists`
|
||||
// variant, not the display string (not a Lance API contract). Same
|
||||
// discipline as `commit_graph.rs`'s create-or-open; pinned by
|
||||
// `lance_surface_guards.rs::lance_error_dataset_already_exists_variant_exists`.
|
||||
Err(lance::Error::DatasetAlreadyExists { .. }) => Dataset::open(&uri)
|
||||
.await
|
||||
.map_err(|open_err| OmniError::Lance(open_err.to_string())),
|
||||
Err(err) => Err(OmniError::Lance(err.to_string())),
|
||||
|
|
@ -276,13 +279,6 @@ fn decode_row(batch: &RecordBatch, row: usize) -> Result<RecoveryAuditRecord> {
|
|||
})
|
||||
}
|
||||
|
||||
pub(crate) fn now_micros() -> Result<i64> {
|
||||
SystemTime::now()
|
||||
.duration_since(UNIX_EPOCH)
|
||||
.map(|d| d.as_micros() as i64)
|
||||
.map_err(|e| OmniError::manifest_internal(format!("system clock before unix epoch: {}", e)))
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
|
|
|||
|
|
@ -1795,15 +1795,20 @@ impl Omnigraph {
|
|||
// `tests/failpoints.rs::branch_merge_phase_b_failure_recovered_on_next_open`.
|
||||
crate::failpoints::maybe_fail(crate::failpoints::names::BRANCH_MERGE_POST_PHASE_B_PRE_MANIFEST_COMMIT)?;
|
||||
|
||||
let manifest_version = if updates.is_empty() {
|
||||
self.version().await
|
||||
} else {
|
||||
self.commit_manifest_updates(&updates).await?
|
||||
};
|
||||
// Publish the merged table versions AND the merge commit in one manifest
|
||||
// CAS (RFC-013 Phase 7): `graph_commit` + `graph_head` rows ride the same
|
||||
// merge-insert as the table-version rows. The merge commit's first parent
|
||||
// is resolved by the publisher as the live target-branch head (the
|
||||
// post-merge correct parent even if the target advanced); its merged-in
|
||||
// parent is the source head. `target_head_commit_id` is no longer passed
|
||||
// — it was the pre-merge target head, which the publisher reads live.
|
||||
let _ = target_head_commit_id;
|
||||
self.commit_merge_with_actor(&updates, source_head_commit_id, actor_id)
|
||||
.await?;
|
||||
|
||||
// Recovery sidecar lifecycle: delete after manifest publish.
|
||||
// Best-effort cleanup; the merge already landed durably so
|
||||
// failing the user here is undesirable.
|
||||
// Recovery sidecar lifecycle: delete after the manifest publish (Phase C).
|
||||
// Best-effort cleanup; the merge already landed durably so failing the
|
||||
// user here is undesirable.
|
||||
if let Some((_, handle)) = recovery {
|
||||
if let Err(err) =
|
||||
crate::db::manifest::delete_sidecar(&handle, self.storage_adapter()).await
|
||||
|
|
@ -1815,13 +1820,6 @@ impl Omnigraph {
|
|||
);
|
||||
}
|
||||
}
|
||||
self.record_merge_commit(
|
||||
manifest_version,
|
||||
target_head_commit_id,
|
||||
source_head_commit_id,
|
||||
actor_id,
|
||||
)
|
||||
.await?;
|
||||
|
||||
if changed_edge_tables {
|
||||
self.invalidate_graph_index().await;
|
||||
|
|
|
|||
|
|
@ -14,6 +14,64 @@ pub(crate) fn maybe_fail(_name: &str) -> Result<()> {
|
|||
Ok(())
|
||||
}
|
||||
|
||||
/// Failpoint that injects a *Lance* error rather than an `OmniError`. Used to
|
||||
/// stand in for a `Dataset::open` failing with a transient/corrupt (non-not-found)
|
||||
/// error, so a test can drive the caller's lance-error classification — the
|
||||
/// behavior FIX A (`read_legacy_commit_cache`) relies on: a not-found is benign
|
||||
/// (empty), anything else propagates. A no-op without the `failpoints` feature
|
||||
/// (the injected variant is therefore unreachable in release builds).
|
||||
#[allow(unused_variables)]
|
||||
pub(crate) fn maybe_fail_lance_open(name: &str) -> std::result::Result<(), lance::Error> {
|
||||
#[cfg(feature = "failpoints")]
|
||||
{
|
||||
fail::fail_point!(name, |_| {
|
||||
Err(lance::Error::io(format!(
|
||||
"injected failpoint triggered: {name}"
|
||||
)))
|
||||
});
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Failpoint that injects a Lance `IncompatibleTransaction` — the variant a
|
||||
/// concurrent `UpdateConfig` stamp race produces. Lets a test drive the v3→v4
|
||||
/// stamp loop's exhaustion path (`commit_v4_stamp_idempotently`) deterministically;
|
||||
/// it is otherwise near-unreachable, since a real concurrent winner stamps the SAME
|
||||
/// value, so the loop's re-read returns `Ok` on the first retry. A no-op without the
|
||||
/// `failpoints` feature.
|
||||
#[allow(unused_variables)]
|
||||
pub(crate) fn maybe_fail_lance_incompatible(name: &str) -> std::result::Result<(), lance::Error> {
|
||||
#[cfg(feature = "failpoints")]
|
||||
{
|
||||
fail::fail_point!(name, |_| {
|
||||
Err(lance::Error::incompatible_transaction_source(
|
||||
format!("injected failpoint triggered: {name}").into(),
|
||||
))
|
||||
});
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Failpoint that injects a *retryable* `RowLevelCasContention` `OmniError` — the
|
||||
/// typed conflict the manifest publisher's outer retry treats as retryable
|
||||
/// (`is_retryable_publish_conflict`). Used to drive the publisher's
|
||||
/// retry-on-`load_publish_state`-error path deterministically: the v3→v4 migration
|
||||
/// surfaces this same type on exhaustion EXPECTING the publisher to re-run the
|
||||
/// load, a path otherwise reachable only under sustained multi-writer contention.
|
||||
/// A no-op without the `failpoints` feature.
|
||||
#[allow(unused_variables)]
|
||||
pub(crate) fn maybe_fail_retryable_contention(name: &str) -> Result<()> {
|
||||
#[cfg(feature = "failpoints")]
|
||||
{
|
||||
fail::fail_point!(name, |_| {
|
||||
return Err(crate::error::OmniError::manifest_row_level_cas_contention(
|
||||
format!("injected retryable contention failpoint: {name}"),
|
||||
));
|
||||
});
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Compile-checked catalog of every failpoint name in this crate. Call sites
|
||||
/// (`maybe_fail`) and tests (`ScopedFailPoint` / the test rendezvous helper)
|
||||
/// reference these constants instead of bare string literals, so a typo is a
|
||||
|
|
@ -55,6 +113,14 @@ pub mod names {
|
|||
pub const SCHEMA_APPLY_AFTER_MANIFEST_COMMIT: &str = "schema_apply.after_manifest_commit";
|
||||
pub const SCHEMA_APPLY_AFTER_STAGING_WRITE: &str = "schema_apply.after_staging_write";
|
||||
pub const SCHEMA_APPLY_BEFORE_STAGING_WRITE: &str = "schema_apply.before_staging_write";
|
||||
// RFC-013 Phase 7 migration failpoints (this branch).
|
||||
pub const MIGRATION_V3_TO_V4_LEGACY_OPEN: &str = "migration.v3_to_v4.legacy_open";
|
||||
pub const MIGRATION_V4_STAMP_FORCE_INCOMPATIBLE: &str = "migration.v4_stamp.force_incompatible";
|
||||
/// Injects a retryable `RowLevelCasContention` from `load_publish_state` so a
|
||||
/// test can prove the publisher's outer retry re-runs the load (the migration
|
||||
/// surfaces this same typed error on exhaustion).
|
||||
pub const PUBLISH_LOAD_STATE_RETRYABLE_CONTENTION: &str =
|
||||
"publish.load_state_retryable_contention";
|
||||
}
|
||||
|
||||
#[cfg(feature = "failpoints")]
|
||||
|
|
|
|||
|
|
@ -1578,80 +1578,14 @@ fn literal_value_to_f64(v: &omnigraph_compiler::catalog::LiteralValue) -> f64 {
|
|||
|
||||
// ─── Edge cardinality validation ─────────────────────────────────────────────
|
||||
|
||||
pub(crate) async fn validate_edge_cardinality(
|
||||
db: &crate::db::Omnigraph,
|
||||
branch: Option<&str>,
|
||||
edge_name: &str,
|
||||
written_version: u64,
|
||||
written_branch: Option<&str>,
|
||||
) -> Result<()> {
|
||||
use arrow_array::Array;
|
||||
let catalog = db.catalog();
|
||||
let edge_type = &catalog.edge_types[edge_name];
|
||||
if edge_type.cardinality.is_default() {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Open edge sub-table at the just-written version, not the snapshot's
|
||||
// (the snapshot still pins to the pre-write version).
|
||||
let snapshot = db.snapshot_for_branch(branch).await?;
|
||||
let table_key = format!("edge:{}", edge_name);
|
||||
let entry = snapshot
|
||||
.entry(&table_key)
|
||||
.ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?;
|
||||
let ds = db
|
||||
.open_dataset_at_state(
|
||||
&entry.table_path,
|
||||
written_branch.or(entry.table_branch.as_deref()),
|
||||
written_version,
|
||||
)
|
||||
.await?;
|
||||
|
||||
// Scan src column, count per source
|
||||
let batches = db.storage().scan(&ds, Some(&["src"]), None, None).await?;
|
||||
|
||||
let mut counts: HashMap<String, u32> = HashMap::new();
|
||||
for batch in &batches {
|
||||
let srcs = batch
|
||||
.column_by_name("src")
|
||||
.unwrap()
|
||||
.as_any()
|
||||
.downcast_ref::<StringArray>()
|
||||
.unwrap();
|
||||
for i in 0..srcs.len() {
|
||||
*counts.entry(srcs.value(i).to_string()).or_insert(0) += 1;
|
||||
}
|
||||
}
|
||||
|
||||
let card = &edge_type.cardinality;
|
||||
for (src, count) in &counts {
|
||||
if let Some(max) = card.max {
|
||||
if *count > max {
|
||||
return Err(OmniError::manifest(format!(
|
||||
"@card violation on edge {}: source '{}' has {} edges (max {})",
|
||||
edge_name, src, count, max
|
||||
)));
|
||||
}
|
||||
}
|
||||
if *count < card.min {
|
||||
return Err(OmniError::manifest(format!(
|
||||
"@card violation on edge {}: source '{}' has {} edges (min {})",
|
||||
edge_name, src, count, card.min
|
||||
)));
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Validate edge `@card` cardinality with in-memory pending edges visible.
|
||||
///
|
||||
/// Loader-level analog to `exec::mutation::validate_edge_cardinality_with_pending`:
|
||||
/// opens the committed dataset at the pre-load snapshot version, then
|
||||
/// delegates to the shared `count_src_per_edge` + `enforce_cardinality_bounds`
|
||||
/// helpers in `exec::staging`. Used by Append/Merge loads (the Overwrite
|
||||
/// path uses `validate_edge_cardinality` which opens the just-written
|
||||
/// Lance version).
|
||||
/// helpers in `exec::staging`. Used by every load mode; for `LoadMode::Overwrite`
|
||||
/// it treats the pending edge batches as the replacement table image (the
|
||||
/// committed rows are being replaced, so only the pending set is counted).
|
||||
///
|
||||
/// `mode` controls dedup behavior. `LoadMode::Merge` passes `Some("id")`
|
||||
/// so committed edges that the load is *updating* (same edge id,
|
||||
|
|
|
|||
|
|
@ -3632,47 +3632,24 @@ async fn branch_merge_phase_b_failure_recovered_on_next_open() {
|
|||
);
|
||||
|
||||
// The recovered branch_merge must record a MERGE commit (with
|
||||
// `merged_parent_commit_id` set), not a plain commit. Without
|
||||
// this, future merges between the same pair lose
|
||||
// already-up-to-date detection. We verify by reading
|
||||
// `_graph_commits.lance` and asserting the most recent commit
|
||||
// tagged with the recovery actor has a non-null
|
||||
// `merged_parent_commit_id`.
|
||||
// `merged_parent_commit_id` set), not a plain commit. Without this, future
|
||||
// merges between the same pair lose already-up-to-date detection. RFC-013
|
||||
// Phase 7 records the recovery commit in `__manifest` (folded into the
|
||||
// recovery publish CAS), so we read it through the commit-graph projection
|
||||
// (`CommitGraph::load_commits`) and assert some commit carries a non-null
|
||||
// `merged_parent_commit_id`. Only a recovered branch_merge can produce one
|
||||
// here (we never completed a normal merge in this test).
|
||||
{
|
||||
use arrow_array::{Array, StringArray};
|
||||
use futures::TryStreamExt;
|
||||
let commits_dir = dir.path().join("_graph_commits.lance");
|
||||
let ds = lance::Dataset::open(commits_dir.to_str().unwrap())
|
||||
.await
|
||||
.unwrap();
|
||||
let batches: Vec<arrow_array::RecordBatch> = ds
|
||||
.scan()
|
||||
.try_into_stream()
|
||||
.await
|
||||
.unwrap()
|
||||
.try_collect()
|
||||
.await
|
||||
.unwrap();
|
||||
let mut found_recovery_merge = false;
|
||||
for batch in batches {
|
||||
let merged = batch
|
||||
.column_by_name("merged_parent_commit_id")
|
||||
.expect("merged_parent_commit_id column present")
|
||||
.as_any()
|
||||
.downcast_ref::<StringArray>()
|
||||
.expect("merged_parent_commit_id is Utf8");
|
||||
// The actor_id lives in _graph_commit_actors; cross-checking
|
||||
// is heavier than necessary. Detecting any non-null
|
||||
// merged_parent_commit_id in the post-recovery state is
|
||||
// sufficient: only a recovered branch_merge can produce one
|
||||
// here (we never completed a normal merge in this test).
|
||||
for i in 0..merged.len() {
|
||||
if !merged.is_null(i) {
|
||||
found_recovery_merge = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
let commits =
|
||||
omnigraph::db::commit_graph::CommitGraph::open(dir.path().to_str().unwrap())
|
||||
.await
|
||||
.unwrap()
|
||||
.load_commits()
|
||||
.await
|
||||
.unwrap();
|
||||
let found_recovery_merge = commits
|
||||
.iter()
|
||||
.any(|c| c.merged_parent_commit_id.is_some());
|
||||
assert!(
|
||||
found_recovery_merge,
|
||||
"recovered branch_merge must record `merged_parent_commit_id` so future \
|
||||
|
|
@ -4496,3 +4473,153 @@ async fn init_failpoint_returns_original_error_not_cleanup_error() {
|
|||
"init error must surface the failpoint cause, got: {msg}"
|
||||
);
|
||||
}
|
||||
|
||||
// ── RFC-013 Phase 7 / FIX A: a transient legacy-open failure must abort the ──
|
||||
// v3→v4 migration loudly, not silently swallow the lineage and stamp v4.
|
||||
//
|
||||
// `migrate_v3_to_v4` backfills graph lineage from `_graph_commits.lance` into
|
||||
// `__manifest`, then stamps internal-schema v4. The migration runs exactly once
|
||||
// per graph (`migrate_internal_schema` is `while stamp < CURRENT`). If a
|
||||
// transient or corrupt `Dataset::open` of the legacy commit dataset is treated
|
||||
// as "no legacy data" (the pre-fix `Err(_) => empty` arm), the migration backfills
|
||||
// NOTHING and stamps v4 — orphaning the real lineage permanently, since the v3
|
||||
// fallback is then disabled. The fix matches the not-found variants (benign:
|
||||
// genuinely no legacy data) and propagates anything else.
|
||||
//
|
||||
// This test injects a non-not-found Lance error at the legacy open via the
|
||||
// `migration.v3_to_v4.legacy_open` failpoint. The load-bearing assertion is the
|
||||
// last one: a once-transient failure leaves the graph RETRYABLE (stamp still v3,
|
||||
// no lineage), so a later open with the fault cleared completes the migration —
|
||||
// it was not a poison pill.
|
||||
#[tokio::test]
|
||||
async fn transient_legacy_open_failure_aborts_migration_without_stamping_v4() {
|
||||
let _scenario = FailScenario::setup();
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap().to_string();
|
||||
|
||||
// A real pre-Phase-7 (v3) graph: lineage only in `_graph_commits.lance`,
|
||||
// `__manifest` stamped v3 with no `graph_commit` rows.
|
||||
let fixture = omnigraph::db::commit_graph::seed_legacy_v3_lineage(&uri)
|
||||
.await
|
||||
.unwrap();
|
||||
let (rows_before, stamp_before) =
|
||||
omnigraph::db::manifest::lineage_row_count_and_stamp_for_test(&uri, None)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(stamp_before, 3, "fixture is stamped v3");
|
||||
assert_eq!(rows_before, 0, "fixture has no lineage in __manifest");
|
||||
|
||||
// Arm the legacy-open fault and run the read-write migration entry point.
|
||||
{
|
||||
let _fp = ScopedFailPoint::new(names::MIGRATION_V3_TO_V4_LEGACY_OPEN, "return");
|
||||
let err = match omnigraph::db::manifest::migrate_on_open_for_test(&uri).await {
|
||||
Ok(()) => panic!("migration must abort when the legacy open fails transiently"),
|
||||
Err(e) => e,
|
||||
};
|
||||
// The injected (non-not-found) Lance error must surface, not be masked.
|
||||
let msg = err.to_string();
|
||||
assert!(
|
||||
msg.contains("injected failpoint triggered: migration.v3_to_v4.legacy_open"),
|
||||
"expected the injected legacy-open error to propagate, got: {msg}"
|
||||
);
|
||||
}
|
||||
|
||||
// The migration left NO drift: stamp still v3, still no lineage. (Pre-fix,
|
||||
// the swallow would have stamped v4 with an empty backfill — permanent loss.)
|
||||
let (rows_after_fault, stamp_after_fault) =
|
||||
omnigraph::db::manifest::lineage_row_count_and_stamp_for_test(&uri, None)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(
|
||||
stamp_after_fault, 3,
|
||||
"a transient legacy-open failure must NOT stamp the manifest to v4",
|
||||
);
|
||||
assert_eq!(
|
||||
rows_after_fault, 0,
|
||||
"a transient legacy-open failure must NOT partially backfill lineage",
|
||||
);
|
||||
|
||||
// The whole correctness claim: a once-transient failure is retryable. With the
|
||||
// fault cleared, the next migration pass reads the legacy lineage and completes.
|
||||
omnigraph::db::manifest::migrate_on_open_for_test(&uri)
|
||||
.await
|
||||
.unwrap();
|
||||
let (rows_done, stamp_done) =
|
||||
omnigraph::db::manifest::lineage_row_count_and_stamp_for_test(&uri, None)
|
||||
.await
|
||||
.unwrap();
|
||||
assert_eq!(stamp_done, 4, "the retried migration stamps v4");
|
||||
assert_eq!(
|
||||
rows_done,
|
||||
fixture.all_ids.len(),
|
||||
"the retried migration backfills every legacy commit",
|
||||
);
|
||||
}
|
||||
|
||||
// ── RFC-013 Phase 7 / FIX B follow-up: the v3→v4 stamp-bump retry loop must ──
|
||||
// surface a RETRYABLE contention error on exhaustion, not a stringified Lance error.
|
||||
//
|
||||
// `commit_v4_stamp_idempotently` bumps the internal-schema stamp under concurrent
|
||||
// runners: the `UpdateConfig` CAS loser gets `IncompatibleTransaction`, re-opens,
|
||||
// confirms the winner stamped the same value, and is done. Genuine exhaustion (every
|
||||
// attempt loses) must return a `RowLevelCasContention` so the publisher's OUTER retry
|
||||
// completes the one-time open — an `OmniError::Lance` would be treated as fatal. The
|
||||
// `migration.v4_stamp.force_incompatible` failpoint forces every stamp attempt to lose,
|
||||
// driving the otherwise-near-unreachable exhaustion path deterministically. (Pre-fix —
|
||||
// `0..=BUDGET` + an `attempt < BUDGET` guard — the last iteration fell through to the
|
||||
// stringifying `Err(e)` arm and returned a non-retryable `OmniError::Lance`.)
|
||||
#[tokio::test]
|
||||
async fn v4_stamp_exhaustion_returns_retryable_contention() {
|
||||
let _scenario = FailScenario::setup();
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap().to_string();
|
||||
|
||||
// A real v3 graph: the backfill merge succeeds; only the terminal stamp loop
|
||||
// is forced to exhaust.
|
||||
let _fixture = omnigraph::db::commit_graph::seed_legacy_v3_lineage(&uri)
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let _fp = ScopedFailPoint::new(names::MIGRATION_V4_STAMP_FORCE_INCOMPATIBLE, "return");
|
||||
let err = match omnigraph::db::manifest::migrate_on_open_for_test(&uri).await {
|
||||
Ok(()) => panic!("migration must error when the stamp bump exhausts its retries"),
|
||||
Err(e) => e,
|
||||
};
|
||||
assert!(
|
||||
matches!(
|
||||
&err,
|
||||
omnigraph::error::OmniError::Manifest(m)
|
||||
if matches!(
|
||||
m.details,
|
||||
Some(omnigraph::error::ManifestConflictDetails::RowLevelCasContention)
|
||||
)
|
||||
),
|
||||
"stamp-bump exhaustion must surface a RETRYABLE RowLevelCasContention so the \
|
||||
publisher's outer retry completes the open, got: {err:?}",
|
||||
);
|
||||
}
|
||||
|
||||
// The publisher's outer retry must re-run `load_publish_state` on a RETRYABLE error,
|
||||
// not propagate it fatally. `load_publish_state` runs `migrate_internal_schema`, whose
|
||||
// bounded merge/stamp loops surface a `RowLevelCasContention` on exhaustion EXPECTING
|
||||
// this re-run (a clean second scan, by which point a concurrent winner has finished the
|
||||
// migration). Before the fix, `load_publish_state().await?` short-circuited the loop —
|
||||
// only `merge_rows` conflicts hit the retry — so the typed contention aborted the
|
||||
// publish. Inject a ONE-SHOT retryable contention into `load_publish_state`: the write
|
||||
// must still commit, because the publisher retries and the cleared second attempt wins.
|
||||
#[tokio::test]
|
||||
#[serial]
|
||||
async fn publisher_retries_retryable_load_publish_state_error() {
|
||||
let _scenario = FailScenario::setup();
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let db = helpers::init_and_load(&dir).await;
|
||||
|
||||
// `1*return`: fail only the FIRST `load_publish_state` of the next publish, so the
|
||||
// retry's second call is clean. Set after `init_and_load` so its publishes are
|
||||
// unaffected.
|
||||
let _fp = ScopedFailPoint::new(names::PUBLISH_LOAD_STATE_RETRYABLE_CONTENTION, "1*return");
|
||||
let row = r#"{"type":"Person","data":{"name":"Grace","age":37}}"#;
|
||||
db.load_as("main", None, row, LoadMode::Merge, None)
|
||||
.await
|
||||
.expect("publisher must retry the one-shot retryable load_publish_state error and commit");
|
||||
}
|
||||
|
|
|
|||
|
|
@ -86,6 +86,83 @@ async fn lance_error_too_much_write_contention_variant_exists() {
|
|||
);
|
||||
}
|
||||
|
||||
// --- Guard 1a: LanceError::IncompatibleTransaction variant exists ----------
|
||||
//
|
||||
// `db/manifest/migrations.rs::commit_v4_stamp_idempotently` pattern-matches on
|
||||
// this variant: two concurrent v3→v4 runners both bump the internal-schema stamp
|
||||
// (an `UpdateConfig` commit on the same metadata key), and the loser gets
|
||||
// `IncompatibleTransaction`. Since both write the same value the conflict is
|
||||
// benign and is retried idempotently. If Lance renames the variant or removes the
|
||||
// builder, the match silently stops catching the conflict — this guard fails to
|
||||
// force an update.
|
||||
|
||||
#[tokio::test]
|
||||
async fn lance_error_incompatible_transaction_variant_exists() {
|
||||
let err =
|
||||
lance::Error::incompatible_transaction_source("concurrent UpdateConfig at version N".into());
|
||||
assert!(
|
||||
matches!(err, lance::Error::IncompatibleTransaction { .. }),
|
||||
"Lance::Error::IncompatibleTransaction variant missing or renamed; \
|
||||
update db/manifest/migrations.rs::commit_v4_stamp_idempotently and \
|
||||
this guard, then re-pin docs/dev/lance.md."
|
||||
);
|
||||
}
|
||||
|
||||
// --- Guard 1c: LanceError::DatasetAlreadyExists variant exists --------------
|
||||
//
|
||||
// `db/commit_graph.rs` and `db/recovery_audit.rs` create internal Lance tables
|
||||
// with a create-or-open idempotency fallback: a concurrent/prior create races,
|
||||
// and the `DatasetAlreadyExists` arm falls back to `Dataset::open`. They match
|
||||
// the typed variant, NOT the display string ("Dataset already exists: ..."),
|
||||
// which is not a Lance API contract. If Lance renames the variant the match
|
||||
// silently stops catching the race and a re-create errors instead of opening —
|
||||
// this guard turns red to force an update.
|
||||
|
||||
#[tokio::test]
|
||||
async fn lance_error_dataset_already_exists_variant_exists() {
|
||||
let err = lance::Error::dataset_already_exists("guard");
|
||||
assert!(
|
||||
matches!(err, lance::Error::DatasetAlreadyExists { .. }),
|
||||
"Lance::Error::DatasetAlreadyExists variant missing or renamed; update the \
|
||||
db/commit_graph.rs + db/recovery_audit.rs create-or-open fallbacks and \
|
||||
this guard, then re-pin docs/dev/lance.md."
|
||||
);
|
||||
}
|
||||
|
||||
// --- Guard 1b: Dataset::open on a missing path returns a not-found variant --
|
||||
//
|
||||
// `db/commit_graph.rs::read_legacy_commit_cache` (the v3→v4 lineage migration
|
||||
// source) classifies a legacy-open error: a genuine not-found is the benign
|
||||
// "no legacy data" signal (empty cache), and ANY OTHER error propagates loudly
|
||||
// rather than being read as "empty" — a swallow there would let the migration
|
||||
// stamp v4 over an empty backfill, orphaning real lineage permanently. That
|
||||
// classification relies on Lance mapping an object-store NotFound to
|
||||
// `DatasetNotFound` (or, for some paths, `NotFound`). If a Lance bump emits a
|
||||
// different variant for a missing dataset, the migration would propagate a
|
||||
// genuine "no legacy data" as a hard error — this guard turns red to force the
|
||||
// classifier (and this guard) to be updated together.
|
||||
|
||||
#[tokio::test]
|
||||
async fn dataset_open_missing_returns_not_found_variant() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
// A path that was never written — nothing to open.
|
||||
let missing = dir.path().join("does-not-exist.lance");
|
||||
let err = match Dataset::open(missing.to_str().unwrap()).await {
|
||||
Ok(_) => panic!("opening a never-written dataset path must error"),
|
||||
Err(e) => e,
|
||||
};
|
||||
assert!(
|
||||
matches!(
|
||||
err,
|
||||
lance::Error::DatasetNotFound { .. } | lance::Error::NotFound { .. }
|
||||
),
|
||||
"Dataset::open on a missing path no longer returns DatasetNotFound/NotFound \
|
||||
(got: {err:?}); update db/commit_graph.rs::read_legacy_commit_cache's \
|
||||
legacy-open classification and this guard together, then re-pin \
|
||||
docs/dev/lance.md."
|
||||
);
|
||||
}
|
||||
|
||||
// --- Guard 2: ManifestLocation field shape ---------------------------------
|
||||
//
|
||||
// `db/manifest/metadata.rs:84-88` reads `.path`, `.size`, `.e_tag`,
|
||||
|
|
|
|||
235
crates/omnigraph/tests/lineage_projection.rs
Normal file
235
crates/omnigraph/tests/lineage_projection.rs
Normal file
|
|
@ -0,0 +1,235 @@
|
|||
//! RFC-013 Phase 7 acceptance gate: graph lineage lives ONLY in `__manifest`.
|
||||
//!
|
||||
//! The `graph_commit` + `graph_head` rows ride the same publish CAS as the
|
||||
//! table-version rows, so `_graph_commits.lance` carries NO commit rows. This
|
||||
//! gate proves two things over a realistic history (commits on main, a branch,
|
||||
//! a merge, all with actors):
|
||||
//!
|
||||
//! 1. The production commit-graph projection (`CommitGraph::open(...)`, which now
|
||||
//! reads `__manifest`) reconstructs the full lineage correctly — commit set,
|
||||
//! parents, the merge commit's two parents + merge actor, per-branch heads,
|
||||
//! and the inline actors.
|
||||
//! 2. `_graph_commits.lance` (and its actor sidecar) hold ZERO commit rows: the
|
||||
//! dual-write is gone and nothing appends to them. This is the load-bearing
|
||||
//! "single source" assertion.
|
||||
|
||||
mod helpers;
|
||||
|
||||
use futures::TryStreamExt;
|
||||
use lance::Dataset;
|
||||
|
||||
use omnigraph::db::commit_graph::CommitGraph;
|
||||
use omnigraph::db::{GraphCommit, Omnigraph};
|
||||
|
||||
use helpers::*;
|
||||
|
||||
/// Count rows in a Lance dataset directory under the graph root, or `0` if it
|
||||
/// does not exist.
|
||||
async fn row_count(root: &str, dir: &str) -> usize {
|
||||
let uri = format!("{}/{}", root.trim_end_matches('/'), dir);
|
||||
let Ok(dataset) = Dataset::open(&uri).await else {
|
||||
return 0;
|
||||
};
|
||||
let batches: Vec<arrow_array::RecordBatch> = dataset
|
||||
.scan()
|
||||
.try_into_stream()
|
||||
.await
|
||||
.unwrap()
|
||||
.try_collect()
|
||||
.await
|
||||
.unwrap();
|
||||
batches.iter().map(|b| b.num_rows()).sum()
|
||||
}
|
||||
|
||||
/// The production commit-graph projection at `branch`, sourced from `__manifest`.
|
||||
async fn projected_commits(root: &str, branch: Option<&str>) -> Vec<GraphCommit> {
|
||||
let graph = match branch {
|
||||
Some(branch) => CommitGraph::open_at_branch(root, branch).await.unwrap(),
|
||||
None => CommitGraph::open(root).await.unwrap(),
|
||||
};
|
||||
let mut commits = graph.load_commits().await.unwrap();
|
||||
commits.sort_by(|a, b| {
|
||||
a.manifest_version
|
||||
.cmp(&b.manifest_version)
|
||||
.then_with(|| a.created_at.cmp(&b.created_at))
|
||||
.then_with(|| a.graph_commit_id.cmp(&b.graph_commit_id))
|
||||
});
|
||||
commits
|
||||
}
|
||||
|
||||
async fn head_id(root: &str, branch: Option<&str>) -> String {
|
||||
let graph = match branch {
|
||||
Some(branch) => CommitGraph::open_at_branch(root, branch).await.unwrap(),
|
||||
None => CommitGraph::open(root).await.unwrap(),
|
||||
};
|
||||
graph
|
||||
.head_commit()
|
||||
.await
|
||||
.unwrap()
|
||||
.unwrap()
|
||||
.graph_commit_id
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn graph_lineage_lives_only_in_manifest() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let uri = dir.path().to_str().unwrap().to_string();
|
||||
|
||||
// Build a realistic history: several authored commits on main, a branch with
|
||||
// its own authored commits, then an authored merge back into main.
|
||||
let main = init_and_load(&dir).await;
|
||||
|
||||
main.mutate_as(
|
||||
"main",
|
||||
MUTATION_QUERIES,
|
||||
"insert_person",
|
||||
&mixed_params(&[("$name", "Alice")], &[("$age", 30)]),
|
||||
Some("act-alice"),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
main.mutate_as(
|
||||
"main",
|
||||
MUTATION_QUERIES,
|
||||
"insert_person",
|
||||
&mixed_params(&[("$name", "Bob")], &[("$age", 41)]),
|
||||
Some("act-bob"),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
main.branch_create("feature").await.unwrap();
|
||||
|
||||
let feature = Omnigraph::open(&uri).await.unwrap();
|
||||
feature
|
||||
.mutate_as(
|
||||
"feature",
|
||||
MUTATION_QUERIES,
|
||||
"insert_person",
|
||||
&mixed_params(&[("$name", "Carol")], &[("$age", 27)]),
|
||||
Some("act-carol"),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
feature
|
||||
.mutate_as(
|
||||
"feature",
|
||||
MUTATION_QUERIES,
|
||||
"insert_person",
|
||||
&mixed_params(&[("$name", "Dave")], &[("$age", 33)]),
|
||||
Some("act-dave"),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
// Advance main once more so the merge is a real (non-fast-forward) merge with
|
||||
// two distinct parents.
|
||||
main.mutate_as(
|
||||
"main",
|
||||
MUTATION_QUERIES,
|
||||
"insert_person",
|
||||
&mixed_params(&[("$name", "Erin")], &[("$age", 38)]),
|
||||
Some("act-erin"),
|
||||
)
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let outcome = main
|
||||
.branch_merge_as("feature", "main", Some("act-merger"))
|
||||
.await
|
||||
.unwrap();
|
||||
// A genuine three-way merge (both sides advanced past the base).
|
||||
assert_eq!(
|
||||
outcome,
|
||||
omnigraph::db::MergeOutcome::Merged,
|
||||
"expected a real merge, not fast-forward/up-to-date"
|
||||
);
|
||||
|
||||
// ── single source: nothing writes `_graph_commits.lance` ─────────────────
|
||||
// RFC-013 Phase 7 folds lineage into `__manifest`; the commit-graph dataset
|
||||
// exists only to carry branch refs, so it (and its actor sidecar) hold ZERO
|
||||
// commit rows. If a stray `append_commit` reappears, this turns red.
|
||||
assert_eq!(
|
||||
row_count(&uri, "_graph_commits.lance").await,
|
||||
0,
|
||||
"_graph_commits.lance must carry no commit rows — lineage lives in __manifest"
|
||||
);
|
||||
assert_eq!(
|
||||
row_count(&uri, "_graph_commit_actors.lance").await,
|
||||
0,
|
||||
"_graph_commit_actors.lance must carry no rows — actors live inline in __manifest"
|
||||
);
|
||||
|
||||
// ── main lineage projected from `__manifest` ─────────────────────────────
|
||||
let main_commits = projected_commits(&uri, None).await;
|
||||
// genesis + Alice + Bob + Erin + the merge = 5 on main.
|
||||
assert!(
|
||||
main_commits.len() >= 5,
|
||||
"expected a non-trivial main history, got {} commits",
|
||||
main_commits.len()
|
||||
);
|
||||
|
||||
// Genesis is the unique parentless commit and carries no actor.
|
||||
let genesis: Vec<&GraphCommit> = main_commits
|
||||
.iter()
|
||||
.filter(|c| c.parent_commit_id.is_none())
|
||||
.collect();
|
||||
assert_eq!(genesis.len(), 1, "exactly one genesis (parentless) commit");
|
||||
assert!(
|
||||
genesis[0].actor_id.is_none(),
|
||||
"genesis commit carries no actor"
|
||||
);
|
||||
|
||||
// Every non-genesis commit's parent resolves to a known commit (a connected
|
||||
// lineage — the publisher resolved each parent under the CAS).
|
||||
for commit in &main_commits {
|
||||
if let Some(parent) = &commit.parent_commit_id {
|
||||
assert!(
|
||||
main_commits.iter().any(|c| &c.graph_commit_id == parent),
|
||||
"parent {parent} of {} must be a known commit",
|
||||
commit.graph_commit_id
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// The merge commit carries both parents and the merge actor.
|
||||
let merge_commit = main_commits
|
||||
.iter()
|
||||
.find(|c| c.merged_parent_commit_id.is_some())
|
||||
.expect("a merge commit with a merged parent must exist");
|
||||
assert_eq!(merge_commit.actor_id.as_deref(), Some("act-merger"));
|
||||
assert!(merge_commit.parent_commit_id.is_some());
|
||||
// The merge is the head of main.
|
||||
assert_eq!(
|
||||
head_id(&uri, None).await,
|
||||
merge_commit.graph_commit_id,
|
||||
"the merge commit is the head of main"
|
||||
);
|
||||
|
||||
// ── feature lineage projected from `__manifest` ──────────────────────────
|
||||
let feature_commits = projected_commits(&uri, Some("feature")).await;
|
||||
// The feature head is Dave's commit (the last authored on the branch).
|
||||
let feature_head = head_id(&uri, Some("feature")).await;
|
||||
let feature_head_commit = feature_commits
|
||||
.iter()
|
||||
.find(|c| c.graph_commit_id == feature_head)
|
||||
.expect("feature head must be in the feature projection");
|
||||
assert_eq!(
|
||||
feature_head_commit.actor_id.as_deref(),
|
||||
Some("act-dave"),
|
||||
"feature head is Dave's authored commit"
|
||||
);
|
||||
|
||||
// ── actors surface inline from the manifest metadata ─────────────────────
|
||||
// main's authored commits: Alice, Bob, Erin (direct) + the merge (act-merger)
|
||||
// = 4. Carol/Dave were authored on the feature branch, not main. Genesis has
|
||||
// no actor.
|
||||
let authored = main_commits
|
||||
.iter()
|
||||
.filter(|c| c.actor_id.is_some())
|
||||
.count();
|
||||
assert!(
|
||||
authored >= 4,
|
||||
"expected the authored commits to surface their actor in the projection, saw {authored}"
|
||||
);
|
||||
}
|
||||
|
|
@ -97,7 +97,9 @@ async fn optimize_on_empty_graph_returns_stats_per_table_with_no_changes() {
|
|||
// Schema declares 2 nodes + 2 edges = 4 data tables, plus the 3 internal
|
||||
// system tables (`__manifest`, `_graph_commits`, `_graph_commit_actors`) optimize
|
||||
// also compacts (RFC-013 step 2) = 7. Compaction should run on each but find
|
||||
// nothing to merge.
|
||||
// nothing to merge. The genesis graph commit rides the SINGLE init
|
||||
// `__manifest` write (RFC-013 Phase 7), so a fresh graph has one fragment per
|
||||
// table — nothing to compact anywhere.
|
||||
assert_eq!(stats.len(), 7);
|
||||
for s in &stats {
|
||||
assert_eq!(s.fragments_removed, 0, "{} should not remove", s.table_key);
|
||||
|
|
@ -143,17 +145,20 @@ async fn optimize_after_load_then_again_is_idempotent() {
|
|||
}
|
||||
}
|
||||
|
||||
/// RFC-013 step 2: `optimize` compacts the internal system tables
|
||||
/// (`__manifest`, `_graph_commits`), which accumulate one fragment per commit.
|
||||
/// After compaction they shed fragments, write no recovery sidecar (a single
|
||||
/// atomic Lance commit — no HEAD-before-publish gap), and the graph stays
|
||||
/// coherent for subsequent reads + strict writes.
|
||||
/// RFC-013 step 2 + Phase 7: `optimize` compacts `__manifest`, which now
|
||||
/// accumulates one fragment per commit for BOTH the table-version rows and the
|
||||
/// folded-in graph-lineage rows (`graph_commit` + `graph_head`). The
|
||||
/// commit-graph datasets (`_graph_commits`, `_graph_commit_actors`) no longer
|
||||
/// take a per-commit row (lineage lives in `__manifest`), so they stay flat —
|
||||
/// nothing to compact. After compaction `__manifest` sheds fragments, writes no
|
||||
/// recovery sidecar (a single atomic Lance commit — no HEAD-before-publish gap),
|
||||
/// and the graph stays coherent for subsequent reads + strict writes.
|
||||
#[tokio::test]
|
||||
async fn optimize_compacts_internal_tables() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let mut db = init_and_load(&dir).await;
|
||||
|
||||
// Build version-history depth so the internal tables accumulate fragments.
|
||||
// Build version-history depth so `__manifest` accumulates fragments.
|
||||
for i in 0..20 {
|
||||
mutate_main(
|
||||
&mut db,
|
||||
|
|
@ -167,16 +172,32 @@ async fn optimize_compacts_internal_tables() {
|
|||
|
||||
let stats = db.optimize().await.unwrap();
|
||||
|
||||
for key in ["__manifest", "_graph_commits"] {
|
||||
// `__manifest` carries every per-commit fragment (table versions + lineage)
|
||||
// and compacts.
|
||||
let manifest_stats = stats
|
||||
.iter()
|
||||
.find(|s| s.table_key == "__manifest")
|
||||
.expect("optimize stats missing internal table __manifest");
|
||||
assert!(
|
||||
manifest_stats.committed,
|
||||
"__manifest should compact after 20 commits"
|
||||
);
|
||||
assert!(
|
||||
manifest_stats.fragments_removed > 0,
|
||||
"__manifest should shed fragments, removed {}",
|
||||
manifest_stats.fragments_removed
|
||||
);
|
||||
|
||||
// The commit-graph datasets take no per-commit row anymore (RFC-013 Phase 7
|
||||
// folds lineage into `__manifest`), so they stay at one fragment — no-ops.
|
||||
for key in ["_graph_commits", "_graph_commit_actors"] {
|
||||
let s = stats
|
||||
.iter()
|
||||
.find(|s| s.table_key == key)
|
||||
.unwrap_or_else(|| panic!("optimize stats missing internal table {key}"));
|
||||
assert!(s.committed, "{key} should compact after 20 commits");
|
||||
assert!(
|
||||
s.fragments_removed > 0,
|
||||
"{key} should shed fragments, removed {}",
|
||||
s.fragments_removed
|
||||
!s.committed,
|
||||
"{key} carries no per-commit rows after Phase 7 — nothing to compact"
|
||||
);
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -685,38 +685,21 @@ async fn list_recovery_audit_kinds(graph_root: &Path) -> Vec<String> {
|
|||
out
|
||||
}
|
||||
|
||||
/// Helper: count `_graph_commits.lance` rows tagged with the recovery actor.
|
||||
/// Helper: count graph commits authored by the recovery actor. RFC-013 Phase 7
|
||||
/// records the recovery commit in `__manifest` (folded into the recovery publish
|
||||
/// CAS), not `_graph_commits.lance`, so this counts through the production
|
||||
/// commit-graph projection (`load_commits`), filtering on the inline actor.
|
||||
async fn count_recovery_actor_commits(graph_root: &Path) -> usize {
|
||||
let actors_dir = graph_root.join("_graph_commit_actors.lance");
|
||||
if !actors_dir.exists() {
|
||||
return 0;
|
||||
}
|
||||
let ds = Dataset::open(actors_dir.to_str().unwrap()).await.unwrap();
|
||||
use arrow_array::{Array, StringArray};
|
||||
use futures::TryStreamExt;
|
||||
let batches: Vec<arrow_array::RecordBatch> = ds
|
||||
.scan()
|
||||
.try_into_stream()
|
||||
let commits = omnigraph::db::commit_graph::CommitGraph::open(graph_root.to_str().unwrap())
|
||||
.await
|
||||
.unwrap()
|
||||
.try_collect()
|
||||
.load_commits()
|
||||
.await
|
||||
.unwrap();
|
||||
let mut count = 0;
|
||||
for batch in &batches {
|
||||
let actors = batch
|
||||
.column_by_name("actor_id")
|
||||
.unwrap()
|
||||
.as_any()
|
||||
.downcast_ref::<StringArray>()
|
||||
.unwrap();
|
||||
for i in 0..actors.len() {
|
||||
if actors.value(i) == "omnigraph:recovery" {
|
||||
count += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
count
|
||||
commits
|
||||
.iter()
|
||||
.filter(|c| c.actor_id.as_deref() == Some("omnigraph:recovery"))
|
||||
.count()
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
|
|
|
|||
|
|
@ -130,7 +130,16 @@ async fn single_insert_data_write_is_bounded() {
|
|||
|
||||
/// At a fixed shallow depth, the per-write object-store read count is below a
|
||||
/// documented ceiling. Fails the moment a change *adds* a round-trip on the write
|
||||
/// path — the "no new round-trip" guard (calibrated: ~50 at depth ~5).
|
||||
/// path — the "no new round-trip" guard.
|
||||
///
|
||||
/// Two folds keep the count low: RFC-013 Phase 7 put the `graph_commit` +
|
||||
/// `graph_head` rows in the same publish merge-insert (no extra `__manifest`
|
||||
/// write/scan per commit), and RFC-013 P2 collapsed the publish path's FOUR
|
||||
/// `__manifest` scans (table locations + version entries + tombstones + a
|
||||
/// separate `read_graph_lineage` for the parent) into ONE — the
|
||||
/// `manifest_reads` sub-ceiling below would trip if any of those scans crept
|
||||
/// back. Calibrated at depth ~5: ~26 `__manifest` reads / ~36 total after the
|
||||
/// P2 fold (was ~44 / ~54 with the four separate scans).
|
||||
#[tokio::test]
|
||||
async fn write_op_count_ceiling_at_shallow_depth() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
|
@ -141,6 +150,16 @@ async fn write_op_count_ceiling_at_shallow_depth() {
|
|||
"depth~5: data={} __manifest={} _graph_commits={} total_reads={}",
|
||||
io.data_reads, io.manifest_reads, io.commit_graph_reads, io.total_reads()
|
||||
);
|
||||
// Sub-ceiling on `__manifest` reads specifically: the publish path does one
|
||||
// scan, not four. ~26 measured at this depth; a re-added scan would push it
|
||||
// well past this. (Deterministic on local FS.)
|
||||
const MANIFEST_CEILING: u64 = 34;
|
||||
assert!(
|
||||
io.manifest_reads <= MANIFEST_CEILING,
|
||||
"per-write __manifest reads {} exceeded ceiling {MANIFEST_CEILING} — a publish-path \
|
||||
scan was re-added (RFC-013 P2 folds them into one)",
|
||||
io.manifest_reads,
|
||||
);
|
||||
const CEILING: u64 = 80;
|
||||
assert!(
|
||||
io.total_reads() <= CEILING,
|
||||
|
|
|
|||
|
|
@ -613,7 +613,10 @@ async fn mixed_insert_and_update_on_same_person_coalesces_to_one_merge() {
|
|||
"dedupe must keep the update's age value, not the insert's",
|
||||
);
|
||||
|
||||
// One-publish guarantee: manifest version advanced by exactly 1.
|
||||
// One-publish guarantee: manifest version advanced by exactly 1. The graph
|
||||
// commit (`graph_commit` + `graph_head` rows) rides the SAME publish CAS as
|
||||
// the table-version rows (RFC-013 Phase 7), so one graph commit is exactly
|
||||
// one manifest version bump.
|
||||
let post_version = version_main(&db).await.unwrap();
|
||||
assert_eq!(
|
||||
post_version,
|
||||
|
|
@ -659,7 +662,9 @@ async fn multiple_appends_to_same_edge_coalesce_to_one_append() {
|
|||
let edges_after = count_rows(&db, "edge:Knows").await;
|
||||
assert_eq!(edges_after, edges_before + 2);
|
||||
|
||||
// One manifest version bump for the two-edge query (atomic publish).
|
||||
// One manifest version bump for the two-edge query (atomic publish): the
|
||||
// graph commit rides the same publish CAS as the table-version rows
|
||||
// (RFC-013 Phase 7).
|
||||
let post_version = version_main(&db).await.unwrap();
|
||||
assert_eq!(
|
||||
post_version,
|
||||
|
|
@ -690,6 +695,8 @@ async fn multi_statement_inserts_publish_exactly_once() {
|
|||
.await
|
||||
.unwrap();
|
||||
|
||||
// One manifest version bump: the graph commit rides the same publish CAS
|
||||
// as the table-version rows (RFC-013 Phase 7).
|
||||
let post_version = version_main(&db).await.unwrap();
|
||||
assert_eq!(
|
||||
post_version,
|
||||
|
|
@ -1005,6 +1012,8 @@ async fn chained_updates_with_overlapping_predicate_respects_intermediate_value(
|
|||
"chained-update final value must reflect the second update applied to op-1's pending value"
|
||||
);
|
||||
|
||||
// One manifest version bump: the graph commit rides the same publish CAS
|
||||
// as the table-version rows (RFC-013 Phase 7).
|
||||
let post_version = version_main(&db).await.unwrap();
|
||||
assert_eq!(
|
||||
post_version,
|
||||
|
|
@ -1043,6 +1052,9 @@ async fn multi_statement_delete_on_same_node_table() {
|
|||
pre_persons - 2,
|
||||
"both deletes must land",
|
||||
);
|
||||
// One manifest version bump: the graph commit (delete-only queries record
|
||||
// one too) rides the same publish CAS as the table-version rows
|
||||
// (RFC-013 Phase 7).
|
||||
let post_version = version_main(&db).await.unwrap();
|
||||
assert_eq!(
|
||||
post_version,
|
||||
|
|
|
|||
|
|
@ -133,7 +133,7 @@ flowchart TB
|
|||
subgraph state[graph state]
|
||||
coord[GraphCoordinator]:::l2
|
||||
mr[ManifestCoordinator<br/>db/manifest.rs]:::l2
|
||||
cg[CommitGraph<br/>_graph_commits.lance]:::l2
|
||||
cg[CommitGraph<br/>projection of __manifest graph_commit/graph_head rows]:::l2
|
||||
stg[MutationStaging<br/>per-query in-memory accumulator<br/>exec/staging.rs]:::l2
|
||||
end
|
||||
|
||||
|
|
|
|||
|
|
@ -28,7 +28,9 @@ for the canonical list. Current reality:
|
|||
|
||||
**Open PRs (land these; relationships in §7):**
|
||||
- **#296** `correctness-by-design-fix` — recovery roll-forward converges on a concurrent
|
||||
manifest advance (this is the fix for the flaky `iss-schema-apply-reopen-recovery-race`).
|
||||
manifest advance (the fix for the flaky `iss-schema-apply-reopen-recovery-race`).
|
||||
**MERGED to main and integrated into this branch** — the converge helper now threads
|
||||
Phase-7's manifest-CAS recovery `graph_commit_id` (see `converge_or_defer_roll_forward`).
|
||||
- **#295** `docs/rfc-013-step-3b` — the step-3b RFC doc.
|
||||
- **#254** `ragnorc/bug-4-schema-apply-occ` — schema-apply vs optimize false-fail
|
||||
(same op-class family as #297, logical side).
|
||||
|
|
@ -335,6 +337,32 @@ over the window — strictly higher liability than either Design A or waiting fo
|
|||
exposing uncommitted variants for `compact_files` / `optimize_indices` / vector index (#6666
|
||||
open; delete #6658 shipped). Track, don't build yet.
|
||||
|
||||
### 5.1 Step-5 design constraints inherited from the #295 spec review
|
||||
3b shipped a **minimal** `WriteTxn { branch, base }` (schema-once + open-collapse via
|
||||
eliminate/probe/thread) and **deferred** the full §4.1 opener-unification — the pinned-base
|
||||
opener, the shared-`Session` open, the write-local **handle cache**, and the strict-op
|
||||
conflict-timing move — to step 5. So the greptile-bot comments on the #295 *spec* were **moot
|
||||
for #298** (which built none of those constructs) but are **load-bearing constraints for step
|
||||
5** when it builds them. Bank them:
|
||||
1. **Handle cache must be `Send + Sync`** (`Mutex<HashMap<…, Dataset>>`, not `RefCell`) if
|
||||
`WriteTxn::open(&self)` is shared across concurrent stage futures — a `RefCell` compiles
|
||||
but panics when two stages poll. Or make it `&mut self` (no parallel-stage sharing). This
|
||||
is the deny-list "in-process-only `Dataset` impls — `Send + Sync`" item.
|
||||
2. **The strict-op timing move needs an explicit retry contract.** If step 5 moves
|
||||
strict-op conflict detection from open-time `ensure_expected_version` to commit-time CAS
|
||||
(the §4.1 pinned-base design), it MUST specify: the txn is **discarded after any commit**
|
||||
(success or conflict — the handle cache is commit-invalidated), and the retry **re-opens a
|
||||
fresh `WriteTxn` at the new HEAD** (never re-stages against the stale pinned base — that
|
||||
reproduces the lost-update). **This is the same retry/refresh contract as the stale-view
|
||||
false-fail (§1d.2)** — the op-class-aware precondition + "fresh base on retry" are one
|
||||
design point. Today (#298) strict ops keep open-at-HEAD + `ensure_expected_version`, so the
|
||||
contract is unchanged; step 5 owns it the moment it pins strict reads to the base.
|
||||
3. **The opener-equivalence test must be non-trivial.** A differential test that only passes
|
||||
when `HEAD == base` proves nothing about pinning. To actually prove "`WriteTxn::open`
|
||||
returns the pinned base, not HEAD," the test must **advance the branch HEAD externally
|
||||
(direct Lance write), then assert the txn open still reads the base version** — and that a
|
||||
strict write then fails `ExpectedVersionMismatch` at commit (verifying the timing move).
|
||||
|
||||
---
|
||||
|
||||
## 6. Why #297 is still needed even if you do Design A
|
||||
|
|
@ -353,10 +381,13 @@ open; delete #6658 shipped). Track, don't build yet.
|
|||
step 3b stacks on it.
|
||||
- **#254** — logical-class fix (schema-apply vs optimize false-fail). Same op-class family;
|
||||
both are de-risking inputs for Design A's per-class commit models.
|
||||
- **#296** — recovery roll-forward converges on concurrent manifest advance. This is the fix
|
||||
- **#296** — recovery roll-forward converges on concurrent manifest advance. The fix
|
||||
for the flaky `iss-schema-apply-reopen-recovery-race`. It touches `recovery.rs` and is
|
||||
*aligned* with #297's "postcondition is the state, not winning the CAS" principle — reconcile
|
||||
the monotonic publish with #296's converge helper if #296 lands first.
|
||||
*aligned* with #297's "postcondition is the state, not winning the CAS" principle. **#296
|
||||
landed on main first and is merged into this branch:** the converge helper
|
||||
(`converge_or_defer_roll_forward`) was reconciled with Phase-7's manifest-CAS roll-forward —
|
||||
on convergence the audit references the winner's folded `graph_commit_id` (the current
|
||||
`graph_head`), not a freshly minted one.
|
||||
- **#295** — the step-3b RFC doc (apply §4's three corrections to it).
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -253,20 +253,43 @@ them explicit.
|
|||
acknowledged-before-visible bug this branch fixed. Close it (local CAS
|
||||
primitive, or a trait-level lock requirement) before admitting any
|
||||
lock-free `if_match` caller.
|
||||
- **Manifest→commit-graph publish atomicity:** a graph commit advances
|
||||
`__manifest` (the visibility authority) and then appends `_graph_commits` as
|
||||
two separate writes (`commit_updates_with_actor_with_expected`, failpoint
|
||||
`graph_publish.before_commit_append`). A crash between them leaves the manifest
|
||||
at version N with no commit-graph row for N. Live reads and durability are
|
||||
unaffected — the live version resolves via the manifest
|
||||
(`GraphCoordinator::version()`), not the commit-graph head — and the open-time
|
||||
recovery sweep does NOT repair it (`lance_head == manifest_pinned` classifies
|
||||
`NoMovement`; a recovery sidecar would not change this). Impact is bounded to
|
||||
commit history: `commit list` misses N, time-travel by commit id to N fails,
|
||||
and merge-base loses a node (a likely-benign off-by-one re-merge). This affects
|
||||
every publish, not a specific maintenance command. Eventual fix: make the
|
||||
commit graph reconcilable from the manifest (or the two writes atomic) — not a
|
||||
recovery-sidecar concern.
|
||||
- **Manifest→commit-graph publish atomicity — CLOSED (RFC-013 Phase 7):** graph
|
||||
lineage now lives ONLY in `__manifest`, as `graph_commit` + `graph_head:<branch>`
|
||||
rows written in the SAME `MergeInsertBuilder` commit as the table-version rows
|
||||
(`commit_changes_with_lineage` → `GraphNamespacePublisher::publish` with a
|
||||
`LineageIntent`). There is no second write to fail between — a graph commit and
|
||||
its lineage land at one manifest version atomically, so a crash after the publish
|
||||
leaves no gap. The commit-graph cache is a derived projection of those manifest
|
||||
rows; nothing writes `_graph_commits.lance` (it persists only to carry branch
|
||||
refs). The prior two-write gap (manifest at N with no `_graph_commits` row for N)
|
||||
is gone by construction. A graph created before Phase 7 (internal schema v3)
|
||||
carries its lineage only in `_graph_commits.lance`; the `migrate_v3_to_v4`
|
||||
internal-schema step (`db/manifest/migrations.rs`) backfills it into `__manifest`
|
||||
per-branch on the first read-write open (idempotent, crash-safe, data-preserving),
|
||||
and a read-only open of an un-migrated v3 graph sources the DAG from
|
||||
`_graph_commits.lance` via a stamp-gated transitional fallback so reads stay
|
||||
correct until the first write migrates it. An old binary refuses a v4-stamped
|
||||
graph (read-write and read-only) with the standard upgrade error. The migration
|
||||
is **loud on failure and concurrent-runner idempotent**: the legacy-open read
|
||||
(`read_legacy_commit_cache`) treats only a genuine not-found as "no legacy data"
|
||||
and propagates any other open error (so a transient/corrupt open can never stamp
|
||||
v4 over an empty backfill — orphaning lineage permanently), and the backfill
|
||||
converges all-or-nothing when two runners open the same legacy graph at once — a
|
||||
bounded re-open retry on the `graph_head:<branch>` row-level CAS plus an
|
||||
idempotent terminal stamp bump (both runners write the same value, so a concurrent
|
||||
`UpdateConfig`/`IncompatibleTransaction` loss re-opens and no-ops if the stamp
|
||||
already landed). The branch read path (`load_commit_cache_for_branch`) also
|
||||
refuses an out-of-range branch stamp (`> CURRENT` or `< MIN_SUPPORTED`;
|
||||
defense-in-depth; not a live hole because migrations run main-first, so main
|
||||
refuses first). The migration chain is **floor-bounded**:
|
||||
`MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION` (migrations.rs; 1 today, a pure no-op) is
|
||||
the oldest stamp this binary opens, enforced symmetrically with the ceiling by the
|
||||
single `refuse_if_stamp_unsupported` guard at all three stamp-read sites
|
||||
(write-path migrate, read-only open, branch lineage-read). Raising MIN sheds the
|
||||
now-dead `migrate_vN_…` arms and (at MIN ≥ 4) the `commit_graph_legacy_v3` legacy
|
||||
readers; a compile-time tripwire (`LOWEST_REGISTERED_MIGRATION_SOURCE`) fails the
|
||||
build if the floor and the lowest registered arm drift. Retirement runbook lives on
|
||||
the `MIN_SUPPORTED_INTERNAL_SCHEMA_VERSION` doc-comment.
|
||||
- **Planner capability/stat surfaces:** cost-aware planning, complete
|
||||
capability advertisement, and explain-with-cost are roadmap. Do not describe
|
||||
them as implemented.
|
||||
|
|
@ -302,19 +325,23 @@ them explicit.
|
|||
in history; but they are not yet brought into `cleanup` (version GC), so the
|
||||
`_versions/` chain still grows until an explicit cleanup (the cleanup half is
|
||||
deferred — it needs the Q8 cleanup-resurrection watermark first). The commit
|
||||
graph is not yet reconcilable from the manifest; and the traversal id-map is
|
||||
graph IS now reconcilable from the manifest (RFC-013 Phase 7 — it is a pure
|
||||
projection of the `graph_commit`/`graph_head` rows); the traversal id-map is
|
||||
still rebuilt.
|
||||
- **Commit-graph parent under concurrency:** `record_graph_commit` now refreshes
|
||||
the commit-graph head from storage before appending, so a same-branch write
|
||||
after an external commit no longer forks the commit DAG by parenting off a
|
||||
stale cached head (the single-process fork, pre-existing for non-strict
|
||||
inserts and widened to strict ops by Fix 1's `refresh_manifest_only`, is now
|
||||
closed). Residual: two processes writing disjoint tables can still pass their
|
||||
per-table manifest CAS and append off the same parent (a refresh-then-append
|
||||
TOCTOU). The convergent fix is reconcile-from-manifest (parent = the commit at
|
||||
the manifest version the publisher CAS'd against; `manifest_version` is on
|
||||
every commit row), composing with the manifest-to-commit-graph atomicity gap;
|
||||
it needs commit-graph append ordering or a Lance append-CAS to fully close.
|
||||
- **Commit-graph parent under concurrency — CLOSED (RFC-013 Phase 7):** the graph
|
||||
commit is now recorded in the manifest publish CAS, and the publisher resolves
|
||||
the new commit's parent INSIDE its retry loop, per attempt, from the just-loaded
|
||||
`__manifest` (the `should_replace_head` winner over the visible `graph_commit`
|
||||
rows). A CAS-conflict retry re-reads the advanced head and parents correctly, so
|
||||
the refresh-then-append TOCTOU is gone. Two processes writing disjoint tables on
|
||||
the same branch now also contend on the shared `graph_head:<branch>` row (one
|
||||
`object_id`, `WhenMatched::UpdateAll`): one wins, the other retries and re-parents
|
||||
— so the cross-process disjoint-table fork is closed too. This is the intended
|
||||
§7.1 contention point, pinned by
|
||||
`manifest::tests::concurrent_disjoint_writes_share_head_and_form_linear_chain`
|
||||
(two disjoint writers → both commit, single linear chain) and
|
||||
`manifest::tests::n_concurrent_disjoint_writers_converge_to_one_linear_chain`
|
||||
(N=8 disjoint writers with app-level retry → one linear chain of 8, no fork).
|
||||
|
||||
## Deny-list
|
||||
|
||||
|
|
|
|||
|
|
@ -170,6 +170,7 @@ Migration from Lance 6.0.1 → 7.0.0 landed in this cycle. **Arrow stayed 58, Da
|
|||
- **Native `DirectoryNamespace` no longer recognizes omnigraph's manifest-tracked tables** (`lance-namespace-impls` dir.rs ~L1310): `list/describe/create_table_version` route through `check_table_status`, which reports an omnigraph table absent → `TableNotFound`. The decoupling is *contingent on omnigraph's legacy boolean PK key*, not an unconditional v7 property: v7's namespace eagerly adds the new `lance-schema:unenforced-primary-key:position` key to any `__manifest` lacking it; that write hits the immutable-PK rule above (the boolean key already set the PK), so `ensure_manifest_table_up_to_date` errors and the namespace silently falls back to directory listing. omnigraph keeps the boolean key deliberately — Lance honors it permanently (maps to PK position 0), and one uniform on-disk format beats a new-vs-old split (existing graphs can't be re-keyed to the position key under that same immutability rule). omnigraph production never uses Lance's native namespace (its publisher writes `__manifest` directly via merge_insert; its own `namespace.rs` impls are custom), so this is test-only — the `test_directory_namespace_direct_publish_cannot_replace_native_omnigraph_write_path` surface guard was realigned to the v7 behavior (it now asserts the native namespace is fully decoupled, which only strengthens the guard's thesis).
|
||||
- **Still NOT fixed in 7.0.0:** vector-index two-phase (Lance #6666 open) — `create_vector_index` inline residual retained; blob-column compaction — `compact_files_still_fails_on_blob_columns` guard still red on a fix, `optimize` still skips blob tables behind `LANCE_SUPPORTS_BLOB_COMPACTION`.
|
||||
- **No Lance API surface omnigraph uses changed at *compile* time** (the only compile break was object_store) — but **two runtime behaviors did** (the unenforced-PK immutability and the native-namespace `TableNotFound`, above), each caught by the full engine test suite rather than the build. `CleanupPolicy`, `WriteParams` (apart from the `auto_cleanup` default), `CompactionOptions`, the namespace models (resolved via `lance-namespace-reqwest-client` 0.7.7, unchanged across the bump), `Operation`, `ManifestLocation`, and `MergeInsertBuilder` shapes are all stable. Lesson: a clean build is not a clean alignment — run `cargo test --workspace` before declaring a Lance bump done.
|
||||
- **Two surface guards added by the v3→v4 migration-robustness follow-up** (not a Lance bump, but they pin Lance error surfaces the migration now classifies on): `dataset_open_missing_returns_not_found_variant` (a missing `Dataset::open` returns `DatasetNotFound`/`NotFound` — the legacy-open read in `db/commit_graph.rs::read_legacy_commit_cache` treats only those as "no legacy data" and propagates everything else) and `lance_error_incompatible_transaction_variant_exists` (a concurrent `UpdateConfig` stamp-bump loses with `IncompatibleTransaction` — `db/manifest/migrations.rs::commit_v4_stamp_idempotently` matches it to retry the benign same-value race). Re-run on a Lance bump like the others.
|
||||
|
||||
Bump this date stanza on the next alignment pass.
|
||||
|
||||
|
|
|
|||
|
|
@ -46,7 +46,7 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
|
|||
| `validators.rs` | Schema constraint enforcement (enum, range, unique, cardinality) across JSONL, insert, update paths |
|
||||
| `policy_engine_chassis.rs` | Engine-layer Cedar enforcement (MR-722): allow + deny through every `_as` writer via the SDK directly — no HTTP — proving embedded and CLI callers hit the same gate as the server, with action × scope shapes matching `authorize_request` |
|
||||
| `maintenance.rs` | `optimize` (compaction), `repair` (explicit uncovered-drift publish), and `cleanup` (version GC): empty/idempotent/no-op edges, policy validation, head preservation; `optimize` publishes its own compaction (`optimize_publishes_compaction_to_manifest_so_schema_apply_succeeds`), skips pre-existing uncovered drift (`optimize_skips_preexisting_manifest_head_drift`), and refuses to run while a `__recovery` sidecar is pending (`optimize_defers_when_recovery_sidecar_is_pending`); `repair` previews/heals verified maintenance drift, refuses raw semantic drift without `--force`, and forced repair publishes only by explicit operator choice; the index reconciler (iss-848): `index_build_tolerates_null_vector_rows` (an untrainable Vector column defers instead of aborting the build, sibling indexes still build) and `optimize_materializes_index_declared_but_unbuilt` (optimize creates a declared-but-deferred index) |
|
||||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`, `optimize_phase_b_failure_recovered_on_next_open`) and the write-entry in-process heal contract (the four `*_after_finalize_publisher_failure_heals_without_reopen` tests — load, mutation, schema apply, branch merge: a follow-up write on the same handle rolls a sidecar-covered residual forward without reopen/refresh) and the storage-fault matrix for the sidecar lifecycle (`recovery.sidecar_{write,delete,list}` / `recovery.record_audit` failpoints: Phase A put failure aborts with zero drift, Phase D delete failure is swallowed and healed by the next write, list failures are loud at heal and open, audit-append failures are retried to exactly one audit row; plus the bucket-gated `s3_load_recovers_after_publisher_failure_without_reopen`) and the convergence-idempotent roll-forward regression (`open_sweep_roll_forward_converges_when_manifest_advances_concurrently`: two concurrent open-sweeps race one sidecar at the `recovery.before_roll_forward_publish` rendezvous; the CAS loser must converge, not fail the open — iss-schema-apply-reopen-recovery-race). |
|
||||
| `failpoints.rs` | Failure-injection coverage (gated on `failpoints` feature). Includes the five per-writer Phase B → recovery integration tests (`recovery_rolls_forward_after_finalize_publisher_failure`, `schema_apply_phase_b_failure_recovered_on_next_open`, `branch_merge_phase_b_failure_recovered_on_next_open`, `ensure_indices_phase_b_failure_recovered_on_next_open`, `optimize_phase_b_failure_recovered_on_next_open`) and the write-entry in-process heal contract (the four `*_after_finalize_publisher_failure_heals_without_reopen` tests — load, mutation, schema apply, branch merge: a follow-up write on the same handle rolls a sidecar-covered residual forward without reopen/refresh) and the storage-fault matrix for the sidecar lifecycle (`recovery.sidecar_{write,delete,list}` / `recovery.record_audit` failpoints: Phase A put failure aborts with zero drift, Phase D delete failure is swallowed and healed by the next write, list failures are loud at heal and open, audit-append failures are retried to exactly one audit row; plus the bucket-gated `s3_load_recovers_after_publisher_failure_without_reopen`). Also the v3→v4 migration fault-injection test (`transient_legacy_open_failure_aborts_migration_without_stamping_v4`, `migration.v3_to_v4.legacy_open` failpoint): a transient legacy-open failure aborts the migration loudly and leaves it retryable (stamp stays v3, no partial backfill), never stamping v4 over an empty backfill. Also the v4 stamp-bump exhaustion regression (`v4_stamp_exhaustion_returns_retryable_contention`, `migration.v4_stamp.force_incompatible` failpoint): the stamp retry loop surfaces a retryable `RowLevelCasContention` on exhaustion, not a stringified `Lance`. And the convergence-idempotent roll-forward regression (`open_sweep_roll_forward_converges_when_manifest_advances_concurrently`: two concurrent open-sweeps race one sidecar at the `recovery.before_roll_forward_publish` rendezvous; the CAS loser must converge, not fail the open — iss-schema-apply-reopen-recovery-race). |
|
||||
| `recovery.rs` | Open-time recovery sweep — sidecar I/O, classifier dispatch (NoMovement / RolledPastExpected / UnexpectedAtP1 / UnexpectedMultistep / InvariantViolation), all-or-nothing decision, roll-forward via `ManifestBatchPublisher::publish`, roll-back via `Dataset::restore`, audit row in `_graph_commit_recoveries.lance`, `OpenMode::ReadOnly` skip path |
|
||||
| `composite_flow.rs` | Compositional/narrative end-to-end stories — multi-step flows that compose mechanics covered by other test files. Catches integration regressions where individual operations all pass their unit tests but their composition breaks (sequential merges, post-merge main writes, time-travel through merge DAG, reopen consistency over multi-merge histories, post-optimize and post-cleanup strict writes). |
|
||||
|
||||
|
|
|
|||
|
|
@ -230,8 +230,9 @@ recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`:
|
|||
rolled-back-to version (`manifest_pinned`); the manifest is published at the
|
||||
restore commit (`manifest_pinned + 1`, same content).
|
||||
- After a successful roll-forward or roll-back, an audit row is
|
||||
recorded — `_graph_commits.lance` carries
|
||||
a commit tagged `actor_id = "omnigraph:recovery"`, and a sibling
|
||||
recorded — the graph commit lineage (the `graph_commit` rows in `__manifest`
|
||||
since RFC-013 Phase 7) carries a commit tagged
|
||||
`actor_id = "omnigraph:recovery"`, and a sibling
|
||||
`_graph_commit_recoveries.lance` row carries `recovery_kind`,
|
||||
`recovery_for_actor` (the original sidecar's actor), `operation_id`,
|
||||
per-table outcomes. Operators run `omnigraph commit list --filter
|
||||
|
|
@ -336,20 +337,40 @@ actual }`. The HTTP server maps this to **409 Conflict** with body
|
|||
|
||||
## Audit
|
||||
|
||||
`actor_id` lands in `_graph_commits.lance` via `record_graph_commit` (no
|
||||
intermediate run record). Audit history is queried via `omnigraph commit
|
||||
list`.
|
||||
`actor_id` lands in the graph commit lineage — the `graph_commit` rows in
|
||||
`__manifest`, written in the publish CAS (RFC-013 Phase 7; previously
|
||||
`_graph_commits.lance`). Audit history is queried via `omnigraph commit list`.
|
||||
|
||||
## Migration code
|
||||
|
||||
`db/manifest/migrations.rs` carries the v2→v3 internal-schema step (MR-770):
|
||||
a one-time sweep that deletes legacy `__run__*` staging branches off
|
||||
`__manifest`. It runs in `Omnigraph::open(ReadWrite)` (via
|
||||
`manifest::migrate_on_open`, before the coordinator reads branch state) and
|
||||
again on the publisher's write path; both are idempotent once the stamp is at
|
||||
v3. Deleting the inert `_graph_runs.lance` / `_graph_run_actors.lance` dataset
|
||||
*bytes* is still deferred — it needs a `StorageAdapter::delete_prefix`
|
||||
primitive — but those bytes are invisible to graph-level state.
|
||||
`db/manifest/migrations.rs` is the single place on-disk `__manifest` shape is
|
||||
reconciled with what the binary expects, stepping the
|
||||
`omnigraph:internal_schema_version` stamp forward one `match`-arm at a time. It
|
||||
runs in `Omnigraph::open(ReadWrite)` (via `manifest::migrate_on_open`, before the
|
||||
coordinator reads branch state) and again on the publisher's write path, so each
|
||||
branch migrates on its first write; every step is idempotent under crash-retry
|
||||
(work first, stamp bump last).
|
||||
|
||||
- **v2→v3** (MR-770): a one-time sweep that deletes legacy `__run__*` staging
|
||||
branches off `__manifest`. Deleting the inert `_graph_runs.lance` /
|
||||
`_graph_run_actors.lance` dataset *bytes* is still deferred — it needs a
|
||||
`StorageAdapter::delete_prefix` primitive — but those bytes are invisible to
|
||||
graph-level state.
|
||||
- **v3→v4** (RFC-013 Phase 7, `migrate_v3_to_v4`): backfills the graph lineage
|
||||
from `_graph_commits.lance` into `__manifest` as `graph_commit` / `graph_head`
|
||||
rows. A graph created before Phase 7 has its lineage only in
|
||||
`_graph_commits.lance`; the new binary reads lineage from the `__manifest`
|
||||
projection, so without this backfill it would see an empty commit DAG. The
|
||||
backfill is per-branch (each branch migrates on its first write), idempotent
|
||||
(keyed on `object_id`; a fast-path guard skips when `__manifest` already
|
||||
carries `graph_commit` rows), and writes exactly one `graph_head:<branch>` row
|
||||
for the actual head. `_graph_commits.lance` is left in place as the branch-ref
|
||||
carrier — no commit row is written to it again. While a graph is below v4, a
|
||||
**read-only** open (which never writes, so never migrates) sources the commit
|
||||
DAG from `_graph_commits.lance` via the stamp-gated transitional fallback in
|
||||
`CommitGraph::open*`, so reads see correct history before the first write
|
||||
migrates the graph. An old binary opening a v4-stamped graph is refused with an
|
||||
"upgrade omnigraph" error in both read-write and read-only modes.
|
||||
|
||||
## Mid-query partial failure: closed by MR-794
|
||||
|
||||
|
|
|
|||
61
docs/releases/v0.8.0.md
Normal file
61
docs/releases/v0.8.0.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
# Omnigraph v0.8.0 (in progress)
|
||||
|
||||
> Draft release notes for the next minor. The version line in `AGENTS.md` and the
|
||||
> crate manifests are bumped when this release is cut — these notes track the
|
||||
> user-visible delta as the RFC-013 work lands.
|
||||
|
||||
This release moves the graph commit lineage into `__manifest` (RFC-013 Phase 7)
|
||||
and ships a **one-time on-disk migration** for existing graphs. It is the first
|
||||
release with an internal-schema change since v0.4.0, so it has an upgrade-order
|
||||
requirement — read the upgrade notes before rolling it out.
|
||||
|
||||
## Graph lineage now lives in `__manifest` (internal schema v4)
|
||||
|
||||
The graph commit DAG (commits, parents, merge parents, per-branch heads, and the
|
||||
authoring actor) is now stored in `__manifest` as `graph_commit` / `graph_head`
|
||||
rows, written in the **same commit (CAS)** as the table-version rows of a graph
|
||||
publish. Previously the lineage lived in a separate `_graph_commits.lance`
|
||||
dataset written after the manifest commit, leaving a narrow window where a crash
|
||||
could land a manifest version with no matching lineage row. Folding the lineage
|
||||
into the publish closes that gap by construction: a graph commit and its lineage
|
||||
now land atomically at one manifest version. The in-memory commit graph is a
|
||||
projection of those manifest rows; `_graph_commits.lance` is retained only as a
|
||||
carrier for Lance branch refs and no longer receives commit rows.
|
||||
|
||||
This bumps the `__manifest` internal schema stamp from **v3 to v4**.
|
||||
|
||||
## Existing graphs migrate seamlessly on first write
|
||||
|
||||
A graph created by an earlier binary (internal schema v3) keeps its lineage in
|
||||
`_graph_commits.lance` with none in `__manifest`. On the **first read-write
|
||||
open**, Omnigraph backfills that lineage into `__manifest` (the `migrate_v3_to_v4`
|
||||
internal-schema step) and bumps the stamp to v4. The migration:
|
||||
|
||||
- is **per-branch** — each branch backfills on its first write;
|
||||
- is **idempotent and crash-safe** — the stamp bump is the last step, and the
|
||||
backfill is keyed on the commit id, so a crash mid-migration re-runs harmlessly
|
||||
on the next open;
|
||||
- **preserves all data** — every commit, parent, merge parent, actor, and head is
|
||||
carried over; commit ids are stable, so existing references still resolve.
|
||||
|
||||
No data is lost and no operator action is required beyond upgrading the binary.
|
||||
|
||||
Before its first write migrates the graph, a **read-only** open of a v3 graph
|
||||
(e.g. `omnigraph commit list`, NDJSON export) still reads correct history via a
|
||||
transitional fallback that sources the commit DAG from `_graph_commits.lance` —
|
||||
read-only opens never write, so they never migrate, but they never show an empty
|
||||
history either.
|
||||
|
||||
## Breaking: upgrade writer binaries first
|
||||
|
||||
Internal schema v4 is a hard version gate. Once a graph has been opened for write
|
||||
by a v0.8.0 binary, its `__manifest` is stamped v4, and an **older binary will
|
||||
refuse to open it** — read-write *and* read-only — with an
|
||||
`upgrade omnigraph before opening this graph` error rather than silently
|
||||
misreading the new lineage. This is the standard forward-version protection
|
||||
(same shape as the v1→v2 / v2→v3 steps), now enforced on the read-only path too.
|
||||
|
||||
**Upgrade order:** upgrade every writer (and reader) binary that touches a graph
|
||||
to v0.8.0 before, or together with, the first write under the new version. A
|
||||
mixed fleet where an old binary still writes the same graph is unsupported, as
|
||||
with any internal-schema bump.
|
||||
|
|
@ -20,13 +20,14 @@ OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordin
|
|||
- **Layout**:
|
||||
- `nodes/{fnv1a64-hex(type_name)}` — one Lance dataset per node type
|
||||
- `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
|
||||
- `__manifest/` — the catalog of all sub-tables and their published versions
|
||||
- `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
|
||||
- `__manifest/` — the catalog of all sub-tables and their published versions, **and** the graph commit lineage (RFC-013 Phase 7)
|
||||
- `_graph_commits.lance` / `_graph_commit_actors.lance` — legacy / branch-ref carriers. Since RFC-013 Phase 7 the graph lineage lives in `__manifest` (`graph_commit` / `graph_head` rows, written in the publish CAS); `_graph_commits.lance` no longer receives commit rows, but is retained to carry the Lance branch refs that `create_branch` / `list_branches` / the `cleanup` orphan reconciler operate on. A graph created before Phase 7 (internal schema v3) keeps its lineage here until its first read-write open, which migrates it into `__manifest` via `migrate_v3_to_v4`.
|
||||
- (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 graphs are inert; the run state machine was removed. The internal schema migration sweeps stale `__run__*` branches on first write-open; the inert dataset bytes themselves remain until a prefix-delete storage primitive lands)
|
||||
- **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`):
|
||||
- `object_type` ∈ `table | table_version | table_tombstone`
|
||||
- `table_key` ∈ `node:<TypeName> | edge:<EdgeName>`
|
||||
- `object_type` ∈ `table | table_version | table_tombstone | graph_commit | graph_head`
|
||||
- `table_key` ∈ `node:<TypeName> | edge:<EdgeName>` (empty for `graph_commit` / `graph_head` lineage rows)
|
||||
- `table_branch` is `null` for the main lineage and the branch name otherwise
|
||||
- **Graph lineage rows** (RFC-013 Phase 7): one immutable `graph_commit` row per commit (`object_id` = the commit ULID; `metadata` JSON carries parent / merged-parent / actor / timestamp) plus one mutable `graph_head:<branch>` pointer per branch (`graph_head:main` for main). The in-memory commit DAG is a projection of these rows.
|
||||
- **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones — rows where `object_type = table_tombstone`, whose own `table_version` (acting as the tombstone version) is `>= the entry's table_version`.
|
||||
- **Atomic publish**: multi-dataset commits publish so that a single write to `__manifest` flips all the new sub-table versions visible at once.
|
||||
- **Row-level CAS on the merge-insert join key**: `object_id` carries an unenforced-primary-key annotation so Lance's bloom-filter conflict resolver rejects two concurrent commits that land the same `object_id` row. Without this annotation, Lance's transparent rebase would admit silent duplicates from racing publishers.
|
||||
|
|
@ -90,8 +91,8 @@ flowchart TB
|
|||
- **Graph root** is one directory (or S3 prefix). Everything below is part of one OmniGraph graph.
|
||||
- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
|
||||
- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
|
||||
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the internal schema migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once a prefix-delete primitive lands.)
|
||||
- **`_graph_commit_recoveries.lance`** — one row per crash-recovery action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join.
|
||||
- **`_graph_commits.lance`** is an L2 dataset retained only as a branch-ref carrier (and, on a pre-Phase-7 graph, the migration source). Since RFC-013 Phase 7 the graph commit DAG lives in `__manifest` as `graph_commit` / `graph_head` rows written in the publish CAS — `_graph_commits.lance` and its paired `_graph_commit_actors.lance` no longer receive commit rows. A graph created before Phase 7 (internal schema v3) backfills its lineage into `__manifest` on its first read-write open (`migrate_v3_to_v4`). (Pre-v0.4.0 graphs also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; the internal schema migration sweeps their stale `__run__*` branches, and the dataset bytes are reclaimed once a prefix-delete primitive lands.)
|
||||
- **`_graph_commit_recoveries.lance`** — one row per crash-recovery action. Joined by `graph_commit_id` to the graph commit lineage (the `graph_commit` rows in `__manifest` since RFC-013 Phase 7); the linked commit carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join.
|
||||
- **`__recovery/{ulid}.json`** — transient sidecar files written by a writer before it advances the underlying dataset, deleted once the matching manifest publish succeeds. A sidecar persisting after process exit means the writer crashed mid-commit; the next read-write open processes it. Steady-state directory is empty.
|
||||
- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
|
||||
- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
|
||||
|
|
|
|||
|
|
@ -3,12 +3,12 @@
|
|||
| Name | Value | Area |
|
||||
|---|---|---|
|
||||
| `MANIFEST_DIR` | `__manifest` | manifest layout |
|
||||
| Commit graph dir | `_graph_commits.lance` | commit graph |
|
||||
| Commit graph dir | `_graph_commits.lance` | branch-ref carrier + pre-v4 lineage source (lineage lives in `__manifest` since RFC-013 Phase 7) |
|
||||
| Run registry dir (legacy, removed) | `_graph_runs.lance` | inert post-v0.4.0; bytes remain until a prefix-delete primitive lands |
|
||||
| Run branch prefix (legacy, removed) | `__run__` | swept off `__manifest` by the internal schema migration; no longer a reserved name |
|
||||
| Schema apply lock | `__schema_apply_lock__` | schema apply |
|
||||
| Manifest publisher retry budget | `PUBLISHER_RETRY_BUDGET = 5` | manifest publish |
|
||||
| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 3` | manifest migrations |
|
||||
| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 4` | manifest migrations (v4 = graph lineage in `__manifest`, RFC-013 Phase 7) |
|
||||
| Merge stage batch | `MERGE_STAGE_BATCH_ROWS = 8192` | merge execution |
|
||||
| Maintenance concurrency | `OMNIGRAPH_MAINTENANCE_CONCURRENCY=8` | optimize/cleanup |
|
||||
| Lance blob compaction support | `LANCE_SUPPORTS_BLOB_COMPACTION = false` | optimize |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue