mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-21 02:28:07 +02:00

Ragnor Comerford 7168ee0ed0

fix(engine): stop branch-merge fast-forward OOM on embedding tables (#277 )

* fix(engine): stop branch-merge fast-forward OOM on embedding tables

A branch→main fast-forward merge of a forked, embedding-bearing table
re-derived the whole branch row-by-row: it lumped new + changed rows into
one Lance `merge_insert`, i.e. a full-outer hash join over the entire
delta that exhausts the DataFusion memory pool (8k rows × 3072-dim →
`Resources exhausted: 188MB HashJoinInput, 100MB pool`), so the merge
hung/failed instead of completing.

Fix the data path on existing, substrate-supported primitives:

- Adopt-with-delta split: new rows → `stage_append` (a streaming
`Operation::Append`, no hash join), only genuinely-changed rows →
a bounded `stage_merge_insert`, deletes inline. New `AdoptDelta` /
`compute_adopt_delta` / `publish_adopted_delta` replace the combined
`compute_source_delta` path; the three-way merge path is untouched.
- Stream the append via `stage_append_stream` →
`execute_uncommitted_stream` (the substrate-blessed bulk-append path),
removing the `Vec`+`concat` full-delta materialization. Blob-aware via
`scan_stream_for_rewrite`. Exposed on the sealed `TableStorage` trait.
- Lazy row-signature: stop stringifying every row's embedding eagerly;
compute the signature only for the `(Some,Some)` changed-candidate arm.
- Index coverage is reconciler-owned: the adopt path no longer rebuilds
vector/FTS indexes inline; `optimize`/`ensure_indices` folds the new
rows in (reads stay correct via brute-force tail). Post-merge
index-coverage contract documented in docs/user/branching/merge.md.
- Recovery pin: new `CandidateTableState::AdoptWithDelta` is classified
and pinned so the append's HEAD advance is sidecar-covered
(invariant 5); the `BranchMerge` sidecar's loose classification covers
the two-commit shape.

The regression gate is structural, not a brittle size threshold: task-local
write probes assert an append-only fast-forward merge does 0
`stage_merge_insert` (the OOM hash join), appends via `stage_append`, and
streams (0 whole-delta materialization). Plus functional correctness,
blob round-trip, index-defer, and a Phase-B failpoint recovery test.

Residual: the classify-time staging round-trip is still O(N) in memory
(architecturally required for the all-or-nothing multi-table publish);
bounding it fully is the fragment-adopt follow-up.

* test(engine): partial branch-merge Phase B must roll back (RED regression)

A branch-merge per-table publish is a multi-commit sequence — adopt:
append → upsert → delete; three-way: merge_insert → delete → index — each
step advancing Lance HEAD before the single manifest publish. Add four
failpoint sites at those windows and four regression tests (mixed delta:
a fresh id, a modified base id, a removed base id) asserting that a crash
mid-sequence rolls the whole merge BACK on the next open and a re-run
re-applies the full delta.

RED against current code: the loose `BranchMerge` classification rolls any
`lance_head > manifest_pinned` forward, so the partial is published and the
merge recorded — the rolled-back-to-base assertion fails with the partial
state visible (e.g. bob appended, dave not deleted). The fix lands next.

The failpoint sites are no-ops unless the `failpoints` feature activates them.

* fix(engine): roll back partial branch-merge Phase B (recovery WAL confirmation)

A branch-merge publishes each table with several Lance commits (adopt:
append → upsert → delete; three-way: merge_insert → delete → index), then
one manifest publish makes them atomic. Recovery classified `BranchMerge`
loosely: any `lance_head > manifest_pinned` with a matching CAS pin rolled
*forward* to the observed HEAD. So a crash mid-sequence published a partial
delta (e.g. the append without its sibling upsert/delete) and recorded the
merge as complete — silent data loss; a re-merge sees "already up to date"
and never repairs it.

Fix: make the recovery sidecar a two-phase WAL for `BranchMerge`. After the
whole per-table publish loop completes, stamp each pin's `confirmed_version`
with its exact achieved Lance version (a second sidecar write), then publish
the manifest. Recovery now:

- rolls FORWARD only to a pin's `confirmed_version` (set ⇒ Phase B finished);
- rolls BACK (`TableClassification::IncompletePhaseB`) when the HEAD moved but
no confirmation was recorded ⇒ a partial publish ⇒ all-or-nothing restore to
the manifest pin, so a re-run re-applies the full delta.

Scope: `BranchMerge` only. Other loose writers (`SchemaApply`,
`EnsureIndices`, `Optimize`) keep the loose roll-forward — their drift is
derived state (index coverage, compaction) a partial roll-forward never
corrupts, so confirmation would be cost without benefit.

This is the write-ahead intent record + idempotent roll-forward that the
fast-forward-main commit model requires to be crash-atomic across N tables;
version-recorded (not phase-count-derived), so it survives later changes to
the per-table commit sequence.

Regression tests (failpoints): four partial-window crashes — adopt
after-append / after-upsert, three-way after-merge / after-delete — each with
a mixed delta (new id, modified id, removed id) now roll the whole merge back;
the existing complete-Phase-B tests still roll forward.

* fix(engine): scope merge index docs to fast-forward; record append probe after write

Two PR-review fixes:

- docs(merge): the "a merge does not build indexes inline" note only holds for
the fast-forward / adopt path (deferred to the reconciler). The three-way
`Merged` path still rebuilds indexes inline in its publish, so a
Merged-outcome merge of an embedding table pays the build up front. Scope the
doc so a Merged-outcome user isn't surprised or led to skip a post-merge
optimize.

- `stage_append` recorded its instrumentation probe before the fallible
`execute_uncommitted`, so a failed staging write left the call/row counters
inflated — and diverged from `stage_append_stream`, which records after the
transaction is built. Record after the write succeeds.

* fix(engine): record stage_merge_insert / vector-index probes after write too

The prior commit moved `stage_append`'s instrumentation probe to after the
write, but left the two sibling write primitives with the identical ordering
bug: `stage_merge_insert` recorded before `execute_uncommitted`, and
`create_vector_index` before the index build. A failed write on either would
inflate the probe counter. Move both to record only after the write succeeds,
so all write-primitive probes share one rule (record-after-success) — closing
the class rather than the single instance the review flagged.

* docs(engine): mark the fragment-adopt excision boundary in the merge code

Comment the transitional row-level merge code so a future fragment-adopt
implementation (Lance branch-merge/rebase #7263 + UUID branch paths #7185)
knows exactly what it deletes and what it keeps:

- `AdoptDelta` / `compute_adopt_delta` / `publish_adopted_delta` — the row-level
re-derivation; removed wholesale when a fast-forward merge becomes a fragment
graft (adopt the source table version's fragments + indexes by reference).
- `stage_append_stream` — its only caller is that merge append; dead with it
unless re-adopted as a general bulk-append path.
- `confirm_sidecar_phase_b` — the boundary marker: this SURVIVES. The recovery
sidecar is the cross-table WAL a fast-forward-main commit still needs; only the
within-table multi-commit reason for `IncompletePhaseB` narrows once each table
is a single graft commit. Keep the sidecar; only simplify the classifier.

Comments only; no behavior change.

* test(engine): pre-upgrade v1 branch-merge sidecar must roll forward (RED)

Phase-B confirmation made the recovery classifier strict for every BranchMerge
sidecar — including ones written by a binary that predates confirmation. A
pre-upgrade crash in the Phase-B→C gap can leave such a sidecar over a COMPLETED
merge; the new classifier reads its absent confirmed_version as a partial and
rolls it back, silently discarding the finished merge (greptile P1 / Cursor High).

This regression test synthesizes that sidecar realistically: crash after Phase B
(real sidecar + advanced Lance HEAD), downgrade the on-disk JSON to the
pre-confirmation v1 shape (schema_version=1, strip confirmed_version), reopen.
RED: the merge rolls back, `bob` is discarded (left ["alice"], want
["alice","bob"]). The versioning fix lands next.

* fix(engine): version the recovery sidecar; read pre-confirmation merges as loose

Phase-B confirmation changed how a BranchMerge sidecar's absent confirmed_version
is interpreted (roll forward → roll back) without versioning the artifact, so the
new classifier silently discarded completed pre-upgrade merges (greptile P1 /
Cursor High). A capability flag would not fix the symmetric direction — keeping
schema_version=1, an OLD binary reading a NEW sidecar sails through its
already-shipped strict gate, ignores the unknown flag, and applies loose
semantics to a new partial → the same data loss on downgrade. Use the versioning
system instead.

- Bump SIDECAR_SCHEMA_VERSION 1 → 2; add a fixed CONFIRMATION_SCHEMA_VERSION = 2
(the generation at which confirmation shipped — pinned, so a later v3 keeps v2
confirmation-aware).
- Make the read gate version-aware (`parse_sidecar`): refuse only versions NEWER
than this binary; accept and interpret older ones with their original
semantics — no operator toil draining pre-upgrade sidecars. Rename
`SidecarSchemaError.supported_version` → `max_supported_version` and reword.
- Dispatch classification by version: the strict BranchMerge confirmation path is
gated on `schema_version >= CONFIRMATION_SCHEMA_VERSION`; a v1 BranchMerge
sidecar falls through to the existing loose roll-forward. Thread
`sidecar.schema_version` from `process_sidecar`.

This is bidirectionally safe: a new binary interprets v1 (loose) and v2 (strict)
and refuses the future; an old binary's `!= 1` gate already refuses v2, so it
never misreads a new sidecar. The flag was an additive-field pattern misapplied
to a semantics change; versioning is the correct mechanism.

Honest residual (any approach): an old *partial* sidecar still rolls forward —
v1 carries no confirmation, so partialness is undetectable in it. The fix stops
us from interpreting old sidecars under new rules; it can't retrofit information
they never had.

* fix(engine): harden recovery — mode resolver, loud divergence check, publish classified version

Three correct-by-design fixes from the holistic review of the recovery path, all
in recovery.rs (each closes a class, not an instance):

A. Resolve the classification mode from `(kind, schema_version)` once, instead of
a kind×version match accreting fall-through guards in `classify_table`. New
`ClassificationMode { Strict, Loose, Confirmed }` + an exhaustive
`SidecarKind::classification_mode` — adding a writer kind or version floor is
now one arm in the resolver (the compiler forces it), not a guard threaded
through the classifier. No behavior change; existing classify/decide tests are
the guard.

B. `confirm_sidecar_phase_b` now errors loudly when a pinned table has no achieved
version in the publish `updates`, instead of silently skipping it (which left
the pin unconfirmed → `IncompletePhaseB` → a silent rollback of a COMPLETE
merge). Guards the implicit `pins ⊆ updates` invariant against a future
divergence between the two filters (invariants 9/13). + a unit test.

C. Recovery roll-forward publishes the version classification OBSERVED
(`state.lance_head`), not a fresh HEAD re-read at publish time. For a Confirmed
pin classify already validated `lance_head == confirmed_version`, so this
publishes the recorded WAL intent by construction and closes the
classify→publish re-derivation/TOCTOU for every writer (invariant 15).
`push_table_update_at_head` → `push_table_update(target_version: Option<u64>)`:
roll-forward pins the classified version; roll-back keeps `None` (publishes the
restore commit it just made). In-scope behavior is preserved, so the existing
roll-forward integration tests are the guard; the drift-hardening is
correct-by-construction (deterministic mid-sweep drift injection isn't feasible
— a sync failpoint can't do an async Lance write).

2026-06-19 00:15:06 +02:00

3 KiB

Raw Blame History

Merging Branches

Merging integrates the changes on one branch into another. OmniGraph merges are three-way and row-level: it compares both branches against their common ancestor and merges each node/edge table row by row, then publishes the result as one atomic commit across the whole graph.

omnigraph branch merge review/2026-04-25 --into main s3://bucket/graph.omni

branch merge <source> [--into <target>] merges <source> into <target> (default main).

Outcomes

A merge resolves to one of three outcomes:

Already up to date — the target already contains every change on the source; nothing to do.
Fast-forward — the target has no changes the source lacks, so the target simply advances to the source.
Merged — both sides diverged; a new merge commit is created with two parents.

Indexes after a merge

A fast-forward merge (the common case — the target had no conflicting changes, so the source's rows are adopted) does not build or rebuild indexes on the rows it brings into the target. Newly merged rows (and any index a table does not yet have) are covered the next time optimize runs — indexes are derived state, and reads stay correct in the meantime via brute-force scan over the not-yet-covered rows. This keeps a fast-forward merge fast (it never pays an inline vector/FTS rebuild on the publish path), at the cost of brute-force search latency on freshly merged rows until the next optimize.

A three-way merge (the Merged outcome — both branches changed the table and the rows were reconciled) still rebuilds the table's indexes inline today, as part of the publish. So a Merged-outcome merge of an embedding-bearing table pays the index-build cost up front.

Either way, run omnigraph optimize after a large merge to restore (or, for the fast-forward path, establish) full index coverage.

Conflicts

When both branches changed the same data incompatibly, the merge fails with a structured list of conflicts (the HTTP server returns 409 with a merge_conflicts[] array). No partial result is published — the merge is all-or-nothing. The conflict kinds are:

Kind	Meaning
`DivergentInsert`	The same id was inserted on both branches.
`DivergentUpdate`	The same row was updated differently on both branches.
`DeleteVsUpdate`	One side deleted a row the other side updated.
`OrphanEdge`	An edge references a node the other side deleted.
`UniqueViolation`	The merged result would violate a unique constraint.
`CardinalityViolation`	The merged result would violate an edge cardinality constraint.
`ValueConstraintViolation`	The merged result would violate a value constraint (enum/range).

Each conflict carries the table, the row id (when applicable), the kind, and a message. Resolve conflicts by reconciling the two branches — typically by making the conflicting change on one side and re-merging.

See branches & commits for the branch and commit-DAG model, and changes for diffing two branches before you merge.

3 KiB Raw Blame History

Merging Branches

Outcomes

Indexes after a merge

Conflicts

3 KiB

Raw Blame History