mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-21 02:28:07 +02:00
2 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7168ee0ed0
|
fix(engine): stop branch-merge fast-forward OOM on embedding tables (#277)
* fix(engine): stop branch-merge fast-forward OOM on embedding tables A branch→main fast-forward merge of a forked, embedding-bearing table re-derived the whole branch row-by-row: it lumped new + changed rows into one Lance `merge_insert`, i.e. a full-outer hash join over the entire delta that exhausts the DataFusion memory pool (8k rows × 3072-dim → `Resources exhausted: 188MB HashJoinInput, 100MB pool`), so the merge hung/failed instead of completing. Fix the data path on existing, substrate-supported primitives: - Adopt-with-delta split: new rows → `stage_append` (a streaming `Operation::Append`, no hash join), only genuinely-changed rows → a bounded `stage_merge_insert`, deletes inline. New `AdoptDelta` / `compute_adopt_delta` / `publish_adopted_delta` replace the combined `compute_source_delta` path; the three-way merge path is untouched. - Stream the append via `stage_append_stream` → `execute_uncommitted_stream` (the substrate-blessed bulk-append path), removing the `Vec`+`concat` full-delta materialization. Blob-aware via `scan_stream_for_rewrite`. Exposed on the sealed `TableStorage` trait. - Lazy row-signature: stop stringifying every row's embedding eagerly; compute the signature only for the `(Some,Some)` changed-candidate arm. - Index coverage is reconciler-owned: the adopt path no longer rebuilds vector/FTS indexes inline; `optimize`/`ensure_indices` folds the new rows in (reads stay correct via brute-force tail). Post-merge index-coverage contract documented in docs/user/branching/merge.md. - Recovery pin: new `CandidateTableState::AdoptWithDelta` is classified and pinned so the append's HEAD advance is sidecar-covered (invariant 5); the `BranchMerge` sidecar's loose classification covers the two-commit shape. The regression gate is structural, not a brittle size threshold: task-local write probes assert an append-only fast-forward merge does 0 `stage_merge_insert` (the OOM hash join), appends via `stage_append`, and streams (0 whole-delta materialization). Plus functional correctness, blob round-trip, index-defer, and a Phase-B failpoint recovery test. Residual: the classify-time staging round-trip is still O(N) in memory (architecturally required for the all-or-nothing multi-table publish); bounding it fully is the fragment-adopt follow-up. * test(engine): partial branch-merge Phase B must roll back (RED regression) A branch-merge per-table publish is a multi-commit sequence — adopt: append → upsert → delete; three-way: merge_insert → delete → index — each step advancing Lance HEAD before the single manifest publish. Add four failpoint sites at those windows and four regression tests (mixed delta: a fresh id, a modified base id, a removed base id) asserting that a crash mid-sequence rolls the whole merge BACK on the next open and a re-run re-applies the full delta. RED against current code: the loose `BranchMerge` classification rolls any `lance_head > manifest_pinned` forward, so the partial is published and the merge recorded — the rolled-back-to-base assertion fails with the partial state visible (e.g. bob appended, dave not deleted). The fix lands next. The failpoint sites are no-ops unless the `failpoints` feature activates them. * fix(engine): roll back partial branch-merge Phase B (recovery WAL confirmation) A branch-merge publishes each table with several Lance commits (adopt: append → upsert → delete; three-way: merge_insert → delete → index), then one manifest publish makes them atomic. Recovery classified `BranchMerge` loosely: any `lance_head > manifest_pinned` with a matching CAS pin rolled *forward* to the observed HEAD. So a crash mid-sequence published a partial delta (e.g. the append without its sibling upsert/delete) and recorded the merge as complete — silent data loss; a re-merge sees "already up to date" and never repairs it. Fix: make the recovery sidecar a two-phase WAL for `BranchMerge`. After the whole per-table publish loop completes, stamp each pin's `confirmed_version` with its exact achieved Lance version (a second sidecar write), then publish the manifest. Recovery now: - rolls FORWARD only to a pin's `confirmed_version` (set ⇒ Phase B finished); - rolls BACK (`TableClassification::IncompletePhaseB`) when the HEAD moved but no confirmation was recorded ⇒ a partial publish ⇒ all-or-nothing restore to the manifest pin, so a re-run re-applies the full delta. Scope: `BranchMerge` only. Other loose writers (`SchemaApply`, `EnsureIndices`, `Optimize`) keep the loose roll-forward — their drift is derived state (index coverage, compaction) a partial roll-forward never corrupts, so confirmation would be cost without benefit. This is the write-ahead intent record + idempotent roll-forward that the fast-forward-main commit model requires to be crash-atomic across N tables; version-recorded (not phase-count-derived), so it survives later changes to the per-table commit sequence. Regression tests (failpoints): four partial-window crashes — adopt after-append / after-upsert, three-way after-merge / after-delete — each with a mixed delta (new id, modified id, removed id) now roll the whole merge back; the existing complete-Phase-B tests still roll forward. * fix(engine): scope merge index docs to fast-forward; record append probe after write Two PR-review fixes: - docs(merge): the "a merge does not build indexes inline" note only holds for the fast-forward / adopt path (deferred to the reconciler). The three-way `Merged` path still rebuilds indexes inline in its publish, so a Merged-outcome merge of an embedding table pays the build up front. Scope the doc so a Merged-outcome user isn't surprised or led to skip a post-merge optimize. - `stage_append` recorded its instrumentation probe before the fallible `execute_uncommitted`, so a failed staging write left the call/row counters inflated — and diverged from `stage_append_stream`, which records after the transaction is built. Record after the write succeeds. * fix(engine): record stage_merge_insert / vector-index probes after write too The prior commit moved `stage_append`'s instrumentation probe to after the write, but left the two sibling write primitives with the identical ordering bug: `stage_merge_insert` recorded before `execute_uncommitted`, and `create_vector_index` before the index build. A failed write on either would inflate the probe counter. Move both to record only after the write succeeds, so all write-primitive probes share one rule (record-after-success) — closing the class rather than the single instance the review flagged. * docs(engine): mark the fragment-adopt excision boundary in the merge code Comment the transitional row-level merge code so a future fragment-adopt implementation (Lance branch-merge/rebase #7263 + UUID branch paths #7185) knows exactly what it deletes and what it keeps: - `AdoptDelta` / `compute_adopt_delta` / `publish_adopted_delta` — the row-level re-derivation; removed wholesale when a fast-forward merge becomes a fragment graft (adopt the source table version's fragments + indexes by reference). - `stage_append_stream` — its only caller is that merge append; dead with it unless re-adopted as a general bulk-append path. - `confirm_sidecar_phase_b` — the boundary marker: this SURVIVES. The recovery sidecar is the cross-table WAL a fast-forward-main commit still needs; only the within-table multi-commit reason for `IncompletePhaseB` narrows once each table is a single graft commit. Keep the sidecar; only simplify the classifier. Comments only; no behavior change. * test(engine): pre-upgrade v1 branch-merge sidecar must roll forward (RED) Phase-B confirmation made the recovery classifier strict for every BranchMerge sidecar — including ones written by a binary that predates confirmation. A pre-upgrade crash in the Phase-B→C gap can leave such a sidecar over a COMPLETED merge; the new classifier reads its absent confirmed_version as a partial and rolls it back, silently discarding the finished merge (greptile P1 / Cursor High). This regression test synthesizes that sidecar realistically: crash after Phase B (real sidecar + advanced Lance HEAD), downgrade the on-disk JSON to the pre-confirmation v1 shape (schema_version=1, strip confirmed_version), reopen. RED: the merge rolls back, `bob` is discarded (left ["alice"], want ["alice","bob"]). The versioning fix lands next. * fix(engine): version the recovery sidecar; read pre-confirmation merges as loose Phase-B confirmation changed how a BranchMerge sidecar's absent confirmed_version is interpreted (roll forward → roll back) without versioning the artifact, so the new classifier silently discarded completed pre-upgrade merges (greptile P1 / Cursor High). A capability flag would not fix the symmetric direction — keeping schema_version=1, an OLD binary reading a NEW sidecar sails through its already-shipped strict gate, ignores the unknown flag, and applies loose semantics to a new partial → the same data loss on downgrade. Use the versioning system instead. - Bump SIDECAR_SCHEMA_VERSION 1 → 2; add a fixed CONFIRMATION_SCHEMA_VERSION = 2 (the generation at which confirmation shipped — pinned, so a later v3 keeps v2 confirmation-aware). - Make the read gate version-aware (`parse_sidecar`): refuse only versions NEWER than this binary; accept and interpret older ones with their original semantics — no operator toil draining pre-upgrade sidecars. Rename `SidecarSchemaError.supported_version` → `max_supported_version` and reword. - Dispatch classification by version: the strict BranchMerge confirmation path is gated on `schema_version >= CONFIRMATION_SCHEMA_VERSION`; a v1 BranchMerge sidecar falls through to the existing loose roll-forward. Thread `sidecar.schema_version` from `process_sidecar`. This is bidirectionally safe: a new binary interprets v1 (loose) and v2 (strict) and refuses the future; an old binary's `!= 1` gate already refuses v2, so it never misreads a new sidecar. The flag was an additive-field pattern misapplied to a semantics change; versioning is the correct mechanism. Honest residual (any approach): an old *partial* sidecar still rolls forward — v1 carries no confirmation, so partialness is undetectable in it. The fix stops us from interpreting old sidecars under new rules; it can't retrofit information they never had. * fix(engine): harden recovery — mode resolver, loud divergence check, publish classified version Three correct-by-design fixes from the holistic review of the recovery path, all in recovery.rs (each closes a class, not an instance): A. Resolve the classification mode from `(kind, schema_version)` once, instead of a kind×version match accreting fall-through guards in `classify_table`. New `ClassificationMode { Strict, Loose, Confirmed }` + an exhaustive `SidecarKind::classification_mode` — adding a writer kind or version floor is now one arm in the resolver (the compiler forces it), not a guard threaded through the classifier. No behavior change; existing classify/decide tests are the guard. B. `confirm_sidecar_phase_b` now errors loudly when a pinned table has no achieved version in the publish `updates`, instead of silently skipping it (which left the pin unconfirmed → `IncompletePhaseB` → a silent rollback of a COMPLETE merge). Guards the implicit `pins ⊆ updates` invariant against a future divergence between the two filters (invariants 9/13). + a unit test. C. Recovery roll-forward publishes the version classification OBSERVED (`state.lance_head`), not a fresh HEAD re-read at publish time. For a Confirmed pin classify already validated `lance_head == confirmed_version`, so this publishes the recorded WAL intent by construction and closes the classify→publish re-derivation/TOCTOU for every writer (invariant 15). `push_table_update_at_head` → `push_table_update(target_version: Option<u64>)`: roll-forward pins the classified version; roll-back keeps `None` (publishes the restore commit it just made). In-scope behavior is preserved, so the existing roll-forward integration tests are the guard; the drift-hardening is correct-by-construction (deterministic mid-sweep drift injection isn't feasible — a sync failpoint can't do an async Lance write). |
||
|
|
612741b387
|
docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)
Content build-out on top of the Phase 1 topic move. No behavior changes. Splits (existing content relocated, cross-linked): - queries/index.md → mutations/index.md (insert/update/delete + the inserts-vs-deletes rule) and search/index.md (the multi-modal search functions + a hybrid-ranking overview tying nearest/bm25/rrf together). queries/index.md now covers the read shape and points at both. - branching/index.md → branching/time-travel.md (snapshots/time travel) and branching/merge.md (three-way merge + the 7 conflict kinds, verified against error.rs MergeConflictKind). New pages (written from the code, user-facing): - quickstart.md — init → load → query → branch, with verified CLI flags. - concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing. Expanded operations/audit.md from a 7-line struct dump into a real actor-tracking page (server token-resolved vs CLI --as chain; reading the trail; the omnigraph:recovery reserved actor). Index wiring: docs/user/index.md and AGENTS.md's topic table link every new page; also normalized AGENTS.md's docs/user link display text to match the Phase 1 retargeted paths. Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs). Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in queries/branching), guides/, and a possible reference/config.md split (the config schema is already coherent in cli/reference.md). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |