omnigraph/crates
Ragnor Comerford 3135ff5d19
MR-793 phases 1-6: TableStorage trait + staged-write surface for engine writers
Hoists Lance's stage+commit two-phase write pattern from "discipline at
each writer" to a sealed trait surface (`TableStorage`). New engine code
that needs to advance Lance HEAD MUST go through `stage_*` + `commit_staged`;
the trait's opaque `SnapshotHandle` / `StagedHandle` types keep
`lance::Dataset` and `lance::Transaction` out of trait signatures.

Phases landed (see .context/mr-793-design.md for the full plan):
* 1a: `crates/omnigraph/src/storage_layer.rs` — `TableStorage` trait,
  sealed (only in-tree types can impl), single impl on `TableStore`
  delegating to existing inherent methods; `Omnigraph::storage()`
  accessor returns `&dyn TableStorage`.
* 2: three new staged primitives — `stage_overwrite`,
  `stage_create_btree_index`, `stage_create_inverted_index` —
  implementing the simple branch of Lance's `CreateIndexBuilder::execute`
  (scalar indices only; vector indices stay inline because
  `build_index_metadata_from_segments` is `pub(crate)` in lance-4.0.0).
  Six new tests in `tests/staged_writes.rs` pin both the new primitives
  and the inline residuals (`delete_where`, `create_vector_index`).
* 3: `tests/forbidden_apis.rs` — defense-in-depth integration test
  walks engine source, fails on direct lance::* inline-commit API use
  outside `table_store.rs` / `db/manifest/`. Skips comment lines and
  honors `// forbidden-api-allow:` sentinels.
* 4: `ensure_indices` migration — scalar index builds now route through
  `stage_create_*_index` + `commit_staged` instead of
  `create_*_index(&mut Dataset)`. Vector indices stay inline (residual,
  named honestly at the call site).
* 5: `branch_merge::publish_rewritten_merge_table` migration — the
  merge_insert phase now uses `stage_merge_insert` + `commit_staged`;
  delete phase stays inline (Lance #6658 residual, named honestly).
* 6: `schema_apply` rewritten_tables migration — non-empty rewrites
  use `stage_overwrite` + `commit_staged`; empty-batch rewrites stay
  inline because `InsertBuilder::execute_uncommitted` rejects empty
  data. The narrow inline window is bounded by `__schema_apply_lock__`.

Verified-green test surface:
* `cargo test -p omnigraph-engine` — 68 lib + ~120 integration tests
  (incl. 6 new staged_writes tests + the new forbidden_apis test).
* `cargo test -p omnigraph-engine --features failpoints --test failpoints`
  — 5 tests, all green.
* `cargo test --workspace` — green.

Deferred to follow-up sessions (see design doc §17 split):
* Phase 1b — convert remaining engine call sites to `&dyn TableStorage`
  (mostly READS that don't touch the staged-write invariant).
* Phase 7 — recovery-on-open reconciler (closes Phase B → Phase C
  residual across process restarts; new subsystem).
* Phase 8 — index-coverage reconciler (full §VII.35 compliance —
  removes synchronous index work from the publish path).
* Phase 9 — demote unused `TableStore` inherent methods to `pub(crate)`
  (depends on Phase 1b).

Lance upstream blockers documented:
* lance-format/lance#6658 — two-phase delete API (open, no PRs).
* Companion: `build_index_metadata_from_segments` should be `pub` so
  vector-index builds can be staged outside the lance crate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:03:15 +02:00
..
omnigraph MR-793 phases 1-6: TableStorage trait + staged-write surface for engine writers 2026-05-02 11:03:15 +02:00
omnigraph-cli chore: scrub Linear ticket numbers and review-bot mentions from code comments 2026-05-01 22:45:38 +02:00
omnigraph-compiler Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46) 2026-04-25 14:22:14 +03:00
omnigraph-server chore: scrub Linear ticket numbers and review-bot mentions from code comments 2026-05-01 22:45:38 +02:00