omnigraph/docs
Ragnor Comerford 3135ff5d19
MR-793 phases 1-6: TableStorage trait + staged-write surface for engine writers
Hoists Lance's stage+commit two-phase write pattern from "discipline at
each writer" to a sealed trait surface (`TableStorage`). New engine code
that needs to advance Lance HEAD MUST go through `stage_*` + `commit_staged`;
the trait's opaque `SnapshotHandle` / `StagedHandle` types keep
`lance::Dataset` and `lance::Transaction` out of trait signatures.

Phases landed (see .context/mr-793-design.md for the full plan):
* 1a: `crates/omnigraph/src/storage_layer.rs` — `TableStorage` trait,
  sealed (only in-tree types can impl), single impl on `TableStore`
  delegating to existing inherent methods; `Omnigraph::storage()`
  accessor returns `&dyn TableStorage`.
* 2: three new staged primitives — `stage_overwrite`,
  `stage_create_btree_index`, `stage_create_inverted_index` —
  implementing the simple branch of Lance's `CreateIndexBuilder::execute`
  (scalar indices only; vector indices stay inline because
  `build_index_metadata_from_segments` is `pub(crate)` in lance-4.0.0).
  Six new tests in `tests/staged_writes.rs` pin both the new primitives
  and the inline residuals (`delete_where`, `create_vector_index`).
* 3: `tests/forbidden_apis.rs` — defense-in-depth integration test
  walks engine source, fails on direct lance::* inline-commit API use
  outside `table_store.rs` / `db/manifest/`. Skips comment lines and
  honors `// forbidden-api-allow:` sentinels.
* 4: `ensure_indices` migration — scalar index builds now route through
  `stage_create_*_index` + `commit_staged` instead of
  `create_*_index(&mut Dataset)`. Vector indices stay inline (residual,
  named honestly at the call site).
* 5: `branch_merge::publish_rewritten_merge_table` migration — the
  merge_insert phase now uses `stage_merge_insert` + `commit_staged`;
  delete phase stays inline (Lance #6658 residual, named honestly).
* 6: `schema_apply` rewritten_tables migration — non-empty rewrites
  use `stage_overwrite` + `commit_staged`; empty-batch rewrites stay
  inline because `InsertBuilder::execute_uncommitted` rejects empty
  data. The narrow inline window is bounded by `__schema_apply_lock__`.

Verified-green test surface:
* `cargo test -p omnigraph-engine` — 68 lib + ~120 integration tests
  (incl. 6 new staged_writes tests + the new forbidden_apis test).
* `cargo test -p omnigraph-engine --features failpoints --test failpoints`
  — 5 tests, all green.
* `cargo test --workspace` — green.

Deferred to follow-up sessions (see design doc §17 split):
* Phase 1b — convert remaining engine call sites to `&dyn TableStorage`
  (mostly READS that don't touch the staged-write invariant).
* Phase 7 — recovery-on-open reconciler (closes Phase B → Phase C
  residual across process restarts; new subsystem).
* Phase 8 — index-coverage reconciler (full §VII.35 compliance —
  removes synchronous index work from the publish path).
* Phase 9 — demote unused `TableStore` inherent methods to `pub(crate)`
  (depends on Phase 1b).

Lance upstream blockers documented:
* lance-format/lance#6658 — two-phase delete API (open, no PRs).
* Companion: `build_index_metadata_from_segments` should be `pub` so
  vector-index builds can be staged outside the lance crate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:03:15 +02:00
..
releases MR-794 step 2: address PR #68 review — merge semantics, cardinality, residual 2026-05-01 13:47:55 +02:00
architecture.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
audit.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
branches-commits.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
changes.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
ci.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
cli-reference.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
cli.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
constants.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
deployment.md Document AWS build variant and bearer-token sources 2026-04-18 04:04:45 +03:00
embeddings.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
errors.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
execution.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
indexes.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
install.md Remove stale Homebrew source-build note 2026-04-11 14:12:49 +03:00
invariants.md MR-794 step 2: address PR #68 review — merge semantics, cardinality, residual 2026-05-01 13:47:55 +02:00
lance.md Add docs/lance.md — task-organized index of Lance upstream docs 2026-04-28 23:48:28 +02:00
maintenance.md Add internal-schema versioning + auto-migration for __manifest 2026-04-29 11:44:14 +00:00
merge.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
policy.md Refactor AGENTS.md from encyclopedia to map; move spec into docs/ 2026-04-28 23:31:08 +02:00
query-language.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
runs.md MR-793 phases 1-6: TableStorage trait + staged-write surface for engine writers 2026-05-02 11:03:15 +02:00
schema-language.md Address reviewer feedback (Cursor + cubic) on PR #60 2026-04-29 00:09:06 +02:00
server.md MR-771: demote Run to direct-publish via expected_table_versions CAS 2026-04-30 08:52:50 +02:00
storage.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00
testing.md MR-794 step 2: docs — runs/invariants/architecture/execution + cleanup 2026-05-01 10:43:19 +02:00