feat(engine): Stage the delete path; retire the inline-delete residual (#308)

* test(engine): pin zero-row cascade delete must not drift an edge table (red)

A delete <Node> cascades a delete_where into every incident edge type. The
inline delete_where (Dataset::delete) advances Lance HEAD even when zero edges
match, but the cascade records the new version only if deleted_rows > 0 — so a
node with no incident edges leaves edge:Knows HEAD>manifest drift, which trips
the next strict write's ExpectedVersionMismatch and repair refuses it.

Red today: edge:Knows manifest=v5, Lance HEAD=v6. Goes green when delete moves
to the staged two-phase path (iss-950, Lance 7.0 DeleteBuilder::execute_uncommitted),
where a 0-row delete commits no Lance version and the deleted_rows>0 gate becomes
correct by construction.

* fix(engine): a zero-row delete must not advance Lance HEAD

Lance's Dataset::delete commits a new version even when the predicate matches
nothing (build_transaction always emits Operation::Delete), so a node delete
that cascades a delete_where into an incident edge type with no matching edges
advanced that edge table's Lance HEAD while the cascade skipped record_inline
(gated on deleted_rows > 0) — leaving HEAD>manifest drift that wedged the next
strict write and that repair refused as suspicious/unverifiable.

Use Lance 7.0's two-phase DeleteBuilder::execute_uncommitted to read
num_deleted_rows before committing: a no-match delete now advances nothing (no
version, no drift) and the existing deleted_rows>0 gate is correct by
construction. Non-zero deletes commit the staged transaction with
skip_auto_cleanup + affected_rows (parity with the prior inline path).

First step of the staged-delete migration (iss-950); turns the
node_delete_with_no_incident_edges_leaves_no_edge_table_drift regression green.

* feat(engine): stage_delete two-phase primitive (MR-A step 0)

Add TableStore::stage_delete (Lance 7.0 DeleteBuilder::execute_uncommitted),
the two-phase analogue of stage_merge_insert: writes deletion files without
advancing Lance HEAD, returns Option<StagedWrite> (None on 0 rows = true no-op),
carrying the deletion-vector updated_fragments as new_fragments and the
superseded originals as removed_fragment_ids so combine_committed_with_staged
makes the deletion visible to in-query reads.

No affected_rows is threaded: like stage_merge_insert's Operation::Update commit,
the staged delete relies on OmniGraph's per-table write queue + manifest CAS, not
Lance's per-dataset conflict resolver (commit_staged is a single attempt).

Flip the two residual guards to the staged path: staged_writes.rs now asserts
stage_delete does NOT advance HEAD and that a staged delete is read-your-writes
visible (the deletion-vector RYW proof D2 retirement depends on); the
lance_surface_guards delete guard pins execute_uncommitted's UncommittedDelete.

No behavior change yet (callers still use delete_where); Step 1 wires them.

* feat(engine): TableStorage::stage_delete + migrate merge delete path (MR-A step 1a)

Add stage_delete/Option<StagedHandle> to the TableStorage trait (delegates to
TableStore::stage_delete). Migrate the two branch_merge delete sites
(three-way RewriteMerged + adopt delta) from the inline delete_where residual to
stage_delete + commit_staged — identical in shape to the stage_merge_insert +
commit_staged pair above each. HEAD still advances within the merge sequence
(via commit_staged), under the unchanged SidecarKind::BranchMerge Phase-B
confirmation; the _pre_delete/_pre_index failpoints fire by position, unchanged.

merge_truth_table, branching, composite_flow green.

* feat(engine): migrate all delete sites to staged path, retire inline delete (MR-A step 1b/1c)

Routes every delete through the staged write path so delete never advances
Lance HEAD inline — the last inline-commit residual on the mutation path is
gone. `MutationStaging` now accumulates delete predicates (`record_delete`)
alongside pending write batches; at end-of-query `stage_all` combines a
table's predicates into one `(p1) OR (p2) …` `stage_delete` (a deletion-vector
transaction, no HEAD advance) and `commit_all` commits it through the same
`commit_staged` path as inserts/updates. Deletes are now ordinary staged
entries: one sidecar pin at `expected + 1`, no inline special-casing.

Migrated callers (all 5): the 3 mutation.rs sites (delete-node, cascade,
delete-edge) and the 2 merge.rs sites (already on stage_delete in step 1a).
`affected_edges`/`affected` move from post-inline-commit `deleted_rows` to a
committed `count_rows` at record time — exact under D₂, bounded by the cascade
working set. A predicate matching zero rows stages nothing (the staged
equivalent of the old "skip record_inline on 0 deleted rows"), so the zero-row
edge-table drift class stays closed by construction.

Retired scaffolding now that no caller remains:
- `MutationStaging.inline_committed` + `record_inline` → `delete_predicates` +
  `record_delete`; `StagedMutation.inline_committed`/`paths` fields and all the
  `commit_all` inline handling (queue keys, sidecar pins with the
  `record_inline` table_version special-case, the inline recheck loop).
- `open_table_for_mutation`'s post-inline-commit reopen branch (deletes no
  longer advance HEAD mid-query, so a second touch reopens at the pinned
  version like any write).
- `InlineCommitResidual::delete_where` + its `TableStore` impl, the orphaned
  `TableStore::delete_where`, and `DeleteState`. `InlineCommitResidual` now
  carries only `create_vector_index` (Lance #6666 still open).

D₂ stays for now: staged-delete read-your-writes doesn't yet compose into the
pending accumulator (insert-then-delete on one table), so mixed
insert/update/delete in one query is still rejected at parse time. Retiring D₂
is step 2. Doc comments updated to match across exec/, storage_layer, db/.

Tests (all green): writes, consistency, validators, end_to_end, composite_flow,
merge_truth_table, maintenance, recovery, staged_writes, forbidden_apis,
lance_surface_guards, changes, point_in_time (286), plus failpoints (63).

* docs: delete is a staged write, not an inline-commit residual (MR-A step 1)

Update the docs that described `delete` as the inline-commit residual now that
MR-A routes it through `stage_delete`. Always-loaded surfaces (AGENTS.md rule
4 / capability matrix, invariants.md Invariant 4 / truth matrix / known gaps)
plus the dev write-path docs (writes.md, execution.md incl. its mutation
sequence diagram, architecture.md) now state: deletes accumulate as predicates
and stage like inserts/updates, no inline HEAD advance; `InlineCommitResidual`
carries only `create_vector_index` (Lance #6666). The parse-time D₂ rule is
documented as retained — not because delete inline-commits, but because
staged-delete read-your-writes is not yet wired into the pending accumulator
(MR-A step 2). lance.md's 7.0 audit note marked MR-A as landed.

* docs: D₂ is a deliberate boundary, not temporary scaffolding (MR-A close-out)

After MR-A staged the delete path, D₂ (a mutation query is insert/update-only
OR delete-only) was left framed as temporary — "until Lance ships two-phase
delete" / "retire in step 2". Lance shipped that and we used it for the
inline-commit fix; D₂'s original justification is gone. It now stands for a
different, permanent reason: keeping a query to one kind keeps its
read-your-writes unambiguous and each table to one version per query. Retiring
it would buy single-commit mixed atomicity (cheap workaround: split, or a
branch) at the cost of an in-query delete view, pending pruning, edge
id-resolution, and two-commit-per-table ordering in the hot mutation path —
complexity not worth earning. Decision: keep D₂ as a deliberate boundary.

Reframes the now-stale wording everywhere, no logic change:
- The D₂ parse-time error message no longer promises "this restriction lifts
  when Lance exposes a two-phase delete API"; it states the boundary and points
  to a branch+merge for one atomic commit.
- `enforce_no_mixed_destructive_constructive` doc, AGENTS.md, invariants.md
  (Invariant 4 / truth matrix / removed from the known-gaps), writes.md,
  architecture.md, lance.md, and the user mutations doc (which wrongly said
  deletes "commit through a different path" — both stage now).
- Swept remaining stale `delete_where` mentions left from the Step-1 migration:
  the merge.rs "swap when upstream ships" comments (already swapped), the
  forbidden_apis / table_ops residual notes, the staged_writes vector-index
  guard doc (was "same as stage_delete's absence" — stage_delete now exists),
  and test comments/assert messages in recovery/maintenance/writes/failpoints.
  Genuinely-historical records (dated Lance audit, rfc-013, bug-case-fix) left.

Verified: engine builds warning-free; check-agents-md OK; writes/maintenance/
recovery/staged_writes/forbidden_apis all green. Closes MR-A.

* test(engine): overlapping delete predicates must not double-count affected_* (red)

Reproduces a reporting regression from the staged-delete migration flagged in
PR #308 review. Because deletes now stage (instead of inline-committing), two
delete statements in one query both scan the same unchanged committed snapshot;
counting each predicate independently over-reports `affected_*` when they
overlap. The old inline path committed each delete before the next ran, so it
counted distinct.

`delete Person where name = "Alice"` then `delete Person where age > 29` over
the standard fixture (Alice 30, Charlie 35) removes 2 distinct nodes and 3
distinct edges, but the buggy per-statement counting returns 3 nodes / 6 edges.
RED at this commit (asserts left=3, right=2).

* fix(engine): dedup overlapping delete predicates when counting affected_*

Count each delete statement against the committed snapshot MINUS the predicates
a prior delete statement on the same table already recorded:
`(pred) AND NOT ((prior1) OR (prior2) …)`. Summed over statements this is
inclusion-exclusion — `Σ |pₙ \ (p₁ ∪ …)| = |p₁ ∪ p₂ ∪ …|` — exactly the distinct
count the combined `(p1) OR (p2)` staged delete removes. Works for nodes and
edges alike with no edge identity needed; the node ID scan uses the same
exclusion so a later statement also doesn't re-cascade already-deleted nodes.
The ORIGINAL predicate is still what gets recorded (the staged delete removes
the union); only the count uses the exclusion. The common single-delete path is
unchanged (`prior` empty → filter is just the base predicate).

New helper `dedup_delete_filter` + `MutationStaging::recorded_delete_predicates`.
Turns the red regression test green (2 nodes / 3 edges); writes (33),
end_to_end, validators, maintenance, recovery, composite_flow, merge_truth_table,
consistency, changes, and failpoints (63) all stay green.

* test(engine): delete dedup must not drop NULL-column rows (red)

Follow-up to the overlapping-delete fix flagged in PR #308 review (Greptile P1):
the `(base) AND NOT (prior)` exclusion breaks under SQL three-valued logic. If a
prior delete predicate references a NULLable column, a later statement's
matching row whose column is NULL makes `prior` evaluate to UNKNOWN, `NOT
UNKNOWN` is UNKNOWN, and the row is filtered out of the scan — even though the
prior delete never matched it. That drops it from `deleted_ids`, skipping its
cascade (orphaned edges) or, if it is the only match, leaving the node
undeleted. A data bug, not just a miscount.

Data: Charlie(age 35), Zoe(age NULL); Knows Zoe→Charlie. `delete Person where
age > 30` then `delete Person where name = "Zoe"`. Under the buggy `NOT`, Zoe's
scan `(name='Zoe') AND NOT (age>30)` is UNKNOWN → Zoe survives. RED at this
commit (Person count left=1, right=0).

* fix(engine): NULL-safe delete dedup — exclude only definitely-matched prior rows

Change `dedup_delete_filter` from `(base) AND NOT (prior)` to
`(base) AND ((prior) IS NOT TRUE)`. `IS NOT TRUE` keeps both FALSE and UNKNOWN
rows, so a prior predicate that evaluates to SQL UNKNOWN (a NULL in a referenced
column) no longer drops a row this statement legitimately matches — only rows a
prior predicate matched as definitely TRUE are excluded from the count/scan. The
distinct-count semantics are unchanged for non-NULL data.

Turns the red NULL-dedup test green (Zoe deleted, her edge cascaded), and the
overlapping-dedup + writes/end_to_end/validators/maintenance/recovery/
composite_flow/consistency suites stay green.

* docs(engine): note dedup_delete_filter's load-bearing dependency on D₂

Self-review follow-up: the overlapping-delete dedup assumes the committed
snapshot is invariant across a query's statements, which holds only because D₂
forbids mixing writes with deletes (so a delete-touched table has no pending
writes). Make that dependency explicit at the function so a future D₂ relaxation
is forced to revisit the dedup. Comment-only.

* Preserve staged write commit metadata
This commit is contained in:
Ragnor Comerford 2026-06-27 16:48:41 +02:00 committed by GitHub
parent a7d4cba53d
commit 0dcdcf5a9d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
25 changed files with 996 additions and 535 deletions

View file

@ -72,10 +72,14 @@ converge the physical state.
table versions.
4. **Mutations publish at one boundary.** A `mutate_as` or `load` operation
accumulates constructive writes, commits each touched table at the end, then
publishes one manifest update. Do not commit per statement. Delete-only
queries are the documented inline residual; the parse-time D2 rule prevents
mixing deletes with insert/update until Lance exposes two-phase delete.
accumulates writes — inserts/updates as pending batches, deletes as
predicates — stages each touched table at the end (deletes via
`stage_delete`, no inline HEAD advance), then publishes one manifest update.
Do not commit per statement. The parse-time D2 rule is a deliberate
boundary: one mutation query is constructive (insert/update) or destructive
(delete), not both — so read-your-writes within a query stays unambiguous
and each table commits at most one version. Compose mixed operations via
separate mutations, or a branch for single-commit atomicity.
Read [writes.md](writes.md) and [execution.md](execution.md).
5. **Recovery is part of the commit protocol.** Writers that can advance Lance
@ -150,11 +154,11 @@ converge the physical state.
|---|---|---|
| Multi-table commit | Manifest CAS plus recovery sidecars; not a single Lance primitive | [writes.md](writes.md), [architecture.md](architecture.md) |
| Constructive mutations | In-memory `MutationStaging`, one end-of-query table commit per touched table, then one manifest publish | [writes.md](writes.md), [execution.md](execution.md) |
| Deletes | Inline-commit residual; delete-only queries allowed, mixed insert/update/delete rejected by D2 | [query-language.md](../user/queries/index.md), [writes.md](writes.md) |
| Deletes | Staged like inserts/updates (`stage_delete` via Lance 7.0 `DeleteBuilder::execute_uncommitted`, MR-A) — no inline HEAD advance; mixed insert/update/delete in one query rejected by D2 as a deliberate boundary (constructive XOR destructive per query; compose via separate mutations or a branch) | [query-language.md](../user/queries/index.md), [writes.md](writes.md) |
| Branch delete | Manifest is the single authority, flipped atomically first; per-table forks + commit-graph branch are derived state, reclaimed best-effort (`force_delete_branch`) with the `cleanup` reconciler as the guaranteed backstop. Reusing a name whose reclaim failed before `cleanup` surfaces an actionable error | [branches-commits.md](../user/branching/index.md), [maintenance.md](../user/operations/maintenance.md) |
| Schema validation | Type checks, required fields, defaults, edge endpoint checks, and edge cardinality are enforced on write paths | [schema-language.md](../user/schema/index.md), [execution.md](execution.md) |
| Unique constraints | Intra-batch and write-path checks exist; intake and branch-merge derive the composite key through one shared function (`loader::composite_unique_key`, a separator-free `Vec<String>` tuple) and fail loudly on an un-keyable column type rather than silently exempting it; full cross-version uniqueness against already-committed rows is still a gap | [schema-language.md](../user/schema/index.md) |
| Storage trait | `TableStorage` (via `db.storage()`) is staged-only; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so §1 holds by construction; capability/stat surfaces are roadmap | [writes.md](writes.md), [architecture.md](architecture.md) |
| Storage trait | `TableStorage` (via `db.storage()`) is staged-only; the sole inline-commit residual (`create_vector_index`) is split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so §1 holds by construction; capability/stat surfaces are roadmap | [writes.md](writes.md), [architecture.md](architecture.md) |
| Index lifecycle | `@index`/`@key` declares *intent*; the physical index is derived state and never fails a logical op. `schema apply` builds no indexes (records intent only; index-only changes touch no table data). `load`/`mutate` build inline through one chokepoint (`build_indices_on_dataset_for_catalog`, type-dispatched by `node_prop_index_kind`: enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector) that fault-isolates an untrainable Vector column into a *pending* index instead of aborting. `optimize`/`ensure_indices` is the reconciler: it creates declared-but-missing indexes and folds appended/rewritten fragments into existing ones (`optimize_indices`), reporting still-pending columns. Explicit maintenance call, not yet a background loop | [indexes.md](../user/search/indexes.md), [maintenance.md](../user/operations/maintenance.md) |
| Traversal IDs | Runtime still builds `TypeIndex`; Lance stable row-id based graph IDs are roadmap | [architecture.md](architecture.md), [query-language.md](../user/queries/index.md) |
| Auth | Bearer token hashing and server-side actor resolution are implemented at the HTTP boundary | [server.md](../user/operations/server.md), [policy.md](../user/operations/policy.md) |
@ -181,19 +185,19 @@ them explicit.
`InlineCommitResidual` trait reached via `db.storage_inline_residual()`, so a
new writer cannot couple a write with a HEAD advance through the default
surface. The dead legacy methods (`append_batch` on the trait,
`merge_insert_batch{,es}`, `create_{btree,inverted}_index`) were removed. The
remaining residuals are `delete_where` and `create_vector_index`. The Lance
6.0.1 → 7.0.0 bump landed, so the staged two-phase delete API
(`DeleteBuilder::execute_uncommitted`, Lance #6658) is now available and MR-A
is unblocked — but the migration itself is still pending, so `delete_where`
stays inline for now. `create_vector_index` remains gated on Lance #6666
(still open). See [lance.md](lance.md) and [writes.md](writes.md). New write
paths should use the staged shape unless a documented Lance blocker applies.
- **Deletes and vector indexes:** `delete_where` and vector index creation still
advance Lance HEAD inline. The public delete two-phase API now exists (Lance
#6658 shipped in 7.0.0), so the delete residual is unblocked pending the MR-A
migration; vector index creation is still blocked (Lance #6666 open). Keep D2
and recovery coverage in place until those residuals are removed.
`merge_insert_batch{,es}`, `create_{btree,inverted}_index`) were removed. MR-A
migrated `delete` onto the staged surface (`TableStorage::stage_delete` via
Lance 7.0 `DeleteBuilder::execute_uncommitted`, #6658) and retired
`InlineCommitResidual::delete_where`, so the sole remaining residual is
`create_vector_index`, gated on Lance #6666 (still open). See [lance.md](lance.md)
and [writes.md](writes.md). New write paths should use the staged shape unless a
documented Lance blocker applies.
- **Vector indexes:** `create_vector_index` still advances Lance HEAD inline —
segment-commit needs `build_index_metadata_from_segments`, `pub(crate)` in Lance
7.0.0 (#6666 open). Keep recovery coverage in place until that residual is
removed. (`delete` is no longer a residual — staged in MR-A. D2 is not a gap:
it is a deliberate constructive-XOR-destructive boundary, documented in
Invariant 4 and the truth matrix.)
- **Blob-column compaction:** Lance `compact_files` mis-decodes blob-v2 columns
under its forced `BlobHandling::AllBinary` read ("more fields in the schema
than provided column indices"), so `optimize` skips any table with a `Blob`