mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-18 02:24:27 +02:00
* test(engine): reproduce empty-table Vector @index aborting schema apply
A Vector (IVF) index trains k-means centroids over the column, so Lance
cannot build it on 0 vectors ("Creating empty vector indices with
train=False is not yet implemented"). schema apply reconciles a table's
whole index set whenever any @index on it changes, so adding an unrelated
scalar @index materializes the dormant empty vector index and aborts the
entire migration (all-or-nothing).
This regression test inits a 0-row Doc with a Vector @index, adds a scalar
@index, and asserts the apply succeeds (then loads one embedded row and
asserts the deferred index materializes). It fails today at the apply step
with the vector-index abort; the fix lands in the next commit.
Refs dev-graph iss-empty-vector-index-schema-apply, iss-848.
* fix(engine): defer Vector @index on an empty table instead of aborting schema apply
build_indices_on_dataset_for_catalog materialized a declared Vector @index
unconditionally. On a 0-row table Lance cannot train the IVF index
("Creating empty vector indices with train=False is not yet implemented"),
so any later migration that touches the table (e.g. adding an unrelated
scalar @index, which reconciles the table's whole index set) aborted the
entire migration on the dormant vector index — all-or-nothing.
Guard the vector arm with a row-count check, matching the guard
ensure_indices_for_branch and the branch-merge rebuild already use: an
untrainable column becomes a pending index that a later ensure_indices /
optimize materializes once the table has rows. Reads stay correct meanwhile
(vector search degrades to a brute-force scan).
Stop-gap: the residual rows-present-but-vectors-null window and the full
decoupling (intent recorded at apply, an idempotent coverage reconciler)
are dev-graph iss-848. Turns the green half of the regression test added in
the previous commit.
Refs dev-graph iss-empty-vector-index-schema-apply, iss-848, iss-687.
* docs(invariants): record the logical-contract-over-physical-state principle
The bug class behind the empty-table vector-index abort (and the schema-apply
vs optimize version drift) is one shape: a physical operation allowed to fail
a logical one. Several hard invariants (2, 5, 7, 13) and deny-list items are
already instances of this, but the unifying rule was never written down.
Add it to docs/dev/invariants.md as a "Governing principle" section above the
hard invariants, naming which invariants and deny-list items instantiate it
and the smell to watch for (a logical operation gated on a physical fact).
Add a one-line always-on rule (7) in AGENTS.md so it stays in working memory,
with the qualifier that genuine logical conflicts still fail loudly — the
licence to lag covers physical convergence, not correctness.
Audience-neutral: no private ticket refs. check-agents-md.sh passes.
* test(engine): index build must tolerate rows with null vectors (load-before-embed)
Loading rows whose vector column is null into a `Vector @index` table fails
today: build_indices (reached via the loader's prepare_updates_for_commit)
calls create_vector_index, and Lance's IVF KMeans errors "cannot train 1
centroids with 0 vectors". The same abort hits ensure_indices/optimize/schema
apply/merge, since they all funnel through build_indices_on_dataset_for_catalog.
This test loads two null-embedding rows and calls ensure_indices; it must not
abort (the untrainable vector column is deferred, sibling indexes still build).
Fails today at the load step; fixed in the next commit.
Refs dev-graph iss-848, iss-empty-vector-index-schema-apply.
* fix(engine): defer unbuildable index columns instead of aborting the write path
build_indices_on_dataset_for_catalog is the chokepoint every write path funnels
through (load/mutate via prepare_updates_for_commit, schema apply, ensure_indices,
optimize, branch merge). Its vector arm called create_vector_index
unconditionally, so a column with no trainable vectors yet — an empty table, or
rows loaded before `embed` populates them — aborted the whole operation with
Lance's IVF KMeans error.
Fault-isolate the vector build: on failure, record the column as a PendingIndex
(table, column, reason), log it, and continue building the sibling indexes; a
later ensure_indices/optimize materializes it once the column is trainable, and
reads use brute-force meanwhile. Manifest/CAS/IO errors at the publish boundary
still propagate. Isolating at the single chokepoint realizes the governing
principle (physical index state never fails a logical operation) for every write
path, and supersedes the earlier symptomatic count_rows==0 stop-gap (removed) —
closing the residual rows-present-but-vectors-null window it left open.
Surfacing pending index status rather than failing is the database norm
(Postgres indisvalid, LanceDB list_indices). ensure_indices and the build_indices
wrappers now return Vec<PendingIndex>; optimize surfaces it in a later commit.
Refs dev-graph iss-848, iss-951 (vector index stays inline-commit until lance#6666).
* test(engine): index-only schema apply must not touch table data
Adding an @index to an existing column should be a pure metadata change once
index materialization moves to the reconciler (iss-848): the apply records the
intent in the catalog/IR but builds nothing inline, so the table's manifest
version is unchanged. Today the indexed_tables block builds the index inline
and bumps the version (4 -> 5). Fixed in the next commit.
Refs dev-graph iss-848.
* fix(engine): schema apply records index intent only; index-only apply is metadata
Schema apply no longer builds indexes inline. The four build_indices calls
(added/renamed/rewritten/index-only tables) are removed; the @index/@key intent
is already persisted in the catalog/IR the apply writes, and the physical index
is materialized off the critical path by ensure_indices/optimize (iss-848).
Concretely:
- AddConstraint (an @index addition — every other added constraint plans as
UnsupportedChange) becomes a pure metadata step alongside the metadata-only
steps: it touches no table data, so the table version is unchanged.
- added/renamed/rewritten tables still write their data; only the trailing
index build is gone. The rewritten table's coverage is restored later by
optimize_indices.
- recovery_pins drops index-only tables (they no longer advance Lance HEAD) and
keeps rewritten tables; their post_commit_pin = expected+1 is now exact (one
rewrite commit), strengthening recovery classification.
- the now-orphaned Omnigraph::build_indices_on_dataset_for_catalog wrapper is
removed.
A migration can no longer abort on an index build, for any index type at any
cardinality. Turns the green half of index_only_constraint_apply_touches_no_table_data.
Refs dev-graph iss-848.
* test(engine): optimize must converge a declared-but-unbuilt index
After iss-848, adding an @index post-data is a metadata-only apply that defers
the physical build, so the column is declared-indexed but unbuilt (reads scan).
`optimize` — the operator's cron reconciler — must materialize it. Today optimize
only maintains coverage of EXISTING indexes (optimize_indices) and never creates
missing ones, so the rank BTREE stays Degraded after optimize. Fixed next commit.
Refs dev-graph iss-848.
* fix(engine): optimize materializes declared-but-unbuilt indexes (the reconciler)
`omnigraph optimize` is the operator's cron reconciler. It already compacts and
folds new fragments into EXISTING indexes (optimize_indices); now it also builds
declared-but-missing indexes, so the indexes schema apply / load defer (iss-848)
converge on the next optimize.
Done inside optimize_one_table (not by composing the all-tables ensure_indices,
which is drift-blind and would re-publish the uncovered HEAD>manifest drift that
optimize deliberately skips): after the per-table drift/blob skips and under the
queue + Optimize sidecar already held, a needs_index_create gate (reusing
needs_index_work_node/edge — "declared index missing AND row_count > 0", so empty
tables stay no-ops) admits index-only work, and Phase B builds the missing index
over the just-compacted layout via the build chokepoint. An untrainable vector
column fault-isolates into the new TableOptimizeStats.pending_indexes (the
list_indices/indisvalid analog operators read), not a failure. committed now
reflects index commits, so the existing post-publish cache invalidation covers
them. LanceDB's optimize only maintains existing indexes; creating
declared-but-missing ones is the L2 behavior omnigraph's declarative @index needs.
Turns the green half of optimize_materializes_index_declared_but_unbuilt.
Refs dev-graph iss-848.
* docs: index materialization is deferred to the reconciler (iss-848)
Update the index-lifecycle docs to reflect the new contract: @index/@key
declares intent and the physical index is derived state that never fails a
logical operation. Schema apply builds nothing (records intent only);
load/mutate build inline through one chokepoint that defers an untrainable
Vector column as pending; optimize/ensure_indices is the reconciler that
creates declared-but-missing indexes and maintains coverage, reporting
still-pending columns.
Touches: dev/invariants.md (truth-matrix Index-lifecycle row), AGENTS.md
(capability matrix), user/search/indexes.md (L2 orchestration), user/operations/
maintenance.md (optimize reconciler bullet), dev/testing.md (new tests).
* test(server): schema_apply_route_can_add_index reflects deferred index build
iss-848 made schema apply record @index intent without building the physical
index inline. The route test asserted the index count increased after apply;
on an empty graph it now stays unchanged (the build is deferred to
ensure_indices/optimize). Assert the new contract: apply succeeds and the
physical index count is unchanged.
* fix(engine): precheck vector trainability — don't pin or swallow (PR review)
Two issues Cursor Bugbot caught in the chokepoint fault-isolation:
1. (HIGH) Pending vector pins roll back siblings. needs_index_work_node counted
a missing vector index as work whenever the table had rows, so a column with
no trainable vectors got pinned in the EnsureIndices recovery sidecar — but
the build deferred it (zero commit). On a crash before manifest publish the
classifier sees NoMovement and the all-or-nothing decision (recovery.rs
decide()) rolls back the WHOLE sidecar, undoing a sibling table's committed
index work.
2. (MED) Vector build swallowed fatal errors. The match arm converted every
create_vector_index error into a deferred PendingIndex, hiding genuine
I/O/manifest/Lance failures as "pending".
Fix both with one trainability precheck (vector_column_trainable: >=1 non-null
vector, the ivf_flat(1) minimum) used identically by needs_index_work_node and
the build arm: an untrainable column is never counted as work (so never pinned —
no zero-commit pin) and never attempted (so it can't fail); only a trainable
column is built, and then any error PROPAGATES (stays fatal). The deferred
column is still recorded as a PendingIndex with a clear reason.
Refs dev-graph iss-848.
* feat(cli): surface pending index column + reason in optimize output (PR review)
Codex (P2): pending_indexes was documented as visible in `optimize --json` but
the CLI projection never emitted it — operators would lose the only signal that
optimize has deferred index work. Greptile (P2): the stat dropped the reason, so
operators saw which column was stuck, not why.
Carry the reason: TableOptimizeStats.pending_indexes is now Vec<PendingIndex>
(column + reason), and `omnigraph optimize --json` emits {column, reason} per
pending index; human output prints a "↳ index pending on '<col>': <reason>" line.
Refs dev-graph iss-848.
* test: align CLI index-add test with deferred build; cover post-rename reconcile
- schema_apply_json_adds_index_for_existing_property (cli_schema_config.rs): the
CLI analog of the server test — asserted the index count grew after apply;
under iss-848 the apply defers the build, so the count is unchanged on an
empty graph. Assert the deferred contract. (The only full-suite failure.)
- optimize_materializes_index_after_type_rename (maintenance.rs, new): covers
the gap Greptile flagged — a RenameType writes the renamed table with rows but
no indexes (inline build removed in Commit B); assert the rank index is
Degraded post-rename and Indexed after optimize reconciles it.
Refs dev-graph iss-848.
* test(engine): in-source apply tests reflect deferred index materialization
The two db::omnigraph in-source unit tests asserted the old "schema apply builds
/ preserves indexes inline" behavior (the only remaining full-suite failures):
- test_apply_schema_defers_index_then_reconciler_builds_it (was
test_apply_schema_adds_index_for_existing_property): apply records the @index
intent but builds nothing; assert the BTREE on `age` is absent after apply and
present after ensure_indices. (Uses `age`, unindexed in TEST_SCHEMA — `name
@key` is already FTS-indexed at seed.)
- test_apply_schema_rewrite_defers_index_then_reconciler_restores (was
test_apply_schema_rewrite_preserves_existing_indices): an AddProperty rewrite
no longer rebuilds indexes inline; assert ensure_indices restores id BTREE +
name FTS after the rewrite.
Verified by grep that these + the server/CLI tests are the complete set of
"apply builds an index" assertions; all other index-presence tests run after
load/ensure_indices/primitives, which still build.
Refs dev-graph iss-848.
* fix(engine): optimize always reports pending indexes, not only on create-work (PR review)
Cursor Bugbot (MED): pending_indexes was filled only when needs_index_create was
true, but the vector trainability precheck makes needs_index_work_node exclude an
untrainable Vector column. So a table whose sole missing index is untrainable, but
which optimize still compacts or reindexes, returned an empty pending_indexes —
contradicting the documented operator contract for deferred columns.
Run the (idempotent) build chokepoint unconditionally once past the no-op gate,
rather than gating it on needs_index_create. It skips existing indexes, builds
any buildable missing one, and reports an untrainable column as pending whether
the table entered for compaction, reindex, or index creation. needs_index_create
still gates the no-op decision (so an index-only table still enters the path).
Refs dev-graph iss-848.
* test(engine): reframe staged-BTREE-failure failpoint onto the reconciler path
ensure_indices_stage_btree_failure_leaves_existing_tables_writable fired
`ensure_indices.post_stage_pre_commit_btree` and expected `apply_schema` (adding
a type) to fail mid-BTREE-build. iss-848 removed apply's inline index build, so
that apply now succeeds and the test's unwrap_err panicked — it exercised a
removed code path.
Reframe onto where BTREE builds happen now: seed Person, add an `@index` on
`age` (apply records intent, defers the build), then `ensure_indices` builds the
deferred BTREE and the failpoint fires between stage and commit. Person's HEAD
is unchanged (no drift) and its EnsureIndices sidecar pins NoMovement; a write to
a different, unpinned table (Company) is unaffected (mutations/loads heal
roll-forward and proceed, unlike optimize/repair which refuse on a pending
sidecar). Preserves the original coverage (staged-index stage failure leaves
other tables writable, no drift) in the new architecture.
Refs dev-graph iss-848.
* feat(server): converge deferred indexes promptly after schema apply (iss-848)
Schema apply records @index intent but defers the physical build. On a
long-lived server, spawn a detached best-effort ensure_indices after a
successful apply so the indexes converge promptly instead of waiting for the
operator's next optimize. Fire-and-forget: it never blocks or fails the apply
response, and a failure is logged (the index still converges on the next
optimize). Guarded on result.applied. The CLI is one-shot, so it has no
equivalent; its convergence path is the optimize cadence.
handle.engine is already an Arc, so the spawn takes an owned clone. Convergence
itself is covered by the engine ensure_indices/optimize tests; the existing
empty-graph schema-apply route tests confirm the response is unaffected (the
spawn is a read-only no-op on an empty table).
Refs dev-graph iss-848.
* docs(maintenance): list pending_indexes in optimize per-table stats (consistency)
310 lines
20 KiB
Markdown
310 lines
20 KiB
Markdown
# Architectural Invariants
|
|
|
|
**Type:** standing review checklist
|
|
**Status:** living document
|
|
**Audience:** anyone proposing, reviewing, or implementing an OmniGraph change
|
|
|
|
This file is intentionally short. It records the rules that should be in
|
|
working memory for every non-trivial change. Detailed mechanics live in the
|
|
area docs linked below.
|
|
|
|
Use it this way:
|
|
|
|
- Review the change against **Hard Invariants** and the **Deny-list**.
|
|
- If code and docs disagree, either fix the code or add/update a **Known Gap**.
|
|
- Keep implementation ledgers, roadmap detail, and historical MR notes in the
|
|
per-area docs. This file is the filter, not the encyclopedia.
|
|
|
|
## Governing principle: logical contract over physical state
|
|
|
|
The hard invariants below are instances of one rule. Keep it in view whenever
|
|
a change touches the boundary between what the graph *means* and how it is
|
|
physically stored.
|
|
|
|
> **Logical state is the contract. Physical state — index coverage, fragment
|
|
> layout, compaction versions, staged writes — is derived, rebuildable, and may
|
|
> be produced asynchronously. A physical operation must never fail a logical
|
|
> one. Preconditions are checked against logical state; physical reconciliation
|
|
> is idempotent and may lag or retry. Genuine logical conflicts still fail
|
|
> loudly: the licence to lag covers physical convergence, not correctness.**
|
|
|
|
Invariants that instantiate it: **2** (manifest-atomic visibility) and **5**
|
|
(recovery is part of the commit protocol) — a partially-written physical layer
|
|
never changes what a graph commit means; **7** (indexes are derived state) — a
|
|
query is correct under partial index coverage, and expensive index work
|
|
converges from manifest state instead of gating the write path; **13** (failures
|
|
bounded and observable) — the licence to lag is not a licence to drop, so a
|
|
physical step that cannot make progress is surfaced, not swallowed. Deny-list
|
|
items that enforce it: synchronous inline vector/FTS index rebuilds on the
|
|
commit path; state that drifts from Lance or the manifest when it can be
|
|
derived; job queues for manifest-derivable state where a reconciler fits.
|
|
|
|
The failure shape it rules out: a legitimate background operation on the
|
|
physical layer (compaction, an index build, an interrupted staged write) is
|
|
allowed to break a logical operation (a query's correctness, a migration's
|
|
success, a branch's writability). The smell to watch for is a logical operation
|
|
whose precondition is a *physical* fact — a cached file version, an index's
|
|
existence, a fragment count. Make the precondition logical and let a reconciler
|
|
converge the physical state.
|
|
|
|
## Hard Invariants
|
|
|
|
1. **Respect the substrate.** Lance owns columnar storage, per-dataset
|
|
versioning, fragments, branches, compaction, cleanup, and index primitives.
|
|
DataFusion should own relational execution where it fits. Do not add custom
|
|
WALs, transaction managers, buffer pools, page formats, or local clones of
|
|
substrate behavior. Read [lance.md](lance.md) before guessing.
|
|
|
|
2. **Graph visibility is manifest-atomic.** Lance commits are per dataset.
|
|
OmniGraph's graph-level atomicity comes from publishing one manifest update
|
|
for the whole graph, guarded by expected table versions and sidecar recovery.
|
|
No write path may make a subset of touched node/edge tables visible as a
|
|
graph commit.
|
|
|
|
3. **A query reads one snapshot.** Query execution captures a manifest snapshot
|
|
for its lifetime. Do not re-read branch head mid-query to discover newer
|
|
table versions.
|
|
|
|
4. **Mutations publish at one boundary.** A `mutate_as` or `load` operation
|
|
accumulates constructive writes, commits each touched table at the end, then
|
|
publishes one manifest update. Do not commit per statement. Delete-only
|
|
queries are the documented inline residual; the parse-time D2 rule prevents
|
|
mixing deletes with insert/update until Lance exposes two-phase delete.
|
|
Read [writes.md](writes.md) and [execution.md](execution.md).
|
|
|
|
5. **Recovery is part of the commit protocol.** Writers that can advance Lance
|
|
HEAD before manifest publish must write `__recovery/{ulid}.json` sidecars.
|
|
`Omnigraph::open` in read-write mode runs the all-or-nothing sweep; the
|
|
write entry points (`load_as`, `mutate_as`, `apply_schema_as`,
|
|
`branch_merge_as`) and `refresh` run roll-forward-only recovery in-process,
|
|
so a long-lived process converges on its next write rather than at restart. Do not add a new writer kind without
|
|
sidecar coverage or an explicit proof that no Lance HEAD can move before
|
|
manifest publish.
|
|
|
|
6. **Strong consistency is the default.** Reads are snapshot-isolated, writes
|
|
are durable before acknowledgement, and branch reads observe the current
|
|
committed graph state. Any eventual-consistency mode must be explicit,
|
|
read-only, auditable, and non-default.
|
|
|
|
7. **Indexes are derived state.** Reads must see the correct result for the
|
|
branch they read even when index coverage is partial. Expensive index work
|
|
should converge from manifest state instead of extending the critical write
|
|
path. Scalar staged index builds and vector inline residuals are documented
|
|
in [writes.md](writes.md) and [indexes.md](../user/search/indexes.md).
|
|
|
|
8. **Schema identity survives renames.** Accepted schema identity must remain
|
|
stable across type and property renames. Rename support belongs in migration
|
|
planning, not in "drop and recreate" behavior. See the known gap below.
|
|
|
|
9. **Schema/data integrity failures are loud.** Type errors, required-field
|
|
misses, invalid edge endpoints, cardinality violations, and unsupported
|
|
mixed mutation modes fail before a graph commit is published. The system must
|
|
not invent placeholder nodes or silently weaken integrity.
|
|
|
|
10. **Query semantics are first-class IR concepts.** Search modes, mutations,
|
|
polymorphism, traversal, retrieval scores, imports, and policy predicates
|
|
belong in typed AST/IR/planner structures. Do not smuggle semantics through
|
|
strings, side tables, global state, or transport-specific flags.
|
|
|
|
11. **Transport/auth stay at the boundary.** Kernel crates should not depend on
|
|
HTTP, OpenAPI, bearer-token parsing, or future transport protocols. The
|
|
server resolves bearer tokens to actors; clients cannot set actor identity
|
|
directly.
|
|
|
|
12. **Bearer-token plaintext is not retained.** Server startup hashes bearer
|
|
tokens, authentication uses constant-time comparison, and request handling
|
|
carries only the resolved actor identity and hash-derived match state.
|
|
|
|
13. **Operational failures are bounded and observable.** Timeout, memory, OOM,
|
|
partial result, recovery, and conflict paths must fail loudly or degrade in
|
|
a documented way. If a metric affects plan choice or operator behavior, it
|
|
must be exposed through the relevant trait or observability surface.
|
|
|
|
14. **Tests match the boundary being changed.** Prefer extending the existing
|
|
test that owns the area. Planner changes need planner-level coverage,
|
|
storage changes need storage/recovery coverage, and end-to-end tests are not
|
|
a substitute for missing lower-level assertions. Read [testing.md](testing.md)
|
|
before adding tests.
|
|
|
|
## Current Truth Matrix
|
|
|
|
| Area | Current state | Source |
|
|
|---|---|---|
|
|
| Multi-table commit | Manifest CAS plus recovery sidecars; not a single Lance primitive | [writes.md](writes.md), [architecture.md](architecture.md) |
|
|
| Constructive mutations | In-memory `MutationStaging`, one end-of-query table commit per touched table, then one manifest publish | [writes.md](writes.md), [execution.md](execution.md) |
|
|
| Deletes | Inline-commit residual; delete-only queries allowed, mixed insert/update/delete rejected by D2 | [query-language.md](../user/queries/index.md), [writes.md](writes.md) |
|
|
| Branch delete | Manifest is the single authority, flipped atomically first; per-table forks + commit-graph branch are derived state, reclaimed best-effort (`force_delete_branch`) with the `cleanup` reconciler as the guaranteed backstop. Reusing a name whose reclaim failed before `cleanup` surfaces an actionable error | [branches-commits.md](../user/branching/index.md), [maintenance.md](../user/operations/maintenance.md) |
|
|
| Schema validation | Type checks, required fields, defaults, edge endpoint checks, and edge cardinality are enforced on write paths | [schema-language.md](../user/schema/index.md), [execution.md](execution.md) |
|
|
| Unique constraints | Intra-batch and write-path checks exist; intake and branch-merge derive the composite key through one shared function (`loader::composite_unique_key`, a separator-free `Vec<String>` tuple) and fail loudly on an un-keyable column type rather than silently exempting it; full cross-version uniqueness against already-committed rows is still a gap | [schema-language.md](../user/schema/index.md) |
|
|
| Storage trait | `TableStorage` (via `db.storage()`) is staged-only; the inline-commit residuals (`delete_where`, `create_vector_index`) are split onto a separate sealed `InlineCommitResidual` trait reached via `db.storage_inline_residual()` (MR-854), so §1 holds by construction; capability/stat surfaces are roadmap | [writes.md](writes.md), [architecture.md](architecture.md) |
|
|
| Index lifecycle | `@index`/`@key` declares *intent*; the physical index is derived state and never fails a logical op. `schema apply` builds no indexes (records intent only; index-only changes touch no table data). `load`/`mutate` build inline through one chokepoint (`build_indices_on_dataset_for_catalog`, type-dispatched by `node_prop_index_kind`: enum + orderable scalar → BTREE, free-text String → FTS, Vector → vector) that fault-isolates an untrainable Vector column into a *pending* index instead of aborting. `optimize`/`ensure_indices` is the reconciler: it creates declared-but-missing indexes and folds appended/rewritten fragments into existing ones (`optimize_indices`), reporting still-pending columns. Explicit maintenance call, not yet a background loop | [indexes.md](../user/search/indexes.md), [maintenance.md](../user/operations/maintenance.md) |
|
|
| Traversal IDs | Runtime still builds `TypeIndex`; Lance stable row-id based graph IDs are roadmap | [architecture.md](architecture.md), [query-language.md](../user/queries/index.md) |
|
|
| Auth | Bearer token hashing and server-side actor resolution are implemented at the HTTP boundary | [server.md](../user/operations/server.md), [policy.md](../user/operations/policy.md) |
|
|
| Tests | Tempdir-backed Lance tests are the current substrate; the storage adapter has an in-memory backend for adapter-level contract tests, but Lance datasets bypass it | [testing.md](testing.md) |
|
|
|
|
The branch-delete reconciler is authority-derived: it reclaims orphaned forks
|
|
today and degrades to a no-op if Lance ships an atomic multi-dataset branch
|
|
operation, so the design composes with that future rather than blocking it. This
|
|
is the same shape as invariant 7 (indexes are derived state); prefer it over a
|
|
recovery-sidecar-style approach for any new multi-dataset metadata operation,
|
|
since the sidecar would be scaffolding to remove once the substrate closes the gap.
|
|
|
|
## Known Gaps
|
|
|
|
Do not hide these behind invariant wording. Either move them forward or keep
|
|
them explicit.
|
|
|
|
- **Rename-stable schema identity:** the invariant is that accepted IDs survive
|
|
renames. The current compiler still derives type IDs from `kind:name`; this
|
|
must be fixed before relying on renamed IDs across accepted schemas.
|
|
- **Storage abstraction:** `TableStorage` is present, sealed, and canonical for
|
|
staged writes. MR-854 sealed it: `db.storage()` exposes only staged primitives
|
|
+ reads, and the inline-commit residuals are split onto a separate sealed
|
|
`InlineCommitResidual` trait reached via `db.storage_inline_residual()`, so a
|
|
new writer cannot couple a write with a HEAD advance through the default
|
|
surface. The dead legacy methods (`append_batch` on the trait,
|
|
`merge_insert_batch{,es}`, `create_{btree,inverted}_index`) were removed. The
|
|
remaining residuals are `delete_where` and `create_vector_index`. The Lance
|
|
6.0.1 → 7.0.0 bump landed, so the staged two-phase delete API
|
|
(`DeleteBuilder::execute_uncommitted`, Lance #6658) is now available and MR-A
|
|
is unblocked — but the migration itself is still pending, so `delete_where`
|
|
stays inline for now. `create_vector_index` remains gated on Lance #6666
|
|
(still open). See [lance.md](lance.md) and [writes.md](writes.md). New write
|
|
paths should use the staged shape unless a documented Lance blocker applies.
|
|
- **Deletes and vector indexes:** `delete_where` and vector index creation still
|
|
advance Lance HEAD inline. The public delete two-phase API now exists (Lance
|
|
#6658 shipped in 7.0.0), so the delete residual is unblocked pending the MR-A
|
|
migration; vector index creation is still blocked (Lance #6666 open). Keep D2
|
|
and recovery coverage in place until those residuals are removed.
|
|
- **Blob-column compaction:** Lance `compact_files` mis-decodes blob-v2 columns
|
|
under its forced `BlobHandling::AllBinary` read ("more fields in the schema
|
|
than provided column indices"), so `optimize` skips any table with a `Blob`
|
|
property — reporting `SkipReason::BlobColumnsUnsupportedByLance` (loud, not a
|
|
silent drop) behind the `LANCE_SUPPORTS_BLOB_COMPACTION` gate. Reads and writes
|
|
are unaffected; only space/fragment reclamation on blob tables is deferred.
|
|
Remove the skip when the upstream Lance fix lands — the
|
|
`lance_surface_guards.rs::compact_files_still_fails_on_blob_columns` guard
|
|
turns red on that bump to force it.
|
|
- **Recovery is serialized against live writers in-process only:** the
|
|
write-entry heal (and `refresh`) serialize against a live writer's sidecar
|
|
lifetime via the per-`(table, branch)` write queues plus the schema-apply
|
|
serialization key — all in-process primitives. A recovery pass in one
|
|
process cannot serialize against a live writer in another (the open-time
|
|
sweep has the same exposure, and always has): it may roll a live foreign
|
|
writer's sidecar forward, which degrades to publisher-CAS contention for
|
|
data writes but can race the schema-staging promotion for a foreign live
|
|
schema apply. Multi-process writers on one graph are already documented
|
|
one-winner-CAS territory; closing this fully needs a cross-process
|
|
serialization primitive (e.g. lease-based use of the schema-apply lock
|
|
branch) — design it before promoting multi-process write topologies.
|
|
- **Local `write_text_if_match` is not a cross-process CAS:** object-store
|
|
backends use a true conditional put (ETag If-Match; the in-memory test
|
|
backend too), but upstream `object_store` leaves `PutMode::Update`
|
|
unimplemented for `LocalFileSystem`, so the local path emulates CAS with
|
|
a content-token compare followed by an atomic replace — a check-then-act
|
|
gap plus content-token ABA. Every current caller goes through the cluster
|
|
lock protocol first, which makes this safe. A lock-free caller would get
|
|
S3-correct but local-racy behavior — the same divergence shape as the
|
|
acknowledged-before-visible bug this branch fixed. Close it (local CAS
|
|
primitive, or a trait-level lock requirement) before admitting any
|
|
lock-free `if_match` caller.
|
|
- **Manifest→commit-graph publish atomicity:** a graph commit advances
|
|
`__manifest` (the visibility authority) and then appends `_graph_commits` as
|
|
two separate writes (`commit_updates_with_actor_with_expected`, failpoint
|
|
`graph_publish.before_commit_append`). A crash between them leaves the manifest
|
|
at version N with no commit-graph row for N. Live reads and durability are
|
|
unaffected — the live version resolves via the manifest
|
|
(`GraphCoordinator::version()`), not the commit-graph head — and the open-time
|
|
recovery sweep does NOT repair it (`lance_head == manifest_pinned` classifies
|
|
`NoMovement`; a recovery sidecar would not change this). Impact is bounded to
|
|
commit history: `commit list` misses N, time-travel by commit id to N fails,
|
|
and merge-base loses a node (a likely-benign off-by-one re-merge). This affects
|
|
every publish, not a specific maintenance command. Eventual fix: make the
|
|
commit graph reconcilable from the manifest (or the two writes atomic) — not a
|
|
recovery-sidecar concern.
|
|
- **Planner capability/stat surfaces:** cost-aware planning, complete
|
|
capability advertisement, and explain-with-cost are roadmap. Do not describe
|
|
them as implemented.
|
|
- **Traversal execution:** current multi-hop execution still uses `TypeIndex`,
|
|
ad-hoc ID filtering, and eager materialization in places. Stable row IDs, SIP,
|
|
and factorization are target patterns, not current fact.
|
|
- **Retrieval ranks:** hybrid search works, but rank/score are not yet carried
|
|
everywhere as ordinary columns through the plan.
|
|
- **Policy pushdown and `Source`:** Cedar enforcement is at the HTTP boundary
|
|
today, and imports are still loader-shaped. Planner predicates and a unified
|
|
`Source` operator are roadmap.
|
|
- **Resource bounds:** some operations still lack enforced per-query memory or
|
|
time budgets. New long-running work should add explicit bounds rather than
|
|
widening the gap.
|
|
|
|
## Deny-list
|
|
|
|
If a proposal fits one of these, the burden is on the proposer to prove why the
|
|
case is exceptional.
|
|
|
|
- Custom WAL, transaction manager, buffer pool, page format, or storage engine.
|
|
- Per-table graph publishing outside the manifest publisher.
|
|
- Re-reading current branch head during a query instead of using the captured
|
|
snapshot.
|
|
- New write paths that can advance Lance HEAD before manifest publish without a
|
|
recovery sidecar.
|
|
- Cross-query `BEGIN`/`COMMIT` transactions in the OSS engine. Use branches and
|
|
merges for multi-query workflows.
|
|
- Acknowledging writes before durable Lance and manifest persistence.
|
|
- Silent fallback to eventual consistency, partial results, or dropped rows.
|
|
- State that drifts from Lance or the manifest when it can be derived.
|
|
- Job queues for manifest-derivable state where a reconciler is the right shape.
|
|
- Synchronous inline vector/FTS index rebuilds on the query commit path, except
|
|
for documented Lance API residuals.
|
|
- Side-channels for query semantics: hidden globals, magic strings, transport
|
|
flags, or out-of-band metadata.
|
|
- Cost-blind plan choice when statistics are available or required.
|
|
- Hidden statistics for behavior that affects planning or operator choice.
|
|
- Hash-map iteration order in result ordering, plan choice, or migration output.
|
|
- String-flattened SQL/filter generation when a structured pushdown API is
|
|
available.
|
|
- Eager multi-hop cross-product materialization when factorization fits.
|
|
- Ad-hoc `IN`-list filtering where SIP or another structured selectivity path
|
|
fits.
|
|
- Discarding retrieval score/rank before fusion or projection decisions.
|
|
- Auto-creating placeholder nodes for orphan edges.
|
|
- Raw filesystem I/O for cluster-stored state (ledger, lock, sidecars,
|
|
approvals, catalog) outside the cluster crate's storage module — every
|
|
stored byte goes through the engine `StorageAdapter` so `file://` and
|
|
`s3://` stay one code path.
|
|
- Wire-protocol-specific code in compiler or engine crates.
|
|
- Cloud-only correctness fixes or forks of the OSS engine for correctness.
|
|
- Mutating immutable substrate state in place, including Lance fragments or
|
|
index segments.
|
|
- Shipping observable behavior as if it were not part of the contract. Output
|
|
ordering, error text, timestamp precision, defaults, and latency profiles all
|
|
become dependencies once exposed.
|
|
|
|
## Review Checklist
|
|
|
|
Use this as yes/no/NA for any non-trivial design or PR:
|
|
|
|
- Does it respect Lance/DataFusion instead of rebuilding them?
|
|
- Does it preserve manifest-atomic graph visibility?
|
|
- Does every query keep one snapshot for its lifetime?
|
|
- Do mutations publish once at the commit boundary?
|
|
- Can every Lance-HEAD-before-manifest gap recover all-or-nothing?
|
|
- Are schema and edge integrity checks strict by default?
|
|
- Are query semantics represented in AST/IR/planner structures?
|
|
- Are transport, auth, and policy boundaries preserved?
|
|
- Are failures bounded, typed, and observable?
|
|
- Are result ordering and plan choices deterministic within a snapshot?
|
|
- Are stats/capabilities exposed when behavior depends on them?
|
|
- Are existing known gaps left no worse and documented if touched?
|
|
- Does the test live at the same boundary as the change?
|
|
- Does the change avoid every deny-list pattern, or justify the exception?
|
|
|
|
## Maintenance Policy
|
|
|
|
Update this file when an invariant changes, a known gap opens or closes, or a
|
|
new review anti-pattern deserves deny-list treatment. Prefer stable headings
|
|
over numbered sections so other docs can link here without churn.
|
|
|
|
Removing or relaxing a hard invariant requires the same review process as code.
|
|
Adding a known gap is acceptable when it makes reality explicit; leaving stale
|
|
claims is not.
|