Five fixes from PR #68 review (Cursor Bugbot + Codex + Cubic): * **scan_with_pending gains merge-shadow semantics** (Codex P1, Cubic P1#1): new `key_column: Option<&str>` parameter. When set, committed rows whose key value appears in any pending batch are excluded from the scan — making `scan_with_pending` correctly merge-semantic for chained updates instead of naively unioning. execute_update calls with Some("id"). Without this, a chained `update where age > 30` could match a row whose pending value already moved out of range. * **Multi-delete on same table no longer trips ExpectedVersionMismatch** (Cursor Bugbot HIGH): open_table_for_mutation routes through reopen_for_mutation when staging.inline_committed has the table, using the post-inline-commit Lance version captured at record_inline time. The legacy open_for_mutation_on_branch fence (Lance HEAD == manifest pinned) is correct cross-writer but wrong intra-query when deletes have already advanced HEAD on this table. Branch goes away when Lance ships two-phase delete (lance-format/lance#6658). * **Cardinality validation consolidated** (Cursor LOW + Codex P2 + Cubic P1#2 + Cubic P2): new exec/staging::count_src_per_edge + enforce_cardinality_bounds shared by mutation and loader paths. Restores the missing min-cardinality check on the engine path. Loader Merge mode passes Some("id") to dedupe edges being updated by id (not double-count committed + pending). Loader Append mode and engine path pass None (ULID-generated ids never collide). * **Dead count_rows_with_pending removed** (Cursor LOW): never called. * **Misleading concat-helper comment fixed** (Cubic P3): claimed schema normalization the helper doesn't implement. Updated to match reality. * **Documentation honesty** (Cubic P1#3): MR-794 narrows but doesn't eliminate the "Lance HEAD ahead of __manifest" drift class. Drift is unreachable for op-execution failures (the partial_failure test pins this), but a residual remains at the finalize→publisher boundary because Lance has no multi-dataset commit primitive: per-table commit_staged calls run sequentially before manifest commit. Updated docs/runs.md, docs/invariants.md §VI.25, docs/releases/v0.4.1.md to scope the claim precisely. * **Failpoint test pinning the residual**: new mutation.post_finalize_pre_publisher failpoint + two tests in tests/failpoints.rs that confirm the documented residual behavior. Catches future regressions that widen the residual. Test additions on tests/runs.rs: * chained_updates_with_overlapping_predicate_respects_intermediate_value * multi_statement_delete_on_same_node_table * cascade_delete_node_then_explicit_delete_edge_on_same_table * mutation_insert_edge_enforces_min_cardinality * load_merge_mode_dedupes_edge_for_cardinality_count 113/113 engine integration tests pass (runs + end_to_end + consistency + staged_writes + validators). Failpoints feature build runs in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
24 KiB
Architectural Invariants & Patterns
Type: Reference / standing document Status: Living — updated as decisions accrue Audience: anyone proposing, reviewing, or implementing a change to any part of OmniGraph
This document captures two things:
- Invariants (Parts I–VI, VIII): load-bearing principles that hold across the architecture. Breaking one is rare and requires explicit justification.
- Current architectural patterns (Part VII): how we realize the invariants today. These are committed conventions, not eternal facts; they may evolve as the engine matures, but until they do, they constrain new work.
These are not query-engine-specific. They apply to every layer.
Status legend
- Status: decided. No annotation needed; this is the default.
- Status: open — see MR-X. The principle is captured, but the concrete default or mechanism is still under discussion. Future work should follow the captured intent or update this document with the resolution.
- Status: aspirational. The invariant describes the target state; current code may not yet uphold it. PRs that move toward upholding it are welcome; PRs that drift away need explicit justification.
Capturing aspirational invariants on purpose: we'd rather record what we want to be true and have current code be measured against it than not have the rule at all.
How to use
- Writing an RFC or design proposal: walk through the relevant sections and state how the proposal upholds each invariant — or why a documented exception is justified.
- Reviewing a PR or design: scan for invariants the change might violate. The deny-list (§IX) is the fastest first pass.
- Debating a tradeoff: invoke the relevant invariant and check whether the tradeoff respects it.
- Updating this document: add to the deny-list freely. Removing or relaxing an invariant requires the same review process as any other architectural decision.
I. Substrate respect — delegate, don't rebuild
The first question for any new component: does the substrate already do this?
Current substrate is Lance for storage, indexes, and MVCC; DataFusion is the working assumption for relational machinery. These are committed choices (MR-737 §2.2, §5.11) but not eternal facts. The invariants below are about respecting whatever substrate we adopt.
-
Don't rebuild what the substrate owns. Storage format, durability (WAL, transaction journal), buffer pool, MVCC, index lifecycle — all delegated. Building parallel implementations turns the project into a different one and locks us out of substrate improvements. Check: Does this proposal introduce a parallel storage format, custom on-disk pages, custom serialization, custom WAL, custom buffer pool?
-
Don't rebuild relational machinery provided by the runtime substrate. Joins, aggregations, parallelism, spill — extension via the substrate's trait surfaces; never reimplementation. Check: Are we extending the substrate via traits, or reimplementing parts of it?
-
Don't maintain state parallel to the substrate. Observe substrate state and derive what we need. State that drifts from the substrate is a bug. Check: Does this proposal track index coverage, manifest versions, or fragment locations independently of the substrate?
II. Layering — the seams hold
-
The IR is the contract between frontend and backend. Frontends emit IR; planner / executor consume it. No frontend logic leaks downward; no executor concerns leak upward. Check: Does the proposal add to the IR, or to a layer? If to a layer, does it cross another layer's concern?
-
Capabilities and statistics flow upward; data flows downward. Lower layers expose what they can do (capabilities) and what they know (statistics). Upper layers consume both. Methods alone are insufficient — methods without capability advertisement force one-size-fits-all plans. Check: When adding a method to a layer trait, did we also expose the capability so the planner can reason about it?
-
One trait boundary per layer. Crossing a layer means going through its trait. Direct calls to lower-layer concrete types from upper layers are forbidden. Check: Does this code call
lance::Datasetdirectly outside engine-storage? Call planner internals from the executor? -
No god modules. Single-module concerns: storage, IR, planner, executor, frontend, reconciler, schema, policy. Each crate has a reference test suite that runs without the others. Check: Does this PR add a concern to a crate that already owns a different one?
-
Wire protocols are interchangeable; the IR is the contract. The kernel produces
Stream<RecordBatch>end-to-end; transports (HTTP/JSON, Arrow Flight, FlightSQL, future protocols) deliver them at the server boundary. No wire-protocol-specific code in kernel crates. Status: aspirational — Flight not yet implemented; tracked in MR-765. Check: Does this code importarrow_flight(or any transport crate) outside the server layer?
III. Distributability — kernel stays remote-friendly
These are technical constraints, independent of whether we ship a distributed product. They preserve the architectural seam.
-
The kernel admits parallel and remote implementations. Trait surfaces are thread-safe; no in-process-only assumptions; remote dataset descriptors (URI, snapshot ref, fragment ID) are accepted without requiring an open in-process handle.
-
IR is location-neutral. No IR operator embeds an assumption about where data lives.
-
Cost models accept new dimensions (network, latency-tier) as additive extensions. No place hard-codes "all cost is local I/O."
-
Background work admits alternate implementations. In-process default; separable worker fleet for distributed deployment uses the same trait. Status: aspirational — distributed deployment is out of scope today (MR-737 §2.2); these constraints preserve the seam.
IV. Evolution — additive over rewrite
-
Additive over rewrite. New IR variants and planner rules slot in. No "tear out and replace" PRs.
-
Capabilities are additive enums. New variants are additive. Existing implementations keep working.
-
Feature-flag behavior changes. Every change that alters runtime behavior ships behind a flag. Old code path stays until the new one is proven.
-
No data drops without a migration. When data needs to move (e.g., adopting stable row IDs), use in-place or dual-write windows. Never "drop and recreate."
-
No breaking schema changes without a migration plan. Schema-IR changes go through the migration planner with safety tier classification. See the MR-694 family.
V. Honesty — what the system tells operators
-
Estimate-vs-actual logging on every estimator. Cost models drift; calibration is a continuous process, not a one-off.
-
Operationally important state is observable. Index coverage, reconciler lag, cost-model accuracy — surfaced through the storage trait's
capabilities()and a unified observability API. -
Honest failure modes. Cost-model misses degrade gracefully (spill, partial-result, bounded abort). No silent OOM.
-
Per-query resource consumption is bounded and exposed. Memory cap, wall-clock timeout, max-rows-scanned, max-fragments-scanned. Operators respect them; bounds exposed via explain.
-
Plans are explainable. Every executed query can be inspected as IR + physical plan + cost annotations. No "you'd have to read the source to know what this does." See MR-684.
VI. Database guarantees — what OmniGraph promises as a system of record
These are user-visible commitments. They state what the engine guarantees and what it does not. For an "agent-native system of record," credibility lives here.
Specific defaults (timeout values, memory caps, TTL windows) are configuration, not invariants — see docs/constants.md and per-deployment configuration. The invariant is that bounds and contracts exist, not their numerical values.
-
Atomicity is per-query. Every
.gqquery is atomic — multi-statement mutations are all-or-nothing via the substrate's atomic-commit primitive. No cross-queryBEGIN/COMMIT; branches and merges fill that role for agent workflows. -
Schema integrity is strict at commit. Type validation, required-field presence (auto-filled from
@defaultif declared), uniqueness across batches and versions, and referential integrity — all enforced before commit succeeds. Per-write softening flags are opt-in, never default. Status: aspirational — referential integrity at scale requires SIP-backed cross-table validation; not yet implemented. Cross-batch / cross-version uniqueness tracked in MR-714. -
Isolation: per-query snapshot; read-your-writes within and across queries in a session. Each query reads from one consistent manifest version. Within a multi-statement mutation, the read subplan inside each write operator sees the writes from earlier statements. Across queries in a session, reads always resolve the latest manifest version — no reader pinning to older snapshots. Status: upheld for inserts/updates after MR-794 step 2+ —
MutationStaging's in-memory accumulator +TableStore::scan_with_pending(DataFusionMemTableunion with the committed Lance scan, with merge-shadow semantics for chained updates) implements read-your-writes within a multi-statement mutation. Delete-touching mutations are limited to delete-only by parse-time D₂; closing the within-query RYW gap for deletes requires Lance's two-phase delete API (tracked: MR-793 / Lance-upstream lance-format/lance#6658). The "Lance HEAD ahead of__manifest" drift class is unreachable for op-execution failures (the partial-failure test pins this), but a narrowed residual remains at the finalize→publisher boundary because Lance has no multi-dataset commit primitive — see docs/runs.md "Finalize → publisher residual". -
Durability before acknowledgement. Commit returns only after the substrate has confirmed durable persistence. No "fast" or "fire-and-forget" durability levels.
-
Causal consistency across sessions. If session A commits and session B subsequently reads, session B sees A's write. Single-coordinator: trivially via single-source manifest. Multi-coordinator: enforced via leader-for-writes plus session-token replica reads. Never weakened. Status: aspirational on the multi-coordinator side.
-
Determinism within a snapshot. Same query + same snapshot + same parameters → order-stable results (deterministic tie-breaks). Plan choice is deterministic given identical statistics. Cross-version determinism is best-effort, not guaranteed (statistics change, plans change). Status: aspirational — current code may rely on HashMap iteration in some paths.
-
Writes are idempotent under retry. Insert / Update / Merge take an explicit
on_conflictpolicy. Clients may provide an idempotency key on writes; the server deduplicates retries within a configurable TTL window. Schema migrations are idempotent under replay. Status: open —on_conflictpolicy lands with mutation IR (MR-737 Phase 8); idempotency-key TTL default is undecided. -
No silent data loss or corruption. Substrate-level checksums are trusted for storage integrity. Semantic-invariant checks at every commit catch higher-level cases (orphan edges, type drift, broken uniqueness). Every operation succeeds, fails loudly with cause, or degrades observably with metrics.
-
Every operation has a documented bound. "May run forever" is forbidden as a default. Defaults are configurable; the invariant is that bounds exist, are documented, and are enforced.
-
Failure scope is bounded. A failing query, fragment-level corruption, or background-task crash does not cascade. Per-table fragment isolation at the storage tier; per-query memory and timeout in the executor. Status: aspirational on the per-query side — per-query memory cap not yet enforced; planned with MR-737 Phase 7.
-
Crash recovery via the same code paths as steady-state. No special "recovery mode." On restart, the engine reads the manifest, finds the latest committed state, and resumes. Substrate atomicity ensures no partial writes survive.
-
Strong consistency by default; relaxation is per-query, never per-default. Strong (read-your-writes, monotonic, snapshot) is the default for every query. Eventual consistency is opt-in per read query for analytical workloads where staleness is acceptable. Never available on writes; always logged for audit. Status: aspirational — eventual-consistency opt-in flag tracked in MR-425.
VII. Current architectural patterns
These are how we realize the invariants today. They are committed conventions — until we explicitly revise them, new code follows them. They are not eternal: a future architecture review may replace any of these with a different mechanism that upholds the same invariants. The deny-list (§IX) protects them in the meantime.
-
Reconciler pattern for derivable state. Index coverage, statistics, anything derivable from manifest state — reconciled, not job-queued. Realizes the "don't maintain state parallel to the substrate" invariant. See MR-737 §5.16.
-
Polymorphism via Union, not per-feature lowering. Interfaces / wildcards / alternation on nodes and edges share one IR (
Polymorphism<T>) and one lowering (Union of per-type concrete plans). Realizes "shared mechanism for shared shape." See MR-737 §5.13. Status: aspirational — node interfaces in MR-579; edge wildcards in MR-744. -
Mutations wrap read subplans. Insert / Update / Delete / Merge are operators that consume read-shaped subplans. Same planner, same cost model, same storage trait. Realizes "writes share the planner with reads." See MR-737 §5.12. Status: aspirational — current mutation path is separate from reads.
-
SIP for cross-operator selectivity propagation. Producers publish ID bitmaps; downstream scans consume them through structured pushdown. Realizes "downstream operators prune via upstream selectivity." Status: aspirational — current code uses IN-list flattening in
Expand. -
Factorize multi-hop, flatten only at projection. Lists carry multiplicity through intermediate operators.
Flattenis inserted by the planner where required, not eagerly. Realizes "intermediate state shouldn't materialize cross-products eagerly." Status: aspirational — current code materializes cross-products eagerly. -
Stable row IDs as dense graph IDs. Don't maintain parallel string→u32 maps. Lance's stable row IDs are the substrate's identity layer; we use them directly. Realizes "use the substrate's identity layer." Status: aspirational — current code rebuilds
TypeIndexper query. -
Rank and score are columns. Retrieval operators emit
_score,_rank. Fusion operators consume rank-bearing batches. Realizes "rank/score is data, not metadata." Status: aspirational — current RRF runs the pipeline twice and discards rank. -
Policy as predicates. Authorization decisions are filter expressions injected into the planner, not enforcement at the API boundary. Realizes "authorization pushes down with other filters." Status: aspirational — Cedar enforcement currently at HTTP boundary only; tracked in MR-722 / MR-725.
-
Imports unify under
Source; transport is interchangeable. A singleSourceIR operator with provider variants (File, Flight, Lance, Stream) handles all imports. Lance-to-Lance is a fast-path that bypasses Arrow encode/decode. Realizes "external data sources share one operator surface." Status: aspirational — current loader is JSONL-only; tracked in MR-765.
VIII. Quality gates — every change passes
-
Tests at every boundary.
MemStoragefor engine tests; planner-only tests; executor-only tests with a stub storage. No layer tested only via end-to-end. -
Reference implementation per trait. Every trait has a primary impl (Lance for storage) and at least a test impl.
-
Documented capability surface. New capabilities are documented with what they advertise, who consumes them, how the planner uses them.
-
Benchmark before optimization. New optimizations land with a benchmark that motivates them; if the motivating workload doesn't exist, the feature waits.
IX. Anti-patterns — deny-list
If a proposal fits one of these, the burden is on the proposer to justify why this case is the exception.
Invariant violations (high bar to override)
- Custom WAL / transaction manager / buffer pool. Substrate owns these (§I.1).
- Wire-protocol-specific code in kernel crates. Kernel produces
Stream<RecordBatch>; transport adapters live at the server boundary only (§II.8). - In-process-only
Datasetimpls. Trait surfaces stay remote-friendly (§III.9). - State that drifts from the substrate / manifest. Derive from observable state (§I.3).
- Cross-query
BEGIN/COMMITtransactions. Branches replace them in OSS (§VI.23). - Acks before durable persistence. "Best-effort commit" is forbidden (§VI.26).
- Reads that see partial commits. Atomicity is non-negotiable (§VI.23).
- Operations without time bounds. Every operation has a documented timeout or backoff (§VI.31).
- "Recovery mode" code paths separate from steady-state. Recovery uses the same code as ordinary reads (§VI.33).
- Eventual consistency as a default. Strong is default; eventual is opt-in per query, never on writes (§VI.34).
- Schema migrations that are not idempotent under replay. Idempotency is required for replay safety (§VI.29).
- Plan choice that varies given identical input statistics. Determinism is required (§VI.28).
- HashMap iteration order in result ordering or plan choice. Use deterministic tie-breaks (§VI.28).
- Cost-blind plan choice. Lowering-order execution is not a planner.
- Hidden statistics. If a metric matters for plan choice, it must be exposed through the trait surface (§II.5).
- Side-channels for query semantics. Search modes, mutations, polymorphism, imports — all first-class IR concepts (§II.4).
- Hand-rolling something the substrate already does. Check the spec first (§I.1).
- Mutating in place state that should be immutable (Lance fragments, index segments). New segments instead.
- Silent failures. OOM, timeout, partial result — all surfaced and bounded (§V.20).
Pattern violations (overridable with justification)
These protect the current architectural patterns (§VII). A future review may revise them.
- Synchronous-inline index updates for indexes expensive to build (vector ANN, FTS). Reconciler pattern instead (§VII.35).
- Job queue for state derivable from manifest. Reconciler pattern instead (§VII.35).
- Per-feature lowering for shapes that share a structure (interfaces, wildcards, alternation). Use one mechanism (§VII.36).
- Per-format import code paths (one path for JSONL, another for Parquet, another for Flight). Use the
SourceIR operator (§VII.43). - Eager materialization of cross-products in multi-hop. Factorize (§VII.39).
- Ad-hoc
IN-list filtering when SIP fits (§VII.38). - String-flattened SQL filter generation when structured pushdown is available.
- Discarding rank in retrieval. Score and rank propagate as columns (§VII.41).
- Auto-creating placeholder nodes for orphan edges (silent invention of data). Reject by default; opt-in per write (§VI.24).
- Double-encoding data when both endpoints speak the same format (e.g., Lance → Arrow → Lance when both are Lance). Use a fast-path (§VII.43).
- Per-write durability fast paths until MemWAL is stable AND a use case justifies the latency vs. risk tradeoff.
X. Review checklist (use against any non-trivial change)
Print this when reviewing an RFC or PR. Each line is yes / no / N/A.
- Does it respect the substrate? (§I)
- Does it cross only one trait boundary per layer? (§II)
- Are capabilities and stats exposed for any new behavior? (§II.5)
- If touching the wire / transport surface, does kernel code stay protocol-agnostic? (§II.8)
- Do trait surfaces stay remote-friendly? (§III)
- Additive, not rewrite? Feature-flagged where behavior changes? (§IV)
- Any new estimator has estimate-vs-actual logging? (§V.18)
- Coverage / lag / budget metrics surfaced? (§V.19–21)
- Failure modes graceful, bounded, observable? (§V.20)
- Atomicity scope respected per query? (§VI.23)
- Schema integrity enforced strict at commit unless explicit opt-out? (§VI.24)
- Isolation level matches default (per-query snapshot, read-your-writes)? (§VI.25)
- Durability ack only after manifest commit? (§VI.26)
- Determinism preserved (order-stable, plan-deterministic)? (§VI.28)
- Idempotency: explicit
on_conflict; idempotency keys honored if used? (§VI.29) - Bounded operations: explicit timeout / memory / concurrency limits? (§VI.31)
- If touching imports / external data, does it go through
Source? (§VII.43) - If implementing a graph / retrieval feature: reuses an existing pattern (reconciler, Union, mutation-wrap-read, SIP, factorize, Source) where applicable? (§VII)
- Tests at every boundary, not just end-to-end? (§VIII.44)
- Reference impl + test impl for any new trait? (§VIII.45)
- None of the deny-list patterns apply? (§IX)
XI. Living document policy
This document is updated when:
- A new architectural decision establishes a new invariant — add it.
- An existing invariant is challenged and either reaffirmed (with the case sharpened) or revised (with explicit migration of any affected code).
- A new architectural pattern is adopted — add to §VII.
- A current pattern (§VII) is replaced — update or remove the entry; update the deny-list.
- A new anti-pattern surfaces in review and deserves a place on the deny-list — add it.
- An aspirational invariant becomes upheld — remove the status annotation.
- An open invariant is decided — record the decision and remove the status annotation.
Updates require the same review process as code. Adding to the deny-list (§IX) is cheap; removing or relaxing an invariant (§I–VI, VIII) requires explicit justification in the proposal. Replacing a pattern (§VII) requires a design discussion linking to the new pattern; until that lands, the existing pattern stays.
When an invariant is contested in the moment, the resolution path is: (a) state the case in the relevant RFC or PR; (b) link it from this document; (c) update this document if the resolution changes the rule.
XII. Source / origin
These invariants and patterns were extracted from the architectural decisions in:
- MR-737 — Query Engine v2 RFC (the kernel scope and seams)
- MR-744 — Edge wildcards / alternation (one cell of the polymorphic-bindings matrix)
- MR-765 — Arrow Flight transport (query, import, export)
- The schema migration program (MR-694 family — additive evolution, safety tiers, idempotent replay)
- The policy program (MR-722 / MR-725 — predicate pushdown)
- The reconciler / index-lifecycle work (MR-737 §5.16, MR-688, MR-679, MR-680)
- The factorization and SIP work (MR-737 §5.2, §5.3 — Kuzu / Ladybug inspiration)
- The polymorphic-bindings framing (MR-737 §5.13 — one mechanism for eight cells)
- The Source-operator framing (MR-737 §5.12 — one mechanism for all imports)
- The database-guarantees discussion (§VI): ACID dimensions, CAP-style consistency model, scale-system precedents (ClickHouse, Turbopuffer, LanceDB, Postgres). Each invariant in §VI corresponds to a specific named decision; see prior architecture discussions for the option space considered.
General precedent: Lance + LanceDB Enterprise architecture; ClickHouse merge subsystem; Kubernetes controllers; Postgres autovacuum; the FDAL stack (Flight + DataFusion + Arrow + Lance).
Adding a new invariant or pattern here means we've learned something — either from a hard call we made and want to preserve, or from a mistake we don't want to repeat. Both are worth recording.