mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
docs/invariants: add §VI.35-37 + non-commitments for MR-686
Three new §VI invariants name what OmniGraph commits to as an agent-native system of record: branches as the cross-query coordination primitive, per-query isolation as a per-query opt-in (Serializable up, eventual down), and type-aware agent-resolvable merges. Plus an explicit non-commitments subsection so reviewers see what is intentionally out of scope (Strict Serializable across queries, cross-process linearizable single-object writes, auto-resolution of ambiguous merge conflicts). §VII and §VIII renumber by +3 to make room (35-43 -> 38-46, 44-47 -> 47-50); deny-list and review-checklist references in §IX/§X follow. testing.md's pre-existing stale §VII.33/34/36 references resolve to their actual §VIII.47/48/50 targets in the same pass. staged_writes.rs:866's docstring gains an MR-686 forward reference so the load-bearing concurrency-hazard test points readers at the queue work that closes the gap. §VI.34 is preserved alongside the broader §VI.36 to keep its MR-425 pointer addressable; the overlap is documented in §VI.36's status line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
028b913d9a
commit
c12f6adb0c
4 changed files with 60 additions and 31 deletions
|
|
@ -860,6 +860,13 @@ async fn lance_restore_appends_one_commit_with_checked_out_content() {
|
|||
/// tables before invoking restore — otherwise this hazard becomes
|
||||
/// reachable during in-flight tenant traffic.
|
||||
///
|
||||
/// MR-686 introduces those per-(table_key, branch) writer queues as the
|
||||
/// application-layer mechanism that closes this hazard once continuous
|
||||
/// in-process recovery (MR-870) lands. Until MR-686's queue is wired into
|
||||
/// the recovery path, the open-time-only invocation strategy is the
|
||||
/// only thing keeping this hazard out of production. See
|
||||
/// `docs/invariants.md` §VI.30, §VI.32, §VI.33.
|
||||
///
|
||||
/// This test is the load-bearing constraint any future reconciler must
|
||||
/// honor.
|
||||
#[tokio::test]
|
||||
|
|
|
|||
|
|
@ -247,7 +247,7 @@ flowchart LR
|
|||
manual[called manually<br/>or from optimize]:::now
|
||||
end
|
||||
|
||||
subgraph roadmap[Roadmap — invariants §VII.35]
|
||||
subgraph roadmap[Roadmap — invariants §VII.38]
|
||||
rec[Reconciler<br/>observes manifest]:::future
|
||||
diff[coverage diff<br/>fragments − fragment_bitmap]:::future
|
||||
wp[worker pool<br/>builds index segments]:::future
|
||||
|
|
@ -258,7 +258,7 @@ flowchart LR
|
|||
rec --> diff --> wp
|
||||
```
|
||||
|
||||
Today, indexes are built explicitly via `ensure_indices`. Reads degrade gracefully when index coverage is partial — Lance's scanner unions indexed and scan paths automatically. The roadmap reconciler (per [`docs/invariants.md`](invariants.md) §VII.35) observes manifest state and converges coverage in the background.
|
||||
Today, indexes are built explicitly via `ensure_indices`. Reads degrade gracefully when index coverage is partial — Lance's scanner unions indexed and scan paths automatically. The roadmap reconciler (per [`docs/invariants.md`](invariants.md) §VII.38) observes manifest state and converges coverage in the background.
|
||||
|
||||
### Server / CLI
|
||||
|
||||
|
|
|
|||
|
|
@ -136,47 +136,64 @@ Specific defaults (timeout values, memory caps, TTL windows) are *configuration*
|
|||
34. **Strong consistency by default; relaxation is per-query, never per-default.** Strong (read-your-writes, monotonic, snapshot) is the default for every query. Eventual consistency is opt-in per read query for analytical workloads where staleness is acceptable. Never available on writes; always logged for audit.
|
||||
*Status: aspirational — eventual-consistency opt-in flag tracked in MR-425.*
|
||||
|
||||
35. **Branches are the cross-query coordination primitive.** Branches are cheap to create, fully isolated, per-branch SI, with durable queryable metadata (creator, intent, parent, fork point). Agents use branches for any multi-step coordination that needs atomicity beyond a single query. Lifecycle policies (TTL, auto-cleanup) are deployment configuration; the invariant is that branches *exist* as first-class durable objects with full SI parity to main.
|
||||
*Status: upheld. Lance shallow-clone gives cheap creation; per-branch SI is the same code path as main; metadata in `_refs/branches/{name}.json` already supports a queryable `metadata` map.*
|
||||
|
||||
36. **Per-query isolation is adjustable per-query, never per-default.** Default is Snapshot Isolation (§VI.25). Queries can opt **up** to Serializable for cross-table-invariant safety (`USING SERIALIZABLE`) or **down** to eventual consistency for analytical reads (`USING EVENTUAL`). Stricter than Serializable (Strict Serial / linearizable-across-queries) is **not offered**; branches (§VI.35) replace that role for high-stakes coordination. Stronger and weaker are both per-query opt-ins, never per-default.
|
||||
*Status: SI default upheld. Serializable opt-in aspirational — predicate revalidation under MR-686's per-(table, branch) queue is the implementation seam. Eventual-read opt-in aspirational — tracked in MR-425. Subsumes §VI.34 (which only covers the downgrade direction); §VI.34 is preserved for now to keep its MR-425 pointer addressable.*
|
||||
|
||||
37. **Merges are type-aware and agent-resolvable.** Branch merge resolution combines two layers. **Structural** (row-level last-write-wins by deterministic tie-break) is exact for sets of independent rows. **Semantic** (per-type policies declared in schema) handles CRDT-shaped operations: grow-only set, monotonic counter, last-writer-wins-with-timestamp, multi-valued register, first-writer-wins. Conflicts no policy resolves pause the merge with structured `MergeConflictKind` rows; agents produce resolution rows and resume. Auto-resolution never silently picks a side when policies are ambiguous.
|
||||
*Status: structural merge upheld via `OrderedTableCursor` + `StagedTableWriter`. Type-declared semantic policies aspirational. Pausable merges aspirational — current code fails on conflict, doesn't pause.*
|
||||
|
||||
### Explicit non-commitments
|
||||
|
||||
These are *not* part of the OmniGraph contract. Listed so reviewers and downstream users see what is intentionally out of scope.
|
||||
|
||||
- **Strict Serializable across queries.** Branches (§VI.35) are the replacement for cross-query strict-serial coordination.
|
||||
- **Cross-process linearizable single-object writes** in multi-coordinator deployments without explicit external coordination (Postgres advisory, S3 sentinel, leader election). §VI.27 multi-coordinator stays aspirational with a clear cost model.
|
||||
- **Automatic semantic conflict resolution.** §VI.37 is explicit: ambiguous conflicts always pause for agent or human resolution; auto-resolution requires a per-type policy.
|
||||
|
||||
## VII. Current architectural patterns
|
||||
|
||||
These are *how* we realize the invariants today. They are committed conventions — until we explicitly revise them, new code follows them. They are not eternal: a future architecture review may replace any of these with a different mechanism that upholds the same invariants. The deny-list (§IX) protects them in the meantime.
|
||||
|
||||
35. **Reconciler pattern for derivable state.** Index coverage, statistics, anything derivable from manifest state — reconciled, not job-queued. *Realizes the "don't maintain state parallel to the substrate" invariant.* See MR-737 §5.16.
|
||||
38. **Reconciler pattern for derivable state.** Index coverage, statistics, anything derivable from manifest state — reconciled, not job-queued. *Realizes the "don't maintain state parallel to the substrate" invariant.* See MR-737 §5.16.
|
||||
*Status: partial after MR-793 PR #70 — scalar index builds (BTree, Inverted) now route through the staged primitives `stage_create_*_index` + `commit_staged` instead of inline `create_*_index`; this is the building block. The reconciler pattern itself (background `IndexReconciler` task driven by manifest commits, removing synchronous index work from the publish path) is tracked in MR-848. Vector indices remain inline-commit until lance-format/lance#6666 ships.*
|
||||
|
||||
36. **Polymorphism via Union, not per-feature lowering.** Interfaces / wildcards / alternation on nodes and edges share one IR (`Polymorphism<T>`) and one lowering (Union of per-type concrete plans). *Realizes "shared mechanism for shared shape."* See MR-737 §5.13.
|
||||
39. **Polymorphism via Union, not per-feature lowering.** Interfaces / wildcards / alternation on nodes and edges share one IR (`Polymorphism<T>`) and one lowering (Union of per-type concrete plans). *Realizes "shared mechanism for shared shape."* See MR-737 §5.13.
|
||||
*Status: aspirational — node interfaces in MR-579; edge wildcards in MR-744.*
|
||||
|
||||
37. **Mutations wrap read subplans.** Insert / Update / Delete / Merge are operators that consume read-shaped subplans. Same planner, same cost model, same storage trait. *Realizes "writes share the planner with reads."* See MR-737 §5.12.
|
||||
40. **Mutations wrap read subplans.** Insert / Update / Delete / Merge are operators that consume read-shaped subplans. Same planner, same cost model, same storage trait. *Realizes "writes share the planner with reads."* See MR-737 §5.12.
|
||||
*Status: aspirational — current mutation path is separate from reads.*
|
||||
|
||||
38. **SIP for cross-operator selectivity propagation.** Producers publish ID bitmaps; downstream scans consume them through structured pushdown. *Realizes "downstream operators prune via upstream selectivity."*
|
||||
41. **SIP for cross-operator selectivity propagation.** Producers publish ID bitmaps; downstream scans consume them through structured pushdown. *Realizes "downstream operators prune via upstream selectivity."*
|
||||
*Status: aspirational — current code uses IN-list flattening in `Expand`.*
|
||||
|
||||
39. **Factorize multi-hop, flatten only at projection.** Lists carry multiplicity through intermediate operators. `Flatten` is inserted by the planner where required, not eagerly. *Realizes "intermediate state shouldn't materialize cross-products eagerly."*
|
||||
42. **Factorize multi-hop, flatten only at projection.** Lists carry multiplicity through intermediate operators. `Flatten` is inserted by the planner where required, not eagerly. *Realizes "intermediate state shouldn't materialize cross-products eagerly."*
|
||||
*Status: aspirational — current code materializes cross-products eagerly.*
|
||||
|
||||
40. **Stable row IDs as dense graph IDs.** Don't maintain parallel string→u32 maps. Lance's stable row IDs are the substrate's identity layer; we use them directly. *Realizes "use the substrate's identity layer."*
|
||||
43. **Stable row IDs as dense graph IDs.** Don't maintain parallel string→u32 maps. Lance's stable row IDs are the substrate's identity layer; we use them directly. *Realizes "use the substrate's identity layer."*
|
||||
*Status: aspirational — current code rebuilds `TypeIndex` per query.*
|
||||
|
||||
41. **Rank and score are columns.** Retrieval operators emit `_score`, `_rank`. Fusion operators consume rank-bearing batches. *Realizes "rank/score is data, not metadata."*
|
||||
44. **Rank and score are columns.** Retrieval operators emit `_score`, `_rank`. Fusion operators consume rank-bearing batches. *Realizes "rank/score is data, not metadata."*
|
||||
*Status: aspirational — current RRF runs the pipeline twice and discards rank.*
|
||||
|
||||
42. **Policy as predicates.** Authorization decisions are filter expressions injected into the planner, not enforcement at the API boundary. *Realizes "authorization pushes down with other filters."*
|
||||
45. **Policy as predicates.** Authorization decisions are filter expressions injected into the planner, not enforcement at the API boundary. *Realizes "authorization pushes down with other filters."*
|
||||
*Status: aspirational — Cedar enforcement currently at HTTP boundary only; tracked in MR-722 / MR-725.*
|
||||
|
||||
43. **Imports unify under `Source`; transport is interchangeable.** A single `Source` IR operator with provider variants (File, Flight, Lance, Stream) handles all imports. Lance-to-Lance is a fast-path that bypasses Arrow encode/decode. *Realizes "external data sources share one operator surface."*
|
||||
46. **Imports unify under `Source`; transport is interchangeable.** A single `Source` IR operator with provider variants (File, Flight, Lance, Stream) handles all imports. Lance-to-Lance is a fast-path that bypasses Arrow encode/decode. *Realizes "external data sources share one operator surface."*
|
||||
*Status: aspirational — current loader is JSONL-only; tracked in MR-765.*
|
||||
|
||||
## VIII. Quality gates — every change passes
|
||||
|
||||
44. **Tests at every boundary.** `MemStorage` for engine tests; planner-only tests; executor-only tests with a stub storage. No layer tested only via end-to-end.
|
||||
47. **Tests at every boundary.** `MemStorage` for engine tests; planner-only tests; executor-only tests with a stub storage. No layer tested only via end-to-end.
|
||||
|
||||
45. **Reference implementation per trait.** Every trait has a primary impl (Lance for storage) and at least a test impl.
|
||||
48. **Reference implementation per trait.** Every trait has a primary impl (Lance for storage) and at least a test impl.
|
||||
*Status: partial after MR-793 PR #70 — `TableStorage` (the engine-internal staged-write trait, sealed) has its primary impl on `TableStore` (Lance-backed). The trait's signatures use opaque `SnapshotHandle` / `StagedHandle` types so a future test impl (e.g., `MemStorage`) can land without changing call sites. No test impl yet; `tempfile::tempdir()` + Lance is the de-facto test substrate today (see [docs/testing.md](testing.md)).*
|
||||
|
||||
46. **Documented capability surface.** New capabilities are documented with what they advertise, who consumes them, how the planner uses them.
|
||||
49. **Documented capability surface.** New capabilities are documented with what they advertise, who consumes them, how the planner uses them.
|
||||
|
||||
47. **Benchmark before optimization.** New optimizations land with a benchmark that motivates them; if the motivating workload doesn't exist, the feature waits.
|
||||
50. **Benchmark before optimization.** New optimizations land with a benchmark that motivates them; if the motivating workload doesn't exist, the feature waits.
|
||||
|
||||
## IX. Anti-patterns — deny-list
|
||||
|
||||
|
|
@ -203,21 +220,23 @@ If a proposal fits one of these, the burden is on the proposer to justify why th
|
|||
- **Hand-rolling something the substrate already does.** Check the spec first (§I.1).
|
||||
- **Mutating in place** state that should be immutable (Lance fragments, index segments). New segments instead.
|
||||
- **Silent failures.** OOM, timeout, partial result — all surfaced and bounded (§V.20).
|
||||
- **Strict-serial coordination expressed as locks held across queries.** Branches are the agent-native primitive for that (§VI.35).
|
||||
- **Auto-resolving merge conflicts when the per-type policy is silent or absent.** Pause and surface the conflict; never silently pick a side (§VI.37).
|
||||
|
||||
### Pattern violations (overridable with justification)
|
||||
|
||||
These protect the *current* architectural patterns (§VII). A future review may revise them.
|
||||
|
||||
- **Synchronous-inline index updates** for indexes expensive to build (vector ANN, FTS). Reconciler pattern instead (§VII.35).
|
||||
- **Job queue for state derivable from manifest.** Reconciler pattern instead (§VII.35).
|
||||
- **Per-feature lowering for shapes that share a structure** (interfaces, wildcards, alternation). Use one mechanism (§VII.36).
|
||||
- **Per-format import code paths** (one path for JSONL, another for Parquet, another for Flight). Use the `Source` IR operator (§VII.43).
|
||||
- **Eager materialization of cross-products** in multi-hop. Factorize (§VII.39).
|
||||
- **Ad-hoc `IN`-list filtering** when SIP fits (§VII.38).
|
||||
- **Synchronous-inline index updates** for indexes expensive to build (vector ANN, FTS). Reconciler pattern instead (§VII.38).
|
||||
- **Job queue for state derivable from manifest.** Reconciler pattern instead (§VII.38).
|
||||
- **Per-feature lowering for shapes that share a structure** (interfaces, wildcards, alternation). Use one mechanism (§VII.39).
|
||||
- **Per-format import code paths** (one path for JSONL, another for Parquet, another for Flight). Use the `Source` IR operator (§VII.46).
|
||||
- **Eager materialization of cross-products** in multi-hop. Factorize (§VII.42).
|
||||
- **Ad-hoc `IN`-list filtering** when SIP fits (§VII.41).
|
||||
- **String-flattened SQL filter generation** when structured pushdown is available.
|
||||
- **Discarding rank in retrieval.** Score and rank propagate as columns (§VII.41).
|
||||
- **Discarding rank in retrieval.** Score and rank propagate as columns (§VII.44).
|
||||
- **Auto-creating placeholder nodes for orphan edges** (silent invention of data). Reject by default; opt-in per write (§VI.24).
|
||||
- **Double-encoding data when both endpoints speak the same format** (e.g., Lance → Arrow → Lance when both are Lance). Use a fast-path (§VII.43).
|
||||
- **Double-encoding data when both endpoints speak the same format** (e.g., Lance → Arrow → Lance when both are Lance). Use a fast-path (§VII.46).
|
||||
- **Per-write durability fast paths** until MemWAL is stable AND a use case justifies the latency vs. risk tradeoff.
|
||||
|
||||
## X. Review checklist (use against any non-trivial change)
|
||||
|
|
@ -240,10 +259,12 @@ Print this when reviewing an RFC or PR. Each line is **yes / no / N/A**.
|
|||
- Determinism preserved (order-stable, plan-deterministic)? (§VI.28)
|
||||
- Idempotency: explicit `on_conflict`; idempotency keys honored if used? (§VI.29)
|
||||
- Bounded operations: explicit timeout / memory / concurrency limits? (§VI.31)
|
||||
- If touching imports / external data, does it go through `Source`? (§VII.43)
|
||||
- If proposing cross-query strict-serial coordination, is it expressed via branches rather than long-held locks? (§VI.35)
|
||||
- If touching merge resolution, are silent-pick paths explicitly absent? (§VI.37)
|
||||
- If touching imports / external data, does it go through `Source`? (§VII.46)
|
||||
- If implementing a graph / retrieval feature: reuses an existing pattern (reconciler, Union, mutation-wrap-read, SIP, factorize, Source) where applicable? (§VII)
|
||||
- Tests at every boundary, not just end-to-end? (§VIII.44)
|
||||
- Reference impl + test impl for any new trait? (§VIII.45)
|
||||
- Tests at every boundary, not just end-to-end? (§VIII.47)
|
||||
- Reference impl + test impl for any new trait? (§VIII.48)
|
||||
- None of the deny-list patterns apply? (§IX)
|
||||
|
||||
## XI. Living document policy
|
||||
|
|
@ -276,6 +297,7 @@ These invariants and patterns were extracted from the architectural decisions in
|
|||
- The polymorphic-bindings framing (**MR-737 §5.13** — one mechanism for eight cells)
|
||||
- The Source-operator framing (**MR-737 §5.12** — one mechanism for all imports)
|
||||
- The database-guarantees discussion (§VI): ACID dimensions, CAP-style consistency model, scale-system precedents (ClickHouse, Turbopuffer, LanceDB, Postgres). Each invariant in §VI corresponds to a specific named decision; see prior architecture discussions for the option space considered.
|
||||
- **MR-686** — Per-table writer queues and per-actor admission. Source for §VI.35–37 and the explicit non-commitments subsection (MR-686's queue is the seam that makes Serializable opt-in implementable, and the reason §VI.27 multi-coordinator stays aspirational).
|
||||
|
||||
General precedent: Lance + LanceDB Enterprise architecture; ClickHouse merge subsystem; Kubernetes controllers; Postgres autovacuum; the FDAL stack (Flight + DataFusion + Arrow + Lance).
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Testing
|
||||
|
||||
This file is the always-on map of the test surface. **Consult it before every task** so you know what tests already cover the area you're about to change, what helpers to reuse, and where a new test belongs. The architectural invariant *"tests at every boundary, not just end-to-end"* lives in [docs/invariants.md §VII.33](invariants.md).
|
||||
This file is the always-on map of the test surface. **Consult it before every task** so you know what tests already cover the area you're about to change, what helpers to reuse, and where a new test belongs. The architectural invariant *"tests at every boundary, not just end-to-end"* lives in [docs/invariants.md §VIII.47](invariants.md).
|
||||
|
||||
## Where tests live, per crate
|
||||
|
||||
|
|
@ -46,7 +46,7 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
|
|||
- **CLI** — `crates/omnigraph-cli/tests/support/mod.rs`: `Command`-style wrapper for invoking `omnigraph`, server-process spawning, fixture resolution, output assertion helpers.
|
||||
- **Server** — no shared helpers; server tests call the `Omnigraph` engine API directly and exercise endpoints over the wire.
|
||||
|
||||
> Note: there is **no `MemStorage` or in-memory backend** today. Tests use `tempfile::tempdir()` for local FS. If you find yourself needing one for layer isolation, that's an architectural ask — see [docs/invariants.md §VII.34](invariants.md) (reference impl + test impl per trait).
|
||||
> Note: there is **no `MemStorage` or in-memory backend** today. Tests use `tempfile::tempdir()` for local FS. If you find yourself needing one for layer isolation, that's an architectural ask — see [docs/invariants.md §VIII.48](invariants.md) (reference impl + test impl per trait).
|
||||
|
||||
## Failpoints (fault injection)
|
||||
|
||||
|
|
@ -72,7 +72,7 @@ Locally, set `OMNIGRAPH_S3_TEST_BUCKET` (and the usual `AWS_*` vars including `A
|
|||
## Examples & benches
|
||||
|
||||
- `crates/omnigraph/examples/bench_expand.rs` — runnable example (not part of CI).
|
||||
- No `benches/` directories. The architectural rule [docs/invariants.md §VII.36](invariants.md) requires benchmark motivation before optimization, so add `benches/` per crate when you ship a perf-driven change.
|
||||
- No `benches/` directories. The architectural rule [docs/invariants.md §VIII.50](invariants.md) requires benchmark motivation before optimization, so add `benches/` per crate when you ship a perf-driven change.
|
||||
|
||||
## Coverage tooling — what's missing
|
||||
|
||||
|
|
@ -104,9 +104,9 @@ When you pick up any change, walk through this:
|
|||
2. **Run those tests locally before editing.** `cargo test --workspace --locked` for the broad pass; `-p <crate> --test <file>` for a focused loop. Confirm a clean baseline.
|
||||
3. **Decide extend-vs-new** explicitly. If you can extend an existing test (assertion, fixture row, parameterization), do that. Only add a new test fn or new file if no existing one owns the area.
|
||||
4. **Reuse the helpers.** `init_and_load()`, fixture files, the CLI `support` harness — re-use them. Don't bootstrap a fresh repo by hand if a helper exists.
|
||||
5. **Mind the boundary.** Per [docs/invariants.md §VII.33](invariants.md), test at the layer the change lives at — planner-level changes deserve planner-level tests, not just end-to-end.
|
||||
5. **Mind the boundary.** Per [docs/invariants.md §VIII.47](invariants.md), test at the layer the change lives at — planner-level changes deserve planner-level tests, not just end-to-end.
|
||||
6. **For substrate-touching changes** (Lance behavior), reach for `failpoints` or fixture-driven scenarios, not stubbed-out mocks.
|
||||
7. **For server / API changes**, confirm the OpenAPI regeneration happens in `openapi.rs` and that the diff lands in `openapi.json`.
|
||||
8. **Verify your change makes an existing test fail before it makes the new one pass.** If you can break the code without breaking a test, your coverage gap is the problem to fix first.
|
||||
|
||||
When in doubt, re-read [docs/invariants.md §VII](invariants.md) — quality gates apply to every change.
|
||||
When in doubt, re-read [docs/invariants.md §VIII](invariants.md) — quality gates apply to every change.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue