Merge branch 'main' into ragnorc/omnigraph-mcp-crate

Bring the MCP feature branch up to date with main (14 commits). One
conflict — compiler/parser.rs: main's `NanoError` → `CompilerError` rename
vs this branch's `@mcp` / per-param `@description` parser additions; resolved
by keeping the new parsing under the renamed error type. The CLI `queries list`
change (#280, surfacing `@description`/`@instruction`) auto-merged with this
branch's `mcp_expose`/`tool_name` columns.
This commit is contained in:
Ragnor Comerford 2026-06-19 21:59:14 +02:00
commit fbf455a250
No known key found for this signature in database
110 changed files with 6396 additions and 2511 deletions

View file

@ -8,10 +8,9 @@ This page explains what the policy says and how to change it.
| Setting | Value | Why |
|---|---|---|
| **Required status checks (strict)** | `Classify Changes`, `Check AGENTS.md Links`, `Test omnigraph-server --features aws`, `CODEOWNERS matches source`, `CODEOWNERS not hand-edited` | Every PR must pass the AWS-feature build/test, AGENTS.md link integrity, and the CODEOWNERS hygiene checks. **`Test Workspace` is deliberately NOT required** — it runs only on push to `main` (post-merge), tags, and manual `workflow_dispatch`, to keep PR turnaround fast (it was the ~15min+ slow gate). It is therefore *not* listed here: a required check that never reports on PRs (the `test` job is `if: github.event_name != 'pull_request'`) would leave every PR permanently pending — the same job-never-reports trap the CODEOWNERS contexts call out below. The trade-off (a regression lands on `main` and is caught by the post-merge run, so `main` can briefly go red) and its mitigations are documented in [ci.md](ci.md). The two CODEOWNERS contexts must equal the job `name:` values in `.github/workflows/codeowners.yml` **verbatim** — a context naming a job that never reports (the old `CODEOWNERS / drift` used the job *id*, and the job was path-filtered) leaves every PR permanently pending and forces admin overrides. `strict: true` requires the branch to be up-to-date with `main` before merge. |
| **Required approving reviews** | `1` | At least one reviewer. With a 2-person team, going higher would block all merges when one person is unavailable. |
| **Require code-owner reviews** | `true` | The reviewer must be a code owner per `.github/CODEOWNERS`. This is what makes the codeowners chassis enforced. |
| **Dismiss stale reviews on new commits** | `true` | A push after approval invalidates the prior review. Prevents the "approve, then sneak in unreviewed changes" pattern. |
| **Required status checks (strict)** | `Classify Changes`, `Check AGENTS.md Links`, `Test omnigraph-server --features aws` | Every PR must pass the AWS-feature build/test and AGENTS.md link integrity. **`Test Workspace` is deliberately NOT required** — it runs only on push to `main` (post-merge), tags, and manual `workflow_dispatch`, to keep PR turnaround fast (it was the ~15min+ slow gate). It is therefore *not* listed here: a required check that never reports on PRs (the `test` job is `if: github.event_name != 'pull_request'`) would leave every PR permanently pending — the job-never-reports trap. The trade-off (a regression lands on `main` and is caught by the post-merge run, so `main` can briefly go red) and its mitigations are documented in [ci.md](ci.md). Each required context must equal a job `name:` that actually reports on PRs **verbatim** — a context naming a job that never reports leaves every PR permanently pending and forces admin overrides. `strict: true` requires the branch to be up-to-date with `main` before merge. |
| **Required approving reviews** | `0` | No human-review gate. With a 2-person team where both maintainers own everything, requiring an approval meant every PR needed the *other* person (or an admin/bypass override) — friction with no real review value. CI checks are the gate; maintainers merge their own PRs once checks pass. Raise this to `1` if an outside-contributor flow ever needs a review gate. |
| **Require code-owner reviews** | `false` | CODEOWNERS was removed entirely (see the git history of `.github/`); there is no code-owner review requirement. |
| **Require linear history** | `true` | No merge commits — squash or rebase only. Matches recent practice. |
| **Disallow force pushes** | `true` | No history rewrites on `main`. |
| **Disallow branch deletions** | `true` | `main` cannot be deleted. |
@ -57,7 +56,7 @@ Outputs the live policy. Compare against `.github/branch-protection.json` to det
- **Audit trail**: `git log .github/branch-protection.json` shows every change with a reviewable diff and a merge commit.
- **Disaster recovery**: if branch protection is accidentally removed or weakened via the UI, the JSON is the canonical recovery point.
- **Consistency**: pairs with `.github/codeowners-roles.yml` (the CODEOWNERS source of truth). Repository policy lives in the repository.
- **Consistency**: repository policy lives in the repository, reviewed like code.
## What this gates
@ -65,11 +64,11 @@ After branch protection is applied, every PR targeting `main` must:
1. Pass all listed status checks.
2. Be up-to-date with `main` (rebase or merge-from-main).
3. Have at least one approving review from a code owner for the touched paths.
4. Have all review conversations resolved.
5. Be squash- or rebase-merged (no merge commits).
3. Have all review conversations resolved.
4. Be squash- or rebase-merged (no merge commits).
Even repository admins are subject to these rules.
No human approval is required (`required_approving_review_count: 0`). Repository
admins can override the gates (`enforce_admins: false`).
## Subsequent hardening (not in this PR)
@ -77,7 +76,7 @@ The branch-protection policy is the foundation. Future hardening adds:
- **Required signed commits** (`required_signatures: true`) — once maintainers enroll GPG/SSH signing.
- **Tag protection** for `v*` tags via `repos/.../tags/protection`.
- **Required reviewers from specific teams** for high-leverage paths (e.g., `docs/dev/invariants.md`) via CODEOWNERS tier expansion + the N-unique-approvers CI workaround.
- **Required reviewers from specific teams** for high-leverage paths (e.g., `docs/dev/invariants.md`) via a GitHub ruleset's path-scoped required-review rule, if a review gate is ever reintroduced.
- **More required CI checks**: `cargo deny`, `cargo audit`, `cargo fmt --check`, `cargo clippy -D warnings`, CodeQL, secret scanning, schema-lint (MR-946).
See the hardening playbook for the full plan.

217
docs/dev/bug-case-fix.md Normal file
View file

@ -0,0 +1,217 @@
# Bug case study: camelCase property filters lowercased at runtime
**Issue:** [#283](https://github.com/ModernRelay/omnigraph/issues/283) (mirrored
in the dev-graph as `iss-990`)
**Reported on:** 0.7.0 (release binary)
**Status of code:** present on `v0.7.0`; fixed on branch `fix/iss-283-camelcase-filter` (read pushdown + pending mutation scan)
**Severity:** correctness — a valid, lint-clean query fails at run time.
## Symptom
A read query that filters on a **camelCase** schema field lints and plans
cleanly but fails when it executes:
```text
No field named reponame. Column names are case sensitive.
```
Minimal repro:
```pg
node SourceDocument {
repoName: String @index
}
```
```gq
query find($repoName: String) {
match { $d: SourceDocument { repoName: $repoName } }
return { $d.repoName }
}
```
`omnigraph lint` passes; running the query errors. The operator workaround is to
rename the field to all-lowercase (`repo`), which is why this looked like a
schema-design quirk rather than an engine bug.
## Root cause
The filter-pushdown path builds the Lance scan predicate's column reference with
`datafusion::prelude::col(property)`:
- **Site:** `crates/omnigraph/src/exec/query.rs``ir_expr_to_expr`:
```rust
IRExpr::PropAccess { property, .. } => Some(col(property)),
```
- `col(&str)` runs DataFusion's SQL **identifier normalization**
(`Column::from_qualified_name``parse_identifiers_normalized(.., false)`),
which **lowercases unquoted identifiers**. So `col("repoName")` resolves to a
column named `reponame`.
- Lance stores columns **case-preserved** (`repoName`) and resolves them
case-sensitively, so the scan can't find `reponame` and errors.
The IR is not at fault: the parser and lowering preserve the original case
(`property: pm.prop_name.clone()`), which is exactly why the compiler resolves
`repoName` and **lint passes**. The case is destroyed only at the
engine → Lance boundary.
There is a **second** boundary with the same root cause but a *different*
parser: the pending-batch scan in `table_store.rs::scan_pending_batches` splices
the mutation predicate string into a DataFusion `SELECT … WHERE {filter}` over a
`MemTable`, and DataFusion's SQL parser lowercases the unquoted column the same
way (`repoName``reponame`). See **Part 2** of the fix — it surfaces only on a
*chained* mutation that re-reads the pending side, which is why a single
update/delete on a camelCase predicate looked fine.
### Why the rest of the engine is unaffected
The two pushdown sites above were the offenders; the remaining paths already
treat column names case-sensitively and handle camelCase correctly:
- **Projection / return** uses the real Arrow field name (`f.name()`).
- **In-memory filtering** (the fallback for non-pushable predicates) looks the
column up by the preserved property name against the batch schema.
- **The committed Lance mutation scan** (`Scanner::filter(&str)`) preserves an
unquoted identifier's case, so committed-row matching on a camelCase predicate
already worked.
So the read bug surfaces for predicates that *are* pushed down (e.g. an equality
on a scalar camelCase column), and the mutation bug only for the pending-side
re-scan of a chained mutation.
### Why it slipped through
The `ir_filter_to_expr` unit tests only use the all-lowercase field `count`, so
no test exercised a camelCase property. Nothing in CI compared the emitted
column name against the schema's casing.
## Fix
There are **two** engine→Lance boundaries that lose case, and they need
**different** fixes because the two consumers disagree on quoting semantics.
### Part 1 — read pushdown (`exec/query.rs`, `ir_expr_to_expr`)
Use DataFusion's case-preserving column constructor, `ident()`, instead of
`col()`:
```rust
IRExpr::PropAccess { property, .. } => Some(datafusion::prelude::ident(property)),
```
`ident()` builds `Expr::Column(Column::new_unqualified(property))` with no SQL
parse and no normalization, so the case is preserved. Property references here
are always bare column names (the variable is dropped via `..`), so there is no
qualified-name (`a.b`) handling to lose.
This is the right layer and the right shape:
- It is a **no-op for the lowercase columns that work today** (`slug`, `id`,
`status`, …) — lowercasing those was already a no-op — so there is no
regression risk for the common case.
- It makes pushdown **consistent** with projection and in-memory filtering,
which already use case-preserved names.
- It also restores **index use** for camelCase columns: today such a filter
errors before the BTREE is even considered.
### Part 2 — pending mutation scan (`table_store.rs`, `scan_pending_batches`)
`update`/`delete` predicates lower through `predicate_to_sql(..)` into a single
**SQL string** (`format!("{} {} {}", column, op, value_sql)`). That one string
is consumed by **two** different parsers, and *they disagree on what quoting
means*:
- The **committed** side passes the string to Lance's `Scanner::filter(&str)`.
Lance **preserves an unquoted identifier's case** (so unquoted camelCase
*already works* on the committed scan) but treats a double-quoted `"col"` as a
**string literal**`"repoName" = 'acme'` parses as `'repoName' = 'acme'`,
a constant-false predicate that silently matches **zero** committed rows.
- The **pending** side splices the same string into a DataFusion
`SELECT … FROM pending WHERE {filter}` over a `MemTable`. DataFusion's SQL
parser **lowercases** an unquoted identifier (`repoName``reponame`) and
fails to resolve against the case-sensitive `MemTable` schema.
So no single quoting choice for the column satisfies both: quoting fixes the
pending side but breaks the committed side, and vice versa. The fix keeps the
predicate **unquoted** (what the committed Lance scan needs) and makes the
*pending* context case-preserving instead, by disabling SQL identifier
normalization on its `SessionContext`:
```rust
let mut config = SessionConfig::new();
config.options_mut().sql_parser.enable_ident_normalization = false;
let ctx = SessionContext::new_with_config(config);
```
`predicate_to_sql` itself never lowercased anything (it copies the preserved
property name), so its emitted string is unchanged — it gains only a comment
recording the unquoted contract. The projection list in the same function is
already double-quoted and is unaffected (quoted identifiers are case-preserved
under either normalization setting).
Rejected alternatives: banning/normalizing camelCase at the compiler (a real
usability regression — camelCase fields are legitimate), lowercasing column
names in storage (a breaking on-disk change), merely making lint *warn* (a
band-aid that leaves the runtime broken), or **quoting the column in
`predicate_to_sql`** (empirically breaks 7 existing lowercase-column mutation
tests because Lance reads `"col"` as a string literal — see Part 2).
## Scope and caveats
- **Not Windows-specific.** The original report's environment was Windows, but
the cause is platform-independent.
- **The mutation path was only *partially* broken, and not where first
assumed.** The committed side of `scan_with_pending(..)` (Lance
`Scanner::filter(&str)`) and `delete`'s `delete_where(..)` / `Dataset::delete`
preserve an unquoted identifier's case, so a *single* `update`/`delete` on a
camelCase predicate already worked. Only the **pending** side — the in-memory
`MemTable` re-scan that a *chained* mutation hits — lowercased the column.
This was confirmed empirically: a single update+delete on `repoName` passes
unfixed; a chained update that re-reads the pending side fails with
`No field named reponame`. The fix is Part 2 above (disable identifier
normalization on the pending `SessionContext`), **not** quoting the column.
The eventual MR-A migration (`delete_where` → Lance 7
`DeleteBuilder::execute_uncommitted`, structured `Expr`) is the longer-term
shape but is out of scope here.
- **Check the coercion lookup.** Adjacent to the fix, the literal-coercion step
(`prop_data_type(.., schema)`, which keeps the BTREE usable) also resolves the
column by name. Confirm it uses the preserved name; if it mishandles case a
camelCase filter would resolve but lose its index — a silent perf regression,
not a crash.
- **Do not use `col(r#""repoName""#)` as the general read-path fix.** Quoting
would preserve this one name, but it routes through SQL identifier parsing and
changes qualified-name semantics. The IR property here is already a bare
column name, so `ident(property)` / `Column::new_unqualified(property)` is the
precise structured expression.
- **Do not "fix" the mutation string by quoting the column.** It is tempting to
reuse a `quote_ident` helper symmetric with `literal_to_sql`'s value escaping,
but the column quote-rules differ between the two consumers of the predicate
string: Lance's `Scanner::filter(&str)` reads `"col"` as a *string literal*
(silently matching nothing), while DataFusion's `ctx.sql` reads it as a
case-preserved identifier. Because the committed Lance scan already preserves
the *unquoted* identifier's case, the column must stay unquoted and the
pending DataFusion context must be told not to normalize — not the reverse.
## Validation (test-first)
1. **Red:** add an `ir_filter_to_expr` test asserting the emitted
`Expr::Column` name for a camelCase property is `repoName`, not `reponame`.
Fails on current code.
2. **Green:** apply the `col``ident` change (Part 1) and the pending-context
`enable_ident_normalization = false` change (Part 2).
3. **End-to-end:** a camelCase `@index` field with
`match { T { camelField: $x } }` returns the row (the unit test alone can't
catch an engine↔Lance boundary regression).
4. **Mutation parity:** with the same camelCase field, cover:
- `update T where camelField == $x set otherField = ...` updates the intended
row.
- `delete T where camelField == $x` deletes the intended row and cascades as
expected.
- A chained update that hits the pending side of `scan_with_pending` still
works, so both the committed Lance scan and pending DataFusion `MemTable`
predicate paths are case-preserving.
5. **Index preservation:** keep or add a plan/trace assertion that the
camelCase `@index` equality predicate still reaches the scalar-index path.
A result-only test can pass while silently falling back to a full scan.
6. Run the full engine suite (`cargo test -p omnigraph-engine`) — in particular
the existing BTREE index-eligibility tests, which `ident()` must not disturb.

View file

@ -3,7 +3,7 @@
`.github/workflows/`:
- **ci.yml**: text-only changes skip; otherwise `cargo test --workspace --locked` on ubuntu-latest with protobuf compiler. OpenAPI-drift check that auto-commits the regenerated `openapi.json` for same-repository PRs. Also runs the AGENTS.md cross-link integrity check (`scripts/check-agents-md.sh`).
- **`Test Workspace` does not run on pull requests.** The job is gated `if: github.event_name != 'pull_request'`, so the full workspace + failpoints suite runs only on push to `main` (post-merge), on `v*` tags, and on manual `workflow_dispatch`. This was a deliberate PR-latency trade-off — it was the slowest gate (~15min warm, up to the 75min cold ceiling). `RustFS S3 Integration` `needs: test`, so it is push-/dispatch-only for the same reason. The fast PR gates remain: `Classify Changes`, `Check AGENTS.md Links`, `Test omnigraph-server --features aws`, and the two CODEOWNERS checks. `Test Workspace` is correspondingly **not** in the required-check list (`.github/branch-protection.json`); see [branch-protection.md](branch-protection.md).
- **`Test Workspace` does not run on pull requests.** The job is gated `if: github.event_name != 'pull_request'`, so the full workspace + failpoints suite runs only on push to `main` (post-merge), on `v*` tags, and on manual `workflow_dispatch`. This was a deliberate PR-latency trade-off — it was the slowest gate (~15min warm, up to the 75min cold ceiling). `RustFS S3 Integration` `needs: test`, so it is push-/dispatch-only for the same reason. The fast PR gates remain: `Classify Changes`, `Check AGENTS.md Links`, and `Test omnigraph-server --features aws`. `Test Workspace` is correspondingly **not** in the required-check list (`.github/branch-protection.json`); see [branch-protection.md](branch-protection.md).
- **Consequences to internalize:** (1) a regression that the suite would catch now lands on `main` and turns the post-merge run red, rather than being blocked pre-merge — `main` can briefly break, so run `cargo test --workspace --locked` locally before merging anything non-trivial, or trigger this workflow on your branch via the Actions "Run workflow" button. (2) `openapi.json` is no longer auto-regenerated on PRs (that step is inside the `test` job); for server/API changes, regenerate it locally with `OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test openapi` and commit it, or the strict drift check fails the post-merge `main` run.
- **Applying this policy:** removing `Test Workspace` from the JSON is inert until an admin runs `./scripts/apply-branch-protection.sh`. **Run it immediately after this change merges** — until then GitHub still requires a `Test Workspace` context that no longer reports on PRs, which leaves every open PR permanently pending (the job-never-reports trap).
- **AWS feature build job**: `cargo build/test -p omnigraph-server --features aws` on ubuntu-latest.

View file

@ -1,58 +0,0 @@
# Code ownership
`.github/CODEOWNERS` is **generated** — not hand-edited. The source of truth is `.github/codeowners-roles.yml`, expanded by `.github/scripts/render-codeowners.py`. CI rejects drift between the two and rejects direct edits to `CODEOWNERS` that don't accompany a yml change.
This setup gives every role change a reviewable PR and a permanent in-repository audit trail (`git log .github/codeowners-roles.yml`).
## Who owns what
The tables below are **generated** from `.github/codeowners-roles.yml` by `.github/scripts/render-codeowners.py` (the same render that produces `.github/CODEOWNERS`). They are the always-current "who owns what at this commit" view — don't edit them by hand; edit the yml and re-render.
<!-- BEGIN GENERATED OWNERSHIP — edit codeowners-roles.yml + run render-codeowners.py -->
**Path → owners** (GitHub applies *last match wins*; the `*` catch-all is listed first and is overridden by the specific patterns below it):
| Path | Owners | Role(s) |
|---|---|---|
| `*` | @aaltshuler @ragnorc | engineering |
| `crates/**` | @aaltshuler @ragnorc | engineering |
| `docs/**` | @aaltshuler @ragnorc | docs |
| `README.md` | @aaltshuler @ragnorc | docs |
| `AGENTS.md` | @aaltshuler @ragnorc | docs |
| `CLAUDE.md` | @aaltshuler @ragnorc | docs |
| `SECURITY.md` | @aaltshuler @ragnorc | docs |
**Roles**:
| Role | Members | Description |
|---|---|---|
| `engineering` | @aaltshuler @ragnorc | All production code under crates/**. Engine, CLI, server, compiler. |
| `docs` | @aaltshuler @ragnorc | Documentation under docs/**, plus repo-level docs (README.md, AGENTS.md, CLAUDE.md symlink, SECURITY.md). |
<!-- END GENERATED OWNERSHIP -->
GitHub treats multiple owners on a CODEOWNERS line as **"any one of them satisfies the review requirement"**. To require N distinct approvers on a specific path, layer a CI check on top (not currently configured).
## How to change role membership or path mappings
1. Edit `.github/codeowners-roles.yml`.
2. Open a PR. **CI re-renders for you**: the `CODEOWNERS` workflow regenerates `.github/CODEOWNERS` and the ownership tables above and auto-commits them back to your PR branch on same-repository PRs — you don't have to run the script locally (though you can: `python3 .github/scripts/render-codeowners.py`, requires PyYAML).
On a fork (where CI can't push back), the workflow instead fails with the diff so you can run the script and commit it yourself.
CI fails the PR if:
- a fork PR left a generated artifact out of sync, or
- `CODEOWNERS` was edited without a corresponding yml change (the `CODEOWNERS not hand-edited` check).
## How to add a new role
1. Add a new entry to `roles:` in the yml with a `description` and `members` list.
2. Reference the role from `paths:` (or `default:`).
3. Regenerate + commit as above.
## Why a generator, not direct CODEOWNERS edits?
- **Audit trail**: `git log .github/codeowners-roles.yml` is the canonical record of every role change. The rendered `CODEOWNERS` is a derived artifact.
- **Roles are first-class**: paths reference roles, not raw handles. Renaming a person or rotating a role updates one place, not every path.
- **Future extension**: scheduled rotation (weekly on-call, quarterly leads) plugs into the same yml without changing the path mappings. Not enabled today.
- **Consistency with the product**: omnigraph itself enforces auditable Cedar policy. The repository's code-owner policy follows the same "policy as reviewed code" pattern.

View file

@ -28,7 +28,6 @@ constraints. User-facing behavior should still be documented through
| Three-way merge implementation and conflicts | [merge.md](merge.md) |
| Diff/change-feed implementation | [changes.md](../user/branching/changes.md) |
| Branch protection policy | [branch-protection.md](branch-protection.md) |
| CODEOWNERS source of truth | [codeowners.md](codeowners.md) |
## Language, Runtime, And Boundaries
@ -63,6 +62,16 @@ The `docs/rfcs/` track is the **public, externally-authorable** RFC process. The
maintainer/internal RFCs below (`rfc-00N-*.md`) are a separate, team-owned
track; don't conflate the two.
## Case Studies
Worked write-ups of specific bugs — root cause, fix, and the reasoning that
ruled out the tempting-but-wrong alternatives. Read these for the debugging
pattern, not just the outcome.
| Area | Read |
|---|---|
| camelCase property filters lowercased at runtime (#283) — two engine→Lance boundaries, two different fixes | [bug-case-fix.md](bug-case-fix.md) |
## Active Implementation Plans
Working documents for in-flight feature work. Removed when the work lands.

View file

@ -53,7 +53,13 @@ converge the physical state.
versioning, fragments, branches, compaction, cleanup, and index primitives.
DataFusion should own relational execution where it fits. Do not add custom
WALs, transaction managers, buffer pools, page formats, or local clones of
substrate behavior. Read [lance.md](lance.md) before guessing.
substrate behavior. Read [lance.md](lance.md) before guessing. Respecting the
substrate also means *using* it idiomatically, not only refraining from
rebuilding it: reuse long-lived handles instead of re-opening per call,
resolve latest state through the substrate's cheap primitive instead of
re-scanning, and share its caches/session. Re-deriving per call what the
substrate keeps warm is a substrate violation even when no code is
reimplemented.
2. **Graph visibility is manifest-atomic.** Lance commits are per dataset.
OmniGraph's graph-level atomicity comes from publishing one manifest update
@ -126,6 +132,18 @@ converge the physical state.
a substitute for missing lower-level assertions. Read [testing.md](testing.md)
before adding tests.
15. **One source of truth, cheaply derived.** Lance and the manifest are the
source of truth. Everything the engine needs at runtime is a derived view of
them: read or projected on demand, held warm, refreshed by a cheap probe. Two
failure modes are forbidden. A *parallel copy* the engine maintains can drift
from the source, and that divergence compounds over time. *Cold
re-derivation* rebuilds the view from the full source on every call, so its
cost grows with history. Invariants 1 and 7, and the deny-list "state that
drifts" and "manifest-derivable reconciler" items, are instances; so is
bounding a read's cost to its working set rather than the commit count. This
is the structural face of "engineering is programming integrated over time":
both failure modes are liabilities that compound as the system grows.
## Current Truth Matrix
| Area | Current state | Source |
@ -252,6 +270,37 @@ them explicit.
- **Resource bounds:** some operations still lack enforced per-query memory or
time budgets. New long-running work should add explicit bounds rather than
widening the gap.
- **Read-path re-derivation (largely closed by the query-latency work):**
snapshot resolution used to re-open a fresh coordinator per read (a full
`__manifest` re-scan plus two commit-graph scans), open each table through the
namespace (two more `__manifest` scans per table), validate the schema twice,
and share no Lance `Session`. That was an O(commits) cost that never warmed up.
Fix 1 (warm coordinator reuse behind a `latest_version_id` probe), Fix 2 (open
tables by location+version), finding A (validate once), and Fix 3 (a held
`Dataset`-handle cache keyed by `(table, branch, version, e_tag when Lance
exposes it)` plus one shared `Session` per graph) remove that tax: a warm
same-branch read does one probe, one schema read, and zero opens on a repeat.
Non-main branch freshness compares the manifest incarnation (`version` plus
manifest-location e_tag when available, otherwise Lance manifest timestamp),
because Lance branch names can be deleted/recreated at the same version number;
the manifest e_tag is carried into synthetic snapshot ids when available, and
a detected same-branch manifest refresh clears read caches as the fallback for
e_tag-less table locations/topology. Remaining: the internal metadata tables
(`__manifest`, `_graph_commits`) are still not compacted, so the probe and
refresh cost still grows with fragment count on a long-lived graph (the
`optimize`-covers-internal-tables follow-up); the commit graph is not yet
reconcilable from the manifest; and the traversal id-map is still rebuilt.
- **Commit-graph parent under concurrency:** `record_graph_commit` now refreshes
the commit-graph head from storage before appending, so a same-branch write
after an external commit no longer forks the commit DAG by parenting off a
stale cached head (the single-process fork, pre-existing for non-strict
inserts and widened to strict ops by Fix 1's `refresh_manifest_only`, is now
closed). Residual: two processes writing disjoint tables can still pass their
per-table manifest CAS and append off the same parent (a refresh-then-append
TOCTOU). The convergent fix is reconcile-from-manifest (parent = the commit at
the manifest version the publisher CAS'd against; `manifest_version` is on
every commit row), composing with the manifest-to-commit-graph atomicity gap;
it needs commit-graph append ordering or a Lance append-CAS to fully close.
## Deny-list
@ -277,6 +326,10 @@ case is exceptional.
- Cost-blind plan choice when statistics are available or required.
- Hidden statistics for behavior that affects planning or operator choice.
- Hash-map iteration order in result ordering, plan choice, or migration output.
- Cold re-derivation on the hot path: rebuilding from the full source what could
be held warm and refreshed cheaply, so cost scales with history rather than the
working set (the cost face of invariant 15; "state that drifts" above is its
shadow-copy face).
- String-flattened SQL/filter generation when a structured pushdown API is
available.
- Eager multi-hop cross-product materialization when factorization fits.
@ -313,6 +366,8 @@ Use this as yes/no/NA for any non-trivial design or PR:
- Are stats/capabilities exposed when behavior depends on them?
- Are existing known gaps left no worse and documented if touched?
- Does the test live at the same boundary as the change?
- Is this operation's cost bounded with respect to history and scale, or does it
re-derive warm state from cold storage per call?
- Does the change avoid every deny-list pattern, or justify the exception?
## Maintenance Policy

View file

@ -1,7 +1,7 @@
# RFC: Server Boots from Cluster State — Phase 5 of the Cluster Control Plane
**Status:** Landed (5A policy bindings #175; 5B/5C the `--cluster` boot mode — one PR)
**Implementation deviations:** (1) cluster mode reuses `ServerConfigMode::Multi` (a new settings *source*, not a new enum variant; `config_path` carries the cluster dir). (2) Stored queries load via `QueryRegistry::from_specs` from verified blob *content*, not blob paths. (3) More than one policy bundle binding a single scope is a boot error (the serving pipeline holds one bundle per graph + one server-level; stacking is a later slice). (4) `GET /graphs` keeps its closed-by-default contract — without a cluster-bound bundle there is no server-level Cedar engine, so enumeration refuses.
**Implementation deviations:** (1) cluster mode reuses `ServerConfigMode::Multi` (a new settings *source*, not a new enum variant; `config_path` carries the cluster dir). (2) Stored queries load via `QueryRegistry::from_specs` from verified blob *content*, not blob paths. (3) More than one policy bundle binding a single scope is a boot error (the serving pipeline holds one bundle per graph + one server-level; stacking is a later slice). (4) `GET /graphs` keeps its closed-by-default contract — without a cluster-bound bundle there is no server-level Cedar engine, so enumeration refuses. (5) Graph-attributed startup failures quarantine that graph by default; operators can restore all-or-nothing boot with `--require-all-graphs` / `OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`.
**Date:** 2026-06-10
**Builds on:** Phase 4 complete ([rfc-004-cluster-graph-schema-apply.md](rfc-004-cluster-graph-schema-apply.md), Landed): `cluster apply` converges graphs, schemas, stored queries, and policies into the cluster catalog. Normative context: [cluster-config-specs.md](cluster-config-specs.md) (the migration model's "window 2"), [cluster-axioms.md](cluster-axioms.md) (axiom 15), [cluster-config-implementation-spec.md](cluster-config-implementation-spec.md) (Phase 5 rollout, Compatibility Stance #7#9, exit criterion 7).
**Target release:** unversioned (phased — see Sequencing).
@ -46,8 +46,8 @@ Mode inference gains rule 0: `--cluster <dir>` → **Cluster mode**, which is al
`load_server_settings` grows a cluster branch that reads, in order:
1. `__cluster/state.json`**missing state is a boot error** ("run `cluster import` + `cluster apply` first"). Pending recovery sidecars under `__cluster/recoveries/` are also a boot error (`cluster_recovery_pending`): a server must not start serving a ledger that a sweep is about to rewrite.
2. **Graph set** = state's `graph.<id>` resources (tombstoned graphs are absent by construction). Each graph's URI is the derived root `<dir>/graphs/<id>.omni`. A recorded graph whose root does not open is a boot error — same fail-fast posture as today's bad URI.
1. `__cluster/state.json`**missing state is a boot error** ("run `cluster import` + `cluster apply` first"). Invalid or unattributable recovery sidecars under `__cluster/recoveries/` are also a boot error: a server must not start if it cannot prove the blast radius. Valid graph-attributed sidecars quarantine that graph by default and are logged as `cluster_recovery_pending`; `--require-all-graphs` promotes them back to a boot error.
2. **Graph set** = state's `graph.<id>` resources (tombstoned graphs are absent by construction). Each graph's URI is the derived root `<dir>/graphs/<id>.omni`. A recorded graph whose root does not open quarantines that graph by default; `--require-all-graphs` restores the original fail-fast posture.
3. **Stored queries** = state's `query.<graph>.<name>` entries, content loaded from the catalog blob at the recorded digest. Blob-missing or digest-mismatched is a boot error (the catalog verification semantics from Stage 3B, applied at boot). Queries type-check at engine open exactly as today (`validate_and_attach` — unchanged).
4. **Policies** = state's `policy.<name>` entries, content from catalog blobs, bindings from the applied metadata of D3: bundles bound to `cluster` load as the server-level Cedar engine (`PolicyEngine::load_server`); bundles bound to graphs load per-graph (`PolicyEngine::load_graph`) and install via `with_policy` — the existing two-gate structure, unchanged.
5. `cluster.yaml` is parsed **only** to validate that the directory is a cluster root (and for nothing else — explicitly not for resource content; a divergence between desired config and applied state is *served as applied*, visible via `cluster plan`).
@ -76,16 +76,19 @@ State's `StateResource` records only a digest. To make the ledger serving-suffic
### D4. Readiness and failure posture
Boot is fail-fast, matching the server's existing stance (bad policy YAML refuses boot):
Cluster-global failures are fail-fast, matching the server's existing stance (bad policy YAML refuses boot). Graph-local failures quarantine the affected graph by default so a single bad graph cannot crash-loop an otherwise healthy cluster. Operators who prefer the original all-or-nothing contract pass `--require-all-graphs` or set `OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`, which promotes every graph-local quarantine/open/settings failure to a boot error.
| Condition | Behavior |
|---|---|
| `state.json` missing / unparseable / unsupported version | boot error |
| pending recovery sidecars | boot error (run any state-mutating cluster command to sweep) |
| recorded graph root missing or unopenable | boot error |
| invalid/unreadable/unattributable recovery sidecars | boot error (run any state-mutating cluster command to sweep or inspect) |
| valid graph-attributed recovery sidecars | quarantine that graph; strict mode boot error |
| recorded graph root missing or unopenable | quarantine that graph; strict mode boot error |
| query/policy blob missing or digest-mismatched | boot error (run `cluster refresh` + `apply` to self-heal, then restart) |
| policy entry without `applies_to` metadata | boot error ("re-run cluster apply", D3) |
| stored query fails type-check against the live schema | boot error (existing `validate_and_attach` behavior) |
| stored query fails parse/type-check against the live schema | quarantine that graph; strict mode boot error |
| embedding provider configuration for one graph cannot resolve | quarantine that graph; strict mode boot error |
| every applied graph is quarantined or fails startup | boot error (`cluster_no_healthy_graphs`) |
| state lock held | **not** an error — boot takes no lock; it reads a point-in-time snapshot of an immutable-once-written state file (the CAS discipline means a concurrent apply produces a *new* file atomically; the server reads whichever was current at open) |
### D5. MCP presentation (`@mcp(expose, tool_name)`) in cluster mode
@ -124,7 +127,7 @@ Rollback is the same switch in reverse — nothing in cluster mode mutates `omni
- *Axiom 5*: the server serves deployed reality (applied digests), never desired intent; D3 keeps the ledger the single serving source.
- *Axiom 12*: boot reads without the lock but relies on the atomic-replace write discipline; it never writes state.
- *Axiom 14 / Stance #9*: the expose-all bridge is named, scoped to cluster mode, and carries its Phase 6 sunset.
- *Loud failures (deny-list)*: every degraded condition is a typed boot error with a remedy; no partial serving, no silent fallback to the yaml.
- *Loud failures (deny-list)*: every degraded condition is either a typed cluster-global boot error with a remedy or an explicit graph quarantine logged at startup; no silent fallback to the yaml. `--require-all-graphs` is the opt-in all-or-nothing mode for operators who treat any degraded graph as fatal.
- *Respect the boundaries*: `omnigraph-cluster` stays free of HTTP; the server reads the catalog through a small read-only loader (either a `pub` read surface on `omnigraph-cluster` or a thin module in the server consuming the documented file formats — implementation picks the one that keeps `omnigraph-cluster` dependency-light; the state/blob formats are already a documented contract).
## Sequencing
@ -132,7 +135,7 @@ Rollback is the same switch in reverse — nothing in cluster mode mutates `omni
| Slice | Scope | Gate |
|---|---|---|
| **5A: serving metadata in state** | `applies_to` recorded on policy resources at apply + sweep roll-forward; additive state schema; `status`/plan surfacing | In-crate tests: metadata written/rolled-forward; old state parses; re-apply backfills |
| **5B: `--cluster` boot mode** | Flag + mode inference rule 0; catalog loader (state → `GraphStartupConfig`s + registries + policy engines); readiness table; OpenAPI regen if surface shifts | Server tests: boot from a converged fixture dir, serve `/graphs/{id}/query` + stored queries + Cedar gates; every D4 row refuses boot; e2e: `cluster apply` then serve — "applied means serving" |
| **5B: `--cluster` boot mode** | Flag + mode inference rule 0; catalog loader (state → `GraphStartupConfig`s + registries + policy engines); readiness table; OpenAPI regen if surface shifts | Server tests: boot from a converged fixture dir, serve `/graphs/{id}/query` + stored queries + Cedar gates; D4 cluster-global rows refuse boot; graph-local rows quarantine by default and refuse under `--require-all-graphs`; e2e: `cluster apply` then serve — "applied means serving" |
| **5C: docs + caveat retirement** | `cluster-config.md` mode-switch section; `server.md`/`deployment.md`; retire the "not serving" caveats for cluster-mode deployments; migration guide (D6) | `check-agents-md.sh`; doc accuracy review |
## Exit-criteria coverage

View file

@ -24,8 +24,9 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
| `merge_truth_table.rs` | Merge-pair truth table (MR-786): all 9×9 `(left_op, right_op)` cells from `{noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel}`. Adding a new op to `OpVariant` forces a compile error in `build_case` until the new row + column are dispositioned. 36 executable cells run through real `branch_merge` with a structured oracle (`MergeOutcome` / `MergeConflictKind` + graph-state assert); 45 cells involving `dropProperty`/`addLabel`/`removeLabel` are recorded as `Unsupported` until the mutation grammar grows. |
| `writes.rs` | Direct-publish writes: cancellation, non-strict insert/merge rebase under the per-table queue, strict stale-write conflicts, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery) |
| `staged_writes.rs` | TableStore staged-write primitives (`stage_append`, `stage_merge_insert`, `commit_staged`, `scan_with_staged`, `count_rows_with_staged`) — primitive-level only; engine code uses the in-memory `MutationStaging` accumulator instead |
| `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs; `// forbidden-api-allow: <reason>` sentinel exempts reviewed lines |
| `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs, nor open datasets directly (`Dataset::open` / `DatasetBuilder::from_uri`/`from_namespace`) — reads route through `Snapshot::open` and the held-handle cache; `// forbidden-api-allow: <reason>` sentinel exempts reviewed lines |
| `lance_surface_guards.rs` | Pins the Lance API surfaces omnigraph depends on (named runtime + compile-only guards; see [lance.md](lance.md)) — the first smoke check on any Lance version bump; e.g. `compact_files_still_fails_on_blob_columns` turns red when the upstream blob-compaction fix lands |
| `warm_read_cost.rs` | Cost-budget tests for the warm read path (query-latency work), measured at the object-store boundary with Lance `IOTracker` (the LanceDB IO-counted pattern): a warm same-branch read does 0 manifest opens, 0 commit-graph opens, 1 version probe, validates the schema once (Fix 1 / finding A / Fix 2 at commit-history depth); stale same-branch reads perform exactly 2 probes and refresh manifest-only; recreated non-main branches with the same Lance version refresh by incarnation; recreated branch-owned table handles are distinguished by table e_tag or refresh-time cache clearing; recreated traversal topology is protected by synthetic snapshot-id incarnation or refresh-time cache clearing; a warm *repeat* read does 0 table opens via the held-handle cache and a write re-opens only the changed table at its new version/e_tag (Fix 3/6A). See "Cost-budget tests" below |
| `lifecycle.rs` | Graph lifecycle, schema state |
| `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) |
| `changes.rs` | `diff_between` / `diff_commits` |
@ -126,5 +127,14 @@ When you pick up any change, walk through this:
6. **For substrate-touching changes** (Lance behavior), reach for `failpoints` or fixture-driven scenarios, not stubbed-out mocks.
7. **For server / API changes**, confirm the OpenAPI regeneration happens in `openapi.rs` and that the diff lands in `openapi.json`.
8. **Verify your change makes an existing test fail before it makes the new one pass.** If you can break the code without breaking a test, your coverage gap is the problem to fix first.
9. **Bound hot-path cost at history depth.** If the change touches a read or open path, add or extend a test that asserts a *bounded* cost (e.g. a warm same-branch read performs zero `Dataset::open`, or a fixed object-op count) against a fixture with realistic *commit-history depth*, not just realistic row counts. Cost that scales with history is invisible on a shallow fixture and only bites in production. See "Cost-budget tests" below.
## Cost-budget tests: bound hot-path cost at history depth
Correctness bugs fail loudly in tests; cost-scaling bugs pass every test and degrade silently in production. The engine read path historically had no cost assertion, and fixtures carry shallow commit history, so an O(commits)-per-query cost stayed green in CI and only surfaced on a long-lived graph (read snapshot resolution re-scanned the internal manifest and commit-graph tables on every query, and those tables were never compacted). Guard against the class:
- **Assert a cost budget, not just a result.** For a read/open path, assert the number of `Dataset::open` calls (or object-store ops) a warm query performs, and that it does not grow with commit count. The reference is LanceDB's IO-counted tests, which assert a cached read costs 0-1 IO and carry a named regression test against "a list call on every subsequent query."
- **Test at history depth.** Build a fixture with many *commits* (not many rows) and assert warm-read cost is flat across depths. A shallow fixture cannot catch an O(commits) cost.
- This is the testing companion to invariant 15 in [docs/dev/invariants.md](invariants.md) (hot-path cost is bounded by work, not history).
When in doubt, re-read [docs/dev/invariants.md](invariants.md) — quality gates apply to every change.

View file

@ -178,6 +178,17 @@ are left at `Lance HEAD = manifest_pinned + 1`.
post_commit_pin)` it intends to commit + the writer kind +
actor_id.
2. **Phase B**: writer's per-table `commit_staged` loop runs.
- **Phase-B confirmation (`BranchMerge` only)**: a `BranchMerge` writer
advances each table's HEAD by *several* commits (append → upsert →
delete), so a bare "HEAD moved" is ambiguous — it could be a complete
publish or one crashed mid-sequence. After the whole per-table loop
finishes, the writer re-writes the sidecar stamping each pin's
`confirmed_version` with the exact achieved version, then proceeds to
Phase C. This is the commit point of the recovery WAL: a crash *after*
confirmation rolls forward to those versions; a crash *during* Phase B
(sidecar still unconfirmed) rolls back. Other writers don't confirm —
their drift is derived state (index coverage, compaction) that a partial
roll-forward never corrupts.
3. **Phase C**: publisher commits the manifest.
4. **Phase D**: writer deletes the sidecar.
@ -197,7 +208,10 @@ recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`:
- For each sidecar in `__recovery/`, compare every named table's
Lance HEAD to the manifest pin. Classify per the all-or-nothing
decision tree (RolledPastExpected / NoMovement / UnexpectedAtP1 /
UnexpectedMultistep / InvariantViolation).
UnexpectedMultistep / IncompletePhaseB / InvariantViolation). For a
`BranchMerge` sidecar, a moved HEAD with no `confirmed_version` classifies
as `IncompletePhaseB` (a partial multi-commit publish) and forces roll-back;
with a `confirmed_version`, roll-forward targets exactly that version.
- If any table is `InvariantViolation` (Lance HEAD < manifest pinned
should be impossible), **abort** with a loud error and leave the
sidecar on disk for operator review.