From 8cb127d8faacd96f8e7749ef185fa4a0313bf2c5 Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Wed, 17 Jun 2026 00:41:10 +0300 Subject: [PATCH 01/13] =?UTF-8?q?docs(readme):=20server-first=20rewrite=20?= =?UTF-8?q?=E2=80=94=20deploy,=20agents,=20on-prem=20RustFS=20(#269)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * docs(readme): server-first rewrite — deploy, agents, on-prem RustFS Reframe the README around the actual product: a self-hostable, multigraph graph server for context assembly and multi-agent coordination, deployed Terraform-style on your own object storage (on-prem via RustFS, or S3/R2/GCS). - Lead with key capabilities and what you can build, not a local toy. - Promote "Drive it with an AI agent" (skill + a docs-first setup prompt) above the manual deploy walkthrough — agents are the primary operator. - "Deploy" is the hero: cluster.yaml → object store → validate/plan/apply → omnigraph-server, with RustFS as the on-prem path front and center. - "Query and mutate": stored queries by name + branch/review/merge. - Security & governance as scannable bullets; Clients & SDKs as a table. - Embedded local graph demoted to a clearly-labeled "quick test" (Signal → Indicates → Pattern), explicitly dev/experiment-only. - Drop the "serve/served/serving" vocabulary tic in favor of deploy/run. Co-Authored-By: Claude Opus 4.8 * docs(readme): add the server boot command to Deploy §3 (Greptile P1) The "Converge and run" step showed only the converge half — the code block ended at `cluster apply` with no `omnigraph-server` command, leaving a linear reader without a way to actually start the server. Add the boot line. Co-Authored-By: Claude Opus 4.8 * docs(readme): simplify the server boot command Drop the inline OMNIGRAPH_SERVER_BEARER_TOKEN prefix from the Deploy hero — the example cluster declares a policy so the server boots without it, and bearer auth is covered in Security & governance. Leaves a single clean line. Co-Authored-By: Claude Opus 4.8 * docs(readme): boot the server from the cluster dir, not a raw s3:// URI Pointing --cluster at the bucket hardcodes the storage URI in the run command. Boot from the config directory instead; the storage URI lives once in cluster.yaml and the server resolves it — single source of truth, and consistent with the cluster apply commands above. Co-Authored-By: Claude Opus 4.8 --------- Co-authored-by: Claude Opus 4.8 --- README.md | 313 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 165 insertions(+), 148 deletions(-) diff --git a/README.md b/README.md index 9c4f8bc..e3876f6 100644 --- a/README.md +++ b/README.md @@ -4,197 +4,214 @@ [![Rust](https://img.shields.io/badge/rust-stable-orange.svg)](rust-toolchain.toml) [![Crates.io](https://img.shields.io/crates/v/omnigraph-cli.svg)](https://crates.io/crates/omnigraph-cli) -**Lakehouse native graph engine built for context assembly** +**Lakehouse graph db for context assembly & multi-agent coordination** +Multimodal retrieval, Git-style branching, object storage-native -Omnigraph acts as operational state & coordination layer for agents. -Hundreds of agents can enrich the graph on parallel isolated branches and changes can be reviewed and merged safely. +Omnigraph is the operational state and coordination layer for fleets of agents. +Run it as a server, declared as code; hundreds of agents operate and enrich the graph on +parallel isolated branches, and every change is reviewed and merged safely. -- Git-style versioning & branching -- Multimodal retrieval (graph+vector/fts+filters) optimized for context assembly -- Runs on the local filesystem or any S3-compatible object store (AWS S3, R2, MinIO, RustFS) -- Native blob-as-data support (docs, images, videos, etc) -- VPC, On-prem, hybrid deployment -- [`Lance`](https://github.com/lance-format/lance) format as open storage layer +## Key capabilities -| AS CODE | What it means | +- **A graph server you run, declared as code** — a `cluster.yaml` declares graphs, schemas, stored queries, embedding providers, and policies. `cluster apply` converges it; `omnigraph-server` boots from it and brings every graph online at `/graphs/{id}/…`. +- **Built for fleets of agents** — hundreds of agents enrich the graph on **parallel isolated branches**; changes are reviewed and merged safely, Git-style, across the whole graph. +- **Multimodal retrieval for context assembly** — graph traversal + vector ANN + full-text + Reciprocal Rank Fusion in **one** query runtime. +- **Security as code** — Cedar policy enforced **server-side on every mutation**, per-graph and server-wide; bearer auth; actor/audit tracking. +- **Runs on your infrastructure** — any S3-compatible object store: **on-prem via RustFS / MinIO**, or AWS S3 / R2 / GCS. VPC, on-prem, hybrid — your data never leaves your store. +- **Open, versioned storage** — [`Lance`](https://github.com/lance-format/lance) columnar format: branchable, time-travelable, with native blob-as-data (docs, images, video). + +## What you can build + +| Use case | What it's for | |---|---| -| **Schema AS CODE** | Typed `.pg` schemas, planned, applied, enforced | -| **Context AS CODE** | Linted queries & agentic nudges, versioned and reusable | -| **Security AS CODE** | Cedar policies enforced server-side on every mutation | -| **Dashboards AS CODE** | Declarative views & controls over the graph *(coming)* | +| **Company brain** | Org knowledge unified into one graph every agent can query | +| **Agentic memory** | Durable, versioned memory — a branch per agent or per task, merged on review | +| **Context graph** | Decision traces and codified tribal knowledge for retrieval | +| **Dev graph** | Issues & dependency model that coding agents read and write | +| **R&D / ML data layer** | Experiments and trials written into branches, versioned for training & eval | -## Core Use Cases - -| Use case | What it's for -|---|---| -| **Company brain** | Org knowledge unified into one queryable graph | -| **Context graph** | Decision traces and codified tribal knowledge | -| **Agentic memory** | Durable, versioned memory for long-running agents | -| **Dev graph** | Issues & dependency model for coding agents | -| **R&D data layer** | Experiments & trials data written into branches | -| **ML workflows** | Versioned, branchable graphs for training & eval | -| **Karpathy's LLM wiki** | A living, agent-updatable knowledge base | - -## Quick Install +## Install ```bash curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash ``` -This installs `omnigraph` and `omnigraph-server` into `~/.local/bin` from -published release binaries. - -Or install with Homebrew: +This installs `omnigraph` (CLI) and `omnigraph-server` into `~/.local/bin` from +published release binaries. Or with Homebrew: ```bash brew tap ModernRelay/tap brew install ModernRelay/tap/omnigraph ``` -## Quick start +## Drive it with an AI agent -The fastest path is an **embedded, local file-backed graph** — no server, no -object store, no Docker: +Omnigraph is built to be run by coding agents — two ways in. -```bash -# A schema and one row of data -cat > schema.pg <<'PG' -node Person { - slug: String @key - name: String - title: String? -} -PG -echo '{"type":"Person","data":{"slug":"alice","name":"Alice","title":"Engineer"}}' > people.jsonl - -# Create → load (--mode is required) → query -omnigraph init --schema schema.pg ./graph.omni -omnigraph load --data people.jsonl --mode overwrite --store ./graph.omni -omnigraph query find_people --store ./graph.omni --params '{"t":"Engineer"}' \ - -e 'query find_people($t: String) { match { $p: Person { title: $t } } return { $p.name } }' - -# Branch, write in isolation, merge — Git-style across the whole graph -omnigraph branch create --from main review/new-hires --store ./graph.omni -omnigraph branch merge review/new-hires --into main --store ./graph.omni -``` - -**Storage backends** — the same flow runs on any backend; only the graph address changes: - -| Backend | Use it for | Graph address | -|---|---|---| -| **Embedded** (local filesystem) | dev, demos, single machine — the default | `./graph.omni` | -| **Object storage** (AWS S3, R2, GCS-S3) | shared, multi-host, durable | `s3://bucket/graph.omni` (+ the `AWS_*` env) | -| **RustFS / MinIO** | rehearse the S3 path locally, no cloud account | `s3://…` against a local endpoint → [deployment guide](docs/user/deployment.md#testing-against-s3-locally) | - -`init` takes the address as its positional argument (`omnigraph init --schema schema.pg
`); `load`, `query`, and `branch` take it via `--store
`. - -For a **served, multi-graph deployment** (the cluster model), see [Common Commands](#common-commands) below. - -## Set it up with an AI agent - -Omnigraph is built to be set up by coding agents. Paste this into Claude Code, -Cursor, or any agent that can read a URL, install a package, and run a shell -command — it installs the skill, reads the docs, and walks you through setup for -your use case: - -```text -Help me set up Omnigraph (a lakehouse-native graph engine for agents). - -1. Install the Omnigraph skill so you operate it correctly: - npx skills add ModernRelay/omnigraph@omnigraph -2. Read the docs at https://github.com/ModernRelay/omnigraph — start with - docs/user/quickstart.md, then docs/user/clusters/index.md. -3. Skim the starter graphs and seed data in the cookbooks: - https://github.com/ModernRelay/omnigraph-cookbooks -4. Ask me what I want to build (company brain, agent memory, dev graph, - research / R&D layer, …). Then install the CLI, stand up a first graph for - that use case, load a little data, and run a query so I can see it working. -``` - -Works with any agent that can browse a URL, install a package, and run a shell. - -## Agent skill & starter graphs - -This repo ships the [**`omnigraph` agent skill**](skills/omnigraph) — the -operational playbook (cluster mode, the two config surfaces, schema evolution, -query linting, data writes, branches, Cedar policy, and common gotchas) that -teaches a coding agent to drive Omnigraph correctly. Install it with: +**Teach your agent the playbook.** This repo ships the +[**`omnigraph` agent skill**](skills/omnigraph): the operational playbook — +cluster mode, the two config surfaces, schema evolution, query linting, data +writes, branches, Cedar policy, and the common gotchas. ```bash npx skills add ModernRelay/omnigraph@omnigraph ``` +**Or have an agent set it up from scratch.** Paste this into Claude Code, +Cursor, or any agent that can read a URL and run a shell command: + +```text +Help me set up Omnigraph + +1. Read the docs at https://github.com/ModernRelay/omnigraph — start with + docs/user/clusters/index.md, then docs/user/deployment.md. +2. Skim the starter graphs and seed data in the cookbooks: + https://github.com/ModernRelay/omnigraph-cookbooks +3. Ask me what I want to build (company brain, agent memory, dev graph, + research / R&D layer, …). Then stand up a cluster for it, load a little + data, and run a query so I can see it working. +``` + For ready-to-run graphs with real seed data (company brain, VC operating system, pharma & industry intel), [`ModernRelay/omnigraph-cookbooks`](https://github.com/ModernRelay/omnigraph-cookbooks) -is the fastest way to see Omnigraph shaped to a real domain. To rehearse the S3 -path locally, see [deployment.md → Testing against S3 locally](docs/user/deployment.md#testing-against-s3-locally). +is the fastest way to see Omnigraph shaped to a real domain. -## Common Commands +## Deploy -A deployment is a **cluster**. A `cluster.yaml` declares its graphs, schemas, -stored queries, and policies; you converge it with `cluster apply` and serve it. -The server is cluster-first — it boots only from a cluster and serves every graph -under `/graphs/{id}/…`. Day-to-day work goes through that server: graphs are -addressed with `--server ` (+ `--graph `), and `query`/`mutate` -invoke a stored query from the catalog **by name**. +A deployment is a **cluster** — a **multigraph** config directory that declares +its graphs, schemas, stored queries, and policies as code. You manage it +**Terraform-style**: `cluster plan` previews the diff, `cluster apply` converges +it. `omnigraph-server` then boots from the cluster and brings every graph online +at `/graphs/{id}/…`, each behind its own policy. -```bash -# 1. Converge the declared cluster, then serve it (--as attributes the apply) -omnigraph cluster apply --config ./company-brain --as you -omnigraph-server --cluster ./company-brain --bind 0.0.0.0:8080 -# or config-free from object storage — the bucket IS the deployment: -# omnigraph-server --cluster s3://my-bucket/company-brain --bind 0.0.0.0:8080 +**1. Declare the cluster.** -# 2. Work against the served graph — stored queries invoked by name -omnigraph query find_people --server prod --graph knowledge --params '{"q":"AI safety"}' -omnigraph mutate add_person --server prod --graph knowledge --params '{"name":"Mina"}' -omnigraph load --data ./data.jsonl --mode merge --server prod --graph knowledge - -# 3. Branch and merge, Git-style across the whole graph -omnigraph branch create --from main review/2026-06 --server prod --graph knowledge -omnigraph branch merge review/2026-06 --into main --server prod --graph knowledge +``` +company-brain/ +├── cluster.yaml +├── people.pg # schema for the "knowledge" graph +├── queries/ # stored queries — the .gq files ARE the declaration +│ └── people.gq +└── base.policy.yaml # a Cedar policy bundle ``` -Set a default scope (or a `--profile`) in `~/.omnigraph/config.yaml` — operator -identity, named servers/clusters, credentials — and the `--server`/`--graph` -flags drop away (`omnigraph query find_people --params …`). - -**Local / ad-hoc.** For quick iteration on a standalone graph (no cluster, no -server), address storage directly with `--store` (or a positional `file://` / -`s3://` URI) and run ad-hoc `.gq` with `--query` (the positional then selects -which query in the file): - -```bash -omnigraph init --schema ./schema.pg ./graph.omni -omnigraph load --data ./data.jsonl --mode merge --store ./graph.omni -omnigraph query --query ./queries.gq get_person --params '{"name":"Alice"}' --store ./graph.omni +```yaml +# cluster.yaml +version: 1 +metadata: + name: company-brain +storage: s3://company/clusters/company-brain # ledger, catalog, and graph data live here +graphs: + knowledge: + schema: people.pg + queries: queries/ # every `query ` in queries/*.gq registers +policies: + base: + file: base.policy.yaml + applies_to: [knowledge] # graph-bound; use [cluster] for server-level ``` -See [docs/user/cli/index.md](docs/user/cli/index.md), the -[CLI reference](docs/user/cli/reference.md), the -[cluster guide](docs/user/clusters/index.md), and the -[deployment guide](docs/user/deployment.md) for schema apply, snapshots, commits, -profiles, and policy/queries tooling. +**2. Stand up your object store.** On-prem, run RustFS (or MinIO) — Omnigraph +writes [Lance](https://github.com/lance-format/lance) to it over the standard S3 +API. In the cloud, point the same `AWS_*` env at S3 / R2 / GCS instead. -## Clients +**3. Converge and run.** `apply` creates each graph, applies its schema, and +publishes queries and policies into the content-addressed catalog. It is +idempotent — re-running is always safe. -For programmatic access to a running `omnigraph-server`: +```bash +omnigraph cluster validate # parse + typecheck everything +omnigraph cluster plan # preview what apply would do +omnigraph cluster apply # converge -- **TypeScript SDK + MCP server** — [`@modernrelay/omnigraph`](https://www.npmjs.com/package/@modernrelay/omnigraph) and [`@modernrelay/omnigraph-mcp`](https://www.npmjs.com/package/@modernrelay/omnigraph-mcp), versioned in lockstep with `omnigraph-server`. Source, docs, and examples: [`ModernRelay/omnigraph-ts`](https://github.com/ModernRelay/omnigraph-ts). -- **Python SDK** — coming soon. +# Boot the server from the cluster dir — storage resolves through cluster.yaml +omnigraph-server --cluster company-brain --bind 0.0.0.0:8080 +``` + +See the [cluster guide](docs/user/clusters/index.md) for the day-2 loop +(edit → plan → apply → restart), approval gates for destructive changes, drift +inspection, and recovery; the [deployment guide](docs/user/deployment.md) for +containers, AWS/Railway, auth, and the full `AWS_*` contract. + +## Query and mutate + +Point the CLI at a running server and a graph. Stored queries and mutations run +**by name** from the catalog; branch and merge run across the whole graph, so a +fleet of agents can write in isolation and have changes reviewed before they +land on `main`. + +```bash +# Stored query / mutation, parameters as JSON +omnigraph query search_docs --server https://graph.internal:8080 --graph knowledge --params '{"q":"AI safety"}' +omnigraph mutate add_person --server https://graph.internal:8080 --graph knowledge --params '{"name":"Mina","team":"Research"}' + +# An agent enriches on its own branch; you review, then merge +omnigraph branch create --from main agent/ingest-42 --server https://graph.internal:8080 --graph knowledge +omnigraph branch merge agent/ingest-42 --into main --server https://graph.internal:8080 --graph knowledge +``` + +Name the server (and a default graph) once in `~/.omnigraph/config.yaml` — with +operator identity and credentials — and the `--server`/`--graph` flags drop +away: `omnigraph query search_docs --params '{"q":"…"}'`. See the +[CLI reference](docs/user/cli/reference.md). + +## Security & governance + +- **Engine-wide enforcement** — every write path goes through the same Cedar gate, so the HTTP server, the CLI, and the embedded SDK obey identical rules. +- **Declared in the cluster** — a policy bundle is bound to graphs (or the whole server) via `policies:` → `applies_to`. +- **Scoped** — rules apply per graph, per branch, or server-wide. +- **No plaintext tokens** — bearer tokens are hashed at startup and compared in constant time. +- **Forge-proof identity** — the actor is resolved server-side from the token; clients can't set it. + +See the [policy guide](docs/user/operations/policy.md). + +## Clients & SDKs + +| Client | Use it for | Where | +|---|---|---| +| **TypeScript SDK** | typed access from Node / TS | [`@modernrelay/omnigraph`](https://www.npmjs.com/package/@modernrelay/omnigraph) · [source](https://github.com/ModernRelay/omnigraph-ts) | +| **MCP server** | bridge Omnigraph to LLM hosts (Claude, Cursor, …) | [`@modernrelay/omnigraph-mcp`](https://www.npmjs.com/package/@modernrelay/omnigraph-mcp) | +| **HTTP / OpenAPI** | any language — the wire contract | the server's OpenAPI spec | +| **Python SDK** | typed access from Python | *coming soon* | + +Both npm packages are versioned in lockstep with `omnigraph-server`. + +## Local quick test (no server) + +1-min setup to try it: an **embedded, local file-backed graph** — no server, no +object store. For dev and experiments; production is the deployed cluster above. + +```bash +cat > schema.pg <<'PG' +node Signal { slug: String @key, title: String } +node Pattern { slug: String @key, name: String } +edge Indicates: Signal -> Pattern +PG +printf '%s\n' \ + '{"type":"Signal","data":{"slug":"s1","title":"OSS model adoption surging"}}' \ + '{"type":"Pattern","data":{"slug":"p1","name":"adoption"}}' \ + '{"edge":"Indicates","from":"s1","to":"p1"}' > data.jsonl + +omnigraph init --schema schema.pg ./graph.omni +omnigraph load --data data.jsonl --mode overwrite --store ./graph.omni + +# "What pattern does signal s1 indicate?" +omnigraph query --store ./graph.omni \ + -e 'query indicates() { match { $s: Signal { slug: "s1" } $s indicates $p } return { $p.name } }' +# → adoption +``` ## Docs -- [Install guide](docs/user/install.md) -- [Deployment guide](docs/user/deployment.md) +- [Cluster guide](docs/user/clusters/index.md) · [Deployment guide](docs/user/deployment.md) · [CLI reference](docs/user/cli/reference.md) +- [Schema](docs/user/schema/index.md) · [Queries](docs/user/queries/index.md) · [Search](docs/user/search/index.md) · [Policy](docs/user/operations/policy.md) ## Build And Test ```bash cargo build --workspace -cargo check --workspace -cargo test --workspace +cargo test --workspace ``` Notes: @@ -211,8 +228,8 @@ Notes: - `crates/omnigraph-policy`: Cedar policy compilation and enforcement - `crates/omnigraph-api-types`: shared HTTP wire DTOs used by both the server and the CLI - `crates/omnigraph-cluster`: cluster config validation, planning, and apply (the control plane) -- `crates/omnigraph-server`: Axum HTTP server — cluster-first, serving N graphs under `/graphs/{id}/…` -- `crates/omnigraph-cli`: CLI for graph lifecycle (init/load), query/mutate, branch/commit/merge, schema/lint, snapshot/export, cluster control, policy/queries, profiles, and maintenance (optimize/repair/cleanup) +- `crates/omnigraph-server`: Axum HTTP server — cluster-first, runs N graphs under `/graphs/{id}/…` +- `crates/omnigraph-cli`: CLI for graph lifecycle, query/mutate, branch/commit/merge, schema/lint, snapshot/export, cluster control, policy/queries, profiles, and maintenance ## Contributing From ccd13eca7c07670e50acf3137c80ee1daac28a5c Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Wed, 17 Jun 2026 00:45:28 +0300 Subject: [PATCH 02/13] docs(readme): make Key capabilities a table (#270) Match the "What you can build" section's scannable two-column format. Co-authored-by: Claude Opus 4.8 --- README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index e3876f6..6574347 100644 --- a/README.md +++ b/README.md @@ -13,12 +13,14 @@ parallel isolated branches, and every change is reviewed and merged safely. ## Key capabilities -- **A graph server you run, declared as code** — a `cluster.yaml` declares graphs, schemas, stored queries, embedding providers, and policies. `cluster apply` converges it; `omnigraph-server` boots from it and brings every graph online at `/graphs/{id}/…`. -- **Built for fleets of agents** — hundreds of agents enrich the graph on **parallel isolated branches**; changes are reviewed and merged safely, Git-style, across the whole graph. -- **Multimodal retrieval for context assembly** — graph traversal + vector ANN + full-text + Reciprocal Rank Fusion in **one** query runtime. -- **Security as code** — Cedar policy enforced **server-side on every mutation**, per-graph and server-wide; bearer auth; actor/audit tracking. -- **Runs on your infrastructure** — any S3-compatible object store: **on-prem via RustFS / MinIO**, or AWS S3 / R2 / GCS. VPC, on-prem, hybrid — your data never leaves your store. -- **Open, versioned storage** — [`Lance`](https://github.com/lance-format/lance) columnar format: branchable, time-travelable, with native blob-as-data (docs, images, video). +| Capability | What it gives you | +|---|---| +| **Declared as code** | A `cluster.yaml` declares graphs, schemas, stored queries, embedding providers, and policies; `cluster apply` converges it and `omnigraph-server` brings every graph online at `/graphs/{id}/…`. | +| **Built for fleets of agents** | Hundreds of agents enrich the graph on **parallel isolated branches**; changes are reviewed and merged safely, Git-style, across the whole graph. | +| **Multimodal retrieval** | Graph traversal + vector ANN + full-text + Reciprocal Rank Fusion in **one** query runtime, for context assembly. | +| **Security as code** | Cedar policy enforced **server-side on every mutation**, per-graph and server-wide; bearer auth; actor/audit tracking. | +| **Runs on your infrastructure** | Any S3-compatible object store — **on-prem via RustFS / MinIO**, or AWS S3 / R2 / GCS. VPC, on-prem, hybrid; your data never leaves your store. | +| **Open, versioned storage** | [`Lance`](https://github.com/lance-format/lance) columnar format: branchable, time-travelable, with native blob-as-data (docs, images, video). | ## What you can build From b55ca02131e47e279dced30740cc0ebc6f27e8fa Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Wed, 17 Jun 2026 00:58:28 +0300 Subject: [PATCH 03/13] docs(readme): one sentence per line in the intro (#271) Adjacent source lines collapse into run-on text when rendered. Add hard line breaks so the headline, subhead, and each intro sentence land on their own line. Co-authored-by: Claude Opus 4.8 --- README.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 6574347..6cead9b 100644 --- a/README.md +++ b/README.md @@ -4,12 +4,11 @@ [![Rust](https://img.shields.io/badge/rust-stable-orange.svg)](rust-toolchain.toml) [![Crates.io](https://img.shields.io/crates/v/omnigraph-cli.svg)](https://crates.io/crates/omnigraph-cli) -**Lakehouse graph db for context assembly & multi-agent coordination** +**Lakehouse graph db for context assembly & multi-agent coordination**\ Multimodal retrieval, Git-style branching, object storage-native -Omnigraph is the operational state and coordination layer for fleets of agents. -Run it as a server, declared as code; hundreds of agents operate and enrich the graph on -parallel isolated branches, and every change is reviewed and merged safely. +Omnigraph is the operational state and coordination layer for fleets of agents.\ +Run it as a server, declared as code; hundreds of agents operate and enrich the graph on parallel isolated branches, and every change is reviewed and merged safely. ## Key capabilities From b6131393b714a2612b5336b5b6528cb1a8ad53c8 Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Wed, 17 Jun 2026 02:36:14 +0300 Subject: [PATCH 04/13] =?UTF-8?q?docs(readme):=20drop=20em-dashes,=20Curso?= =?UTF-8?q?r=E2=86=92Codex,=20rename=20agent=20section=20(#274)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * docs(readme): drop em-dashes, Cursor→Codex, rename agent section - Replace all 20 em-dashes with context-appropriate punctuation (colons, semicolons, parens, commas) — removes the AI-slop tell. - Cursor → Codex (the agent-host examples and the MCP host list). - "Drive it with an AI agent" → "Set it up with an AI agent". Co-Authored-By: Claude Opus 4.8 * docs(readme): wordmark header + simplify query examples - Add the compact wordmark header (light/dark SVG, subtitle, nav row, restyled badges) from the header-redesign work; bring the wordmark assets with it. - Rewrite the Query and mutate examples to lead with the short, config-default form (no repeated --server/--graph) and aliases — showing how simple it is, not crazy-long lines. The verbose --server/--graph/--store form is demoted to a one-line "ad-hoc target" note. Co-Authored-By: Claude Opus 4.8 --------- Co-authored-by: Claude Opus 4.8 --- README.md | 103 ++++++++++--------- assets/omnigraph-wordmark-dark.svg | 152 +++++++++++++++++++++++++++++ assets/omnigraph-wordmark.svg | 152 +++++++++++++++++++++++++++++ 3 files changed, 363 insertions(+), 44 deletions(-) create mode 100644 assets/omnigraph-wordmark-dark.svg create mode 100644 assets/omnigraph-wordmark.svg diff --git a/README.md b/README.md index 6cead9b..deaea8b 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,29 @@ -# Omnigraph +

+ + + OMNIGRAPH + +

-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) -[![Rust](https://img.shields.io/badge/rust-stable-orange.svg)](rust-toolchain.toml) -[![Crates.io](https://img.shields.io/crates/v/omnigraph-cli.svg)](https://crates.io/crates/omnigraph-cli) +

+ Lakehouse graph database for context assembly & multi-agent coordination
+ Multimodal retrieval · Git-style branching · object-storage native +

-**Lakehouse graph db for context assembly & multi-agent coordination**\ -Multimodal retrieval, Git-style branching, object storage-native +

+ Quickstart  ·  + Docs  ·  + Cookbooks  ·  + CLI +

+ +

+ License: MIT + crates.io + Rust +

+ +
Omnigraph is the operational state and coordination layer for fleets of agents.\ Run it as a server, declared as code; hundreds of agents operate and enrich the graph on parallel isolated branches, and every change is reviewed and merged safely. @@ -18,7 +36,7 @@ Run it as a server, declared as code; hundreds of agents operate and enrich the | **Built for fleets of agents** | Hundreds of agents enrich the graph on **parallel isolated branches**; changes are reviewed and merged safely, Git-style, across the whole graph. | | **Multimodal retrieval** | Graph traversal + vector ANN + full-text + Reciprocal Rank Fusion in **one** query runtime, for context assembly. | | **Security as code** | Cedar policy enforced **server-side on every mutation**, per-graph and server-wide; bearer auth; actor/audit tracking. | -| **Runs on your infrastructure** | Any S3-compatible object store — **on-prem via RustFS / MinIO**, or AWS S3 / R2 / GCS. VPC, on-prem, hybrid; your data never leaves your store. | +| **Runs on your infrastructure** | Any S3-compatible object store: **on-prem via RustFS / MinIO**, or AWS S3 / R2 / GCS. VPC, on-prem, hybrid; your data never leaves your store. | | **Open, versioned storage** | [`Lance`](https://github.com/lance-format/lance) columnar format: branchable, time-travelable, with native blob-as-data (docs, images, video). | ## What you can build @@ -26,7 +44,7 @@ Run it as a server, declared as code; hundreds of agents operate and enrich the | Use case | What it's for | |---|---| | **Company brain** | Org knowledge unified into one graph every agent can query | -| **Agentic memory** | Durable, versioned memory — a branch per agent or per task, merged on review | +| **Agentic memory** | Durable, versioned memory: a branch per agent or per task, merged on review | | **Context graph** | Decision traces and codified tribal knowledge for retrieval | | **Dev graph** | Issues & dependency model that coding agents read and write | | **R&D / ML data layer** | Experiments and trials written into branches, versioned for training & eval | @@ -45,26 +63,26 @@ brew tap ModernRelay/tap brew install ModernRelay/tap/omnigraph ``` -## Drive it with an AI agent +## Set it up with an AI agent -Omnigraph is built to be run by coding agents — two ways in. +Omnigraph is built to be run by coding agents. Two ways in: **Teach your agent the playbook.** This repo ships the -[**`omnigraph` agent skill**](skills/omnigraph): the operational playbook — -cluster mode, the two config surfaces, schema evolution, query linting, data -writes, branches, Cedar policy, and the common gotchas. +[**`omnigraph` agent skill**](skills/omnigraph): the operational playbook +covering cluster mode, the two config surfaces, schema evolution, query linting, +data writes, branches, Cedar policy, and the common gotchas. ```bash npx skills add ModernRelay/omnigraph@omnigraph ``` **Or have an agent set it up from scratch.** Paste this into Claude Code, -Cursor, or any agent that can read a URL and run a shell command: +Codex, or any agent that can read a URL and run a shell command: ```text Help me set up Omnigraph -1. Read the docs at https://github.com/ModernRelay/omnigraph — start with +1. Read the docs at https://github.com/ModernRelay/omnigraph, starting with docs/user/clusters/index.md, then docs/user/deployment.md. 2. Skim the starter graphs and seed data in the cookbooks: https://github.com/ModernRelay/omnigraph-cookbooks @@ -80,7 +98,7 @@ is the fastest way to see Omnigraph shaped to a real domain. ## Deploy -A deployment is a **cluster** — a **multigraph** config directory that declares +A deployment is a **cluster**: a **multigraph** config directory that declares its graphs, schemas, stored queries, and policies as code. You manage it **Terraform-style**: `cluster plan` previews the diff, `cluster apply` converges it. `omnigraph-server` then boots from the cluster and brings every graph online @@ -92,7 +110,7 @@ at `/graphs/{id}/…`, each behind its own policy. company-brain/ ├── cluster.yaml ├── people.pg # schema for the "knowledge" graph -├── queries/ # stored queries — the .gq files ARE the declaration +├── queries/ # stored queries: the .gq files ARE the declaration │ └── people.gq └── base.policy.yaml # a Cedar policy bundle ``` @@ -113,20 +131,20 @@ policies: applies_to: [knowledge] # graph-bound; use [cluster] for server-level ``` -**2. Stand up your object store.** On-prem, run RustFS (or MinIO) — Omnigraph +**2. Stand up your object store.** On-prem, run RustFS (or MinIO); Omnigraph writes [Lance](https://github.com/lance-format/lance) to it over the standard S3 API. In the cloud, point the same `AWS_*` env at S3 / R2 / GCS instead. **3. Converge and run.** `apply` creates each graph, applies its schema, and publishes queries and policies into the content-addressed catalog. It is -idempotent — re-running is always safe. +idempotent; re-running is always safe. ```bash omnigraph cluster validate # parse + typecheck everything omnigraph cluster plan # preview what apply would do omnigraph cluster apply # converge -# Boot the server from the cluster dir — storage resolves through cluster.yaml +# Boot the server from the cluster dir; storage resolves through cluster.yaml omnigraph-server --cluster company-brain --bind 0.0.0.0:8080 ``` @@ -137,33 +155,30 @@ containers, AWS/Railway, auth, and the full `AWS_*` contract. ## Query and mutate -Point the CLI at a running server and a graph. Stored queries and mutations run -**by name** from the catalog; branch and merge run across the whole graph, so a -fleet of agents can write in isolation and have changes reviewed before they -land on `main`. +Set a default server and graph once in `~/.omnigraph/config.yaml`, and the +everyday commands stay short. Stored queries and mutations run **by name**: ```bash -# Stored query / mutation, parameters as JSON -omnigraph query search_docs --server https://graph.internal:8080 --graph knowledge --params '{"q":"AI safety"}' -omnigraph mutate add_person --server https://graph.internal:8080 --graph knowledge --params '{"name":"Mina","team":"Research"}' +omnigraph query search_docs --params '{"q":"AI safety"}' +omnigraph mutate add_person --params '{"name":"Mina"}' -# An agent enriches on its own branch; you review, then merge -omnigraph branch create --from main agent/ingest-42 --server https://graph.internal:8080 --graph knowledge -omnigraph branch merge agent/ingest-42 --into main --server https://graph.internal:8080 --graph knowledge +# Branch, review, merge across the whole graph; agents write in isolation +omnigraph branch create --from main agent/ingest-42 +omnigraph branch merge agent/ingest-42 --into main ``` -Name the server (and a default graph) once in `~/.omnigraph/config.yaml` — with -operator identity and credentials — and the `--server`/`--graph` flags drop -away: `omnigraph query search_docs --params '{"q":"…"}'`. See the -[CLI reference](docs/user/cli/reference.md). +An **alias** is shorter still: bind a server, graph, and stored query to one +name, then `omnigraph alias triage` runs it. For an ad-hoc target, any command +still takes `--server --graph ` (or `--store ` for a local +graph). See the [CLI reference](docs/user/cli/reference.md). ## Security & governance -- **Engine-wide enforcement** — every write path goes through the same Cedar gate, so the HTTP server, the CLI, and the embedded SDK obey identical rules. -- **Declared in the cluster** — a policy bundle is bound to graphs (or the whole server) via `policies:` → `applies_to`. -- **Scoped** — rules apply per graph, per branch, or server-wide. -- **No plaintext tokens** — bearer tokens are hashed at startup and compared in constant time. -- **Forge-proof identity** — the actor is resolved server-side from the token; clients can't set it. +- **Engine-wide enforcement:** every write path goes through the same Cedar gate, so the HTTP server, the CLI, and the embedded SDK obey identical rules. +- **Declared in the cluster:** a policy bundle is bound to graphs (or the whole server) via `policies:` → `applies_to`. +- **Scoped:** rules apply per graph, per branch, or server-wide. +- **No plaintext tokens:** bearer tokens are hashed at startup and compared in constant time. +- **Forge-proof identity:** the actor is resolved server-side from the token; clients can't set it. See the [policy guide](docs/user/operations/policy.md). @@ -172,16 +187,16 @@ See the [policy guide](docs/user/operations/policy.md). | Client | Use it for | Where | |---|---|---| | **TypeScript SDK** | typed access from Node / TS | [`@modernrelay/omnigraph`](https://www.npmjs.com/package/@modernrelay/omnigraph) · [source](https://github.com/ModernRelay/omnigraph-ts) | -| **MCP server** | bridge Omnigraph to LLM hosts (Claude, Cursor, …) | [`@modernrelay/omnigraph-mcp`](https://www.npmjs.com/package/@modernrelay/omnigraph-mcp) | -| **HTTP / OpenAPI** | any language — the wire contract | the server's OpenAPI spec | +| **MCP server** | bridge Omnigraph to LLM hosts (Claude, Codex, …) | [`@modernrelay/omnigraph-mcp`](https://www.npmjs.com/package/@modernrelay/omnigraph-mcp) | +| **HTTP / OpenAPI** | any language, the wire contract | the server's OpenAPI spec | | **Python SDK** | typed access from Python | *coming soon* | Both npm packages are versioned in lockstep with `omnigraph-server`. ## Local quick test (no server) -1-min setup to try it: an **embedded, local file-backed graph** — no server, no -object store. For dev and experiments; production is the deployed cluster above. +1-min setup to try it: an **embedded, local file-backed graph** (no server, no +object store). For dev and experiments; production is the deployed cluster above. ```bash cat > schema.pg <<'PG' @@ -229,7 +244,7 @@ Notes: - `crates/omnigraph-policy`: Cedar policy compilation and enforcement - `crates/omnigraph-api-types`: shared HTTP wire DTOs used by both the server and the CLI - `crates/omnigraph-cluster`: cluster config validation, planning, and apply (the control plane) -- `crates/omnigraph-server`: Axum HTTP server — cluster-first, runs N graphs under `/graphs/{id}/…` +- `crates/omnigraph-server`: Axum HTTP server, cluster-first, runs N graphs under `/graphs/{id}/…` - `crates/omnigraph-cli`: CLI for graph lifecycle, query/mutate, branch/commit/merge, schema/lint, snapshot/export, cluster control, policy/queries, profiles, and maintenance ## Contributing diff --git a/assets/omnigraph-wordmark-dark.svg b/assets/omnigraph-wordmark-dark.svg new file mode 100644 index 0000000..47b2033 --- /dev/null +++ b/assets/omnigraph-wordmark-dark.svg @@ -0,0 +1,152 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/omnigraph-wordmark.svg b/assets/omnigraph-wordmark.svg new file mode 100644 index 0000000..45778dc --- /dev/null +++ b/assets/omnigraph-wordmark.svg @@ -0,0 +1,152 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + From d0e06a6ff6cc549e55c1bf1f19e61138b7ecfeb7 Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Wed, 17 Jun 2026 02:58:47 +0300 Subject: [PATCH 05/13] =?UTF-8?q?docs:=20audit=20pass=20=E2=80=94=20drop?= =?UTF-8?q?=20pre-0.7.0=20release=20notes;=20scrub=20RFC=20refs=20from=20u?= =?UTF-8?q?ser=20docs=20(#272)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * docs: audit pass — drop pre-0.7.0 release notes; scrub RFC refs from user docs - Delete the pre-0.7.0 release-notes archive (v0.2.0 … v0.6.2); keep v0.7.0. - Rewrite every inline "RFC-0NN" citation in docs/user/** into durable plain language (the behavior is the contract, not the planning doc): cli/index.md, cli/reference.md, clusters/index.md, operations/{maintenance, policy,server}.md. Updated the in-page "Scopes & profiles" anchor to match the de-RFC'd heading. No sub-0.7.0 version caveats or stale Lance-version refs were present in docs/user/**. Dev docs, AGENTS.md, and instruction files are out of scope for this pass. Co-Authored-By: Claude Opus 4.8 * docs: second alignment pass — drop residual pre-cluster-only framing - cli/reference.md: rewrite the server-scope graph-resolution rule — an omnigraph-server is always cluster-backed, so GET /graphs always answers and --graph is required; the bare-URL path is only the fallback for an unavailable/non-omnigraph endpoint (was "a single-graph / flat server … uses its bare URL as before"). - embeddings.md: "Direct single-graph serving" → "Direct (--store) access" (there is no single-graph serving mode under cluster-only). - clusters/{config,index}.md: drop the removed --target flag from the "--cluster cannot combine with …" clauses. Verified: no Linear tickets, no RFC refs, no single-graph-as-current, no --target-as-combinable in docs/user/**. Co-Authored-By: Claude Opus 4.8 --------- Co-authored-by: Claude Opus 4.8 --- docs/releases/v0.2.0.md | 86 -------------- docs/releases/v0.2.1.md | 59 ---------- docs/releases/v0.2.2.md | 29 ----- docs/releases/v0.3.0.md | 49 -------- docs/releases/v0.3.1.md | 19 ---- docs/releases/v0.4.0.md | 88 -------------- docs/releases/v0.4.1.md | 142 ----------------------- docs/releases/v0.4.2.md | 115 ------------------- docs/releases/v0.5.0.md | 171 ---------------------------- docs/releases/v0.6.0.md | 141 ----------------------- docs/releases/v0.6.1.md | 28 ----- docs/releases/v0.6.2.md | 69 ----------- docs/user/cli/index.md | 2 +- docs/user/cli/reference.md | 29 ++--- docs/user/clusters/config.md | 2 +- docs/user/clusters/index.md | 4 +- docs/user/operations/maintenance.md | 2 +- docs/user/operations/policy.md | 2 +- docs/user/operations/server.md | 2 +- docs/user/search/embeddings.md | 2 +- 20 files changed, 23 insertions(+), 1018 deletions(-) delete mode 100644 docs/releases/v0.2.0.md delete mode 100644 docs/releases/v0.2.1.md delete mode 100644 docs/releases/v0.2.2.md delete mode 100644 docs/releases/v0.3.0.md delete mode 100644 docs/releases/v0.3.1.md delete mode 100644 docs/releases/v0.4.0.md delete mode 100644 docs/releases/v0.4.1.md delete mode 100644 docs/releases/v0.4.2.md delete mode 100644 docs/releases/v0.5.0.md delete mode 100644 docs/releases/v0.6.0.md delete mode 100644 docs/releases/v0.6.1.md delete mode 100644 docs/releases/v0.6.2.md diff --git a/docs/releases/v0.2.0.md b/docs/releases/v0.2.0.md deleted file mode 100644 index 7872ecf..0000000 --- a/docs/releases/v0.2.0.md +++ /dev/null @@ -1,86 +0,0 @@ -# Omnigraph v0.2.0 - -Omnigraph v0.2.0 focuses on day-to-day operability: safer schema evolution, more capable mutation queries, better local and remote ergonomics, and a documented HTTP surface for clients and tooling. - -This release is especially relevant if you are running Omnigraph locally on RustFS or using the CLI and server together as a graph application backend. - -## Highlights - -### Schema planning and apply - -Schema changes can now move from planning to execution with first-class CLI and server support. - -- Added `omnigraph schema apply --schema ...` alongside `schema plan` -- Added `POST /schema/apply` on the server -- Added policy support for schema application through the `schema_apply` action -- Persisted accepted schema updates as part of a supported apply flow - -This makes schema evolution an actual product capability instead of a plan-only diagnostic. - -### Safer schema apply on live repos - -After the initial schema-apply rollout, the apply path was hardened to avoid clobbering concurrent writes and to preserve indexes during table rewrites. - -- Blocks writes while schema apply is in progress -- Verifies source heads before publishing rewritten tables -- Rebuilds the full expected index set after rewrite operations -- Keeps schema apply constrained to repos whose only branch is `main` - -The result is a much more defensible v1 schema migration path. - -### Multi-statement mutations - -Mutation queries can now contain multiple sequential statements that execute atomically within one run. - -Example: - -```gq -query add_and_link($name: String, $age: I32, $friend: String) { - insert Person { name: $name, age: $age } - insert Knows { from: $name, to: $friend } -} -``` - -This is a meaningful step toward richer write-side workflows without forcing multiple client round trips. - -### OpenAPI support - -The server now publishes an OpenAPI document at `/openapi.json`. - -- Added schema-backed endpoint documentation for the Omnigraph HTTP API -- Documented request and response types for the current server surface -- Made the published spec reflect runtime auth mode, so open local deployments are documented correctly - -This makes Omnigraph easier to integrate with generated clients, inspection tools, and API consumers that want a machine-readable contract. - -### CLI and export ergonomics - -Several rough edges in the CLI were fixed. - -- Export now streams instead of buffering the full snapshot in memory first -- Load summaries now report actual loaded row counts -- Alias handling no longer steals legitimate first arguments -- `commit show` matches the documented `--uri` usage -- Remote and local usage are more consistent for common admin flows - -## Additional Improvements - -- RustFS CI is now scoped to relevant changes instead of burning time on unrelated pull requests -- README and install docs were tightened around public binary install behavior -- The local RustFS bootstrap remains aligned with the rolling `edge` binary channel - -## Upgrade Notes - -- If you use local or remote schema administration, prefer `schema plan` before `schema apply` -- `schema apply` is intentionally conservative in v1 and rejects repos with non-`main` branches -- If policy is enabled, make sure admin actors are allowed to perform `schema_apply` -- If you rely on published binaries, this release is the point where stable installers can pick up schema apply and the newer CLI/runtime behavior without using `edge` - -## Included Changes - -- PR #2: CLI ergonomics and streamed export output -- PR #5: schema apply command and policy support -- PR #7: schema apply concurrency and index-preservation hardening -- PR #4: multi-statement mutations -- PR #1: OpenAPI generation and auth-aware `/openapi.json` -- PR #8: RustFS CI scoping improvements diff --git a/docs/releases/v0.2.1.md b/docs/releases/v0.2.1.md deleted file mode 100644 index b840885..0000000 --- a/docs/releases/v0.2.1.md +++ /dev/null @@ -1,59 +0,0 @@ -# Omnigraph v0.2.1 - -Omnigraph v0.2.1 is a focused follow-up release on top of v0.2.0. It adds query linting, improves query execution correctness, hardens the local RustFS bootstrap flow, and cleans up project config naming. - -## Highlights - -### Query lint and check - -The CLI now ships a first-class query validation surface: - -- `omnigraph query lint` -- `omnigraph query check` - -These commands validate `.gq` files against either an explicit schema file or a local/S3-backed repo schema, emit structured results, and support both human-readable and JSON output. - -### Query execution fixes and aggregate support - -This release includes several improvements in the query engine: - -- aggregate execution support for read queries -- nullable query parameters now accept omission and explicit null for nullable params -- traversal planning and join alignment are more robust for traversal-introduced bindings - -Together, these changes make complex read queries more dependable and easier to author. - -### Better local RustFS startup - -The local RustFS bootstrap is more resilient: - -- detects dirty/stale repo prefixes before blindly reinitializing -- makes bootstrap recovery clearer for persisted local RustFS state -- ships a more generic demo fixture instead of user-specific seed content - -This reduces the most common failure mode in local-first setup. - -### Config terminology cleanup - -`omnigraph.yaml` now uses graph-oriented naming: - -- `graphs:` instead of `targets:` -- `cli.graph` / `server.graph` instead of `target` - -This removes one of the more confusing overloaded terms in the CLI/server config model. - -## Included Changes - -- PR #15: query lint and query check commands -- PR #6: aggregate execution support -- PR #3: nullable query parameter fixes -- PR #16: traversal planning and join-alignment fixes -- PR #13: local RustFS bootstrap recovery hardening -- PR #14: generic bootstrap fixture -- PR #17: config rename from targets to graphs - -## Upgrade Notes - -- If you maintain `.gq` files in-repo, add `omnigraph query lint` to your local validation workflow -- Existing configs must use `graphs:` / `graph:` after this release -- Local RustFS users should prefer the current bootstrap script from `main` or this release rather than older cached copies diff --git a/docs/releases/v0.2.2.md b/docs/releases/v0.2.2.md deleted file mode 100644 index 88d086e..0000000 --- a/docs/releases/v0.2.2.md +++ /dev/null @@ -1,29 +0,0 @@ -# Omnigraph v0.2.2 - -Omnigraph v0.2.2 is a packaging follow-up to v0.2.1. It keeps the CLI and server surface the same, but renames the published runtime crate from `omnigraph` to `omnigraph-engine` so the full crate set can be published cleanly to crates.io. - -## Highlights - -### Published runtime crate rename - -The runtime package is now published as: - -- `omnigraph-engine` - -The in-code Rust library name remains `omnigraph`, so internal imports and code paths stay stable. CLI users are unaffected. - -### Crates.io metadata cleanup - -All published crates now ship repository, homepage, and documentation metadata so the crates.io pages are complete and the release pipeline no longer emits missing-package-metadata warnings. - -## Included Changes - -- rename runtime package from `omnigraph` to `omnigraph-engine` -- bump `omnigraph-engine`, `omnigraph-compiler`, `omnigraph-server`, and `omnigraph-cli` to `0.2.2` -- update dependent manifests and CI package references to the new runtime package name - -## Upgrade Notes - -- Rust consumers should depend on `omnigraph-engine` on crates.io -- Code that imports the library can continue using `omnigraph` as the crate name -- The `omnigraph` CLI binary name is unchanged diff --git a/docs/releases/v0.3.0.md b/docs/releases/v0.3.0.md deleted file mode 100644 index 4c900a7..0000000 --- a/docs/releases/v0.3.0.md +++ /dev/null @@ -1,49 +0,0 @@ -# Omnigraph v0.3.0 - -Omnigraph v0.3.0 is a feature and security release. It adds an AWS deployment path for the server, hardens bearer-token authentication, introduces a schema inspection endpoint, and ships the CodeBuild-driven image packaging pipeline. - -## Highlights - -### AWS deployment path - -A new `aws` Cargo feature enables an AWS-native bearer-token backend. When compiled with `--features aws` and pointed at an AWS Secrets Manager secret ARN via `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET`, the server fetches and parses bearer tokens directly from Secrets Manager at startup. The token loading path is abstracted behind a `TokenSource` trait so additional backends are easy to add. - -A manually-dispatched Package workflow builds two variants of the server image (default and `--features aws`) via AWS CodeBuild, tags them by source SHA in ECR, and records the digests for downstream deploy automation. - -### Bearer auth hardening - -Bearer tokens are now hashed (SHA-256) at rest inside the server and compared using constant-time equality (`subtle::ConstantTimeEq`). The authenticated actor id is resolved server-side from the hash match — requests can no longer assert their own actor id by setting a header. - -### Schema inspection API - -A new `GET /schema` endpoint and matching CLI `schema get` command return the active graph schema as JSON. A static OpenAPI spec is published at `openapi.json` and kept in sync with the server via a CI job. - -### Stricter run-branch hygiene - -Internal `__run__…` branches, used for short-lived write staging, are now filtered out of user-visible branch listings and are deleted on every terminal state transition instead of accumulating over time. - -## Breaking changes - -### Schema state is now required - -The server refuses to open a repo that lacks persisted schema state (`_schema.pg`, `_schema.ir.json`, `__schema_state.json`) or that has non-main public branches left over from earlier versions. Existing repos created with 0.2.x need to be reinitialized (or have their schema state written explicitly) before they can be opened with 0.3.0. - -## Included Changes - -- Add `aws` feature + `SecretsManagerTokenSource` backend -- Extract `TokenSource` trait for bearer token loading -- Harden bearer auth: constant-time compare, SHA-256 hashed at rest, server-authoritative actor id -- Add manually-dispatched Package workflow for CodeBuild image builds (default + aws variants) -- Add `GET /schema` endpoint and `schema get` CLI command -- Ship static `openapi.json` spec with CI auto-sync -- Filter and delete ephemeral `__run__` branches -- Switch Dockerfile base to ECR Public (avoid Docker Hub rate limits) -- Raise `LANCE_MEM_POOL_SIZE` default to 1 GB for stable parallel tests -- Automate Homebrew tap updates on release tags -- Documentation for the AWS build variant and bearer-token sources - -## Upgrade Notes - -- Repos created with 0.2.x must be reinitialized (or have their schema state generated) before they can be opened with 0.3.0 -- Deployments using AWS Secrets Manager for bearer tokens must build the server with `--features aws` and set `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` to the secret ARN -- The default token source (env var or JSON file) continues to work unchanged diff --git a/docs/releases/v0.3.1.md b/docs/releases/v0.3.1.md deleted file mode 100644 index 1f5d7dc..0000000 --- a/docs/releases/v0.3.1.md +++ /dev/null @@ -1,19 +0,0 @@ -# Omnigraph v0.3.1 - -Omnigraph v0.3.1 is a performance and operability point release. - -## Highlights - -- **Parallel per-type load writes**: the bulk loader writes to each node/edge table concurrently rather than serially, materially reducing wall-clock time on multi-table loads. -- **`omnigraph optimize` and `omnigraph cleanup` CLI commands**: previously only available via the engine API. `optimize` runs Lance `compact_files()` across every node/edge table; `cleanup` runs Lance `cleanup_old_versions()` with a `--keep`/`--older-than` policy and requires `--confirm` for the destructive form. -- **Dst-id deduplication during edge expand hydration**: avoids redundant lookups when the same destination id appears multiple times in an `Expand` step (#45). - -## Included Changes - -- Parallel per-type load writes (#46) -- `omnigraph optimize` / `cleanup` CLI commands and runtime APIs (#46) -- Dedupe dst ids before hydrating nodes in `execute_expand` (#45) - -## Upgrade Notes - -No breaking changes. Existing v0.3.0 repos can be opened directly with v0.3.1. diff --git a/docs/releases/v0.4.0.md b/docs/releases/v0.4.0.md deleted file mode 100644 index d3a8244..0000000 --- a/docs/releases/v0.4.0.md +++ /dev/null @@ -1,88 +0,0 @@ -# Omnigraph v0.4.0 - -Omnigraph v0.4.0 demotes the Run state machine to commit metadata via the -publisher's CAS, fixing a write-cancellation hole and reducing the engine's -surface area. - -## Highlights - -- **Direct-to-target writes**: `mutate_as` and `load` write - directly to the target tables and call - `ManifestBatchPublisher::publish` once at the end with - `expected_table_versions`. No more `__run__` staging branches, no - more `RunRecord` state machine. Cross-table OCC is enforced inside the - publisher's row-level CAS on `__manifest`. -- **Cancellation safety by construction**: a dropped mutation future - leaves no graph-level state — only orphaned Lance fragments, reclaimed - by `omnigraph cleanup`. The "zombie run" cascade documented in - `.context/zombie-run-investigation.md` is gone. -- **Read-your-writes inside multi-statement mutations**: a `.gq` query - that inserts and then references a row in the same statement now sees - its own writes via an in-process `MutationStaging` cache, even though - no manifest commit happens between ops. -- **Structured conflict surface**: concurrent writers race through the - publisher's CAS; the loser surfaces as - `ManifestConflictDetails::ExpectedVersionMismatch { table_key, - expected, actual }`. The HTTP server maps this to **409 Conflict** with - a structured `manifest_conflict` body so clients can detect-and-retry - without parsing the message. - -## Removed - -This is a breaking release. Pre-0.4.0 / no SLA. - -- `omnigraph::db::{RunRecord, RunStatus, RunId}` types and the - `_graph_runs.lance` / `_graph_run_actors.lance` Lance datasets. -- Engine APIs `begin_run`, `begin_run_as`, `publish_run`, - `publish_run_as`, `abort_run`, `fail_run`, `terminate_run`, - `list_runs`, `get_run`. -- HTTP endpoints: `GET /runs`, `GET /runs/{run_id}`, `POST - /runs/{run_id}/publish`, `POST /runs/{run_id}/abort`. The - `RunListOutput` and `RunOutput` schemas are removed from the OpenAPI - document. -- CLI subcommands: `omnigraph run list`, `omnigraph run show`, `omnigraph - run publish`, `omnigraph run abort`. Use `omnigraph commit list` - reading the commit graph for audit history. -- Cedar policy actions `run_publish` and `run_abort`. Existing - `policy.yaml` files referencing these actions will fail validation — - remove the rules; the `change` action covers the equivalent gating. - -## Behavior changes - -- `mutate_as` / `load` are now **atomic per query, single publish at the - end**. A failed mutation leaves the target unchanged with no - intermediate manifest commits. -- The `OmniError::manifest_conflict` shape produced by concurrent - writers is now `ExpectedVersionMismatch` (was `MergeConflict::DivergentUpdate` - via the run merge path). Clients that match on the conflict body must - switch to inspecting `manifest_conflict.table_key/expected/actual`. - -## Known limitation - -A multi-statement mutation that writes a Lance fragment in op-N and then -fails in op-N+1 leaves the touched table with Lance HEAD ahead of the -manifest. The next mutation against that table fails with -`ExpectedVersionMismatch`. Most validation runs before any Lance write, -so single-statement mutations are unaffected; the narrow path is -multi-statement queries with late-op failures. Tracked as a follow-up; -see [docs/dev/writes.md](../dev/writes.md#mid-query-partial-failure-closed-by-mr-794) -for the workaround. - -## Upgrade notes - -- **Stale `__run__*` branches and `_graph_runs.lance`** in legacy v0.3.x - repos are *inert* — the engine no longer reads them — but they remain - on disk until production cleanup. This release deliberately does not touch - legacy bytes. -- The `is_internal_run_branch` predicate is kept as a defense-in-depth - guard against users naming a branch `__run__*`. It will be removed in - a follow-up cleanup. -- External scripts hitting `/runs/*` will now receive 404. Migrate them - to `/commits` for audit history; mutation status is implied by the - HTTP response on `/change` itself. - -## Included Changes - -- Demote Run: write directly to target via publisher -- `ManifestBatchPublisher::publish` accepts per-table - `expected_table_versions` diff --git a/docs/releases/v0.4.1.md b/docs/releases/v0.4.1.md deleted file mode 100644 index 4983015..0000000 --- a/docs/releases/v0.4.1.md +++ /dev/null @@ -1,142 +0,0 @@ -# Omnigraph v0.4.1 - -Omnigraph v0.4.1 closes the multi-statement-mutation atomicity gap that -v0.4.0 documented as a known limitation. Inserts and updates now route -through an in-memory `MutationStaging` accumulator and commit via Lance's -two-phase distributed-write API at end-of-query. A failed mid-query op -no longer leaves Lance HEAD drifted on the touched table — the next -mutation proceeds normally. - -## Highlights - -- **Staged-write rewire**: `mutate_as` and `load` (Append / - Merge modes) accumulate insert/update batches into - `MutationStaging.pending` per touched table. No Lance HEAD advance - happens during op execution; one `stage_*` + `commit_staged` per - table runs at end-of-query, then `ManifestBatchPublisher::publish` - commits the manifest atomically. **For op-execution failures** - (validation errors, missing endpoints, parse-time D₂ rejection), Lance - HEAD on every staged table is untouched and the next mutation - proceeds normally. A narrowed residual remains at the - finalize→publisher boundary (multi-table `commit_staged` is not - atomic with the manifest commit) — see [docs/dev/writes.md](../dev/writes.md) - "Finalize → publisher residual" for details. -- **D₂ parse-time rule**: a single mutation query is either - insert/update-only or delete-only. Mixed → rejected with a clear - error directing the caller to split into two queries. Lance 4.0.0 - has no public two-phase delete; deletes still inline-commit, and D₂ - keeps that path safe. -- **Read-your-writes via DataFusion `MemTable`**: read sites in - multi-statement mutations consume `TableStore::scan_with_pending`, - which Lance-scans the committed snapshot at the captured - `expected_version` and unions with a DataFusion `MemTable` over the - pending batches. Replaces the previous "reopen at staged Lance - version" pattern. -- **Coordinator swap-restore eliminated** from `mutate_with_current_actor`. - Branch is threaded explicitly through the per-op execution path - (`execute_named_mutation`, `execute_insert`, `execute_update`, - `execute_delete*`, `validate_edge_insert_endpoints`, - `ensure_node_id_exists`). The `swap_coordinator_for_branch` / - `restore_coordinator` API and `CoordinatorRestoreGuard` are removed - from `mutation.rs`. (`merge.rs` keeps its own swap pattern; that's - a separate workflow.) -- **`docs/dev/invariants.md` mutation atomicity / read-your-writes status** - flips from `aspirational/open` to `upheld for inserts/updates`. The within-query read-your-writes - guarantee is now load-bearing for the publisher CAS contract. - -## Behavior changes - -- A failed multi-statement mutation no longer surfaces - `ExpectedVersionMismatch` on the *next* mutation against the same - table. The next call proceeds normally — Lance HEAD on staged - tables is unchanged. -- Mixed insert/update + delete in one query is rejected at parse - time. Existing test queries that mixed both must be split. -- `MutationStaging`'s shape changed: `pending: HashMap` - + `inline_committed: HashMap` replaces the - previous `latest: HashMap`. This is an internal - type; no public API impact. - -## Residual / out of scope - -- **`LoadMode::Overwrite`** keeps the legacy inline-commit path - (truncate-then-append doesn't fit the staged shape). A mid-overwrite - failure can still drift Lance HEAD on a partially-truncated table; - the next overwrite replaces it. Operator-driven, rare. -- **Delete-only multi-statement mutations** still inline-commit per op. - D₂ keeps inserts/updates from coexisting with deletes, so the - inline path remains atomic per op but not per query for delete-only - cascades. Closing this requires Lance to expose - `DeleteJob::execute_uncommitted`; tracked upstream with Lance. -- **`schema_apply`, `branch_merge_internal`, `ensure_indices`** still - use Lance's inline-commit APIs. The two-phase pattern is in - `mutate_as` and `load` only; hoisting it to a storage-trait invariant - covering all writers remains future work. - -## Tests added - -- `tests/writes.rs::partial_failure_leaves_target_queryable_and_unblocks_next_mutation` - (replaces the old `partial_failure_observably_rolls_back_but_blocks_next_mutation_on_same_table`) -- `tests/writes.rs::mutation_rejects_mixed_insert_and_delete_at_parse_time` -- `tests/writes.rs::mixed_insert_and_update_on_same_person_coalesces_to_one_merge` -- `tests/writes.rs::multiple_appends_to_same_edge_coalesce_to_one_append` -- `tests/writes.rs::multi_statement_inserts_publish_exactly_once` -- `tests/writes.rs::load_with_bad_edge_reference_unblocks_next_load` -- `tests/writes.rs::load_with_cardinality_violation_unblocks_next_load` - -## Files changed - -- `crates/omnigraph/src/exec/staging.rs` (NEW) — `MutationStaging`, - `PendingTable`, `PendingMode`, `StagedTablePath`, - `dedupe_merge_batches_by_id`. -- `crates/omnigraph/src/exec/mutation.rs` — D₂ check; per-op - rewires (`execute_insert`, `execute_update`, `execute_delete*`); - branch threading; coordinator-swap removal; helper - `validate_edge_cardinality_with_pending`; helper - `concat_match_batches_to_schema`; `apply_assignments` updated to - copy unassigned blob columns from full-schema scans. -- `crates/omnigraph/src/loader/mod.rs` — `load_jsonl_reader` split: - staged path for Append/Merge, legacy inline-commit path for - Overwrite. Helpers `collect_node_ids_with_pending` and - `validate_edge_cardinality_with_pending_loader`. -- `crates/omnigraph/src/table_store.rs` — `scan_with_pending`, - `count_rows_with_pending` (DataFusion `MemTable`-backed union with - Lance scan). -- `Cargo.toml` (workspace) + `crates/omnigraph/Cargo.toml` — added - `datafusion = "52"` direct dep (transitively pulled by Lance - already; required for `MemTable`). -- `docs/dev/writes.md` — removed "Known limitation" section; documented - the new accumulator + D₂ + LoadMode::Overwrite residual. -- `docs/dev/invariants.md` — mutation atomicity / read-your-writes status - flipped to `upheld for inserts/updates`. -- `docs/dev/architecture.md` — added "Mutation atomicity — in-memory - accumulator" subsection; refreshed the engine + state - diagrams to drop `RunRegistry` and add `MutationStaging`. -- `docs/dev/execution.md` — rewrote the mutation flow sequence diagram - for the staged-write path; updated the `LoadMode` table to call - out per-mode commit semantics; rewrote `load` vs `ingest`. -- `docs/user/query-language.md` — documented the D₂ parse-time rule. -- `docs/user/errors.md` — added the D₂ `BadRequest` rejection path. -- `docs/user/storage.md` — dropped the live `_graph_runs.lance` reference - from the layout diagram and prose. -- `docs/user/branches-commits.md` — moved `__run__` to a legacy note; - removed `publish_run` from the publish-trigger list. -- `docs/user/audit.md` — current `_as` API list refreshed; legacy - `RunRecord.actor_id` moved to a historical note. -- `docs/user/constants.md` — marked the run registry / branch-prefix rows - as legacy. -- `docs/user/cli.md` — replaced the legacy `omnigraph run *` quickstart - block with `omnigraph commit list/show`. -- `docs/dev/testing.md` — extended the `writes.rs` row to cover the new - staged-write contract tests; added the `staged_writes.rs` row. -- `AGENTS.md` (CLAUDE.md symlink) — updated the atomic-per-query - description and the L2 capability matrix row. - -## Included Changes - -- Rewire `mutate_as` and `load` via in-memory `MutationStaging` + - `stage_*` / `commit_staged` per touched table at end-of-query. -- (The storage substrate shipped in v0.4.0's PR #67 — `StagedWrite`, - `stage_append`, `stage_merge_insert`, `commit_staged`, - `scan_with_staged`, `count_rows_with_staged` — and is the substrate - this release builds on.) diff --git a/docs/releases/v0.4.2.md b/docs/releases/v0.4.2.md deleted file mode 100644 index bc45716..0000000 --- a/docs/releases/v0.4.2.md +++ /dev/null @@ -1,115 +0,0 @@ -# Omnigraph v0.4.2 - -Omnigraph v0.4.2 is a concurrency, admission-control, and release-hygiene -release. It removes the server-global write lock, lets disjoint writers make -progress concurrently, adds per-actor admission limits, hardens branch and -mutation races with snapshot-isolation fences, and documents the release in -public open-source terms. - -## Highlights - -- **Unlocked server engine handle**: the HTTP server now holds the engine behind - a shared handle instead of a server-global write lock. Concurrent handlers can - call engine APIs directly while the engine serializes only the resources that - actually conflict. -- **Engine-owned writer queues**: same `(table, branch)` writers are serialized - by per-table writer queues inside the engine, while disjoint table/branch - writes can run concurrently. This narrows contention without relying on route - handlers to know storage-level ordering rules. -- **Per-actor admission control**: mutating HTTP handlers are gated by a - `WorkloadController` with per-actor in-flight request and estimated-byte - budgets. Rejections use HTTP 429 with `code: too_many_requests` and a - `Retry-After` header, so noisy actors back off without blocking unrelated - actors. -- **Admission coverage for all mutating handlers**: `/change`, `/ingest`, - `/schema/apply`, branch create/delete, and branch merge now flow through the - admission controller. Read-only endpoints are not admission-gated. -- **Op-kind-aware version checks**: mutation commit-time drift checks distinguish - append-like inserts from strict update/delete work. Inserts remain permissive - enough for safe concurrent append patterns; updates and deletes get stricter - stale-view rejection. -- **Read-time drift checks for strict mutations**: staged mutations compare the - manifest pin captured when the query opened against the manifest snapshot - captured under table-queue ownership. If a concurrent writer moved the table - after the query read, the stale writer returns a structured - `manifest_conflict` 409 instead of staging work computed against an old - snapshot. -- **Inline-delete recovery coverage**: delete-only mutations still use Lance's - inline delete path, but their recovery sidecar is now written before the - manifest-version rejection path can return. If a delete moves Lance HEAD and a - concurrent manifest update makes the query stale, the next read-write open can - roll the residual back rather than leaving a head-ahead-of-manifest table. -- **Branch-operation race hardening**: branch creation and branch merge avoid - coordinator swap-restore races that could expose the wrong active branch to - concurrent work. Concurrent branch merges are serialized by a merge mutex. -- **Branch-merge target revalidation**: merges re-check target table versions - after acquiring target write queues. A stale merge plan returns a structured - conflict instead of overwriting concurrent target-branch changes or adopting a - source table over newly appended target rows. -- **Schema refresh deadlock fix**: recovery refresh releases the write guard - before schema reload, preventing a refresh/schema-apply deadlock. -- **Lean admission API**: removed the unused global rewrite admission pool, - `service_unavailable` error variant, related 503 documentation, and benchmark - flag. The public server surface now reflects only admission behavior that is - wired to handlers. -- **Open-source release hygiene**: this release adds guidance for public-facing - documentation, release notes, and version bumps. Release docs now avoid - private issue tracker references and use stable public descriptions instead. - -## Behavior changes - -- Disjoint mutating HTTP requests can now make progress concurrently instead of - queueing behind one process-wide engine write lock. -- Mutating handlers may return HTTP 429 when an actor exceeds per-actor in-flight - or estimated-byte budgets. Clients should respect `Retry-After` and retry - later. -- Concurrent update/delete and merge races now return structured - `manifest_conflict` 409 responses in more stale-view cases instead of relying - on later publisher-CAS detection or allowing a stale plan to proceed. -- Concurrent branch merge × change on the same target branch may return either - success or a clean 409 conflict, depending on which operation wins the queue. -- `OMNIGRAPH_GLOBAL_REWRITE_MAX` is no longer recognized. Remove it from - deployment manifests; use the per-actor in-flight and byte-budget admission - settings for the currently wired server controls. - -## Upgrade Notes - -- No repository migration is required. Existing v0.4.1 repos can be opened - directly with v0.4.2. -- Clients should treat `manifest_conflict` 409 responses as retryable stale-view - conflicts. This was already the documented contract, but this release uses it - in more concurrent-write paths. -- Clients should handle HTTP 429 from every mutating endpoint, not only - `/change`. Honor the `Retry-After` header. -- Operators should remove stale references to global rewrite admission and 503 - rewrite-pool exhaustion from local runbooks. -- If you maintain public docs or release notes, use public identifiers and - user-facing descriptions rather than private tracker IDs. - -## Tests added or strengthened - -- Regression tests for update read-your-writes under in-process concurrency. -- HTTP tests for same-key insert snapshots, disjoint `/change` concurrency, and - `/ingest` admission 429 + `Retry-After`. -- Branch-operation regression tests for branch-create swap-restore races, - concurrent `/change` + branch-merge interleavings, branch-merge swap-restore - races, branch-op matrix coverage, and post-reopen consistency. -- Failpoint-backed regression coverage for inline-delete recovery sidecar - creation before version-mismatch rejection. -- Admission tests use injectable `WorkloadController` state instead of mutating - process environment. - -## Included Changes - -- Shared server engine state and per-actor admission on mutating endpoints. -- Per-(table, branch) writer queues and op-kind-aware manifest drift checks. -- Strict read-time version checks for updates/deletes. -- Branch create/merge race hardening and branch-merge target snapshot - revalidation under queue ownership. -- Retry-after support for admission rejections and OpenAPI updates for reachable - 429 responses. -- Actor-isolation benchmark harness updates for the current admission controller. -- Removal of the unwired global rewrite admission / 503 server surface. -- Version bump to `0.4.2` across workspace crates, `Cargo.lock`, and - `openapi.json`. -- Public release-note cleanup and new OSS best-practice guidance in `AGENTS.md`. diff --git a/docs/releases/v0.5.0.md b/docs/releases/v0.5.0.md deleted file mode 100644 index 16e284e..0000000 --- a/docs/releases/v0.5.0.md +++ /dev/null @@ -1,171 +0,0 @@ -# Omnigraph v0.5.0 - -Omnigraph v0.5.0 is a substrate, security, and migration-safety release. It -jumps the storage substrate from Lance 4 to Lance 6.0.1 (DataFusion 52 → 53, -Arrow 57 → 58), introduces engine-wide Cedar policy enforcement on every -authoring path, and ships a structured schema-lint v1 chassis with -code-tagged diagnostics, soft drops, and an explicit `--allow-data-loss` -flag for destructive migrations. - -## Highlights - -- **Lance 6.0.1 substrate**: bump from Lance 4.0.0 → 6.0.1, DataFusion 52 → - 53, Arrow 57 → 58. New optimizer rules (vectorized `IN`-list eq kernel, - `PhysicalExprSimplifier`, push-limit-into-hash-join, CASE-NULL shortcut) - reach predicates that flow through the engine. `lance-tokenizer` replaces - tantivy internally; FTS behavior preserved. -- **Cedar policy engine**: a new `omnigraph-policy` crate wires - `Omnigraph::enforce(action, scope, actor)` into every `_as` writer - (`mutate_as`, `load_as`, `apply_schema_as`, `branch_create_as`, - `branch_merge_as`, `branch_delete_as`, plus the load and change - variants). The HTTP server defaults to deny-all when no Cedar policy is - configured; a YAML policy file is required to enable writes. Actor - identity comes only from signed token claims — clients cannot set actor - identity directly. -- **Schema lint v1 chassis**: diagnostics now carry stable codes of the form - `OG-XXX-NNN` instead of free-form messages. `omnigraph schema plan` and - `apply` understand soft drops on properties and types — destructive drops - require the new `--allow-data-loss` flag (Hard mode) at the CLI and an - equivalent JSON flag over HTTP. -- **Structured filter pushdown**: query-language predicates lower to - DataFusion `Expr` and push down through Lance's `Scanner::filter_expr` - instead of being flattened to SQL strings. This unlocks `CompOp::Contains` - pushdown (via `array_has`), which previously fell through to in-memory - post-scan filtering, and lets the DataFusion 53 optimizer rules above act - on our predicates. -- **HTTP `allow_data_loss` parity**: the destructive-drop guard now exists - on both the CLI (`--allow-data-loss`) and HTTP (`allow_data_loss: true` in - the schema-apply request body). -- **Inline query strings on CLI and HTTP**: `omnigraph read` / - `omnigraph mutate` and the corresponding HTTP endpoints accept inline - `.gq` source, not just a file path. Easier ad-hoc queries, clearer - request logs. -- **Browser CORS layer**: optional CORS layer on `omnigraph-server` for - browser-based UIs, gated by `OMNIGRAPH_CORS_ORIGINS`. -- **Merge-insert dup-rowid fix**: Lance's `MergeInsertBuilder` could surface - spurious `"Ambiguous merge inserts"` errors on sequential merges against - rows previously rewritten by `merge_insert`. The engine now opts into - `SourceDedupeBehavior::FirstSeen` with a `check_batch_unique_by_keys` - fail-fast precondition that guarantees source-side dedup happens before - Lance sees the batch. -- **Branch-merge error-path recovery**: a branch merge that failed - mid-flight could leave the in-process coordinator pointing at a stale - active branch. The error path now restores the prior coordinator, - matching the success path's invariant. -- **Branch merge with blob columns**: external blob URIs are now - materialized correctly during branch merge instead of being dropped or - pointing at the source branch. -- **Lance API surface guards**: a new test file - (`crates/omnigraph/tests/lance_surface_guards.rs`) pins eight specific - Lance API surfaces (`LanceError::TooMuchWriteContention`, - `ManifestLocation` fields, `MergeInsertBuilder` return shape, - `WriteParams::default`, `compact_files` signature, etc.) so the next - Lance bump fails compile or runtime on any silent drift rather than - producing wrong-state recovery in production. - -## Behavior changes - -- **On-disk format unchanged**: existing v0.4.2 datasets open unchanged. - The Lance file format pin stays at V2_2 (required by Lance's blob v2 - feature). -- **`omnigraph-server` defaults to deny-all under `--policy`**: starting a - server with the policy feature enabled but no Cedar YAML policy - configured rejects every write. Operators must supply a policy file to - authorize anything. -- **Schema-lint diagnostics carry stable codes**: messages now lead with - `OG-XXX-NNN`. CI parsers or tooling that keyed off the v0.4.2 free-form - text need to switch to code-based matching. -- **Destructive schema drops require `--allow-data-loss`**: dropping a - property or type returns a structured diagnostic by default. - `omnigraph schema apply --allow-data-loss` (CLI) or - `{"allow_data_loss": true}` (HTTP) opts into Hard mode. -- **`HashJoinExec` null-aware semantics on anti-join**: a side effect of - the DataFusion 53 bump — `NOT IN` semantics under null-valued anti-join - columns are now correct per SQL standard. Queries that depended on the - prior behavior would have been incorrect. - -## Upgrade Notes - -### Migration - -- No data migration. v0.4.2 repos open directly on v0.5.0. - -### Clients - -- HTTP and SDK clients should switch any string-matching schema-lint - parsing to code-based matching against the `OG-XXX-NNN` prefix. -- Clients exercising destructive schema drops (`DropProperty`, `DropType`) - must add the `allow_data_loss` request field (HTTP) or - `--allow-data-loss` flag (CLI). Default is soft-drop-or-reject. -- Clients consuming `mutate_as` / `load_as` / `apply_schema_as` / branch - authoring APIs now flow through the policy enforcer. Anything bypassing - authorization on v0.4.2 will be rejected on v0.5.0 once a policy is - configured. - -### Operators - -- Configure a Cedar policy YAML for production servers before enabling - writes; deny-all is the new default. The `omnigraph policy validate` / - `test` / `explain` CLI commands are unchanged. -- Bearer tokens continue to be the actor-identity source; review the - signed-token-claim-only invariant in `docs/dev/invariants.md` if you've - built custom authentication. -- If your local CI uses RustFS for S3-compatible storage testing, our CI - pins `rustfs/rustfs:1.0.0-beta.3` (the last known-good tag before the - upstream credentials-policy change). Mirror the pin or set - `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` for the new image - versions. - -## Tests added or strengthened - -- `crates/omnigraph/tests/lance_surface_guards.rs` — 8 named guards pinning - Lance API surfaces against silent drift on future bumps. -- `crates/omnigraph/tests/policy_engine_chassis.rs` — engine-level policy - enforcement coverage; complements the existing HTTP policy tests. -- Policy chassis e2e gap-fills — branch-merge, branch-create, branch-delete - policy paths now have explicit end-to-end tests over HTTP and CLI. -- Merge-pair truth table — exhaustive op-variant matrix for three-way - merge across `noop`, `addNode`, `removeNode`, `addEdge`, `removeEdge`, - `setProperty`, `dropProperty`, `addLabel`, `removeLabel`; the build - fails to compile when a new op variant is added without dispositioning - every pairing. -- Merge-insert: regression for the dup-rowid bug class on the load surface - (`load_merge_repeated_against_overlapping_keys_succeeds`), the update - surface (`second_sequential_update_on_same_row_succeeds`), and the - upstream-Lance-gap canary - (`load_merge_window_2_documents_upstream_lance_gap`). -- Maintenance + destructive-migration coverage — `omnigraph optimize` / - `cleanup` boundary cases, plus schema-apply soft-drop and Hard-mode - paths. -- Stable-row-id preservation across `stage_overwrite` — pins the invariant - that staged overwrites carry stable row IDs through to the committed - fragment set. -- `CompOp::Contains` pushdown regression - (`ir_filter_with_list_contains_pushes_down`) — pins the new structured - Expr pushdown path that retired the in-memory fallback. - -## Included Changes - -- Lance 4 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58 substrate upgrade. -- `omnigraph-policy` crate with engine-wide Cedar enforcement and - signed-token-claim-only actor identity. -- Schema-lint v1 chassis with `OG-XXX-NNN` codes, soft `DropProperty` / - `DropType` semantics, and `--allow-data-loss` for Hard mode. -- HTTP `allow_data_loss` request field parity with the CLI flag. -- Structured DataFusion `Expr` filter pushdown via - `Scanner::filter_expr`, with `CompOp::Contains` lowered through - `array_has`. -- Inline `.gq` source acceptance on CLI and HTTP read/mutate endpoints. -- Optional CORS layer on `omnigraph-server` for browser UIs. -- Bug fixes: merge-insert dup-rowid (FirstSeen + uniqueness precondition), - branch-merge coordinator restore on error, blob-column materialization - during branch merge. -- New Lance API surface-guard test file as the canary for future Lance - bumps. -- Recovery-sidecar coverage extended across the four write paths - (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, - `ensure_indices`) with failpoint regression tests. -- CI: pinned `rustfs/rustfs:1.0.0-beta.3` after the upstream `:latest` - introduced a credentials-policy change. -- Version bump to `0.5.0` across workspace crates, `Cargo.lock`, - `openapi.json`, and the `AGENTS.md` surveyed version. diff --git a/docs/releases/v0.6.0.md b/docs/releases/v0.6.0.md deleted file mode 100644 index 7984056..0000000 --- a/docs/releases/v0.6.0.md +++ /dev/null @@ -1,141 +0,0 @@ -# Omnigraph v0.6.0 - -Three pieces of work land in this release: - -1. The **graph terminology rename** (renamed `Repo` → `Graph` across the Cedar resource model, policy API, and query-lint schema source). -2. **Multi-graph server mode** — one `omnigraph-server` process can now serve 1–10 graphs concurrently behind cluster routes (`/graphs/{graph_id}/...`), with per-graph and server-level Cedar policy, read-only `GET /graphs` enumeration, and CLI parity (`omnigraph graphs list`). -3. **Inline + canonical-named queries and mutations.** New `POST /query` and `POST /mutate` endpoints pair with the CLI's new `-e/--query-string` flag for ad-hoc execution without a temp file. `POST /read` and `POST /change` continue serving indefinitely as deprecated aliases that carry RFC 9745 `Deprecation: true` and RFC 8288 `Link: ; rel="successor-version"` response headers, plus `deprecated: true` in `openapi.json`. Same canonicalization on the CLI: `omnigraph query`, `omnigraph mutate`, and top-level `omnigraph lint` / `omnigraph check` replace `omnigraph read`, `omnigraph change`, and the nested `omnigraph query lint` / `omnigraph query check`. Every deprecated spelling remains a `visible_alias` that warns to stderr once per invocation. - -Runtime add/remove (`POST /graphs`, `DELETE /graphs/{id}`, `omnigraph graphs create`) is **not** in v0.6.0. Operators add or remove graphs by editing `omnigraph.yaml` and restarting. The first cut of `POST /graphs` shipped behind an atomic-YAML-rewrite design that we pulled before release once its concurrency guarantees were challenged (flock-on-renamed-inode race, duplicate-check outside the critical section, and an init-cleanup path that could destroy an existing graph's schema on re-init). The correct fix is a Lance-style cluster catalog (reserve → init → publish with recovery sidecars); that work is deferred. - -## Breaking Changes - -### Graph terminology rename - -- Renamed the Cedar resource entity from `Omnigraph::Repo` to `Omnigraph::Graph`. -- Renamed policy API terminology from `repo_id` to `graph_id` on `PolicyCompiler::compile` (and on the new `PolicyEngine::load_graph` / `PolicyEngine::load_server` loaders described below). -- Renamed query-lint schema source JSON from `"repo"` to `"graph"` for `schema_source.kind`. - -### Multi-graph server mode - -- **Multi-graph deployments lose flat routes.** Single-graph invocation (`omnigraph-server `) is unchanged — same flat `/snapshot`, `/read`, `/branches`, etc. Multi-graph deployments serve those routes under `/graphs/{graph_id}/...`; bare flat paths return 404 in multi mode. -- **`ServerConfig` shape change** (programmatic embedders only): `ServerConfig { uri, policy_file }` is replaced by `ServerConfig { mode: ServerConfigMode }`, where `ServerConfigMode = Single { uri, policy_file } | Multi { graphs, config_path, server_policy_file }`. Callers that use `load_server_settings` are unaffected; callers that construct `ServerConfig` directly need to wrap their fields in `ServerConfigMode::Single`. -- **`AppState`'s routing surface** is `AppState::routing() -> &GraphRouting`, where `GraphRouting = Single { handle } | Multi { registry, config_path }`. The previous `AppState::uri()`, `AppState::mode()`, `AppState::registry()` accessors and the `ServerMode` enum are gone — embedders read `state.routing()` and match on the arm they need. Per-graph URIs live on `handle.uri`. -- **`AppState::new_multi`** is the new multi-graph constructor. Single-mode `new_*` / `open_*` constructors are unchanged. -- **`AuthenticatedActor(Arc)` → `ResolvedActor { actor_id, tenant_id, scopes, source }`** (programmatic embedders only). The struct shape changes, but the HTTP contract — bearer auth and the bearer-derived-actor-identity guarantee — is unchanged. Cluster-mode call sites construct with `tenant_id: None`, `scopes: vec![Scope::Full]`, `source: AuthSource::Static`. The new fields are forward-compat seams for future multi-tenant and OAuth deployments; they're inert in this release. -- **`PolicyEngine::load(path, graph_id)` removed** in favor of two kind-typed loaders: `PolicyEngine::load_graph(path, graph_id)` for per-graph policies and `PolicyEngine::load_server(path)` for server-level policies. Each loader rejects rules whose action `resource_kind()` doesn't match the engine kind — operators who put a `graph_list` rule in a per-graph file (or a `read` rule in a server file) now get a load-time error instead of a silently-never-matching rule. -- **`PolicyRequest::actor_id` field removed.** Actor identity is now a separate parameter on `PolicyEngine::authorize(actor_id, &request)`. The type system enforces the server-authoritative-actor invariant: actor identity is always sourced from the bearer-token match resolved at the auth boundary; handlers cannot smuggle identity through the request body. -- **`Omnigraph::init` is strict by default.** Initialization at a URI that already holds schema files now errors with `OmniError::AlreadyInitialized` instead of silently overwriting. Operators who actually want to overwrite use `InitOptions { force: true }` (CLI: `omnigraph init --force`). Closes the destructive-cleanup footgun where a failed re-init would delete an existing graph's schema files. -- **Top-level `policy.file` is rejected in multi-graph server mode.** It remains valid for single-graph / CLI-local policy. Multi-graph deployments must move graph rules to `graphs..policy.file` and server-scoped `graph_list` rules to `server.policy.file`. -- **Open server startup requires explicit opt-in.** A server with no bearer tokens and no policy now refuses to start unless passed `--unauthenticated` or `OMNIGRAPH_UNAUTHENTICATED=1`. -- **Policy requires bearer tokens.** Configuring any policy file without bearer tokens now refuses startup; otherwise every protected request would 401 before Cedar could evaluate it. -- **Tokens without policy default-deny non-read actions.** Existing authenticated deployments that relied on writes or admin routes without Cedar policy must add policy rules for those actions. -- **`GET /graphs` requires `server.policy.file` in every runtime state.** Even `--unauthenticated` mode keeps server topology closed until the operator explicitly authorizes `graph_list`. - -### Query / mutation rename - -- **`ChangeRequest` field rename**: `query_source` → `query`, `query_name` → `name`. Both legacy names continue to deserialize via `#[serde(alias = "...")]`, so existing clients sending the old JSON keys keep working. CLI remote calls against `/change` still emit the legacy keys verbatim through the `legacy_change_request_body` helper so a newer CLI talking to an older server keeps working byte-for-byte. -- **CLI `omnigraph query lint` / `omnigraph query check`** are now top-level — canonical name is **`omnigraph lint`**. The three deprecated invocations (`omnigraph query lint`, `omnigraph query check`, and bare `omnigraph check`) remain as argv-level shims that rewrite to `omnigraph lint` and print a one-line stderr deprecation warning. `check` is deliberately **not** a clap `visible_alias` on `lint` — two equivalent canonical names would split agent emissions between them depending on training-data drift, so the deprecation pattern (rewrite + warn) gives one unambiguous canonical name in `omnigraph --help`. - -## New - -- **Multi-graph mode**. Invoke with `omnigraph-server --config omnigraph.yaml` where the YAML has a non-empty `graphs:` map and no single-mode selector (no `server.graph`, no CLI `` or `--target`). At startup the server opens every configured graph in parallel (bounded concurrency, fail-fast). -- **`GET /graphs`**. Lists every registered graph, sorted alphabetically by `graph_id`. Auth-required when bearer tokens are configured; Cedar-gated by `PolicyAction::GraphList` against `Omnigraph::Server::"root"`. Returns 405 in single mode. Server-scoped actions require an explicit `server.policy.file` in every runtime state — the management surface is closed by default even in `--unauthenticated` mode so that server topology is never exposed without operator opt-in. -- **CLI `omnigraph graphs list`**. Mirrors the HTTP surface. Rejects local URI targets with a clear message — for remote multi-graph servers only. -- **CLI `omnigraph init --force`**. Bypasses the strict-init preflight when an operator deliberately wants to recover from orphan schema files. Does NOT purge existing Lance datasets; recursive deletion needs `StorageAdapter::delete_prefix` (deferred — see below). -- **Per-graph Cedar policy**. Each entry in the `graphs:` map can carry a `policy.file` path, loaded at startup via `PolicyEngine::load_graph`. Cedar's `Omnigraph::Graph::""` resource is per-graph; the new `Omnigraph::Server::"root"` resource governs server-level actions. -- **Server-level Cedar policy**. `server.policy.file` in the config governs the `graph_list` action on `Omnigraph::Server::"root"`. Required to expose `GET /graphs` in every runtime state — without a server policy the default-deny posture rejects `graph_list`, including in `--unauthenticated` mode. -- **Cedar action vocabulary**: `graph_list` (server-scoped). Runtime `graph_create` / `graph_delete` are reserved but not shipped — see "Deferred." -- **Canonical graph URI identity.** Server startup normalizes graph root URIs before registry insertion and response output, so aliases such as `/tmp/g`, `/tmp/g/`, and `file:///tmp/g` cannot register as distinct graphs that actually share one Lance root. -- **`POST /query`** and **`POST /mutate`**. Canonical inline endpoints. `/query` rejects mutations with a typed 400 (the D2 rule lives at the URL — read-only contract enforced before execution); body uses the clean `{ query, name, params, branch, snapshot }` shape. `/mutate` accepts the same shape for mutations. Both available in single mode and per-graph multi mode (`/graphs/{id}/query`, `/graphs/{id}/mutate`). Internal call sites share two helpers (`run_query`, `run_mutate`) that take decoupled args, not request bodies — the seam MR-969's future stored-query handler plugs into. -- **CLI `omnigraph query` / `omnigraph mutate`** as top-level canonical subcommands. Pairs with new top-level **`omnigraph lint` (alias `check`)** so query validation no longer sits under `omnigraph query`. -- **CLI `-e, --query-string `** on both `omnigraph query` and `omnigraph mutate`. 3-way mutex with `--query ` and `--alias ` — exactly one is required. Empty string rejected. Suits ad-hoc exploration, REPL workflows, and agent tool-use without temp files. -- **Three-channel deprecation signal on `/read` and `/change`**: OpenAPI `deprecated: true` on the operation (every codegen flags the generated SDK method), RFC 9745 `Deprecation: true` response header, and RFC 8288 `Link: ; rel="successor-version"` (or ``) response header. Auto-discoverable; no SDK breakage. -- **`omnigraph.yaml` `aliases..command`** now accepts `query` and `mutate` as canonical values alongside the legacy `read` and `change`. The internal `AliasCommand` enum retains the legacy variant names so serialized configs stay byte-stable. - -## Configuration - -`omnigraph.yaml` schema additions (all optional, single-mode unaffected): - -```yaml -server: - bind: 0.0.0.0:8080 - policy: - file: ./server-policy.yaml # server-level Cedar (graph_list) - -graphs: - alpha: - uri: s3://tenant-bucket/alpha - policy: - file: ./policies/alpha.yaml # per-graph Cedar - beta: - uri: s3://tenant-bucket/beta - # no per-graph policy → engine-layer enforcement is a no-op -``` - -## Deferred - -- **`POST /graphs` runtime graph creation** and **CLI `omnigraph graphs create`**. Pulled before release after the YAML-rewrite design's correctness story didn't survive review. A future release will add a managed cluster catalog (Lance-backed reserve → init → publish with recovery sidecars) and re-expose runtime creation on top of it. Until then, operators add graphs by editing `omnigraph.yaml` and restarting. -- **`DELETE /graphs/{id}`**. Never shipped in v0.6.0; deferred with the same cluster-catalog work. -- **`StorageAdapter::delete_prefix`**. The substrate primitive a managed catalog would need. Will land alongside runtime mutation. -- **`omnigraph init --force` purging Lance state.** Today `--force` only bypasses the schema-file preflight; recursive deletion of existing Lance datasets needs `delete_prefix`. -- **`X-Actor-Id` service delegation forwarding**. Needs durable both-actor audit on `_graph_commits.lance` — out of scope. -- **Hot policy reload**. Restart is cheap at N≤10 graphs. - -## User Impact - -- **No on-disk migration is required.** Existing `.omni` graphs from v0.5.0 (and earlier) open cleanly under v0.6.0 — Lance datasets, `__manifest`, `_schema.pg`, `_schema.ir.json`, `__schema_state.json`, `_graph_commits.lance`, `_graph_commit_recoveries.lance` all use unchanged formats. No conversion step. -- **Existing single-graph storage upgrades without migration.** Server deployments may need auth/policy config changes: explicitly pass `--unauthenticated` for local open mode, configure tokens when using policy, and add Cedar policy for non-read authenticated actions. -- **Multi-graph adoption is opt-in.** Add a `graphs:` map to `omnigraph.yaml` (and remove `server.graph`) to switch a deployment to multi mode. -- **Cluster routes are breaking for client SDKs targeting multi mode.** Generated clients from previous v0.5.0 OpenAPI specs will hit 404 on flat paths against a multi-mode server. Regenerate against the v0.6.0 `openapi.json`. -- **Supported YAML policy authoring is unchanged.** The Cedar `Omnigraph::Graph` and `Omnigraph::Server` entities are internally generated by `compile_policy_source` — operator YAML only references actions and groups. -- **Operators with unsupported raw Cedar policy files** should update `Omnigraph::Repo` resource references to `Omnigraph::Graph`. -- **Endpoint and CLI rename is cosmetic on the client side.** Existing callers on `/read`, `/change`, `omnigraph read`, `omnigraph change`, and `omnigraph query lint` keep working — they pick up the `Deprecation` + `Link` headers (or stderr deprecation warning on the CLI) so SDKs and proxies can surface the successor name automatically. New integrations should target the canonical names. ChangeRequest field names migrate at the caller's pace — both `query_source`/`query_name` and `query`/`name` accepted indefinitely. - -## Migration: single → multi - -```yaml -# Before (v0.5.0 single-mode invocation) -server: - graph: my-graph -graphs: - my-graph: - uri: /var/lib/omnigraph/my-graph -policy: - file: ./policy.yaml -``` - -```yaml -# After (v0.6.0 multi-mode — drop `server.graph` and the top-level `policy`) -server: - policy: - file: ./server-policy.yaml # NEW: governs GET /graphs -graphs: - my-graph: - uri: /var/lib/omnigraph/my-graph - policy: - file: ./policy.yaml # MOVED: was top-level -``` - -Same `omnigraph.yaml` file; restart the server. Clients targeting the old flat routes (`/snapshot`, `/read`, …) must update to `/graphs/my-graph/snapshot`, etc. - -To add a new graph after rollout: stop the server, append a new `graphs.` entry, restart. - -## Documentation - -- Public docs, CLI help, examples, server docs, and test helpers now consistently use "graph" for the OmniGraph data artifact. -- GitHub/source repository terminology remains spelled out as "repository" where needed. -- New: `docs/user/cli.md` documents `omnigraph graphs list`; `docs/user/server.md` documents the multi-graph mode and the cluster route convention; `docs/user/policy.md` documents the per-graph vs server-scoped action distinction. -- New: `docs/user/server.md` documents `POST /query` / `POST /mutate` and the three-channel deprecation signal on `/read` / `/change`. `docs/user/cli.md` documents the `-e/--query-string` flag with examples. `docs/user/cli-reference.md` shows the canonical CLI verbs (`query`, `mutate`, `lint`, `check`) with legacy spellings as visible aliases. -- New: `docs/dev/rfc-001-queries-envelope-mcp.md` is the cross-cutting design doc for the inline / stored query work that started landing in this release. It sequences the v0.6.x patch series (request/response envelope hardening) and the v0.7.0 stored-query + MCP work. - -## Test coverage - -- `GraphId` newtype validation, registry race tests, init failpoints (still reachable from `omnigraph init` CLI). -- Mode-inference four-rule matrix, parallel multi-graph startup, cluster routing. -- Cedar `Server` resource refactor, backwards-compat for graph-only policies, kind-alignment rejection (server actions in graph files / vice versa). -- `GET /graphs` enumeration, 405-in-single-mode, 403-in-Open-mode-without-server-policy, Cedar admin/viewer authorization. -- Cluster routes with inner path params (`/branches/{branch}`, `/commits/{commit_id}`) deserialize correctly under axum 0.8 nested routing. -- Policy-requires-tokens startup invariant enforced uniformly across single and multi mode. -- The bearer-auth-derived-actor-identity regression test (client-supplied identity headers are ignored; the server-resolved actor is the only identity Cedar sees) stays green across the entire refactor. - diff --git a/docs/releases/v0.6.1.md b/docs/releases/v0.6.1.md deleted file mode 100644 index eb76e1f..0000000 --- a/docs/releases/v0.6.1.md +++ /dev/null @@ -1,28 +0,0 @@ -# Omnigraph v0.6.1 - -v0.6.1 focuses on operational polish after v0.6.0: stored-query registries, safer branch cleanup, more complete release artifacts, and a Lance blob-compaction workaround. - -## Highlights - -- **Stored-query registries.** `omnigraph.yaml` can declare curated `queries:` blocks per graph. Servers load and type-check them at startup, `omnigraph queries validate` checks them offline, `omnigraph queries list` shows exposed queries and typed params, `GET /queries` exposes a typed catalog, and `POST /queries/{name}` invokes a stored query without accepting ad hoc `.gq` source from the client. -- **Stored-query policy gate.** New Cedar action `invoke_query` gates the stored-query invocation surface. Stored mutations are double-gated: `invoke_query` to reach the stored query and `change` for the actual write. -- **Safer branch deletion.** `branch_delete` now treats the manifest as the authority, flips branch visibility atomically, and reclaims per-table/commit-graph forks as derived state. If best-effort reclaim is interrupted, `cleanup` reconciles orphaned forks; reusing a branch name before cleanup reports an actionable error. -- **Legacy `__run__` cleanup (MR-770).** *(Correction: this item shipped in [v0.6.2](v0.6.2.md), not v0.6.1 — the v0.6.1 notes over-claimed it. At the v0.6.1 tag the `__run__` branch-name guard and `run_registry.rs` were still present and no v2→v3 sweep migration existed.)* The guard removal and the one-time v2→v3 `__manifest` migration that sweeps stale `__run__*` staging branches on first read-write open are described in the v0.6.2 release notes. -- **Blob-safe optimize.** `omnigraph optimize` skips tables with `Blob` properties instead of failing the whole sweep on Lance's blob-v2 compaction decode bug. Skips are visible in human output, `--json` as `skipped`, `TableOptimizeStats.skipped`, and logs; non-blob tables still compact normally. -- **Deployment improvements.** The container entrypoint now composes `OMNIGRAPH_TARGET_URI` with `OMNIGRAPH_CONFIG`, so operators can keep the graph URI in env while loading policy/query config from a mounted file. The local RustFS bootstrap pins RustFS beta.3 and allows the current insecure local-dev default credentials. -- **Windows release support.** Tagged and edge releases now publish Windows x86_64 archives containing `omnigraph.exe` and `omnigraph-server.exe`, with a PowerShell installer and Windows install docs. -- **Release tooling.** Homebrew formula generation was tightened to produce audit-clean formulas. - -## Compatibility Notes - -- A graph selected by name (`--target` or `server.graph`) now uses `graphs..policy` and `graphs..queries`. Top-level `policy` / `queries` blocks are only for anonymous bare-URI single-graph mode; using them with a named graph now fails loudly with migration guidance. -- `mcp.expose` defaults to `true` for stored-query registry entries. Set `mcp: { expose: false }` for service-only queries that should not appear in the catalog. -- `invoke_query` is graph-scoped, not branch-scoped. Branch/snapshot access remains enforced by the inner `read` / `change` gate. -- **Legacy `__run__` migration.** *(Correction: deferred to [v0.6.2](v0.6.2.md).)* The automatic v2→v3 `__manifest` stamp migration that sweeps stale `__run__*` branches on first read-write open ships in v0.6.2, not v0.6.1; a v0.6.1 binary does not perform it. See the v0.6.2 notes for the migration behavior and the read-only caveat. -- Blob tables are not compacted until the upstream Lance fix lands, so fragment count and deleted-row space on blob tables are not reclaimed by `optimize`. Reads, writes, and query results are unaffected; no on-disk migration is required. -- `TableOptimizeStats` is now `#[non_exhaustive]` and gains a `skipped: Option` field (so does the new `SkipReason` enum). This is a source-level change only for downstream code that built this returned result struct by literal — rare, since it is produced by `optimize` and consumed by reading its fields; field access is unaffected, and `#[non_exhaustive]` keeps future additions non-breaking. - -## Docs And Cleanup - -- Public docs were updated for stored queries, policy, server routes, deployment, Windows installation, branch deletion, maintenance, and the `runs` docs rename to `writes`. -- README copy and release documentation were refreshed; older release notes had small typo/wording fixes. diff --git a/docs/releases/v0.6.2.md b/docs/releases/v0.6.2.md deleted file mode 100644 index f97f67b..0000000 --- a/docs/releases/v0.6.2.md +++ /dev/null @@ -1,69 +0,0 @@ -# Omnigraph v0.6.2 - -v0.6.2 is a maintenance-safety release on top of v0.6.1. It tightens the -`optimize` / recovery boundary, adds an explicit repair path for uncovered -manifest/head drift, completes the legacy `__run__` branch cleanup (MR-770), -accepts pretty-printed JSON load input, and updates the project governance and -release automation around those fixes. - -## Highlights - -- **Explicit `omnigraph repair`.** New `repair` CLI support previews uncovered - manifest/head drift by default and reports each table's classification, - action, manifest version, Lance HEAD version, Lance operations, and any - classification error. `--confirm` publishes verified maintenance-only drift; - `--force --confirm` can publish suspicious or unverifiable drift after - operator review. -- **Optimize skips uncovered drift.** `omnigraph optimize` now refuses to - interpret Lance HEAD movement that is ahead of `__manifest` without a recovery - sidecar. Those tables are reported as `skipped: DriftNeedsRepair` and left - untouched until `omnigraph repair` classifies them. -- **Optimize publishes compaction.** Successful compaction now publishes the - compacted Lance version back through the graph manifest and is covered by an - `Optimize` recovery sidecar. A crash after Lance compaction but before - manifest publish converges through the normal recovery sweep instead of - leaving hidden drift. -- **Recovery roll-back convergence.** Recovery roll-back now aligns the - manifest-visible version after restoring a table, closing the residual where - Lance HEAD and `__manifest` could stay out of sync after recovery. -- **Legacy `__run__` branch cleanup (MR-770).** Completes the retirement of the - Run state machine (removed in v0.4.0). A one-time v2→v3 `__manifest` - internal-schema migration runs on the first read-write open and deletes any - stale `__run__*` staging branches left by pre-v0.4.0 graphs — they previously - leaked into `branch list` and counted as blocking branches at `schema apply` - time. The migration is idempotent, and the `is_internal_run_branch` guard - (and `run_registry.rs`) is retired now that `__run__*` is an ordinary branch - name. (The earlier v0.6.1 notes described this as shipped in v0.6.1; it - actually landed here in v0.6.2.) -- **Pretty-printed JSON load input.** `load` accepts multi-line JSON objects in - addition to one-object-per-line JSONL, so formatted fixture or export files no - longer need to be minified before import. - -## Operational Notes - -- `repair` requires a clean recovery state. Pending `__recovery` sidecars still - belong to automatic open-time recovery; reopen the graph first, then run - repair if drift remains. -- `repair --confirm` only auto-publishes drift made of Lance maintenance - operations (`Rewrite` and `ReserveFragments`). Semantic operations such as - append, delete, update, and merge are refused unless the operator uses - `--force --confirm`. -- `optimize` remains non-destructive. It still skips blob-bearing tables while - OmniGraph is pinned to the Lance version with the blob-v2 compaction issue. -- No manual on-disk migration is required. Existing graphs open under v0.6.2. - Graphs already at internal manifest schema stamp v3 are unchanged; graphs - created before v0.4.0 that still carry the v2 stamp auto-migrate v2→v3 on the - first **read-write** open (the `__run__*` sweep above). The migration is - write-path-only, so a long-lived **read-only** deployment still lists any - stale `__run__*` branch until it is next opened read-write. - -## Docs, Governance, And CI - -- Added issue, discussion, RFC, and pull-request templates plus governance docs - for the external contribution path. -- Regenerated CODEOWNERS tables and adjusted branch-protection docs so code - owners can bypass required PR review where repository rules allow it. -- Trimmed Windows release builds out of per-PR CI and kept Windows packaging on - tag releases. -- Made Homebrew audit diagnostic-only in the release workflow so a flaky audit - cannot block publishing an otherwise valid formula update. diff --git a/docs/user/cli/index.md b/docs/user/cli/index.md index 6f49c42..b00d42b 100644 --- a/docs/user/cli/index.md +++ b/docs/user/cli/index.md @@ -14,7 +14,7 @@ omnigraph mutate insert_person --params '{"name":"Mina","age":28}' `omnigraph query` is the canonical read command (pairs with `POST /query`); `omnigraph mutate` is the canonical write command (pairs with `POST /mutate`). The positional argument is the **stored-query name**, invoked from the served -catalog (RFC-011 D3) — the graph is addressed by scope (`--server` / `--profile` +catalog — the graph is addressed by scope (`--server` / `--profile` / defaults), and the verb asserts the query's kind (`query` rejects a stored mutation, and vice-versa). The previous names `omnigraph read` and `omnigraph change` keep working as visible aliases — invocations emit a one-line diff --git a/docs/user/cli/reference.md b/docs/user/cli/reference.md index 1d52e45..9d83ead 100644 --- a/docs/user/cli/reference.md +++ b/docs/user/cli/reference.md @@ -2,7 +2,7 @@ A reference for the `omnigraph` binary's command surface and the per-operator `~/.omnigraph/config.yaml` schema. For a quick-start guide, see [cli.md](index.md). -Top-level command families and subcommands. Graph-targeting commands accept a positional `file://`/`s3://` URI, `--server ` (an operator-defined server from `~/.omnigraph/config.yaml` by name, or a literal `http(s)://` URL, optionally with `--graph ` for multi-graph servers; exclusive with a positional URI), `--store ` (a single graph's storage directly), or `--profile ` / `$OMNIGRAPH_PROFILE` (a named scope bundle; see [Scopes & profiles](#scopes--profiles-rfc-011)); `cluster` commands use `--config `, while `policy` and `queries` read a cluster's applied state via `--cluster `. A remote server is addressed only with `--server` — a positional `http(s)://` URI is rejected. **`query`/`mutate` are the exception**: their positional is a stored-query *name* (RFC-011 D3), not a graph URI, so they address the graph only via `--store`/`--server`/`--profile`/defaults. +Top-level command families and subcommands. Graph-targeting commands accept a positional `file://`/`s3://` URI, `--server ` (an operator-defined server from `~/.omnigraph/config.yaml` by name, or a literal `http(s)://` URL, optionally with `--graph ` for multi-graph servers; exclusive with a positional URI), `--store ` (a single graph's storage directly), or `--profile ` / `$OMNIGRAPH_PROFILE` (a named scope bundle; see [Scopes & profiles](#scopes--profiles)); `cluster` commands use `--config `, while `policy` and `queries` read a cluster's applied state via `--cluster `. A remote server is addressed only with `--server` — a positional `http(s)://` URI is rejected. **`query`/`mutate` are the exception**: their positional is a stored-query *name*, not a graph URI, so they address the graph only via `--store`/`--server`/`--profile`/defaults. ## Top-level commands @@ -13,14 +13,14 @@ Top-level command families and subcommands. Graph-targeting commands accept a po | `ingest` | deprecated alias of `load --from ` (defaults: `--from main --mode merge`); prints a one-line warning to stderr | | `query ` (alias: `read`) | run a read query. **Catalog lane** (default): `` is a stored query invoked **by name** from the served catalog (served-only — address with `--server`/`--profile`; the verb asserts the query is a read). **Ad-hoc lane**: with `--query ` or `-e`/`--query-string `, runs that source (the positional `` then selects which query in it). No positional graph URI — address via `--store`/`--server`/`--profile`. `read` is the deprecated previous name (one-line stderr warning) | | `mutate ` (alias: `change`) | run a mutation query; same catalog (by-name, served-only, verb asserts mutation) / ad-hoc (`--query`/`-e`) lanes as `query`. `change` is the deprecated previous name (one-line stderr warning) | -| `alias [args]` | invoke an operator alias — a read-only personal binding (under `aliases:` in `~/.omnigraph/config.yaml`) to a stored query on a named server (RFC-011 D4; replaces the removed `--alias` flag; stored mutations are rejected before execution) | +| `alias [args]` | invoke an operator alias — a read-only personal binding (under `aliases:` in `~/.omnigraph/config.yaml`) to a stored query on a named server (replaces the removed `--alias` flag; stored mutations are rejected before execution) | | `snapshot` | print current snapshot (per-table version + row count) | | `export` | dump to JSONL on stdout (`--type T`, `--table K` filters) | | `branch create \| list \| delete \| merge` | branching ops | | `commit list \| show` | inspect commit graph | | `schema plan \| apply \| show (alias: get)` | migrations. `apply` refuses a cluster-managed graph (one whose storage is inside a cluster) and points at `cluster apply` — those graphs evolve through the cluster ledger, not a direct apply | | `lint` (alias: `check`) | offline / graph-backed query validation. Replaces `query lint` / `query check`, which are kept as deprecated argv-level shims that print a one-line warning and rewrite to `omnigraph lint` | -| `cluster validate \| plan \| apply \| approve \| status \| refresh \| import \| force-unlock` | declarative cluster control plane. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json`, annotates dispositions, and embeds real schema-migration previews; `apply` converges the cluster — stored-query/policy catalog writes (content-addressed under `__cluster/resources/`), graph creates, schema updates (soft drops only; `--as` records the actor), and graph deletes behind a digest-bound approval from `cluster approve --as ` (`apply`/`approve` default the actor from `~/.omnigraph/config.yaml`'s `operator.actor` when `--as` is omitted); what apply converges is what an `omnigraph-server --cluster ` deployment serves on its next restart (`--cluster` is the server's only boot source — RFC-011 cluster-only); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock ` manually removes a held local state lock by exact id | +| `cluster validate \| plan \| apply \| approve \| status \| refresh \| import \| force-unlock` | declarative cluster control plane. `validate` checks a local `cluster.yaml` folder and referenced schema/query/policy files; `plan` diffs it against local JSON state at `__cluster/state.json`, annotates dispositions, and embeds real schema-migration previews; `apply` converges the cluster — stored-query/policy catalog writes (content-addressed under `__cluster/resources/`), graph creates, schema updates (soft drops only; `--as` records the actor), and graph deletes behind a digest-bound approval from `cluster approve --as ` (`apply`/`approve` default the actor from `~/.omnigraph/config.yaml`'s `operator.actor` when `--as` is omitted); what apply converges is what an `omnigraph-server --cluster ` deployment serves on its next restart (`--cluster` is the server's only boot source — cluster-only); `status` reads the state ledger; `refresh`/`import` explicitly update local JSON state from read-only graph observations; `force-unlock ` manually removes a held local state lock by exact id | | `optimize` | non-destructive Lance compaction (skips tables with `Blob` columns or uncovered drift; `--json` reports `skipped`) | | `repair [--confirm] [--force]` | preview or explicitly publish uncovered manifest/head drift. `--confirm` heals verified maintenance drift and exits non-zero if suspicious/unverifiable drift is refused; `--force --confirm` publishes suspicious/unverifiable drift after operator review | | `cleanup --keep N --older-than 7d --confirm` | destructive version GC (`--confirm` to execute; also needs `--yes` against a non-local `s3://` target — see *Write diagnostics & destructive confirmation*) | @@ -52,7 +52,7 @@ To maintain a server-backed graph, run the `direct` verbs from a host with stora ## Write diagnostics & destructive confirmation -Two global flags make writes self-documenting and guard the dangerous ones (RFC-011 Decision 9): +Two global flags make writes self-documenting and guard the dangerous ones: - **Every write echoes its resolved target to stderr** — `omnigraph load → s3://acme/brain/graphs/knowledge.omni (direct, remote)` — so you catch a scope that resolved somewhere unexpected (e.g. *prod*) before it lands. Applies to `load`, `ingest`, `mutate`, `branch create|delete|merge`, `schema apply`, `optimize`, `repair`, `cleanup`. The line is stderr, so `--json` consumers reading stdout are unaffected; suppress it with **`--quiet`**. - **Destructive writes against a non-local scope require confirmation.** `cleanup`, overwrite `load` (`--mode overwrite`), and `branch delete` proceed freely against a local (`file://`) graph, but when the resolved target is **not local** (a served `http(s)://` graph or an `s3://` store/cluster) they require explicit consent: pass **`--yes`** to confirm, an interactive terminal is prompted, and a non-interactive run (no TTY, or `--json`) **refuses with an error** rather than silently destroying. `cleanup` still also requires its existing `--confirm` (preview→execute); `--yes` is the additional non-local consent. @@ -79,15 +79,15 @@ servers: # operator-owned endpoints; names key the credentials url: https://graph.example.com # no tokens in this file, ever defaults: output: table # read format default, below --json/--format/alias - server: prod # the everyday SERVED scope when no address is given (RFC-011) + server: prod # the everyday SERVED scope when no address is given # store: file:///data/dev.omni # OR a zero-flag LOCAL default (mutually # # exclusive with `server`); the local-dev # # counterpart of `server` default_graph: knowledge # graph selected in a server/cluster scope -clusters: # admin-only: managed-cluster storage roots (RFC-011). +clusters: # admin-only: managed-cluster storage roots. brain: # the ONLY place a storage root lives in this file. root: s3://acme/clusters/brain -profiles: # named scope bundles (RFC-011); pick with --profile +profiles: # named scope bundles; pick with --profile staging: { server: staging, default_graph: knowledge } # a served scope brain-admin: { cluster: brain, default_graph: knowledge } # a direct cluster scope ``` @@ -96,7 +96,7 @@ Absent file = empty layer. Unknown keys warn and load (a file written for a newer CLI works on an older one). Override the config directory with `$OMNIGRAPH_HOME`. -#### Scopes & profiles (RFC-011) +#### Scopes & profiles A command resolves a **scope** — a server, a cluster, or a store — then selects a graph in it; the served-vs-direct access path is derived from the scope, not @@ -116,14 +116,15 @@ sticky "current" mode. Inspect what is defined with `omnigraph profile list` and `--cluster --graph `. A `--graph` flag overrides the profile's default. - A `server`-bound scope on a maintenance verb, or a `cluster`-bound scope on a data verb, is rejected with a message pointing at the right addressing. -- **No graph selected (RFC-011 D7).** When a scope has no `--graph` and no +- **No graph selected.** When a scope has no `--graph` and no `default_graph`, the CLI never silently picks: - **Cluster scope** — exactly **one** applied graph is used automatically; **several** errors and lists the candidates (from the served catalog). - - **Server scope** — a multi-graph server (any non-empty `GET /graphs`, even a - single entry) errors and lists the candidates: you must pass `--graph `. - A single-graph / flat server (405 on `/graphs`), or one whose `/graphs` is - policy-gated or unreachable, uses its bare URL as before. + - **Server scope** — an `omnigraph-server` is always cluster-backed, so its + `GET /graphs` lists the graphs and you must pass `--graph ` (the CLI + lists the candidates if you omit it). It falls back to the bare URL only + when `/graphs` is unavailable: policy-gated, unreachable, or a + non-`omnigraph` endpoint. `--target`, `--cluster-graph`, and the positional-`http(s)://`→remote dispatch have been **removed** (`--graph` is now the one graph selector across server and @@ -158,7 +159,7 @@ aliases: `omnigraph alias triage 2026-06-01` invokes `POST /graphs/spike/queries/weekly_triage` with the keyed -credential. Aliases live in their own `alias` namespace (RFC-011 Decision 4), +credential. Aliases live in their own `alias` namespace, so an alias can never shadow — or be shadowed by — a built-in verb. (The old `--alias ` flag on `query`/`mutate` was removed.) diff --git a/docs/user/clusters/config.md b/docs/user/clusters/config.md index 8f8caf4..cd4d772 100644 --- a/docs/user/clusters/config.md +++ b/docs/user/clusters/config.md @@ -390,7 +390,7 @@ omnigraph-server --cluster company-brain --bind 0.0.0.0:8080 ``` `--cluster ` is an **exclusive boot source** (axiom 15): it cannot -combine with a graph URI, `--target`, or `--config`, and in this mode +combine with a graph URI or `--config`, and in this mode `omnigraph.yaml` is never read — not for graphs, not for queries, not for policies. The server serves the **applied revision**: graph roots recorded in `state.json`, stored-query and policy content from the content-addressed diff --git a/docs/user/clusters/index.md b/docs/user/clusters/index.md index c59ff9d..0c2e7d7 100644 --- a/docs/user/clusters/index.md +++ b/docs/user/clusters/index.md @@ -91,7 +91,7 @@ only the URI and credentials, no checkout of the config repo. The ledger and catalog on the bucket are the deployment artifact. `--cluster` is an **exclusive boot source**: it cannot be combined with a -graph URI, `--target`, or `--config`, and `omnigraph.yaml` is never read in +graph URI or `--config`, and `omnigraph.yaml` is never read in this mode. Routing is always multi-graph: ```bash @@ -273,7 +273,7 @@ a cluster are created by `cluster apply`, not by hand. If the cluster has exactly **one** applied graph you can omit `--graph` — it is used automatically. With **several**, omitting `--graph` errors and lists the -candidates (RFC-011 D7); it never picks one for you. +candidates; it never picks one for you. Against an **`s3://`-backed cluster** the resolved graph storage is non-local, so a destructive `cleanup` additionally requires **`--yes`** (an interactive prompt diff --git a/docs/user/operations/maintenance.md b/docs/user/operations/maintenance.md index 161e5d6..e2a88eb 100644 --- a/docs/user/operations/maintenance.md +++ b/docs/user/operations/maintenance.md @@ -35,7 +35,7 @@ backstop, so it does as much as it can and converges on re-run. The CLI reports any failed tables; rerun `cleanup` to retry them. - CLI guards with `--confirm`; without it, prints a preview line. -- **Non-local consent (RFC-011 D9).** Against a non-local target (an `s3://` store/cluster), `cleanup` additionally requires `--yes` on top of `--confirm`: a TTY is prompted, and a non-interactive run (no TTY, or `--json`) refuses rather than destroying. A local (`file://`) target needs only `--confirm`. The same `--yes` gate applies to overwrite `load` and `branch delete`; every maintenance run echoes its resolved target to stderr (suppress with `--quiet`). +- **Non-local consent.** Against a non-local target (an `s3://` store/cluster), `cleanup` additionally requires `--yes` on top of `--confirm`: a TTY is prompted, and a non-interactive run (no TTY, or `--json`) refuses rather than destroying. A local (`file://`) target needs only `--confirm`. The same `--yes` gate applies to overwrite `load` and `branch delete`; every maintenance run echoes its resolved target to stderr (suppress with `--quiet`). - **Recovery floor:** `--keep < 3` may garbage-collect versions that crash recovery needs as a rollback target. Default `--keep 10` is safe. - **Orphaned-branch reconciliation:** before the version GC, cleanup reclaims any per-table or commit-graph branch absent from the manifest branch list. These orphans arise when a `branch_delete` flips the manifest authority but a downstream best-effort reclaim does not complete (see [branches-commits.md](../branching/index.md)). The reconciler is idempotent (it no-ops once nothing is orphaned), runs regardless of the `keep_versions` / `older_than` values (those gate version GC only), and never reclaims `main` or system-branch forks. Reclaimed forks are logged. diff --git a/docs/user/operations/policy.md b/docs/user/operations/policy.md index c6096d0..54fbea5 100644 --- a/docs/user/operations/policy.md +++ b/docs/user/operations/policy.md @@ -78,7 +78,7 @@ The default actor identity for CLI direct-engine (`--store`) writes is `operator.actor` in `~/.omnigraph/config.yaml`. Override per-invocation with `--as ` — `--as` wins, otherwise `operator.actor`, otherwise no actor. Remote HTTP writes ignore both — they resolve their actor server-side from the -bearer token. (Direct-store access carries no Cedar policy under RFC-011; policy +bearer token. (Direct-store access carries no Cedar policy; policy lives in the cluster/server.) ## CLI diff --git a/docs/user/operations/server.md b/docs/user/operations/server.md index bd14e1e..ced9d0d 100644 --- a/docs/user/operations/server.md +++ b/docs/user/operations/server.md @@ -1,6 +1,6 @@ # HTTP Server (`omnigraph-server`) -Axum 0.8 + tokio + utoipa-generated OpenAPI. **Cluster-only boot** (RFC-011): the server always boots from a cluster (`--cluster `) and serves N graphs (N ≥ 1) under cluster routes. There is no longer a single-graph flat-route mode, no positional `` boot, no `--target`, and no `omnigraph.yaml`-`graphs:`-map boot. All HTTP is nested under `/graphs/{graph_id}/...`; `/healthz` and the management `/graphs` enumeration stay flat. +Axum 0.8 + tokio + utoipa-generated OpenAPI. **Cluster-only boot**: the server always boots from a cluster (`--cluster `) and serves N graphs (N ≥ 1) under cluster routes. There is no longer a single-graph flat-route mode, no positional `` boot, no `--target`, and no `omnigraph.yaml`-`graphs:`-map boot. All HTTP is nested under `/graphs/{graph_id}/...`; `/healthz` and the management `/graphs` enumeration stay flat. ## Boot diff --git a/docs/user/search/embeddings.md b/docs/user/search/embeddings.md index e69d928..11f3540 100644 --- a/docs/user/search/embeddings.md +++ b/docs/user/search/embeddings.md @@ -42,7 +42,7 @@ boots from the applied cluster ledger, so `cluster validate`, `plan`, and needs no key. Vector dimensions stay schema-driven by the target `Vector(N)` column. -Direct single-graph serving, embedded callers, and the offline +Direct (`--store`) access, embedded callers, and the offline `omnigraph embed` pipeline use environment configuration unless they inject an `EmbeddingConfig` directly. From 5243c048aa2b464818a22a2b4f51f23c2bda819a Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Wed, 17 Jun 2026 13:25:20 +0200 Subject: [PATCH 06/13] perf(engine): remove the per-query metadata re-derivation tax on warm reads (#268) * test(engine): add read-path IO instrumentation seam for warm-read cost tests Prerequisite seam for the query-latency fixes. Adds crates/omnigraph/src/instrumentation.rs: - CountingStorageAdapter: a StorageAdapter decorator counting per-method reads (read_text/exists/read_text_versioned/list_dir), for the schema-contract reads on the query path. - A per-query task-local (QueryIoProbes) carrying Lance WrappingObjectStore wrappers per open category plus a probe counter, delivered via with_query_io_probes. open_dataset_tracked attaches the wrapper so the open itself is counted (ObjectStoreParams.object_store_wrapper). Wires the wrappers into the manifest open (open_manifest_dataset) and the commit-graph opens (CommitGraph::open/open_at_branch). Production leaves the task-local unset, so nothing attaches. Makes Omnigraph::open_with_storage public so tests can inject the counting adapter. lance-io is a dev-dependency (IOTracker named only in tests). No runtime behavior change. * test(engine): warm same-branch read should reuse the coordinator (red) Cost-budget test using Lance IOTracker at the object-store boundary (the LanceDB IO-counted-test pattern). On a 20-commit-deep graph, a warm same-branch query re-opens a fresh coordinator, which opens both the commit graph and __manifest. Asserts the read opens the commit graph zero times and performs exactly one cheap version probe; today it does neither (it scans the commit graph on re-open and never probes). The freshness guard already passes. Adds the commit_many helper for history-depth fixtures. Red half of the Fix 1 red->green pair; turns green with the next commit. * perf(engine): same-branch reads reuse the warm coordinator (Fix 1) query()/resolved_target re-opened a fresh GraphCoordinator from storage on every read (full __manifest scan + two commit-graph scans), so a warm read's cost grew with commit history (invariant 15) though the data was unchanged. resolved_target now serves same-branch reads from the warm in-memory coordinator, gated by a cheap version probe (latest_version_id, one object-store op) instead of a full re-open: - fresh (probe == cached version): return the in-memory snapshot under the read lock, with a synthetic (branch, version) id and no commit-graph access (reads pin the snapshot by manifest version, not the commit DAG; invariant 2). - stale: take the write lock, re-probe (double-checked; tokio RwLock has no read->write upgrade), then refresh_manifest_only (no commit-graph scan), preserving strong consistency for external writers (invariant 6). Cross-branch and snapshot targets keep the existing cold-resolve path. Adds ManifestCoordinator/GraphCoordinator::probe_latest_version and GraphCoordinator::refresh_manifest_only. Nothing on the read path needs a real commit ULID (only RuntimeCache keys on the id, where synthetic is consistent), per a caller audit. A warm same-branch read on a 20-commit graph now does zero commit-graph opens and exactly one probe (down from a deep commit-graph scan) and still observes external commits. The residual per-table __manifest scans are removed later by Fix 2. * test(engine): warm query should validate the schema contract once (red) ensure_schema_state_valid runs twice per query (query()/run_query_at AND resolved_target/snapshot_at_version), each reading 3 contract files + 2 existence probes. A warm query thus does 6 read_text + 4 exists where one validation (3 + 2) suffices, measured via CountingStorageAdapter. Adds a drift guard (schema_source_drift_is_caught_on_read) that already passes. Red half of the finding-A red->green pair. * perf(engine): validate the schema contract once per query (finding A) ensure_schema_state_valid ran on every query AND again inside resolved_target / snapshot_at_version, so each query validated the schema contract twice (~10 storage ops). Removes the redundant query()/ run_query_at() calls; the validation inside resolved_target / snapshot_at_version still runs, so drift is detected exactly as before. A source-only fast path was rejected: a long-lived handle must detect external drift of the schema source, IR, OR state on its next operation (lifecycle::long_lived_handle_rejects_schema_*), which a source-only compare would miss. So the only safe latency win is not validating twice. A warm query now does one validation (3 read_text + 2 exists) instead of two (6 + 4). * test(engine): warm + multi-table reads should do zero manifest scans (red) After Fix 1 a warm same-branch read still scans __manifest ~44 times at 20-commit depth: not from resolution (Fix 1 removed that) but from the per-table open path, which routes through the Lance namespace and full-scans __manifest twice per touched table (describe_table + describe_table_version). Tightens the warm test to assert manifest read_iops == 0 and adds a multi-table (traversal) test asserting the same, pinning the "2 tables = 2x" tax. Red half of the Fix 2 red->green pair. * perf(engine): open touched tables by location+version, not via the namespace (Fix 2) SubTableEntry::open routed every read-path table open through DatasetBuilder::from_namespace(BranchManifestNamespace), whose describe_table full-scans __manifest and, with managed_versioning, makes Lance scan again (describe_table_version) -- two full __manifest scans per touched table. That was the residual that made warm-read manifest IO grow with history and the '2 tables = 2x' multi-table tax. The resolved Snapshot already holds each table's path/version/branch, so open directly: from_uri(table_uri_for_path(root, path, branch)).with_version(v). The branch-qualified location is the dataset that physically holds the version (main: {path}; branch: {path}/tree/{branch}, Lance native-branch storage), and with_version resolves it within THAT dataset's _versions. 0 namespace calls + 1 HEAD via the native ConditionalPutCommitHandler. The read namespace (BranchManifestNamespace) is now unused in production (writes use StagedTableNamespace), so it, its constructor, and the helpers only it used (to_namespace_version, publish_requests, their imports) are gated #[cfg(test)] -- retained to validate the namespace contract in unit tests. Removes the dead open_table_at_version_from_manifest. Warm same-branch + multi-table reads now scan __manifest zero times; branch + time-travel reads stay correct (branching.rs, point_in_time.rs, 2 lib regression tests); production-lib warnings unchanged (baseline). * test(engine): cost-budget coverage for branch-warm and stale-refresh reads (matrix) Extends the read-path cost-budget tests across more of the morphological matrix: - warm_branch_read_does_no_manifest_scans: a warm read on a non-main branch (handle synced to it) scans __manifest zero times, exercising Fix 2's branch-owned-table open (tree/{branch} + with_version) on Fix 1's warm path -- the cell that regressed when the open used with_branch against the base. - stale_read_refreshes_manifest_only: an external commit makes the next read take the stale path, which re-reads the manifest (read_iops > 0) but never scans the commit graph (refresh_manifest_only), pinning Fix 1's manifest-only refresh. Cold paths (cross-branch, time-travel) stay behavior-covered (branching.rs, point_in_time.rs) and are cold by design (Fix 1 warm-paths only same-branch), so there is no manifest==0 contract to assert there. * test(engine): same-branch write after external commit must not fork the commit DAG (red) * fix(engine): refresh commit-graph head before append to prevent same-branch DAG fork A same-branch write that follows an external commit committed a fresh manifest version (commit_all rebases the pin from a fresh coordinator) but appended off the coordinator's stale in-memory commit-graph head, forking the commit DAG (the new commit and the external commit shared a parent). Pre-existing for non-strict inserts; widened to strict ops by Fix 1's refresh_manifest_only freshening the read-time pin. record_graph_commit now refreshes the commit-graph head from storage before append_commit, so the parent is the true current head. record_merge_commit is unaffected (it passes explicit parents). * perf(engine): hold open Dataset handles + share one Session per graph (Fix 3) A warm same-branch read still re-opened every touched table per query (the "never warms up" residual after Fix 1+2). A per-graph held-handle cache keyed by (table_path, branch, version) now serves repeat reads with zero table opens, and one shared lance::Session per graph warms metadata/index caches across opens. Validated against LanceDB upstream (rust/lancedb/src/table/dataset.rs DatasetConsistencyWrapper): hold an Arc and reuse it for 0-IO warm reads; one Session per connection threaded into opens; writers never serve from the read cache; time-travel bypasses. One adaptation: omnigraph keys by version (snapshot-pins-version model) where LanceDB keys per-table+HEAD, reusing the in-repo GraphIndexCache LRU template. - ReadCaches (session + TableHandleCache) injected onto live-Branch-read snapshots in resolved_target; Snapshot::open serves from the cache or opens once with the session on a miss (via the instrumented open_table_dataset). - Writes (resolved_branch_target -> open HEAD) and time-travel / Snapshot-id reads bypass the cache. Version-in-key makes a write a new key (old handle ages out via LRU); invalidate_all at branch-switch/refresh is hygiene only. - Cost tests: a 2nd identical warm read does 0 table opens; a write re-opens only the changed table at its new version. Full engine suite green. * test(engine): forbid raw data opens in the read/exec layer (P2 guard) Extend the forbidden-API guard with Dataset::open / DatasetBuilder::from_uri / from_namespace so the read/exec layer (exec/, loader/, changes/, db/omnigraph/) cannot bypass Snapshot::open and the held-handle cache (Fix 3). The instrumented opener (instrumentation.rs) is allow-listed; two legitimate non-read opens (a test editing __manifest, Hard-drop version GC) carry sentinels. The storage/manifest layers stay allow-listed. Lean P2 scope, per LanceDB-upstream + minimize-liability: the data-read boundary already exists (SubTableEntry::open); this guard pins it so a future read cannot open around the cache. Centralizing all internal opens behind one opener is deferred. * docs(dev): invariant 15 (one source of truth, cheaply derived) + cost-budget testing Records the principle behind the query-latency work: Lance and the manifest are the source of truth, everything else a derived view held warm and refreshed by a cheap probe; the two failure modes (a drifting parallel copy, and cold re-derivation whose cost grows with history) are deny-listed. Adds the cost-budget testing discipline (assert a warm read's open/IO count is flat at commit-history depth, the LanceDB IO-counted pattern) and the warm_read_cost.rs row. Updates the read-path-re-derivation known gap to reflect what Fix 1/2/3 + finding A close, and adds the commit-graph-parent-under-concurrency gap. * fix(engine): branch-incarnation identity + unified invalidation + shared LruMap (PR #268 review) Phase 6 A-D, correct-by-design responses to the Codex/Greptile P2 review comments. A: warm-read freshness and the table-handle cache key use the manifest incarnation (e_tag, manifest-timestamp fallback, then version), so a deleted+recreated non-main branch reusing a version number cannot be served stale; main stays version-cheap, non-main loads latest_manifest; a detected stale refresh also invalidates read caches; two regression tests force the version collision. B: unify the two cache invalidations into Omnigraph::invalidate_read_caches() at the four sites. C: assert the stale path's probe count. D: shared LruMap behind both caches with unconditional eviction, plus a unit test. Full engine suite green; multi-process lineage fork and O(history) write refresh remain known gaps for Phase 6E/7. --- AGENTS.md | 4 + Cargo.lock | 1 + crates/omnigraph/Cargo.toml | 1 + crates/omnigraph/src/db/commit_graph.rs | 38 +- crates/omnigraph/src/db/graph_coordinator.rs | 61 +- crates/omnigraph/src/db/manifest.rs | 138 ++- crates/omnigraph/src/db/manifest/layout.rs | 9 +- crates/omnigraph/src/db/manifest/metadata.rs | 2 +- crates/omnigraph/src/db/manifest/namespace.rs | 38 +- crates/omnigraph/src/db/manifest/publisher.rs | 6 +- crates/omnigraph/src/db/manifest/recovery.rs | 7 +- crates/omnigraph/src/db/manifest/tests.rs | 11 +- crates/omnigraph/src/db/omnigraph.rs | 146 ++- crates/omnigraph/src/db/omnigraph/optimize.rs | 1 + .../src/db/omnigraph/schema_apply.rs | 7 +- .../omnigraph/src/db/omnigraph/table_ops.rs | 5 +- crates/omnigraph/src/exec/query.rs | 4 +- crates/omnigraph/src/instrumentation.rs | 231 +++++ crates/omnigraph/src/lib.rs | 1 + crates/omnigraph/src/runtime_cache.rs | 220 ++++- crates/omnigraph/tests/branching.rs | 168 ++++ crates/omnigraph/tests/forbidden_apis.rs | 9 + crates/omnigraph/tests/helpers/mod.rs | 15 + crates/omnigraph/tests/warm_read_cost.rs | 833 ++++++++++++++++++ docs/dev/invariants.md | 57 +- docs/dev/testing.md | 12 +- 26 files changed, 1918 insertions(+), 107 deletions(-) create mode 100644 crates/omnigraph/src/instrumentation.rs create mode 100644 crates/omnigraph/tests/warm_read_cost.rs diff --git a/AGENTS.md b/AGENTS.md index 378de88..e8cd035 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -125,6 +125,8 @@ This is a decision lens, not a code-size rule. It cuts both ways. Sometimes the When evaluating a design, ask: *"what does this look like after 5 more changes like it?"* If the answer is "this converges to one shape", cost is bounded. If it's "this forks every time", the option is mortgaging the future for present convenience — pick differently. +The same lens has a structural corollary: **one source of truth, cheaply derived.** Lance and the manifest are the source of truth; everything else is a derived view. Maintaining a parallel copy invites drift that compounds over time, and re-deriving a view from the full source on every call makes its cost grow with history. Both are liabilities integrated over time, so both are ruled out the same way: hold a warm derived view and refresh it with a cheap probe, never shadow the source or rebuild from it cold. Invariant 15 in [docs/dev/invariants.md](docs/dev/invariants.md) states this; invariants 1 (respect the substrate) and 7 (indexes are derived state) are instances. + ### Tiebreakers when liability alone is silent - **Correctness > simplicity > performance.** Lexicographic — give up performance for simpler code; give up simplicity for correct code; never give up correctness. The deny-list ("no silent failures," "no acks before durable persistence," "no reads of partial commits") is this rule's hard floor. @@ -145,6 +147,7 @@ These are architectural rules that need to be in scope on every change. They're 5. **Reads always see the current index state for the branch they're reading.** Indexes track the branch head, not historical snapshots. If you change index lifecycle, preserve this guarantee. 6. **Stable type IDs survive renames.** Schema migration relies on identity that's stable across rename — don't mint new IDs on rename. 7. **Logical contract over physical state.** Physical state (index coverage, fragment layout, compaction versions, staged writes) is derived and rebuildable; it must never fail a logical operation. Check preconditions against logical state and let reconciliation converge the physical state idempotently — genuine logical conflicts still fail loudly. This is the rule rules 1–6 instantiate; full statement and applications in [docs/dev/invariants.md](docs/dev/invariants.md). +8. **One source of truth, cheaply derived.** Lance and the manifest are the source of truth; runtime state is a derived view of them. Don't maintain a parallel copy that can drift, and don't re-derive a view from cold storage on every call (that makes cost grow with history). Hold it warm, refresh with a cheap probe. ### Deny-list (fast-pass review filter — full reasoning in [docs/dev/invariants.md](docs/dev/invariants.md)) @@ -166,6 +169,7 @@ If a proposal fits one of these, the burden is on the proposer to justify why th - Cloud-only correctness fixes — correctness is always OSS. - Forking the codebase for Cloud — trait-extension only. - Hand-rolling something Lance already does — check the spec first. +- Shadowing the source of truth with a maintained parallel copy, or re-deriving a derived view from cold storage per call (cost then scales with history). Hold it warm and refresh cheaply. - Mutating in place state that should be immutable (Lance fragments, index segments) — new segments instead. - Silent failures — OOM, timeout, partial result must all be surfaced and bounded. - Shipping observable behavior as if it weren't part of the contract — output ordering, error-message text, timestamp precision, default-flag values, latency profile. Per Hyrum's Law, every observable behavior gets depended on once shipped; don't expose what you don't want to commit to. diff --git a/Cargo.lock b/Cargo.lock index 2419e9f..3963da1 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -4941,6 +4941,7 @@ dependencies = [ "lance-datafusion", "lance-file", "lance-index", + "lance-io", "lance-linalg", "lance-namespace", "lance-namespace-impls", diff --git a/crates/omnigraph/Cargo.toml b/crates/omnigraph/Cargo.toml index 7ee9bda..55d3008 100644 --- a/crates/omnigraph/Cargo.toml +++ b/crates/omnigraph/Cargo.toml @@ -55,5 +55,6 @@ arc-swap = { workspace = true } omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.0" } tokio = { workspace = true } lance-namespace-impls = { workspace = true } +lance-io = "7.0.0" serial_test = "3" proptest = "1" diff --git a/crates/omnigraph/src/db/commit_graph.rs b/crates/omnigraph/src/db/commit_graph.rs index 3d90e54..572bdf5 100644 --- a/crates/omnigraph/src/db/commit_graph.rs +++ b/crates/omnigraph/src/db/commit_graph.rs @@ -79,10 +79,14 @@ impl CommitGraph { pub async fn open(root_uri: &str) -> Result { let root = root_uri.trim_end_matches('/'); - let dataset = Dataset::open(&graph_commits_uri(root)) - .await - .map_err(|e| OmniError::Lance(e.to_string()))?; - let actor_dataset = Dataset::open(&graph_commit_actors_uri(root)).await.ok(); + let wrapper = crate::instrumentation::commit_graph_wrapper(); + let dataset = + crate::instrumentation::open_dataset_tracked(&graph_commits_uri(root), wrapper.clone()) + .await?; + let actor_dataset = + crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(root), wrapper) + .await + .ok(); let actor_by_commit_id = match &actor_dataset { Some(dataset) => load_commit_actor_cache(dataset).await?, None => HashMap::new(), @@ -101,14 +105,18 @@ impl CommitGraph { pub async fn open_at_branch(root_uri: &str, branch: &str) -> Result { let root = root_uri.trim_end_matches('/'); - let dataset = Dataset::open(&graph_commits_uri(root)) - .await - .map_err(|e| OmniError::Lance(e.to_string()))?; + let wrapper = crate::instrumentation::commit_graph_wrapper(); + let dataset = + crate::instrumentation::open_dataset_tracked(&graph_commits_uri(root), wrapper.clone()) + .await?; let dataset = dataset .checkout_branch(branch) .await .map_err(|e| OmniError::Lance(e.to_string()))?; - let actor_dataset = Dataset::open(&graph_commit_actors_uri(root)).await.ok(); + let actor_dataset = + crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(root), wrapper) + .await + .ok(); let actor_by_commit_id = match &actor_dataset { Some(dataset) => load_commit_actor_cache(dataset).await?, None => HashMap::new(), @@ -127,9 +135,12 @@ impl CommitGraph { pub async fn refresh(&mut self) -> Result<()> { let root = self.root_uri.clone(); - self.dataset = Dataset::open(&graph_commits_uri(&root)) - .await - .map_err(|e| OmniError::Lance(e.to_string()))?; + let wrapper = crate::instrumentation::commit_graph_wrapper(); + self.dataset = crate::instrumentation::open_dataset_tracked( + &graph_commits_uri(&root), + wrapper.clone(), + ) + .await?; if let Some(branch) = &self.active_branch { self.dataset = self .dataset @@ -137,7 +148,10 @@ impl CommitGraph { .await .map_err(|e| OmniError::Lance(e.to_string()))?; } - self.actor_dataset = Dataset::open(&graph_commit_actors_uri(&root)).await.ok(); + self.actor_dataset = + crate::instrumentation::open_dataset_tracked(&graph_commit_actors_uri(&root), wrapper) + .await + .ok(); self.actor_by_commit_id = match &self.actor_dataset { Some(dataset) => load_commit_actor_cache(dataset).await?, None => HashMap::new(), diff --git a/crates/omnigraph/src/db/graph_coordinator.rs b/crates/omnigraph/src/db/graph_coordinator.rs index dfe2767..b9bcb11 100644 --- a/crates/omnigraph/src/db/graph_coordinator.rs +++ b/crates/omnigraph/src/db/graph_coordinator.rs @@ -10,7 +10,9 @@ use crate::storage::{StorageAdapter, join_uri, normalize_root_uri}; use super::commit_graph::{CommitGraph, GraphCommit}; use super::is_internal_system_branch; -use super::manifest::{ManifestChange, ManifestCoordinator, Snapshot, SubTableUpdate}; +use super::manifest::{ + ManifestChange, ManifestCoordinator, ManifestIncarnation, Snapshot, SubTableUpdate, +}; const GRAPH_COMMITS_DIR: &str = "_graph_commits.lance"; @@ -26,10 +28,11 @@ impl SnapshotId { &self.0 } - pub(crate) fn synthetic(branch: Option<&str>, version: u64) -> Self { - match branch { - Some(branch) => Self(format!("manifest:{}:v{}", branch, version)), - None => Self(format!("manifest:main:v{}", version)), + pub(crate) fn synthetic(branch: Option<&str>, version: u64, e_tag: Option<&str>) -> Self { + let branch = branch.unwrap_or("main"); + match e_tag { + Some(e_tag) => Self(format!("manifest:{}:v{}:etag:{}", branch, version, e_tag)), + None => Self(format!("manifest:{}:v{}", branch, version)), } } } @@ -166,6 +169,10 @@ impl GraphCoordinator { self.manifest.version() } + pub(crate) fn manifest_incarnation(&self) -> ManifestIncarnation { + self.manifest.incarnation() + } + pub fn snapshot(&self) -> Snapshot { self.manifest.snapshot() } @@ -182,6 +189,19 @@ impl GraphCoordinator { Ok(()) } + pub(crate) async fn probe_latest_incarnation(&self) -> Result { + crate::instrumentation::record_probe(); + self.manifest.probe_latest_incarnation().await + } + + /// Refresh only the manifest (not the commit graph). The read path uses this + /// on a stale same-branch probe: a read pins its snapshot by manifest version + /// and never needs the commit graph, so a full `refresh` (which also scans + /// the commit graph) would be wasted IO. + pub async fn refresh_manifest_only(&mut self) -> Result<()> { + self.manifest.refresh().await + } + pub async fn branch_list(&self) -> Result> { self.manifest.list_branches().await.map(|branches| { branches @@ -315,10 +335,13 @@ impl GraphCoordinator { None => GraphCoordinator::open(self.root_uri(), Arc::clone(&self.storage)).await?, }; - Ok(other - .head_commit_id() - .await? - .unwrap_or_else(|| SnapshotId::synthetic(other.current_branch(), other.version()))) + Ok(other.head_commit_id().await?.unwrap_or_else(|| { + SnapshotId::synthetic( + other.current_branch(), + other.version(), + other.manifest_incarnation().e_tag.as_deref(), + ) + })) } pub async fn resolve_target(&self, target: &ReadTarget) -> Result { @@ -339,7 +362,11 @@ impl GraphCoordinator { } }; let snapshot_id = other.head_commit_id().await?.unwrap_or_else(|| { - SnapshotId::synthetic(other.current_branch(), other.version()) + SnapshotId::synthetic( + other.current_branch(), + other.version(), + other.manifest_incarnation().e_tag.as_deref(), + ) }); Ok(ResolvedTarget { requested: target.clone(), @@ -509,9 +536,23 @@ impl GraphCoordinator { return Ok(SnapshotId::synthetic( current_branch.as_deref(), manifest_version, + self.manifest_incarnation().e_tag.as_deref(), )); }; failpoints::maybe_fail("graph_publish.before_commit_append")?; + // Refresh the commit-graph head from storage before selecting the + // parent. `append_commit` parents the new commit on the IN-MEMORY head + // (`head_commit_id`, zero storage read), but the manifest was just + // committed against a freshly rebased pin (`commit_all` opens a fresh + // coordinator) while THIS coordinator's cached head may be stale because + // an external writer advanced the branch. Without this refresh a + // same-branch write after an external commit appends off the stale head + // and FORKS the commit DAG (the new commit and the external commit + // sharing a parent). Refreshing makes the parent the true current head; + // the just-committed manifest version has no commit-graph row yet, so the + // fresh head is exactly the prior commit. (record_merge_commit is + // unaffected — it passes explicit parents, never the cached head.) + commit_graph.refresh().await?; let graph_commit_id = commit_graph .append_commit(current_branch.as_deref(), manifest_version, actor_id) .await?; diff --git a/crates/omnigraph/src/db/manifest.rs b/crates/omnigraph/src/db/manifest.rs index f130523..ce91513 100644 --- a/crates/omnigraph/src/db/manifest.rs +++ b/crates/omnigraph/src/db/manifest.rs @@ -24,11 +24,10 @@ mod recovery; mod state; use graph::{init_manifest_graph, open_manifest_graph, snapshot_state_at}; -use layout::{manifest_uri, open_manifest_dataset, type_name_hash}; +use layout::{manifest_uri, open_manifest_dataset, table_uri_for_path, type_name_hash}; pub(crate) use metadata::TableVersionMetadata; #[cfg(test)] use metadata::{OMNIGRAPH_ROW_COUNT_KEY, table_version_metadata_for_state}; -use namespace::open_table_at_version_from_manifest; pub(crate) use namespace::open_table_head_for_write; #[cfg(test)] use namespace::{branch_manifest_namespace, staged_table_namespace}; @@ -74,16 +73,51 @@ pub struct Snapshot { root_uri: String, version: u64, entries: HashMap, + /// Per-graph read caches (shared `Session` + held-handle cache), injected by + /// `Omnigraph::resolved_target` for live Branch reads so table opens reuse + /// handles (0 IO on a warm repeat) and one `Session`. `None` for write-prelude + /// snapshots, time-travel / Snapshot-id reads, and directly-built test + /// snapshots, which fall back to a plain open. + read_caches: Option>, } impl Snapshot { - /// Open a sub-table dataset at its pinned version. + /// Open a sub-table dataset at its pinned version. With read caches present + /// (live Branch reads), reuse a held handle through the cache (0 open IO on a + /// warm repeat) and the shared `Session`; otherwise plain-open (Fix 2). pub async fn open(&self, table_key: &str) -> Result { let entry = self .entries .get(table_key) .ok_or_else(|| OmniError::manifest(format!("no manifest entry for {}", table_key)))?; - entry.open(&self.root_uri).await + match &self.read_caches { + Some(caches) => { + let location = table_uri_for_path( + &self.root_uri, + &entry.table_path, + entry.table_branch.as_deref(), + ); + caches + .handles + .get_or_open( + &entry.table_path, + entry.table_branch.as_deref(), + entry.table_version, + entry.version_metadata.e_tag(), + &location, + Some(&caches.session), + ) + .await + } + None => entry.open(&self.root_uri).await, + } + } + + /// Attach per-graph read caches (shared `Session` + handle cache) so this + /// snapshot's table opens reuse handles and the session. Set by + /// `Omnigraph::resolved_target` for live Branch reads only. + pub(crate) fn set_read_caches(&mut self, caches: Arc) { + self.read_caches = Some(caches); } /// Manifest version this snapshot was taken from. @@ -101,6 +135,31 @@ impl Snapshot { } } +#[derive(Debug, Clone, PartialEq, Eq)] +pub(crate) struct ManifestIncarnation { + pub(crate) version: u64, + pub(crate) e_tag: Option, + timestamp_nanos: Option, +} + +impl ManifestIncarnation { + pub(crate) fn matches(&self, held: &Self) -> bool { + if self.version != held.version { + return false; + } + match (&self.e_tag, &held.e_tag) { + (Some(latest), Some(current)) => latest == current, + _ => match (self.timestamp_nanos, held.timestamp_nanos) { + (Some(latest), Some(current)) => latest == current, + // Some object stores can omit both e_tag and manifest timestamp + // from the reachable API. In that narrow case the version-number + // probe is the strongest available identity. + _ => true, + }, + } + } +} + impl SubTableUpdate { pub(crate) fn to_create_table_version_request(&self) -> CreateTableVersionRequest { self.version_metadata.to_create_table_version_request( @@ -132,14 +191,28 @@ pub(crate) enum ManifestChange { } impl SubTableEntry { + /// Open this sub-table at its pinned version directly by location (Fix 2), + /// without the Lance namespace — which would full-scan `__manifest` twice per + /// open (`describe_table` + `describe_table_version`). The resolved Snapshot + /// already holds the path, version, and branch. Branches are Lance native + /// branches, so `with_branch` resolves `{base}/tree/{branch}` from the base + /// URI; main uses `with_version`. pub(crate) async fn open(&self, root_uri: &str) -> Result { - open_table_at_version_from_manifest( - root_uri, - &self.table_key, - self.table_branch.as_deref(), - self.table_version, - ) - .await + // The branch-qualified location is the dataset that physically holds this + // version: main at `{table_path}`, a branch at + // `{table_path}/tree/{branch}` (Lance native-branch storage). `with_version` + // then resolves the version within THAT dataset's `_versions` — a branch + // version lives under `tree/{branch}/_versions`, not the base. This + // matches the physical layout the namespace path resolved, without the + // per-open `__manifest` scan. + let location = table_uri_for_path(root_uri, &self.table_path, self.table_branch.as_deref()); + // Route through the instrumented data-table opener (Fix 3). With no + // session this is exactly the Fix-2 `from_uri(location).with_version`. + // This is the uncached fallback (a snapshot with no read caches); the + // cached path (`Snapshot::open` → handle cache) calls the same opener on + // a miss with the shared session, so both paths count on the per-query + // `table_wrapper`. + crate::instrumentation::open_table_dataset(&location, self.table_version, None).await } } @@ -223,6 +296,7 @@ impl ManifestCoordinator { .into_iter() .map(|entry| (entry.table_key.clone(), entry)) .collect(), + read_caches: None, } } @@ -359,6 +433,48 @@ impl ManifestCoordinator { self.dataset.version().version } + /// Latest committed manifest version on disk (one object-store op, no row + /// scan). The freshness probe for warm reuse: compare against `version()` + /// (the held handle's pinned version) to decide whether to refresh. + pub async fn probe_latest_version(&self) -> Result { + self.dataset + .latest_version_id() + .await + .map_err(|e| OmniError::Lance(e.to_string())) + } + + pub(crate) fn incarnation(&self) -> ManifestIncarnation { + ManifestIncarnation { + version: self.version(), + e_tag: self.dataset.manifest_location().e_tag.clone(), + timestamp_nanos: Some(self.dataset.manifest().timestamp_nanos), + } + } + + /// Latest committed manifest identity. Main cannot be deleted/recreated, so + /// the cheap version-number probe is sufficient there. Non-main Lance + /// branches can be deleted and recreated with the same version number, so + /// load the latest manifest location and compare its e_tag / timestamp too. + pub(crate) async fn probe_latest_incarnation(&self) -> Result { + if self.active_branch.is_none() { + return Ok(ManifestIncarnation { + version: self.probe_latest_version().await?, + e_tag: self.dataset.manifest_location().e_tag.clone(), + timestamp_nanos: Some(self.dataset.manifest().timestamp_nanos), + }); + } + let (manifest, location) = self + .dataset + .latest_manifest() + .await + .map_err(|e| OmniError::Lance(e.to_string()))?; + Ok(ManifestIncarnation { + version: manifest.version, + e_tag: location.e_tag, + timestamp_nanos: Some(manifest.timestamp_nanos), + }) + } + pub fn active_branch(&self) -> Option<&str> { self.active_branch.as_deref() } diff --git a/crates/omnigraph/src/db/manifest/layout.rs b/crates/omnigraph/src/db/manifest/layout.rs index 9cfde9a..08fe043 100644 --- a/crates/omnigraph/src/db/manifest/layout.rs +++ b/crates/omnigraph/src/db/manifest/layout.rs @@ -20,9 +20,12 @@ pub(super) fn manifest_uri(root: &str) -> String { } pub(super) async fn open_manifest_dataset(root_uri: &str, branch: Option<&str>) -> Result { - let dataset = Dataset::open(&manifest_uri(root_uri.trim_end_matches('/'))) - .await - .map_err(|e| OmniError::Lance(e.to_string()))?; + let uri = manifest_uri(root_uri.trim_end_matches('/')); + let dataset = crate::instrumentation::open_dataset_tracked( + &uri, + crate::instrumentation::manifest_wrapper(), + ) + .await?; match branch { Some(branch) if branch != "main" => dataset .checkout_branch(branch) diff --git a/crates/omnigraph/src/db/manifest/metadata.rs b/crates/omnigraph/src/db/manifest/metadata.rs index 0bf14b6..7cd6436 100644 --- a/crates/omnigraph/src/db/manifest/metadata.rs +++ b/crates/omnigraph/src/db/manifest/metadata.rs @@ -111,7 +111,6 @@ impl TableVersionMetadata { self.manifest_size } - #[cfg(test)] pub(crate) fn e_tag(&self) -> Option<&str> { self.e_tag.as_deref() } @@ -138,6 +137,7 @@ impl TableVersionMetadata { request } + #[cfg(test)] pub(super) fn to_namespace_version(&self, version: u64) -> TableVersion { self.to_namespace_version_with_details(version, None, None) } diff --git a/crates/omnigraph/src/db/manifest/namespace.rs b/crates/omnigraph/src/db/manifest/namespace.rs index 5e907ba..0d567e0 100644 --- a/crates/omnigraph/src/db/manifest/namespace.rs +++ b/crates/omnigraph/src/db/manifest/namespace.rs @@ -16,21 +16,30 @@ use object_store::{ use crate::error::{OmniError, Result}; -use super::layout::{ - namespace_internal_error, open_manifest_dataset, table_id_to_key, table_uri_for_path, -}; -use super::metadata::{ - TableVersionMetadata, namespace_version_metadata, parse_namespace_version_request, -}; +use super::layout::{namespace_internal_error, table_uri_for_path}; +#[cfg(test)] +use super::layout::{open_manifest_dataset, table_id_to_key}; +use super::metadata::TableVersionMetadata; +#[cfg(test)] +use super::metadata::{namespace_version_metadata, parse_namespace_version_request}; +#[cfg(test)] use super::publisher::GraphNamespacePublisher; +// The read namespace (BranchManifestNamespace) is test-only since Fix 2: reads +// open sub-tables directly by location+version (SubTableEntry::open), so nothing +// in production routes a read through the Lance namespace. The writes path uses +// StagedTableNamespace. These items are retained to validate the namespace +// contract in unit tests. +#[cfg(test)] use super::state::{ManifestState, SubTableEntry, read_manifest_entries, read_manifest_state}; +#[cfg(test)] #[derive(Debug, Clone)] struct BranchManifestNamespace { root_uri: String, branch: Option, } +#[cfg(test)] impl BranchManifestNamespace { fn new(root_uri: &str, branch: Option<&str>) -> Self { Self { @@ -137,6 +146,7 @@ impl StagedTableNamespace { } } +#[cfg(test)] pub(crate) fn branch_manifest_namespace( root_uri: &str, branch: Option<&str>, @@ -175,21 +185,7 @@ async fn load_table_from_namespace( .map_err(|e| OmniError::Lance(e.to_string())) } -pub(crate) async fn open_table_at_version_from_manifest( - root_uri: &str, - table_key: &str, - branch: Option<&str>, - version: u64, -) -> Result { - load_table_from_namespace( - branch_manifest_namespace(root_uri, branch), - table_key, - branch, - Some(version), - ) - .await -} - +#[cfg(test)] #[async_trait] impl LanceNamespace for BranchManifestNamespace { fn namespace_id(&self) -> String { diff --git a/crates/omnigraph/src/db/manifest/publisher.rs b/crates/omnigraph/src/db/manifest/publisher.rs index 288f4be..ba1166d 100644 --- a/crates/omnigraph/src/db/manifest/publisher.rs +++ b/crates/omnigraph/src/db/manifest/publisher.rs @@ -24,10 +24,13 @@ use lance::Dataset; use lance::Error as LanceError; use lance::dataset::{MergeInsertBuilder, WhenMatched, WhenNotMatched}; use lance_namespace::NamespaceError; +#[cfg(test)] use lance_namespace::models::CreateTableVersionRequest; use crate::error::{OmniError, Result}; +#[cfg(test)] +use super::SubTableUpdate; use super::layout::{open_manifest_dataset, tombstone_object_id, version_object_id}; use super::metadata::parse_namespace_version_request; use super::migrations::migrate_internal_schema; @@ -37,7 +40,7 @@ use super::state::{ }; use super::{ ManifestChange, OBJECT_TYPE_TABLE, OBJECT_TYPE_TABLE_TOMBSTONE, OBJECT_TYPE_TABLE_VERSION, - SubTableEntry, SubTableUpdate, TableRegistration, TableTombstone, + SubTableEntry, TableRegistration, TableTombstone, }; /// Bound on the publisher-level retry loop that wraps Lance's row-level CAS @@ -396,6 +399,7 @@ impl GraphNamespacePublisher { Ok(Arc::try_unwrap(new_dataset).unwrap_or_else(|arc| (*arc).clone())) } + #[cfg(test)] pub(super) async fn publish_requests( &self, requests: &[CreateTableVersionRequest], diff --git a/crates/omnigraph/src/db/manifest/recovery.rs b/crates/omnigraph/src/db/manifest/recovery.rs index 968d3f4..4b0f870 100644 --- a/crates/omnigraph/src/db/manifest/recovery.rs +++ b/crates/omnigraph/src/db/manifest/recovery.rs @@ -830,7 +830,12 @@ pub(crate) async fn recover_manifest_drift( // write-entry heal: a deferred sidecar whose branch was // deleted would otherwise fail every ReadWrite open. coordinator.refresh().await?; - if !coordinator.all_branches().await?.iter().any(|name| name == b) { + if !coordinator + .all_branches() + .await? + .iter() + .any(|name| name == b) + { discard_orphaned_branch_sidecar( root_uri, storage.as_ref(), diff --git a/crates/omnigraph/src/db/manifest/tests.rs b/crates/omnigraph/src/db/manifest/tests.rs index 0e00505..3888bd4 100644 --- a/crates/omnigraph/src/db/manifest/tests.rs +++ b/crates/omnigraph/src/db/manifest/tests.rs @@ -1531,7 +1531,11 @@ async fn test_v2_to_v3_sweeps_legacy_run_branches_on_write_open() { .await .unwrap(); let post = open_manifest_dataset(uri, None).await.unwrap(); - assert_eq!(super::migrations::read_stamp(&post), 2, "stamp rewound to v2"); + assert_eq!( + super::migrations::read_stamp(&post), + 2, + "stamp rewound to v2" + ); } // A no-op publish forces the open-for-write path, which runs the migration. @@ -1556,7 +1560,10 @@ async fn test_v2_to_v3_sweeps_legacy_run_branches_on_write_open() { !after.iter().any(|b| b.starts_with("__run__")), "legacy run branch must be swept; got {after:?}", ); - assert!(after.iter().any(|b| b == "feature"), "user branch must survive"); + assert!( + after.iter().any(|b| b == "feature"), + "user branch must survive" + ); assert!(after.iter().any(|b| b == "main"), "main must survive"); // Idempotent: a second write-open finds the stamp at current and does not diff --git a/crates/omnigraph/src/db/omnigraph.rs b/crates/omnigraph/src/db/omnigraph.rs index 48be274..e1d7acf 100644 --- a/crates/omnigraph/src/db/omnigraph.rs +++ b/crates/omnigraph/src/db/omnigraph.rs @@ -106,6 +106,12 @@ pub struct Omnigraph { coordinator: Arc>, table_store: TableStore, runtime_cache: RuntimeCache, + /// Per-graph read caches: one shared Lance `Session` plus the held-`Dataset` + /// handle cache, handed to live-Branch-read snapshots (via + /// `resolved_target`) so table opens reuse handles (0 IO on a warm repeat) + /// and one session. Invalidated alongside `runtime_cache` on branch switch / + /// refresh — hygiene only; version-in-key carries correctness. + read_caches: Arc, /// Read-heavy on every query, written only by `apply_schema`. ArcSwap /// gives atomic pointer swap with zero-cost reads (`load()` returns a /// `Guard>`), so concurrent queries on different actors @@ -327,6 +333,14 @@ impl Omnigraph { coordinator: Arc::new(tokio::sync::RwLock::new(coordinator)), table_store: TableStore::new(&root), runtime_cache: RuntimeCache::default(), + // One shared Session per graph (LanceDB's one-session-per-connection + // model) plus the held-handle cache, created once and reused across + // reads. Session::default() caps are lazy (6 GiB index / 1 GiB + // metadata); multi-graph cap/sharing is a deferred follow-up. + read_caches: Arc::new(crate::runtime_cache::ReadCaches { + session: Arc::new(lance::session::Session::default()), + handles: Arc::new(crate::runtime_cache::TableHandleCache::default()), + }), catalog: Arc::new(ArcSwap::from_pointee(catalog)), schema_source: Arc::new(ArcSwap::from_pointee(schema_source.to_string())), write_queue: Arc::new(crate::db::write_queue::WriteQueueManager::new()), @@ -351,12 +365,10 @@ impl Omnigraph { Self::open_with_storage_and_mode(uri, storage_for_uri(uri)?, OpenMode::ReadOnly).await } - /// `open_with_storage` retained for existing callers (init/test paths). - /// Defaults to `OpenMode::ReadWrite`. - pub(crate) async fn open_with_storage( - uri: &str, - storage: Arc, - ) -> Result { + /// Open with a caller-supplied [`StorageAdapter`]. Used by init/test paths + /// and by embedding/test consumers that wrap storage (e.g. a counting + /// decorator for IO-budget tests). Defaults to `OpenMode::ReadWrite`. + pub async fn open_with_storage(uri: &str, storage: Arc) -> Result { Self::open_with_storage_and_mode(uri, storage, OpenMode::ReadWrite).await } @@ -428,6 +440,14 @@ impl Omnigraph { coordinator: Arc::new(tokio::sync::RwLock::new(coordinator)), table_store: TableStore::new(&root), runtime_cache: RuntimeCache::default(), + // One shared Session per graph (LanceDB's one-session-per-connection + // model) plus the held-handle cache, created once and reused across + // reads. Session::default() caps are lazy (6 GiB index / 1 GiB + // metadata); multi-graph cap/sharing is a deferred follow-up. + read_caches: Arc::new(crate::runtime_cache::ReadCaches { + session: Arc::new(lance::session::Session::default()), + handles: Arc::new(crate::runtime_cache::TableHandleCache::default()), + }), catalog: Arc::new(ArcSwap::from_pointee(catalog)), schema_source: Arc::new(ArcSwap::from_pointee(schema_source)), write_queue: Arc::new(crate::db::write_queue::WriteQueueManager::new()), @@ -539,6 +559,12 @@ impl Omnigraph { } pub(crate) async fn ensure_schema_state_valid(&self) -> Result<()> { + // Full per-call validation is intentional: a long-lived handle must + // detect external drift of the schema source, IR, OR state on its next + // operation (see lifecycle::long_lived_handle_rejects_schema_* tests). A + // source-only fast path would miss IR/state drift when _schema.pg is + // unchanged, so the only safe latency win is not calling this twice per + // query (finding A removes the redundant caller in exec/query.rs). validate_schema_contract(self.uri(), Arc::clone(&self.storage)).await } @@ -719,10 +745,13 @@ impl Omnigraph { let normalized = normalize_branch_name(branch.unwrap_or("main"))?; let coord = self.coordinator.read().await; if normalized.as_deref() == coord.current_branch() { - let snapshot_id = coord - .head_commit_id() - .await? - .unwrap_or_else(|| SnapshotId::synthetic(coord.current_branch(), coord.version())); + let snapshot_id = coord.head_commit_id().await?.unwrap_or_else(|| { + SnapshotId::synthetic( + coord.current_branch(), + coord.version(), + coord.manifest_incarnation().e_tag.as_deref(), + ) + }); return Ok(ResolvedTarget { requested, branch: coord.current_branch().map(str::to_string), @@ -785,10 +814,15 @@ impl Omnigraph { let branch = normalize_branch_name(branch)?; let next = self.open_coordinator_for_branch(branch.as_deref()).await?; *self.coordinator.write().await = next; - self.runtime_cache.invalidate_all().await; + self.invalidate_read_caches().await; Ok(()) } + async fn invalidate_read_caches(&self) { + self.runtime_cache.invalidate_all().await; + self.read_caches.handles.invalidate_all().await; + } + /// Re-read the handle-local coordinator state from storage AND run /// in-process recovery. Closes the Phase B → Phase C residual (e.g. /// `MutationStaging::finalize` crash mid-publish in a long-running @@ -888,7 +922,7 @@ impl Omnigraph { ) .await?; self.reload_schema_if_source_changed().await?; - self.runtime_cache.invalidate_all().await; + self.invalidate_read_caches().await; Ok(()) } @@ -920,7 +954,7 @@ impl Omnigraph { // write that triggered the heal validates against the stale // schema. Same post-heal step as `refresh`. self.reload_schema_if_source_changed().await?; - self.runtime_cache.invalidate_all().await; + self.invalidate_read_caches().await; } Ok(()) } @@ -956,7 +990,7 @@ impl Omnigraph { /// own publish path. pub(crate) async fn refresh_coordinator_only(&self) -> Result<()> { self.coordinator.write().await.refresh().await?; - self.runtime_cache.invalidate_all().await; + self.invalidate_read_caches().await; Ok(()) } @@ -974,11 +1008,66 @@ impl Omnigraph { target: impl Into, ) -> Result { self.ensure_schema_state_valid().await?; - self.coordinator - .read() - .await - .resolve_target(&target.into()) - .await + let target = target.into(); + let mut resolved = self.resolve_target_inner(&target).await?; + // Attach the read caches (shared Session + held-handle cache) for live + // Branch reads so table opens reuse handles (0 IO on a warm repeat). + // Snapshot-id reads are deliberately NOT cached: they pin a historical + // version `cleanup` may GC, so bypassing the cache sidesteps the + // cleanup-vs-cached-handle edge. Writes never reach here (they use + // `resolved_branch_target`), so they never receive a pinned handle. + if matches!(target, ReadTarget::Branch(_)) { + resolved + .snapshot + .set_read_caches(Arc::clone(&self.read_caches)); + } + Ok(resolved) + } + + /// Resolve a read target to its snapshot, without attaching read caches. + /// Same-branch reads reuse the warm coordinator, gated by a cheap version + /// probe (invariant 6: strong consistency, never a blind warm read). Reads do + /// not need the commit graph (the manifest version is the visibility + /// authority, invariant 2), so the id is synthetic and no commit-graph scan + /// happens on this path. + async fn resolve_target_inner(&self, target: &ReadTarget) -> Result { + if let ReadTarget::Branch(branch) = target { + let normalized = normalize_branch_name(branch)?; + { + let coord = self.coordinator.read().await; + if normalized.as_deref() != coord.current_branch() { + // Different branch: cold resolve (opens that branch). + return coord.resolve_target(target).await; + } + let held = coord.manifest_incarnation(); + if coord.probe_latest_incarnation().await?.matches(&held) { + return Ok(warm_resolved_target(&coord, target)); + } + // Stale: refresh under the write lock below. + } + let mut coord = self.coordinator.write().await; + if normalized.as_deref() == coord.current_branch() { + // Re-check after taking the write lock; another writer may have + // refreshed (tokio RwLock has no read->write upgrade). + let held = coord.manifest_incarnation(); + let mut refreshed = false; + if !coord.probe_latest_incarnation().await?.matches(&held) { + coord.refresh_manifest_only().await?; + refreshed = true; + } + let resolved = warm_resolved_target(&coord, target); + drop(coord); + if refreshed { + self.invalidate_read_caches().await; + } + return Ok(resolved); + } + // Branch changed while waiting for the write lock: cold resolve. + return coord.resolve_target(target).await; + } + + // Snapshot target: resolve through the commit graph as before. + self.coordinator.read().await.resolve_target(target).await } // ─── Change detection ──────────────────────────────────────────────── @@ -1673,6 +1762,24 @@ pub(crate) fn normalize_branch_name(branch: &str) -> Result> { Ok(Some(branch.to_string())) } +/// Build a `ResolvedTarget` from the warm coordinator without opening the commit +/// graph. The live branch snapshot is pinned by the manifest incarnation, so the +/// id is synthetic `(branch, version, e_tag when available)`; nothing on the read +/// path needs a real commit ULID (only `RuntimeCache` keys on the id, where +/// synthetic is consistent). +fn warm_resolved_target(coord: &GraphCoordinator, requested: &ReadTarget) -> ResolvedTarget { + ResolvedTarget { + requested: requested.clone(), + branch: coord.current_branch().map(str::to_string), + snapshot_id: SnapshotId::synthetic( + coord.current_branch(), + coord.version(), + coord.manifest_incarnation().e_tag.as_deref(), + ), + snapshot: coord.snapshot(), + } +} + pub(crate) fn ensure_public_branch_ref(branch: &str, operation: &str) -> Result<()> { if is_internal_system_branch(branch) { return Err(OmniError::manifest(format!( @@ -2523,6 +2630,7 @@ edge WorksAt: Person -> Company db.branch_create("__run__legacy").await.unwrap(); drop(db); { + // forbidden-api-allow: test synthesizes a legacy graph by editing __manifest directly. let mut ds = lance::Dataset::open(&format!("{}/__manifest", uri)) .await .unwrap(); diff --git a/crates/omnigraph/src/db/omnigraph/optimize.rs b/crates/omnigraph/src/db/omnigraph/optimize.rs index 9181822..498f9ae 100644 --- a/crates/omnigraph/src/db/omnigraph/optimize.rs +++ b/crates/omnigraph/src/db/omnigraph/optimize.rs @@ -937,6 +937,7 @@ mod tests { for type_name in ["Person", "Company"] { let table_uri = node_table_uri(uri, type_name); + // forbidden-api-allow: test synthesizes a branch ref directly on the Lance dataset. let mut ds = lance::Dataset::open(&table_uri).await.unwrap(); let base = ds.version().version; ds.create_branch("feature", base, None).await.unwrap(); diff --git a/crates/omnigraph/src/db/omnigraph/schema_apply.rs b/crates/omnigraph/src/db/omnigraph/schema_apply.rs index 48f8099..d013eb2 100644 --- a/crates/omnigraph/src/db/omnigraph/schema_apply.rs +++ b/crates/omnigraph/src/db/omnigraph/schema_apply.rs @@ -447,8 +447,7 @@ where && sidecar_registrations.is_empty() && sidecar_tombstones.is_empty()); if writes_sidecar { - schema_apply_queue_keys - .push(crate::db::manifest::schema_apply_serial_queue_key()); + schema_apply_queue_keys.push(crate::db::manifest::schema_apply_serial_queue_key()); } let _schema_apply_queue_guards = db .write_queue() @@ -530,8 +529,7 @@ where .await?; let table_path = table_path_for_table_key(target_table_key)?; let dataset_uri = db.storage().dataset_uri(&table_path); - let target_ds = - SnapshotHandle::new(TableStore::write_dataset(&dataset_uri, batch).await?); + let target_ds = SnapshotHandle::new(TableStore::write_dataset(&dataset_uri, batch).await?); // Indexes on the renamed table are reconciled later (iss-848). let state = db.storage().table_state(&dataset_uri, &target_ds).await?; table_registrations.insert(target_table_key.clone(), table_path); @@ -750,6 +748,7 @@ where async fn cleanup_dataset_old_versions(db: &Omnigraph, full_uri: &str) -> Result<()> { use chrono::Utc; use lance::dataset::cleanup::CleanupPolicy; + // forbidden-api-allow: maintenance (Hard-drop version GC) opens the dataset to run cleanup_old_versions. let ds = lance::Dataset::open(full_uri) .await .map_err(|e| OmniError::Lance(e.to_string()))?; diff --git a/crates/omnigraph/src/db/omnigraph/table_ops.rs b/crates/omnigraph/src/db/omnigraph/table_ops.rs index d30acff..c325931 100644 --- a/crates/omnigraph/src/db/omnigraph/table_ops.rs +++ b/crates/omnigraph/src/db/omnigraph/table_ops.rs @@ -1097,7 +1097,8 @@ async fn prepare_updates_for_commit( // have null embeddings) is deferred and logged inside // build_indices; a later ensure_indices/optimize materializes it. // The load/mutate/merge commit must not fail on it. - let _pending = build_indices_on_dataset(db, &prepared_update.table_key, &mut ds).await?; + let _pending = + build_indices_on_dataset(db, &prepared_update.table_key, &mut ds).await?; let state = db.storage().table_state(&full_path, &ds).await?; prepared_update.table_version = state.version; prepared_update.row_count = state.row_count; @@ -1350,6 +1351,7 @@ mod classify_fork_ref_tests { // the manifest's `feature` snapshot still places on main. let person = node_path(&db, "feature", "node:Person").await; { + // forbidden-api-allow: test synthesizes a branch ref directly on the Lance dataset. let mut ds = lance::Dataset::open(&person).await.unwrap(); let v = ds.version().version; ds.create_branch("feature", v, None).await.unwrap(); @@ -1362,6 +1364,7 @@ mod classify_fork_ref_tests { // Orphan (ghost): a ref for a branch the manifest does not have at all. { + // forbidden-api-allow: test synthesizes a branch ref directly on the Lance dataset. let mut ds = lance::Dataset::open(&person).await.unwrap(); let v = ds.version().version; ds.create_branch("ghost", v, None).await.unwrap(); diff --git a/crates/omnigraph/src/exec/query.rs b/crates/omnigraph/src/exec/query.rs index b12e26b..e922075 100644 --- a/crates/omnigraph/src/exec/query.rs +++ b/crates/omnigraph/src/exec/query.rs @@ -35,7 +35,7 @@ impl Omnigraph { query_name: &str, params: &ParamMap, ) -> Result { - self.ensure_schema_state_valid().await?; + // resolved_target validates the schema contract; no redundant call here. let resolved = self.resolved_target(target).await?; let catalog = self.catalog(); @@ -80,7 +80,7 @@ impl Omnigraph { query_name: &str, params: &ParamMap, ) -> Result { - self.ensure_schema_state_valid().await?; + // snapshot_at_version validates the schema contract; no redundant call here. let snapshot = self.snapshot_at_version(version).await?; let catalog = self.catalog(); diff --git a/crates/omnigraph/src/instrumentation.rs b/crates/omnigraph/src/instrumentation.rs new file mode 100644 index 0000000..98249c0 --- /dev/null +++ b/crates/omnigraph/src/instrumentation.rs @@ -0,0 +1,231 @@ +//! Read-path cost instrumentation (test seam). +//! +//! Two boundary instruments let cost-budget tests assert that a warm read does +//! no redundant IO, the way LanceDB's IO-counted tests do (see +//! `docs/dev/testing.md`, "Cost-budget tests"): +//! +//! - **Lance object store** — a per-query [`WrappingObjectStore`] attached to the +//! datasets a query opens, so a test counts real `read_iops`. Delivered through +//! a task-local ([`QueryIoProbes`]) set by the test; production leaves it unset, +//! so the open helpers attach nothing (one unset-`Option` check per open). +//! - **omnigraph `StorageAdapter`** — [`CountingStorageAdapter`], a decorator that +//! counts per-method calls (the schema-contract reads on the query path). +//! +//! Nothing here changes runtime behavior: the wrappers only observe, and the +//! decorator delegates every call. `IOTracker` (the concrete counter) lives in +//! tests via the `lance-io` dev-dependency; this module stays generic over the +//! `lance::io`-re-exported trait, so it adds no production dependency. + +use std::sync::Arc; +use std::sync::atomic::{AtomicU64, Ordering}; + +use async_trait::async_trait; +use lance::Dataset; +use lance::dataset::builder::DatasetBuilder; +use lance::io::{ObjectStoreParams, WrappingObjectStore}; + +use crate::error::{OmniError, Result}; +use crate::storage::StorageAdapter; + +/// Per-query IO probes, installed for a query's task via [`with_query_io_probes`]. +/// +/// Each wrapper is attached (when present) to the datasets that category opens, +/// so a test reads `read_iops` off its own `IOTracker` handle. `probe_count` +/// records calls to the version probe (which runs on the coordinator's already-open +/// handle, so it is counted by invocation rather than by the per-query wrappers). +#[derive(Clone, Default)] +pub struct QueryIoProbes { + pub manifest_wrapper: Option>, + pub commit_graph_wrapper: Option>, + /// Attached to the per-table data opens a query performs (the cache-miss + /// path in `SubTableEntry::open`). Lets a cost test assert how many tables + /// a query actually opened — N on a cold read, 0 on a warm repeat once the + /// handle cache (Fix 3) serves them. + pub table_wrapper: Option>, + pub probe_count: Arc, +} + +tokio::task_local! { + static QUERY_IO_PROBES: QueryIoProbes; +} + +/// Run `fut` with per-query IO probes installed. Test-only entry point; nothing +/// in production sets the probes, so the accessors below return `None`/no-op. +pub async fn with_query_io_probes(probes: QueryIoProbes, fut: F) -> F::Output +where + F: std::future::Future, +{ + QUERY_IO_PROBES.scope(probes, fut).await +} + +fn current(f: impl FnOnce(&QueryIoProbes) -> R) -> Option { + QUERY_IO_PROBES.try_with(f).ok() +} + +pub(crate) fn manifest_wrapper() -> Option> { + current(|p| p.manifest_wrapper.clone()).flatten() +} + +pub(crate) fn commit_graph_wrapper() -> Option> { + current(|p| p.commit_graph_wrapper.clone()).flatten() +} + +pub(crate) fn table_wrapper() -> Option> { + current(|p| p.table_wrapper.clone()).flatten() +} + +/// Record one version-probe invocation against the active per-query probes. +/// No-op when no probes are installed (production). +pub(crate) fn record_probe() { + let _ = current(|p| p.probe_count.fetch_add(1, Ordering::Relaxed)); +} + +/// Open a Lance dataset at `uri`, attaching `wrapper` (for IO counting) when +/// present. With no wrapper this is exactly `Dataset::open(uri)`. The wrapper is +/// set via `ObjectStoreParams` on the builder so the open itself is counted +/// (`Dataset::with_object_store_wrappers` only wraps an already-open store). +pub(crate) async fn open_dataset_tracked( + uri: &str, + wrapper: Option>, +) -> Result { + let result = match wrapper { + None => Dataset::open(uri).await, + Some(wrapper) => { + DatasetBuilder::from_uri(uri) + .with_store_params(ObjectStoreParams { + object_store_wrapper: Some(wrapper), + ..Default::default() + }) + .load() + .await + } + }; + result.map_err(|e| OmniError::Lance(e.to_string())) +} + +/// Open a data-table dataset at `location` pinned to `version` — the cache-miss +/// path of the data-read boundary (`SubTableEntry::open`). Attaches the shared +/// per-graph `Session` (warms metadata/index caches across opens, LanceDB's +/// one-session-per-connection pattern) and the per-query `table_wrapper` (for IO +/// counting) when present. With neither, this is exactly the Fix-2 +/// `from_uri(location).with_version(version)` open. +pub(crate) async fn open_table_dataset( + location: &str, + version: u64, + session: Option<&Arc>, +) -> Result { + let mut builder = DatasetBuilder::from_uri(location).with_version(version); + if let Some(session) = session { + builder = builder.with_session(session.clone()); + } + if let Some(wrapper) = table_wrapper() { + builder = builder.with_store_params(ObjectStoreParams { + object_store_wrapper: Some(wrapper), + ..Default::default() + }); + } + builder + .load() + .await + .map_err(|e| OmniError::Lance(e.to_string())) +} + +/// Per-method read counts for [`CountingStorageAdapter`]. +#[derive(Debug, Default)] +pub struct StorageReadCounts { + pub read_text: AtomicU64, + pub exists: AtomicU64, + pub read_text_versioned: AtomicU64, + pub list_dir: AtomicU64, +} + +impl StorageReadCounts { + pub fn read_text(&self) -> u64 { + self.read_text.load(Ordering::Relaxed) + } + pub fn exists(&self) -> u64 { + self.exists.load(Ordering::Relaxed) + } + pub fn read_text_versioned(&self) -> u64 { + self.read_text_versioned.load(Ordering::Relaxed) + } + pub fn list_dir(&self) -> u64 { + self.list_dir.load(Ordering::Relaxed) + } +} + +/// Boundary decorator over a [`StorageAdapter`] that counts read-facing calls. +/// Reads delegate after incrementing; writes delegate unchanged. Construct with +/// [`CountingStorageAdapter::new`] and open an engine via +/// `Omnigraph::open_with_storage` to count its non-Lance storage IO. +#[derive(Debug)] +pub struct CountingStorageAdapter { + inner: Arc, + counts: Arc, +} + +impl CountingStorageAdapter { + /// Wrap `inner`, returning the adapter and a shared handle to its counts. + pub fn new(inner: Arc) -> (Arc, Arc) { + let counts = Arc::new(StorageReadCounts::default()); + let adapter: Arc = Arc::new(Self { + inner, + counts: Arc::clone(&counts), + }); + (adapter, counts) + } +} + +#[async_trait] +impl StorageAdapter for CountingStorageAdapter { + async fn read_text(&self, uri: &str) -> Result { + self.counts.read_text.fetch_add(1, Ordering::Relaxed); + self.inner.read_text(uri).await + } + + async fn write_text(&self, uri: &str, contents: &str) -> Result<()> { + self.inner.write_text(uri, contents).await + } + + async fn write_text_if_absent(&self, uri: &str, contents: &str) -> Result { + self.inner.write_text_if_absent(uri, contents).await + } + + async fn exists(&self, uri: &str) -> Result { + self.counts.exists.fetch_add(1, Ordering::Relaxed); + self.inner.exists(uri).await + } + + async fn rename_text(&self, from_uri: &str, to_uri: &str) -> Result<()> { + self.inner.rename_text(from_uri, to_uri).await + } + + async fn delete(&self, uri: &str) -> Result<()> { + self.inner.delete(uri).await + } + + async fn list_dir(&self, dir_uri: &str) -> Result> { + self.counts.list_dir.fetch_add(1, Ordering::Relaxed); + self.inner.list_dir(dir_uri).await + } + + async fn read_text_versioned(&self, uri: &str) -> Result<(String, String)> { + self.counts.read_text_versioned.fetch_add(1, Ordering::Relaxed); + self.inner.read_text_versioned(uri).await + } + + async fn write_text_if_match( + &self, + uri: &str, + contents: &str, + expected_version: &str, + ) -> Result> { + self.inner + .write_text_if_match(uri, contents, expected_version) + .await + } + + async fn delete_prefix(&self, prefix_uri: &str) -> Result<()> { + self.inner.delete_prefix(prefix_uri).await + } +} diff --git a/crates/omnigraph/src/lib.rs b/crates/omnigraph/src/lib.rs index ff0b3d6..7dd7135 100644 --- a/crates/omnigraph/src/lib.rs +++ b/crates/omnigraph/src/lib.rs @@ -14,6 +14,7 @@ pub mod error; mod exec; pub mod failpoints; pub mod graph_index; +pub mod instrumentation; pub mod loader; pub mod runtime_cache; pub mod storage; diff --git a/crates/omnigraph/src/runtime_cache.rs b/crates/omnigraph/src/runtime_cache.rs index 84b562a..e85a90a 100644 --- a/crates/omnigraph/src/runtime_cache.rs +++ b/crates/omnigraph/src/runtime_cache.rs @@ -1,6 +1,9 @@ use std::collections::{HashMap, VecDeque}; +use std::hash::Hash; use std::sync::Arc; +use lance::Dataset; +use lance::session::Session; use omnigraph_compiler::catalog::Catalog; use tokio::sync::Mutex; @@ -26,17 +29,15 @@ pub struct RuntimeCache { graph_indices: Mutex, } -#[derive(Debug, Default)] +#[derive(Debug)] struct GraphIndexCache { - entries: HashMap>, - lru: VecDeque, + entries: LruMap>, } impl RuntimeCache { pub async fn invalidate_all(&self) { let mut cache = self.graph_indices.lock().await; - cache.entries.clear(); - cache.lru.clear(); + cache.entries.invalidate_all(); } pub async fn graph_index( @@ -48,7 +49,6 @@ impl RuntimeCache { { let mut cache = self.graph_indices.lock().await; if let Some(index) = cache.entries.get(&key).cloned() { - cache.touch(key.clone()); return Ok(index); } } @@ -62,7 +62,6 @@ impl RuntimeCache { let index = Arc::new(GraphIndex::build(&resolved.snapshot, &edge_types).await?); let mut cache = self.graph_indices.lock().await; if let Some(existing) = cache.entries.get(&key).cloned() { - cache.touch(key); return Ok(existing); } cache.insert(key, Arc::clone(&index)); @@ -72,24 +71,86 @@ impl RuntimeCache { impl GraphIndexCache { fn insert(&mut self, key: GraphIndexCacheKey, value: Arc) { - self.entries.insert(key.clone(), value); - self.touch(key); - while self.entries.len() > 8 { - let Some(oldest) = self.lru.pop_front() else { - break; - }; - if self.entries.remove(&oldest).is_some() { - break; - } + self.entries.insert(key, value); + } + + #[cfg(test)] + fn touch(&mut self, key: GraphIndexCacheKey) { + self.entries.touch(key); + } +} + +#[derive(Debug)] +struct LruMap +where + K: Clone + Eq + Hash, +{ + entries: HashMap, + lru: VecDeque, + cap: usize, +} + +impl LruMap +where + K: Clone + Eq + Hash, +{ + fn new(cap: usize) -> Self { + Self { + entries: HashMap::new(), + lru: VecDeque::new(), + cap, } } - fn touch(&mut self, key: GraphIndexCacheKey) { + fn get(&mut self, key: &K) -> Option<&V> { + if self.entries.contains_key(key) { + self.touch(key.clone()); + self.entries.get(key) + } else { + None + } + } + + fn insert(&mut self, key: K, value: V) { + self.entries.insert(key.clone(), value); + self.touch(key); + while self.entries.len() > self.cap { + let Some(oldest) = self.lru.pop_front() else { + break; + }; + self.entries.remove(&oldest); + } + } + + fn invalidate_all(&mut self) { + self.entries.clear(); + self.lru.clear(); + } + + #[cfg(test)] + fn contains_key(&self, key: &K) -> bool { + self.entries.contains_key(key) + } + + #[cfg(test)] + fn len(&self) -> usize { + self.entries.len() + } + + fn touch(&mut self, key: K) { self.lru.retain(|existing| existing != &key); self.lru.push_back(key); } } +impl Default for GraphIndexCache { + fn default() -> Self { + Self { + entries: LruMap::new(8), + } + } +} + fn graph_index_cache_key(resolved: &ResolvedTarget, catalog: &Catalog) -> GraphIndexCacheKey { let mut edge_tables: Vec = catalog .edge_types @@ -114,6 +175,114 @@ fn graph_index_cache_key(resolved: &ResolvedTarget, catalog: &Catalog) -> GraphI } } +/// Max held `Dataset` handles. A handle holds only Arcs (object store + manifest), +/// never table data, so this is cheap; it bounds how many `(table, branch, +/// version, e_tag)` cells stay warm. One graph's live table set across a couple +/// of branches at the current version fits comfortably, with headroom for the +/// recently-superseded versions left by writes until they age out. +const TABLE_HANDLE_CACHE_CAP: usize = 64; + +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +struct TableHandleKey { + table_path: String, + table_branch: Option, + version: u64, + e_tag: Option, +} + +/// Held open-`Dataset` handles keyed by `(table_path, branch, version, e_tag)` — the +/// version-keyed analogue of LanceDB's `DatasetConsistencyWrapper` +/// (`rust/lancedb/src/table/dataset.rs`). A warm read reuses a held handle with +/// zero open IO (a cheap `Dataset` clone); a miss opens once at the location with +/// the shared `Session`. Version plus e_tag are in the key, so a write (or a +/// delete/recreate that reuses a version number on object stores with e_tags) is +/// simply a new key. A same-branch manifest refresh clears this cache as the +/// fallback for e_tag-less table locations. Only read-path Data opens use this — +/// writes open HEAD directly and never receive a pinned handle. +#[derive(Default)] +pub struct TableHandleCache { + inner: Mutex, +} + +struct TableHandleCacheInner { + entries: LruMap, +} + +impl TableHandleCache { + /// Drop all held handles. Correctness never requires this (version-in-key); + /// it is memory hygiene, called from the same hooks that clear the graph + /// index cache (branch switch / refresh). + pub async fn invalidate_all(&self) { + let mut inner = self.inner.lock().await; + inner.entries.invalidate_all(); + } + + /// Return the dataset for `(table_path, branch, version, e_tag)`, reusing a + /// held handle (0 open IO) or opening it once at `location` with the shared + /// `session` on a miss. + pub async fn get_or_open( + &self, + table_path: &str, + table_branch: Option<&str>, + version: u64, + e_tag: Option<&str>, + location: &str, + session: Option<&Arc>, + ) -> Result { + let key = TableHandleKey { + table_path: table_path.to_string(), + table_branch: table_branch.map(str::to_string), + version, + e_tag: e_tag.map(str::to_string), + }; + { + let mut inner = self.inner.lock().await; + if let Some(ds) = inner.entries.get(&key).cloned() { + return Ok(ds); + } + } + // Miss: open without holding the lock (the open is async IO). A concurrent + // double-miss opens twice and one wins the insert — correct (the dataset + // at a version is immutable) and rare. + let ds = crate::instrumentation::open_table_dataset(location, version, session).await?; + let mut inner = self.inner.lock().await; + if let Some(existing) = inner.entries.get(&key).cloned() { + return Ok(existing); + } + inner.insert(key, ds.clone()); + Ok(ds) + } +} + +impl TableHandleCacheInner { + fn insert(&mut self, key: TableHandleKey, value: Dataset) { + self.entries.insert(key, value); + } +} + +impl Default for TableHandleCacheInner { + fn default() -> Self { + Self { + entries: LruMap::new(TABLE_HANDLE_CACHE_CAP), + } + } +} + +/// Per-graph read caches handed to a resolved `Snapshot` so its table opens reuse +/// one shared `Session` (LanceDB's one-session-per-connection pattern) and the +/// held-handle cache. Manual `Debug` because `lance::session::Session` is not +/// `Debug`; this lets `Snapshot` keep its `#[derive(Debug)]`. +pub struct ReadCaches { + pub session: Arc, + pub handles: Arc, +} + +impl std::fmt::Debug for ReadCaches { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.debug_struct("ReadCaches").finish_non_exhaustive() + } +} + #[cfg(test)] mod tests { use std::sync::Arc; @@ -156,4 +325,21 @@ mod tests { assert!(cache.entries.contains_key(&key(0))); assert!(!cache.entries.contains_key(&key(1))); } + + #[test] + fn lru_map_evicts_oldest_and_touch_refreshes_order() { + let mut map = LruMap::new(2); + map.insert("a", 1); + map.insert("b", 2); + + assert_eq!(map.get(&"a"), Some(&1)); + map.insert("c", 3); + + assert!(map.contains_key(&"a")); + assert!(!map.contains_key(&"b")); + assert!(map.contains_key(&"c")); + + map.invalidate_all(); + assert_eq!(map.len(), 0); + } } diff --git a/crates/omnigraph/tests/branching.rs b/crates/omnigraph/tests/branching.rs index 108702c..bd98c9c 100644 --- a/crates/omnigraph/tests/branching.rs +++ b/crates/omnigraph/tests/branching.rs @@ -548,6 +548,174 @@ async fn branch_merge_records_single_latest_commit_with_two_parents() { ); } +// ── P1: commit-DAG coherence on same-branch writes after an external commit ── +// +// `append_commit` takes a new commit's parent from the coordinator's in-memory +// head (commit_graph head_commit, zero storage read), but `commit_all` rebases +// the MANIFEST from a fresh coordinator. So after an external writer advances +// the branch, a same-branch write on a non-refreshed handle commits a fresh +// manifest version yet appends off the stale head — forking the commit DAG (the +// new commit and the external commit share a parent). Data is unaffected (the +// manifest is the visibility authority); only commit history is malformed. +// P1 refreshes the commit-graph head before the append, so the parent is the +// true current head. These two tests are RED before that fix, GREEN after. + +/// Non-strict insert: the fork is pre-existing (commit_all rebases the manifest +/// regardless of the stale head), independent of Fix 1. +#[tokio::test] +async fn same_branch_insert_after_external_commit_is_linear() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + + // Handle A: a long-lived writer whose coordinator head stays pinned at the + // load commit (C0) — it never refreshes before its own write below. + let mut a = init_and_load(&dir).await; + let c0 = CommitGraph::open(uri) + .await + .unwrap() + .head_commit() + .await + .unwrap() + .unwrap(); + + // External writer B advances main: commit C1, parent C0. + let mut b = Omnigraph::open(uri).await.unwrap(); + mutate_main( + &mut b, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "ext_b")], &[("$age", 30)]), + ) + .await + .unwrap(); + let c1 = CommitGraph::open(uri) + .await + .unwrap() + .head_commit() + .await + .unwrap() + .unwrap(); + assert_eq!( + c1.parent_commit_id.as_deref(), + Some(c0.graph_commit_id.as_str()), + "sanity: B's commit C1 should descend from C0" + ); + + // A writes to main WITHOUT refreshing. A's coordinator still thinks the head + // is C0, so a pre-fix append parents the new commit on C0 instead of C1. + mutate_main( + &mut a, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "local_a")], &[("$age", 40)]), + ) + .await + .unwrap(); + + let commits = CommitGraph::open(uri) + .await + .unwrap() + .load_commits() + .await + .unwrap(); + let latest = commits.iter().max_by_key(|c| c.manifest_version).unwrap(); + assert_eq!( + latest.parent_commit_id.as_deref(), + Some(c1.graph_commit_id.as_str()), + "A's same-branch write after an external commit must append off the true \ + head C1, not the stale head C0 (commit-DAG fork)" + ); + let c0_children = commits + .iter() + .filter(|c| c.parent_commit_id.as_deref() == Some(c0.graph_commit_id.as_str())) + .count(); + assert_eq!(c0_children, 1, "C0 must have exactly one child; two is the fork"); +} + +/// Strict update after a read: Fix 1's `refresh_manifest_only` makes the read +/// freshen the read-time pin, defeating the strict 409 that used to force a +/// coherent refresh — so the same stale-head append forks strict ops too. +#[tokio::test] +async fn same_branch_update_after_external_commit_and_read_is_linear() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + + // A inserts the row it will later update; this is A's own commit (Ca), so + // A's coordinator head is Ca. + let mut a = init_and_load(&dir).await; + mutate_main( + &mut a, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "target")], &[("$age", 40)]), + ) + .await + .unwrap(); + let ca = CommitGraph::open(uri) + .await + .unwrap() + .head_commit() + .await + .unwrap() + .unwrap(); + + // External writer B advances main: commit Cb, parent Ca. + let mut b = Omnigraph::open(uri).await.unwrap(); + mutate_main( + &mut b, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "ext_b")], &[("$age", 30)]), + ) + .await + .unwrap(); + let cb = CommitGraph::open(uri) + .await + .unwrap() + .head_commit() + .await + .unwrap() + .unwrap(); + assert_eq!(cb.parent_commit_id.as_deref(), Some(ca.graph_commit_id.as_str())); + + // A reads main: the stale-probe path refreshes A's MANIFEST (via + // refresh_manifest_only) but not its commit-graph head, freshening the + // read-time pin so the strict update below skips its 409. + query_main(&mut a, TEST_QUERIES, "total_people", ¶ms(&[])) + .await + .unwrap(); + + // Strict update, no explicit refresh: pre-fix it appends off the stale head + // Ca instead of Cb. + mutate_main( + &mut a, + MUTATION_QUERIES, + "set_age", + &mixed_params(&[("$name", "target")], &[("$age", 99)]), + ) + .await + .unwrap(); + + let commits = CommitGraph::open(uri) + .await + .unwrap() + .load_commits() + .await + .unwrap(); + let latest = commits.iter().max_by_key(|c| c.manifest_version).unwrap(); + assert_eq!( + latest.parent_commit_id.as_deref(), + Some(cb.graph_commit_id.as_str()), + "a strict update after an external commit and a local read must append \ + off the true head Cb, not the stale head Ca" + ); + let ca_children = commits + .iter() + .filter(|c| c.parent_commit_id.as_deref() == Some(ca.graph_commit_id.as_str())) + .count(); + assert_eq!(ca_children, 1, "Ca must have exactly one child; two is the fork"); +} + #[tokio::test] async fn branch_merge_records_actor_on_latest_commit() { let dir = tempfile::tempdir().unwrap(); diff --git a/crates/omnigraph/tests/forbidden_apis.rs b/crates/omnigraph/tests/forbidden_apis.rs index e079464..667e8c5 100644 --- a/crates/omnigraph/tests/forbidden_apis.rs +++ b/crates/omnigraph/tests/forbidden_apis.rs @@ -71,6 +71,14 @@ const FORBIDDEN_PATTERNS: &[&str] = &[ "Dataset::drop_columns", "Dataset::truncate_table", "Dataset::restore", + // Raw dataset OPENS — all reads must route through `Snapshot::open` (the + // held-handle cache + shared Session, Fix 3). Only the instrumented opener + // (`instrumentation.rs`) and the storage/manifest layers (allow-listed below) + // open datasets directly; forbidding these in the read/exec layer keeps a + // future read from silently bypassing the cache. + "Dataset::open", + "DatasetBuilder::from_uri", + "DatasetBuilder::from_namespace", // Lance-specific method names that don't clash with our `TableStore` // wrappers (we use `merge_insert_batch{,es}`, `add_columns_to_*`, // etc. — never the bare Lance names). Engine code that writes @@ -106,6 +114,7 @@ const ALLOW_LIST_FILES: &[&str] = &[ "commit_graph.rs", // Maintains `_graph_commits.lance` system table. "graph_coordinator.rs", // Drives the manifest publisher / branch coordinator. "recovery_audit.rs", // Maintains `_graph_commit_recoveries.lance` (recovery audit trail). + "instrumentation.rs", // The instrumented dataset opener (open_dataset_tracked / open_table_dataset). ]; /// Directories exempt from the guard. Files under these paths may use diff --git a/crates/omnigraph/tests/helpers/mod.rs b/crates/omnigraph/tests/helpers/mod.rs index 6476e1a..e690839 100644 --- a/crates/omnigraph/tests/helpers/mod.rs +++ b/crates/omnigraph/tests/helpers/mod.rs @@ -166,6 +166,21 @@ pub async fn mutate_branch( db.mutate(branch, query_source, query_name, params).await } +/// Advance the manifest version `n` times (one commit per insert), building +/// deep commit history for cost-budget tests (history depth, not row count). +pub async fn commit_many(db: &mut Omnigraph, n: usize) { + for i in 0..n { + mutate_main( + db, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", &format!("commit_many_{i}"))], &[("$age", 30)]), + ) + .await + .unwrap(); + } +} + pub async fn snapshot_main(db: &Omnigraph) -> Result { db.snapshot_of(ReadTarget::branch("main")).await } diff --git a/crates/omnigraph/tests/warm_read_cost.rs b/crates/omnigraph/tests/warm_read_cost.rs new file mode 100644 index 0000000..d7fc52a --- /dev/null +++ b/crates/omnigraph/tests/warm_read_cost.rs @@ -0,0 +1,833 @@ +//! Cost-budget tests for the warm read path (Fix 1): a warm same-branch read +//! must perform no manifest or commit-graph opens, measured with Lance's +//! `IOTracker` at the object-store boundary (the LanceDB IO-counted-test +//! pattern; see docs/dev/testing.md). Guards invariant 15 (read cost bounded by +//! work, not history) for snapshot resolution, and invariant 6 (a warm reader +//! still observes external commits). + +mod helpers; + +use std::sync::Arc; +use std::sync::atomic::{AtomicU64, Ordering}; + +use arrow_array::{Array, StringArray}; +use lance::io::WrappingObjectStore; +use lance_io::utils::tracking_store::IOTracker; +use omnigraph::db::{Omnigraph, ReadTarget}; +use omnigraph::instrumentation::{QueryIoProbes, with_query_io_probes}; +use omnigraph_compiler::result::QueryResult; + +use helpers::{ + MUTATION_QUERIES, TEST_QUERIES, commit_many, count_rows, init_and_load, mixed_params, + mutate_branch, mutate_main, params, +}; + +/// IO probes plus the tracker handles to read `read_iops` after the query. +/// Returns `(probes, manifest, commit_graph, table, probe_count)` — `table` +/// counts per-table data opens (the cache-miss path), so a cost test can assert +/// N opens on a cold read and 0 on a warm repeat (Fix 3). +fn probes() -> ( + QueryIoProbes, + IOTracker, + IOTracker, + IOTracker, + Arc, +) { + let manifest = IOTracker::default(); + let commit_graph = IOTracker::default(); + let table = IOTracker::default(); + let probe_count = Arc::new(AtomicU64::new(0)); + let probes = QueryIoProbes { + manifest_wrapper: Some(Arc::new(manifest.clone()) as Arc), + commit_graph_wrapper: Some(Arc::new(commit_graph.clone()) as Arc), + table_wrapper: Some(Arc::new(table.clone()) as Arc), + probe_count: Arc::clone(&probe_count), + }; + (probes, manifest, commit_graph, table, probe_count) +} + +fn first_column_strings(result: &QueryResult) -> Vec { + if result.num_rows() == 0 { + return Vec::new(); + } + let batch = result.concat_batches().unwrap(); + let values = batch + .column(0) + .as_any() + .downcast_ref::() + .unwrap(); + let mut out = (0..values.len()) + .filter(|&row| !values.is_null(row)) + .map(|row| values.value(row).to_string()) + .collect::>(); + out.sort(); + out +} + +/// A warm same-branch read must not re-open or scan `__manifest`, and must not +/// open the commit graph, even at commit-history depth. The only manifest IO is +/// the version probe (counted by invocation). Fails before Fix 1, where the read +/// path re-opens a fresh coordinator and scans both internal tables. +#[tokio::test] +async fn warm_same_branch_read_does_no_resolution_opens() { + let dir = tempfile::tempdir().unwrap(); + let mut db = init_and_load(&dir).await; + // Deep history: warm-read resolution cost must be flat in commit count. + commit_many(&mut db, 20).await; + + let (probes_in, manifest, commit_graph, _table, probe_count) = probes(); + with_query_io_probes( + probes_in, + db.query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ), + ) + .await + .unwrap(); + + // A warm same-branch read opens nothing from the internal tables, even at + // commit-history depth. Fix 1 reuses the coordinator (no re-open: 0 + // commit-graph opens, exactly 1 cheap version probe). Fix 2 opens the touched + // data table by location+version instead of via the namespace, so the + // per-table __manifest scan is gone too. Pre-fix, each of these is a deep scan + // of an internal table that grows with commit count. + assert_eq!( + manifest.stats().read_iops, + 0, + "warm same-branch read must not scan __manifest (resolution or per-table)" + ); + assert_eq!( + commit_graph.stats().read_iops, + 0, + "warm same-branch read must not open the commit graph (no coordinator re-open)" + ); + assert_eq!( + probe_count.load(Ordering::Relaxed), + 1, + "warm same-branch read performs exactly one version probe" + ); +} + +/// A multi-table query (a traversal touching Person, WorksAt, and Company) scans +/// `__manifest` zero times. Fix 2 opens every touched table by location+version, +/// so manifest IO no longer scales with the number of tables — pre-Fix-2 each +/// table cost two full `__manifest` scans (`describe_table` + +/// `describe_table_version`), which is the "2 tables = 2×" multi-table tax. +#[tokio::test] +async fn multi_table_query_does_no_manifest_scans() { + let dir = tempfile::tempdir().unwrap(); + let db = init_and_load(&dir).await; + + let (probes_in, manifest, _commit_graph, _table, _probe) = probes(); + with_query_io_probes( + probes_in, + db.query( + ReadTarget::branch("main"), + TEST_QUERIES, + "age_stats", + ¶ms(&[]), + ), + ) + .await + .unwrap(); + + assert_eq!( + manifest.stats().read_iops, + 0, + "a multi-table read must not scan __manifest once per touched table" + ); +} + +/// A warm reader must observe a commit made through another handle (invariant 6, +/// strong consistency): the version probe detects the advance and refreshes. +/// Passes before and after Fix 1 (today's cold re-read is always fresh); a +/// regression guard so the warm-reuse fast path never serves a stale read. +#[tokio::test] +async fn external_commit_observed_by_warm_reader() { + let dir = tempfile::tempdir().unwrap(); + let mut writer = init_and_load(&dir).await; + let uri = dir.path().to_str().unwrap(); + let reader = Omnigraph::open(uri).await.unwrap(); + + let before = count_rows(&reader, "node:Person").await; + + // External commit through a separate handle. + mutate_main( + &mut writer, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "ext_new_person")], &[("$age", 41)]), + ) + .await + .unwrap(); + + let after = count_rows(&reader, "node:Person").await; + assert_eq!( + after, + before + 1, + "warm reader must observe an external commit" + ); +} + +// ── Finding A: drop the redundant per-query schema validation ───────────────── +// +// Every query runs `ensure_schema_state_valid`. It ran TWICE per query (once in +// query()/run_query_at, once again in resolved_target/snapshot_at_version), each +// reading 3 contract files + 2 existence probes (~10 storage ops). Finding A +// removes the redundant caller, so validation runs once. (A cheaper source-only +// probe was rejected: the codebase requires per-call detection of IR/state drift +// on long-lived handles -- lifecycle::long_lived_handle_rejects_schema_ir_drift +// -- which a source-only compare would miss.) Measured at the StorageAdapter +// boundary with the counting decorator. + +/// A warm query validates the schema contract exactly once (3 reads + 2 exists), +/// not twice. Fails before finding A, where query() and resolved_target each +/// validate (6 read_text + 4 exists). +#[tokio::test] +async fn warm_query_validates_schema_contract_once() { + use omnigraph::instrumentation::CountingStorageAdapter; + use omnigraph::storage::storage_for_uri; + + let dir = tempfile::tempdir().unwrap(); + // Init through the standard path, then re-open behind a counting adapter to + // measure the per-query schema-contract storage reads (delta around the + // query excludes open-time reads). + let _ = init_and_load(&dir).await; + let uri = dir.path().to_str().unwrap(); + let (adapter, counts) = CountingStorageAdapter::new(storage_for_uri(uri).unwrap()); + let db = Omnigraph::open_with_storage(uri, adapter).await.unwrap(); + + let before_read_text = counts.read_text(); + let before_exists = counts.exists(); + db.query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ) + .await + .unwrap(); + + assert_eq!( + counts.read_text() - before_read_text, + 3, + "warm query should validate the schema contract once (3 reads), not twice" + ); + assert_eq!( + counts.exists() - before_exists, + 2, + "warm query should probe contract-file existence once (2 probes), not twice" + ); +} + +/// The cheap source-compare must still detect that the on-disk schema source has +/// drifted from the validated contract and fail the read, rather than serving the +/// stale-but-cached schema. Passes before and after finding A (regression guard +/// for the documented weaker per-query guard). +#[tokio::test] +async fn schema_source_drift_is_caught_on_read() { + let dir = tempfile::tempdir().unwrap(); + let _writer = init_and_load(&dir).await; + let uri = dir.path().to_str().unwrap(); + let reader = Omnigraph::open(uri).await.unwrap(); + + // Drift the on-disk schema source behind the reader's back. + std::fs::write( + dir.path().join("_schema.pg"), + "this is not a valid schema {{{", + ) + .unwrap(); + + let result = reader + .query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ) + .await; + assert!( + result.is_err(), + "a query must fail when the on-disk schema source has drifted from the validated contract" + ); +} + +// ── Morphological-matrix coverage: branch-warm + stale-refresh cells ────────── + +/// A WARM read on a non-main branch (handle synced to that branch) also scans +/// `__manifest` zero times. Exercises Fix 2's branch-owned-table open +/// (`{table_path}/tree/{branch}` + with_version) on Fix 1's warm path — the cell +/// that regressed when the open used `with_branch` against the base. +#[tokio::test] +async fn warm_branch_read_does_no_manifest_scans() { + let dir = tempfile::tempdir().unwrap(); + let db = init_and_load(&dir).await; + db.branch_create("feature").await.unwrap(); + // Write to the branch so its tables are branch-owned (under tree/feature). + db.mutate( + "feature", + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "Eve")], &[("$age", 22)]), + ) + .await + .unwrap(); + // Bind the handle's coordinator to the branch so reads of it take the warm path. + db.sync_branch("feature").await.unwrap(); + + let (probes_in, manifest, commit_graph, _table, probe_count) = probes(); + with_query_io_probes( + probes_in, + db.query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ), + ) + .await + .unwrap(); + + assert_eq!( + manifest.stats().read_iops, + 0, + "warm branch read must not scan __manifest (branch-owned table opened by location)" + ); + assert_eq!( + commit_graph.stats().read_iops, + 0, + "warm branch read must not open the commit graph" + ); + assert_eq!( + probe_count.load(Ordering::Relaxed), + 1, + "warm branch read performs exactly one version probe" + ); +} + +/// A non-main branch can be deleted and recreated at the same Lance version +/// number. Warm branch freshness therefore needs the manifest incarnation, not +/// just `version()`, or a reader pinned to the old incarnation can serve stale +/// rows from the deleted branch. This is the correctness guard for Phase 6A. +#[tokio::test] +async fn warm_read_on_recreated_branch_observes_new_incarnation() { + let dir = tempfile::tempdir().unwrap(); + let mut writer = init_and_load(&dir).await; + let uri = dir.path().to_str().unwrap(); + let reader = Omnigraph::open(uri).await.unwrap(); + + writer.branch_create("feature").await.unwrap(); + mutate_branch( + &mut writer, + "feature", + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "Eve")], &[("$age", 22)]), + ) + .await + .unwrap(); + + reader.sync_branch("feature").await.unwrap(); + let old_feature = reader + .query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "get_person", + ¶ms(&[("$name", "Eve")]), + ) + .await + .unwrap(); + assert_eq!( + old_feature.num_rows(), + 1, + "test setup: old feature branch must contain Eve" + ); + let old_version = reader + .version_of(ReadTarget::branch("feature")) + .await + .unwrap(); + + writer.branch_delete("feature").await.unwrap(); + mutate_main( + &mut writer, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "MainOnly")], &[("$age", 44)]), + ) + .await + .unwrap(); + writer.branch_create("feature").await.unwrap(); + let new_version = writer + .version_of(ReadTarget::branch("feature")) + .await + .unwrap(); + assert_eq!( + new_version, old_version, + "test setup must exercise branch incarnation reuse at one Lance version" + ); + + let (probes_in, manifest, commit_graph, _table, probe_count) = probes(); + let new_feature = with_query_io_probes( + probes_in, + reader.query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "get_person", + ¶ms(&[("$name", "MainOnly")]), + ), + ) + .await + .unwrap(); + + assert_eq!( + new_feature.num_rows(), + 1, + "warm reader must refresh to the recreated branch incarnation" + ); + assert!( + manifest.stats().read_iops > 0, + "recreated branch must re-read the manifest after the incarnation probe" + ); + assert_eq!( + commit_graph.stats().read_iops, + 0, + "same-branch incarnation refresh must be manifest-only" + ); + assert_eq!( + probe_count.load(Ordering::Relaxed), + 2, + "stale same-branch read probes once under the read lock and once under the write lock" + ); +} + +/// Recreated non-main branches can reuse the same branch-owned table version. +/// This forces the held table-handle cache to distinguish incarnations by the +/// per-table Lance manifest e_tag, not just `(table_path, branch, version)`. +#[tokio::test] +async fn recreated_branch_owned_table_handle_uses_table_etag() { + let dir = tempfile::tempdir().unwrap(); + let mut writer = init_and_load(&dir).await; + let uri = dir.path().to_str().unwrap(); + let reader = Omnigraph::open(uri).await.unwrap(); + + writer.branch_create("feature").await.unwrap(); + mutate_branch( + &mut writer, + "feature", + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "OldOnly")], &[("$age", 31)]), + ) + .await + .unwrap(); + + reader.sync_branch("feature").await.unwrap(); + let old_person = reader + .query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "get_person", + ¶ms(&[("$name", "OldOnly")]), + ) + .await + .unwrap(); + assert_eq!(old_person.num_rows(), 1); + let old_entry = reader + .snapshot_of(ReadTarget::branch("feature")) + .await + .unwrap() + .entry("node:Person") + .unwrap() + .clone(); + assert_eq!(old_entry.table_branch.as_deref(), Some("feature")); + + writer.branch_delete("feature").await.unwrap(); + writer.branch_create("feature").await.unwrap(); + mutate_branch( + &mut writer, + "feature", + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "NewOnly")], &[("$age", 32)]), + ) + .await + .unwrap(); + let new_entry = writer + .snapshot_of(ReadTarget::branch("feature")) + .await + .unwrap() + .entry("node:Person") + .unwrap() + .clone(); + assert_eq!(new_entry.table_path, old_entry.table_path); + assert_eq!(new_entry.table_branch, old_entry.table_branch); + assert_eq!( + new_entry.table_version, old_entry.table_version, + "test setup must force table handle identity to differ only by e_tag" + ); + + let (probes_in, manifest, commit_graph, table, probe_count) = probes(); + let new_person = with_query_io_probes( + probes_in, + reader.query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "get_person", + ¶ms(&[("$name", "NewOnly")]), + ), + ) + .await + .unwrap(); + assert_eq!( + new_person.num_rows(), + 1, + "warm reader must open the recreated branch-owned table incarnation" + ); + assert!( + table.stats().read_iops > 0, + "table e_tag must force a held-handle cache miss for the recreated table" + ); + assert!( + manifest.stats().read_iops > 0, + "recreated branch must refresh the manifest" + ); + assert_eq!( + commit_graph.stats().read_iops, + 0, + "same-branch table-incarnation refresh must be manifest-only" + ); + assert_eq!( + probe_count.load(Ordering::Relaxed), + 2, + "stale same-branch read probes once under each lock" + ); + + let stale_old_person = reader + .query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "get_person", + ¶ms(&[("$name", "OldOnly")]), + ) + .await + .unwrap(); + assert_eq!( + stale_old_person.num_rows(), + 0, + "old branch-owned table contents must not leak after branch recreation" + ); +} + +/// The graph-index cache is keyed by synthetic snapshot id plus edge-table +/// state. A recreated branch can reuse the same edge table `(branch, version)`, +/// so the synthetic snapshot id must carry the manifest incarnation or traversal +/// can reuse stale topology. +#[tokio::test] +async fn recreated_branch_traversal_uses_graph_index_incarnation() { + let dir = tempfile::tempdir().unwrap(); + let mut writer = init_and_load(&dir).await; + let uri = dir.path().to_str().unwrap(); + let reader = Omnigraph::open(uri).await.unwrap(); + + writer.branch_create("feature").await.unwrap(); + mutate_branch( + &mut writer, + "feature", + MUTATION_QUERIES, + "insert_person_and_friend", + &mixed_params( + &[("$name", "OldWalker"), ("$friend", "Alice")], + &[("$age", 41)], + ), + ) + .await + .unwrap(); + + reader.sync_branch("feature").await.unwrap(); + let old_friends = reader + .query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "friends_of", + ¶ms(&[("$name", "OldWalker")]), + ) + .await + .unwrap(); + assert_eq!(first_column_strings(&old_friends), vec!["Alice"]); + let old_edge_entry = reader + .snapshot_of(ReadTarget::branch("feature")) + .await + .unwrap() + .entry("edge:Knows") + .unwrap() + .clone(); + assert_eq!(old_edge_entry.table_branch.as_deref(), Some("feature")); + + writer.branch_delete("feature").await.unwrap(); + writer.branch_create("feature").await.unwrap(); + mutate_branch( + &mut writer, + "feature", + MUTATION_QUERIES, + "insert_person_and_friend", + &mixed_params( + &[("$name", "NewWalker"), ("$friend", "Bob")], + &[("$age", 42)], + ), + ) + .await + .unwrap(); + let new_edge_entry = writer + .snapshot_of(ReadTarget::branch("feature")) + .await + .unwrap() + .entry("edge:Knows") + .unwrap() + .clone(); + assert_eq!(new_edge_entry.table_path, old_edge_entry.table_path); + assert_eq!(new_edge_entry.table_branch, old_edge_entry.table_branch); + assert_eq!( + new_edge_entry.table_version, old_edge_entry.table_version, + "test setup must force graph-index identity to differ only by snapshot incarnation" + ); + + let (probes_in, manifest, commit_graph, _table, probe_count) = probes(); + let new_friends = with_query_io_probes( + probes_in, + reader.query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "friends_of", + ¶ms(&[("$name", "NewWalker")]), + ), + ) + .await + .unwrap(); + assert_eq!( + first_column_strings(&new_friends), + vec!["Bob"], + "traversal must use the recreated branch's topology, not stale cached graph index" + ); + assert!( + manifest.stats().read_iops > 0, + "recreated branch traversal must refresh the manifest" + ); + assert_eq!( + commit_graph.stats().read_iops, + 0, + "same-branch traversal incarnation refresh must be manifest-only" + ); + assert_eq!( + probe_count.load(Ordering::Relaxed), + 2, + "stale same-branch read probes once under each lock" + ); + + let stale_old_friends = reader + .query( + ReadTarget::branch("feature"), + TEST_QUERIES, + "friends_of", + ¶ms(&[("$name", "OldWalker")]), + ) + .await + .unwrap(); + assert_eq!( + first_column_strings(&stale_old_friends), + Vec::::new(), + "old branch topology must not leak after branch recreation" + ); +} + +/// When an external writer advances the manifest, the reader's next query takes +/// the STALE path: it re-reads the manifest (read_iops > 0) but never scans the +/// commit graph (`refresh_manifest_only`), unlike a full coordinator refresh. +/// Pins Fix 1's manifest-only refresh. +#[tokio::test] +async fn stale_read_refreshes_manifest_only() { + let dir = tempfile::tempdir().unwrap(); + let mut writer = init_and_load(&dir).await; + let uri = dir.path().to_str().unwrap(); + let reader = Omnigraph::open(uri).await.unwrap(); + // Establish the reader's warm coordinator. + reader + .query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ) + .await + .unwrap(); + + // External commit advances the on-disk manifest behind the reader. + mutate_main( + &mut writer, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "Frank")], &[("$age", 33)]), + ) + .await + .unwrap(); + + let (probes_in, manifest, commit_graph, _table, probe_count) = probes(); + with_query_io_probes( + probes_in, + reader.query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ), + ) + .await + .unwrap(); + + assert!( + manifest.stats().read_iops > 0, + "stale read must re-read the manifest" + ); + assert_eq!( + commit_graph.stats().read_iops, + 0, + "stale refresh must be manifest-only (no commit-graph scan)" + ); + assert_eq!( + probe_count.load(Ordering::Relaxed), + 2, + "stale same-branch read probes once under the read lock and once under the write lock" + ); +} + +// ── Fix 3: held-handle cache — warm repeat reads stop re-opening tables ──────── +// +// After Fix 1+2 a warm same-branch read still re-opened every touched table per +// query (the "never warms up" residual). Fix 3 holds the open `Dataset` per +// `(table, branch, version, e_tag)` (the version-keyed analogue of LanceDB's +// `DatasetConsistencyWrapper`) and shares one `Session` per graph, so a second +// identical warm read reuses the handle with zero table opens. + +/// Headline: a second identical warm same-branch read does ZERO table opens +/// (the cold first read opens; the warm repeat serves from the held-handle +/// cache). Fails before Fix 3, where every read re-opens the table. +#[tokio::test] +async fn repeat_warm_read_reuses_table_handles() { + let dir = tempfile::tempdir().unwrap(); + let mut db = init_and_load(&dir).await; + // Deep history: the win must hold regardless of commit count. + commit_many(&mut db, 10).await; + + // Cold first read: opens the touched table. + let (p1, _m1, _c1, table1, _pr1) = probes(); + with_query_io_probes( + p1, + db.query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ), + ) + .await + .unwrap(); + assert!( + table1.stats().read_iops > 0, + "the cold first read must open the table" + ); + + // Warm repeat: the held handle is reused, so no open happens through this + // query's table wrapper. + let (p2, manifest2, commit_graph2, table2, probe2) = probes(); + with_query_io_probes( + p2, + db.query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ), + ) + .await + .unwrap(); + assert_eq!( + table2.stats().read_iops, + 0, + "a warm repeat read must reuse the held handle (0 table opens)" + ); + assert_eq!( + manifest2.stats().read_iops, + 0, + "warm repeat read: 0 manifest opens" + ); + assert_eq!( + commit_graph2.stats().read_iops, + 0, + "warm repeat read: 0 commit-graph opens" + ); + assert_eq!( + probe2.load(Ordering::Relaxed), + 1, + "warm repeat read: exactly one version probe" + ); +} + +/// A write advances the table's version, so the next read misses the +/// version-keyed cache and re-opens — never serving a stale handle (invariant 6 +/// for the cached path). Passes with or without the cache; a correctness guard +/// that the cache cannot serve pre-write data. +#[tokio::test] +async fn write_invalidates_table_cache_for_changed_table() { + let dir = tempfile::tempdir().unwrap(); + let mut db = init_and_load(&dir).await; + + let before = count_rows(&db, "node:Person").await; + + // Warm the cache for Person. + db.query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ) + .await + .unwrap(); + + // Write Person: its version advances, so the cached (table, branch, version) + // key is now superseded. + mutate_main( + &mut db, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "cache_miss_one")], &[("$age", 50)]), + ) + .await + .unwrap(); + + // The next read re-opens Person at the new version (cache miss). + let (p, _m, _c, table, _pr) = probes(); + with_query_io_probes( + p, + db.query( + ReadTarget::branch("main"), + TEST_QUERIES, + "total_people", + ¶ms(&[]), + ), + ) + .await + .unwrap(); + assert!( + table.stats().read_iops > 0, + "a read after a write to the table must re-open it (version-keyed miss)" + ); + + let after = count_rows(&db, "node:Person").await; + assert_eq!( + after, + before + 1, + "the post-write read observes the new row (no stale handle served)" + ); +} diff --git a/docs/dev/invariants.md b/docs/dev/invariants.md index 2fa87d1..eb6821a 100644 --- a/docs/dev/invariants.md +++ b/docs/dev/invariants.md @@ -53,7 +53,13 @@ converge the physical state. versioning, fragments, branches, compaction, cleanup, and index primitives. DataFusion should own relational execution where it fits. Do not add custom WALs, transaction managers, buffer pools, page formats, or local clones of - substrate behavior. Read [lance.md](lance.md) before guessing. + substrate behavior. Read [lance.md](lance.md) before guessing. Respecting the + substrate also means *using* it idiomatically, not only refraining from + rebuilding it: reuse long-lived handles instead of re-opening per call, + resolve latest state through the substrate's cheap primitive instead of + re-scanning, and share its caches/session. Re-deriving per call what the + substrate keeps warm is a substrate violation even when no code is + reimplemented. 2. **Graph visibility is manifest-atomic.** Lance commits are per dataset. OmniGraph's graph-level atomicity comes from publishing one manifest update @@ -126,6 +132,18 @@ converge the physical state. a substitute for missing lower-level assertions. Read [testing.md](testing.md) before adding tests. +15. **One source of truth, cheaply derived.** Lance and the manifest are the + source of truth. Everything the engine needs at runtime is a derived view of + them: read or projected on demand, held warm, refreshed by a cheap probe. Two + failure modes are forbidden. A *parallel copy* the engine maintains can drift + from the source, and that divergence compounds over time. *Cold + re-derivation* rebuilds the view from the full source on every call, so its + cost grows with history. Invariants 1 and 7, and the deny-list "state that + drifts" and "manifest-derivable reconciler" items, are instances; so is + bounding a read's cost to its working set rather than the commit count. This + is the structural face of "engineering is programming integrated over time": + both failure modes are liabilities that compound as the system grows. + ## Current Truth Matrix | Area | Current state | Source | @@ -252,6 +270,37 @@ them explicit. - **Resource bounds:** some operations still lack enforced per-query memory or time budgets. New long-running work should add explicit bounds rather than widening the gap. +- **Read-path re-derivation (largely closed by the query-latency work):** + snapshot resolution used to re-open a fresh coordinator per read (a full + `__manifest` re-scan plus two commit-graph scans), open each table through the + namespace (two more `__manifest` scans per table), validate the schema twice, + and share no Lance `Session`. That was an O(commits) cost that never warmed up. + Fix 1 (warm coordinator reuse behind a `latest_version_id` probe), Fix 2 (open + tables by location+version), finding A (validate once), and Fix 3 (a held + `Dataset`-handle cache keyed by `(table, branch, version, e_tag when Lance + exposes it)` plus one shared `Session` per graph) remove that tax: a warm + same-branch read does one probe, one schema read, and zero opens on a repeat. + Non-main branch freshness compares the manifest incarnation (`version` plus + manifest-location e_tag when available, otherwise Lance manifest timestamp), + because Lance branch names can be deleted/recreated at the same version number; + the manifest e_tag is carried into synthetic snapshot ids when available, and + a detected same-branch manifest refresh clears read caches as the fallback for + e_tag-less table locations/topology. Remaining: the internal metadata tables + (`__manifest`, `_graph_commits`) are still not compacted, so the probe and + refresh cost still grows with fragment count on a long-lived graph (the + `optimize`-covers-internal-tables follow-up); the commit graph is not yet + reconcilable from the manifest; and the traversal id-map is still rebuilt. +- **Commit-graph parent under concurrency:** `record_graph_commit` now refreshes + the commit-graph head from storage before appending, so a same-branch write + after an external commit no longer forks the commit DAG by parenting off a + stale cached head (the single-process fork, pre-existing for non-strict + inserts and widened to strict ops by Fix 1's `refresh_manifest_only`, is now + closed). Residual: two processes writing disjoint tables can still pass their + per-table manifest CAS and append off the same parent (a refresh-then-append + TOCTOU). The convergent fix is reconcile-from-manifest (parent = the commit at + the manifest version the publisher CAS'd against; `manifest_version` is on + every commit row), composing with the manifest-to-commit-graph atomicity gap; + it needs commit-graph append ordering or a Lance append-CAS to fully close. ## Deny-list @@ -277,6 +326,10 @@ case is exceptional. - Cost-blind plan choice when statistics are available or required. - Hidden statistics for behavior that affects planning or operator choice. - Hash-map iteration order in result ordering, plan choice, or migration output. +- Cold re-derivation on the hot path: rebuilding from the full source what could + be held warm and refreshed cheaply, so cost scales with history rather than the + working set (the cost face of invariant 15; "state that drifts" above is its + shadow-copy face). - String-flattened SQL/filter generation when a structured pushdown API is available. - Eager multi-hop cross-product materialization when factorization fits. @@ -313,6 +366,8 @@ Use this as yes/no/NA for any non-trivial design or PR: - Are stats/capabilities exposed when behavior depends on them? - Are existing known gaps left no worse and documented if touched? - Does the test live at the same boundary as the change? +- Is this operation's cost bounded with respect to history and scale, or does it + re-derive warm state from cold storage per call? - Does the change avoid every deny-list pattern, or justify the exception? ## Maintenance Policy diff --git a/docs/dev/testing.md b/docs/dev/testing.md index 8d6a305..6a62580 100644 --- a/docs/dev/testing.md +++ b/docs/dev/testing.md @@ -23,8 +23,9 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav | `merge_truth_table.rs` | Merge-pair truth table (MR-786): all 9×9 `(left_op, right_op)` cells from `{noop, addNode, removeNode, addEdge, removeEdge, setProperty, dropProperty, addLabel, removeLabel}`. Adding a new op to `OpVariant` forces a compile error in `build_case` until the new row + column are dispositioned. 36 executable cells run through real `branch_merge` with a structured oracle (`MergeOutcome` / `MergeConflictKind` + graph-state assert); 45 cells involving `dropProperty`/`addLabel`/`removeLabel` are recorded as `Unsupported` until the mutation grammar grows. | | `writes.rs` | Direct-publish writes: cancellation, non-strict insert/merge rebase under the per-table queue, strict stale-write conflicts, multi-statement atomicity, MR-794 staged-write rewire (D₂ rejection, insert+update coalesce, multi-append coalesce, partial-failure recovery, load RI/cardinality recovery) | | `staged_writes.rs` | TableStore staged-write primitives (`stage_append`, `stage_merge_insert`, `commit_staged`, `scan_with_staged`, `count_rows_with_staged`) — primitive-level only; engine code uses the in-memory `MutationStaging` accumulator instead | -| `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs; `// forbidden-api-allow: ` sentinel exempts reviewed lines | +| `forbidden_apis.rs` | Defense-in-depth source-walk guard: engine code (`exec/`, `db/omnigraph/`, `loader/`, `changes/`) must not reach around the sealed storage trait to Lance inline-commit APIs, nor open datasets directly (`Dataset::open` / `DatasetBuilder::from_uri`/`from_namespace`) — reads route through `Snapshot::open` and the held-handle cache; `// forbidden-api-allow: ` sentinel exempts reviewed lines | | `lance_surface_guards.rs` | Pins the Lance API surfaces omnigraph depends on (named runtime + compile-only guards; see [lance.md](lance.md)) — the first smoke check on any Lance version bump; e.g. `compact_files_still_fails_on_blob_columns` turns red when the upstream blob-compaction fix lands | +| `warm_read_cost.rs` | Cost-budget tests for the warm read path (query-latency work), measured at the object-store boundary with Lance `IOTracker` (the LanceDB IO-counted pattern): a warm same-branch read does 0 manifest opens, 0 commit-graph opens, 1 version probe, validates the schema once (Fix 1 / finding A / Fix 2 at commit-history depth); stale same-branch reads perform exactly 2 probes and refresh manifest-only; recreated non-main branches with the same Lance version refresh by incarnation; recreated branch-owned table handles are distinguished by table e_tag or refresh-time cache clearing; recreated traversal topology is protected by synthetic snapshot-id incarnation or refresh-time cache clearing; a warm *repeat* read does 0 table opens via the held-handle cache and a write re-opens only the changed table at its new version/e_tag (Fix 3/6A). See "Cost-budget tests" below | | `lifecycle.rs` | Graph lifecycle, schema state | | `point_in_time.rs` | Snapshots, time travel (`snapshot_at_version`, `entity_at`) | | `changes.rs` | `diff_between` / `diff_commits` | @@ -125,5 +126,14 @@ When you pick up any change, walk through this: 6. **For substrate-touching changes** (Lance behavior), reach for `failpoints` or fixture-driven scenarios, not stubbed-out mocks. 7. **For server / API changes**, confirm the OpenAPI regeneration happens in `openapi.rs` and that the diff lands in `openapi.json`. 8. **Verify your change makes an existing test fail before it makes the new one pass.** If you can break the code without breaking a test, your coverage gap is the problem to fix first. +9. **Bound hot-path cost at history depth.** If the change touches a read or open path, add or extend a test that asserts a *bounded* cost (e.g. a warm same-branch read performs zero `Dataset::open`, or a fixed object-op count) against a fixture with realistic *commit-history depth*, not just realistic row counts. Cost that scales with history is invisible on a shallow fixture and only bites in production. See "Cost-budget tests" below. + +## Cost-budget tests: bound hot-path cost at history depth + +Correctness bugs fail loudly in tests; cost-scaling bugs pass every test and degrade silently in production. The engine read path historically had no cost assertion, and fixtures carry shallow commit history, so an O(commits)-per-query cost stayed green in CI and only surfaced on a long-lived graph (read snapshot resolution re-scanned the internal manifest and commit-graph tables on every query, and those tables were never compacted). Guard against the class: + +- **Assert a cost budget, not just a result.** For a read/open path, assert the number of `Dataset::open` calls (or object-store ops) a warm query performs, and that it does not grow with commit count. The reference is LanceDB's IO-counted tests, which assert a cached read costs 0-1 IO and carry a named regression test against "a list call on every subsequent query." +- **Test at history depth.** Build a fixture with many *commits* (not many rows) and assert warm-read cost is flat across depths. A shallow fixture cannot catch an O(commits) cost. +- This is the testing companion to invariant 15 in [docs/dev/invariants.md](invariants.md) (hot-path cost is bounded by work, not history). When in doubt, re-read [docs/dev/invariants.md](invariants.md) — quality gates apply to every change. From 4590c91f9d7f53e86fdb461eab99803b5e833548 Mon Sep 17 00:00:00 2001 From: aaltshuler Date: Wed, 17 Jun 2026 23:44:24 +0300 Subject: [PATCH 07/13] rename compiler NanoError and fix cluster config warnings --- crates/omnigraph-cli/src/main.rs | 4 +- crates/omnigraph-cli/tests/cli_cluster.rs | 4 + crates/omnigraph-cluster/src/lib.rs | 4 +- crates/omnigraph-cluster/src/store.rs | 26 +++ crates/omnigraph-cluster/src/sweep.rs | 19 +- crates/omnigraph-cluster/src/tests.rs | 61 ++++++ crates/omnigraph-compiler/src/catalog/mod.rs | 12 +- .../src/catalog/schema_ir.rs | 12 +- crates/omnigraph-compiler/src/error.rs | 82 ++++++- crates/omnigraph-compiler/src/ir/lower.rs | 6 +- crates/omnigraph-compiler/src/query/parser.rs | 185 ++++++++-------- .../omnigraph-compiler/src/query/typecheck.rs | 206 +++++++++--------- crates/omnigraph-compiler/src/query_input.rs | 18 +- crates/omnigraph-compiler/src/result.rs | 6 +- .../omnigraph-compiler/src/schema/parser.rs | 183 ++++++++-------- crates/omnigraph/src/error.rs | 2 +- docs/user/operations/errors.md | 2 +- 17 files changed, 499 insertions(+), 333 deletions(-) diff --git a/crates/omnigraph-cli/src/main.rs b/crates/omnigraph-cli/src/main.rs index bb3b062..fa6f4db 100644 --- a/crates/omnigraph-cli/src/main.rs +++ b/crates/omnigraph-cli/src/main.rs @@ -1050,7 +1050,7 @@ async fn main() -> Result<()> { // The actor attributes graph-moving operations (sidecars, // audit entries, engine schema-apply commits). Cluster FACTS // stay unlayered; the operator's identity resolves --as flag - // first, then the per-operator omnigraph.yaml `cli.actor`. + // first, then per-operator config `operator.actor`. let actor = resolve_cluster_actor(cli.as_actor.as_deref())?; let output = apply_config_dir_with_options(config, ApplyOptions { actor }).await; finish_cluster_apply(&output, json)?; @@ -1062,7 +1062,7 @@ async fn main() -> Result<()> { } => { let Some(approver) = resolve_cluster_actor(cli.as_actor.as_deref())? else { bail!( - "`cluster approve` requires an approver: pass the global --as flag or set `cli.actor` in your omnigraph.yaml — an approval without an approver is meaningless" + "`cluster approve` requires an approver: pass the global --as flag or set `operator.actor` in ~/.omnigraph/config.yaml — an approval without an approver is meaningless" ); }; let output = approve_config_dir(config, &resource, &approver).await; diff --git a/crates/omnigraph-cli/tests/cli_cluster.rs b/crates/omnigraph-cli/tests/cli_cluster.rs index e35a54d..d2b6d13 100644 --- a/crates/omnigraph-cli/tests/cli_cluster.rs +++ b/crates/omnigraph-cli/tests/cli_cluster.rs @@ -796,6 +796,10 @@ fn cluster_approve_uses_operator_actor_fallback() { ); let stderr = String::from_utf8_lossy(&output.stderr); assert!(stderr.contains("--as"), "{stderr}"); + assert!(stderr.contains("operator.actor"), "{stderr}"); + assert!(stderr.contains("config.yaml"), "{stderr}"); + assert!(!stderr.contains("cli.actor"), "{stderr}"); + assert!(!stderr.contains("omnigraph.yaml"), "{stderr}"); } #[test] diff --git a/crates/omnigraph-cluster/src/lib.rs b/crates/omnigraph-cluster/src/lib.rs index 1c4e4fc..0dad78c 100644 --- a/crates/omnigraph-cluster/src/lib.rs +++ b/crates/omnigraph-cluster/src/lib.rs @@ -160,7 +160,7 @@ pub async fn plan_config_dir(config_dir: impl AsRef) -> PlanOutput { // Plan is read-only: pending sidecars are reported, never acted on // (RFC-004 open question 3 keeps read-only commands warn-only). - warn_pending_recovery_sidecars(&desired.config_dir, &mut diagnostics); + warn_pending_recovery_sidecars(&backend, &mut diagnostics).await; let mut prior_resources = BTreeMap::new(); let mut prior_state: Option = None; @@ -1260,7 +1260,7 @@ pub async fn status_config_dir(config_dir: impl AsRef) -> StatusOutput { backend .observe_lock(&mut observations, &mut diagnostics) .await; - warn_pending_recovery_sidecars(&parsed.config_dir, &mut diagnostics); + warn_pending_recovery_sidecars(&backend, &mut diagnostics).await; let mut resource_digests = BTreeMap::new(); let mut resource_statuses = BTreeMap::new(); diff --git a/crates/omnigraph-cluster/src/store.rs b/crates/omnigraph-cluster/src/store.rs index 5129397..9a2e748 100644 --- a/crates/omnigraph-cluster/src/store.rs +++ b/crates/omnigraph-cluster/src/store.rs @@ -321,6 +321,32 @@ impl ClusterStore { // ---- recovery sidecars ---- + pub(crate) async fn list_recovery_sidecar_locations( + &self, + diagnostics: &mut Vec, + ) -> Vec { + let dir_uri = self.uri(CLUSTER_RECOVERIES_DIR); + let mut uris = match self.adapter.list_dir(&dir_uri).await { + Ok(uris) => uris, + Err(err) => { + diagnostics.push(Diagnostic::warning( + "recovery_sidecar_read_error", + CLUSTER_RECOVERIES_DIR, + format!("could not list '{CLUSTER_RECOVERIES_DIR}': {err}"), + )); + return Vec::new(); + } + }; + uris.retain(|uri| uri.ends_with(".json")); + uris.sort(); + uris.into_iter() + .map(|uri| match uri.rsplit('/').next() { + Some(name) => format!("{}/{name}", self.display(CLUSTER_RECOVERIES_DIR)), + None => uri, + }) + .collect() + } + pub(crate) async fn list_recovery_sidecars( &self, diagnostics: &mut Vec, diff --git a/crates/omnigraph-cluster/src/sweep.rs b/crates/omnigraph-cluster/src/sweep.rs index 6539cae..27e6e9c 100644 --- a/crates/omnigraph-cluster/src/sweep.rs +++ b/crates/omnigraph-cluster/src/sweep.rs @@ -427,21 +427,14 @@ pub(crate) async fn mark_approvals_consumed(backend: &ClusterStore, approval_ids } /// Read-only commands report pending sidecars without acting on them. -pub(crate) fn warn_pending_recovery_sidecars(config_dir: &Path, diagnostics: &mut Vec) { - let recoveries_dir = config_dir.join(CLUSTER_RECOVERIES_DIR); - let Ok(entries) = fs::read_dir(&recoveries_dir) else { - return; - }; - let mut names: Vec = entries - .flatten() - .filter(|entry| entry.path().extension().is_some_and(|ext| ext == "json")) - .map(|entry| entry.file_name().to_string_lossy().into_owned()) - .collect(); - names.sort(); - for name in names { +pub(crate) async fn warn_pending_recovery_sidecars( + backend: &ClusterStore, + diagnostics: &mut Vec, +) { + for location in backend.list_recovery_sidecar_locations(diagnostics).await { diagnostics.push(Diagnostic::warning( "cluster_recovery_pending", - format!("{CLUSTER_RECOVERIES_DIR}/{name}"), + location, "a recovery sidecar from an interrupted apply is pending; the next state-mutating command will classify it", )); } diff --git a/crates/omnigraph-cluster/src/tests.rs b/crates/omnigraph-cluster/src/tests.rs index ac448cf..536e904 100644 --- a/crates/omnigraph-cluster/src/tests.rs +++ b/crates/omnigraph-cluster/src/tests.rs @@ -3375,6 +3375,67 @@ policies: ); } + #[tokio::test] + async fn read_only_commands_warn_on_pending_recovery_sidecar_in_storage_root() { + let dir = fixture(); + let storage = tempfile::tempdir().unwrap(); + let storage_path = storage.path().to_string_lossy().to_string(); + let mut config = fs::read_to_string(dir.path().join(CLUSTER_CONFIG_FILE)).unwrap(); + config = config.replace( + "version: 1\n", + &format!("version: 1\nstorage: {storage_path}\n"), + ); + fs::write(dir.path().join(CLUSTER_CONFIG_FILE), config).unwrap(); + + let desired = validate_config_dir(dir.path()); + assert!(desired.ok, "{:?}", desired.diagnostics); + let schema_digest = desired + .resource_digests + .get("schema.knowledge") + .unwrap() + .clone(); + let graph_composite = graph_digest( + "knowledge", + Some(&schema_digest), + Some(&BTreeMap::new()), + None, + None, + ); + write_state_resources( + storage.path(), + &[ + ("graph.knowledge", graph_composite.as_str()), + ("schema.knowledge", schema_digest.as_str()), + ], + ); + write_create_sidecar(storage.path(), "knowledge", "irrelevant", "01STORAGE"); + + let status = status_config_dir(dir.path()).await; + assert!(status.ok, "{:?}", status.diagnostics); + assert!( + status + .diagnostics + .iter() + .any(|diagnostic| diagnostic.code == "cluster_recovery_pending" + && diagnostic.path.contains("01STORAGE.json")), + "{:?}", + status.diagnostics + ); + + let plan = plan_config_dir(dir.path()).await; + assert!(plan.ok, "{:?}", plan.diagnostics); + assert!( + plan.diagnostics + .iter() + .any(|diagnostic| diagnostic.code == "cluster_recovery_pending" + && diagnostic.path.contains("01STORAGE.json")), + "{:?}", + plan.diagnostics + ); + + assert!(!dir.path().join(CLUSTER_RECOVERIES_DIR).exists()); + } + #[tokio::test] async fn plan_annotates_apply_dispositions() { let dir = fixture(); diff --git a/crates/omnigraph-compiler/src/catalog/mod.rs b/crates/omnigraph-compiler/src/catalog/mod.rs index 93f8d89..2287c3b 100644 --- a/crates/omnigraph-compiler/src/catalog/mod.rs +++ b/crates/omnigraph-compiler/src/catalog/mod.rs @@ -6,7 +6,7 @@ use std::sync::Arc; use arrow_schema::{DataType, Field, Schema, SchemaRef}; -use crate::error::{NanoError, Result}; +use crate::error::{CompilerError, Result}; use crate::schema::ast::{Cardinality, Constraint, ConstraintBound, SchemaDecl, SchemaFile}; use crate::types::{PropType, ScalarType}; @@ -151,7 +151,7 @@ pub fn build_catalog(schema: &SchemaFile) -> Result { for decl in &schema.declarations { if let SchemaDecl::Node(node) = decl { if node_types.contains_key(&node.name) { - return Err(NanoError::Catalog(format!( + return Err(CompilerError::Catalog(format!( "duplicate node type: {}", node.name ))); @@ -250,19 +250,19 @@ pub fn build_catalog(schema: &SchemaFile) -> Result { for decl in &schema.declarations { if let SchemaDecl::Edge(edge) = decl { if edge_types.contains_key(&edge.name) { - return Err(NanoError::Catalog(format!( + return Err(CompilerError::Catalog(format!( "duplicate edge type: {}", edge.name ))); } if !node_types.contains_key(&edge.from_type) { - return Err(NanoError::Catalog(format!( + return Err(CompilerError::Catalog(format!( "edge {} references unknown source type: {}", edge.name, edge.from_type ))); } if !node_types.contains_key(&edge.to_type) { - return Err(NanoError::Catalog(format!( + return Err(CompilerError::Catalog(format!( "edge {} references unknown target type: {}", edge.name, edge.to_type ))); @@ -302,7 +302,7 @@ pub fn build_catalog(schema: &SchemaFile) -> Result { if let Some(existing) = edge_name_index.get(&normalized_name) && existing != &edge.name { - return Err(NanoError::Catalog(format!( + return Err(CompilerError::Catalog(format!( "edge name collision after case folding: '{}' conflicts with '{}'", edge.name, existing ))); diff --git a/crates/omnigraph-compiler/src/catalog/schema_ir.rs b/crates/omnigraph-compiler/src/catalog/schema_ir.rs index d90539e..4a56ffa 100644 --- a/crates/omnigraph-compiler/src/catalog/schema_ir.rs +++ b/crates/omnigraph-compiler/src/catalog/schema_ir.rs @@ -4,7 +4,7 @@ use serde::{Deserialize, Serialize}; use sha2::{Digest, Sha256}; use crate::catalog::{Catalog, build_catalog}; -use crate::error::{NanoError, Result}; +use crate::error::{CompilerError, Result}; use crate::schema::ast::{Annotation, Cardinality, Constraint, PropDecl, SchemaDecl, SchemaFile}; use crate::types::PropType; @@ -119,7 +119,7 @@ pub fn build_schema_ir(schema: &SchemaFile) -> Result { pub fn build_catalog_from_ir(ir: &SchemaIR) -> Result { if ir.ir_version != SCHEMA_IR_VERSION { - return Err(NanoError::Catalog(format!( + return Err(CompilerError::Catalog(format!( "unsupported schema ir_version {} (expected {})", ir.ir_version, SCHEMA_IR_VERSION ))); @@ -167,12 +167,12 @@ pub fn build_catalog_from_ir(ir: &SchemaIR) -> Result { pub fn schema_ir_json(ir: &SchemaIR) -> Result { serde_json::to_string(ir) - .map_err(|err| NanoError::Catalog(format!("serialize schema ir error: {}", err))) + .map_err(|err| CompilerError::Catalog(format!("serialize schema ir error: {}", err))) } pub fn schema_ir_pretty_json(ir: &SchemaIR) -> Result { serde_json::to_string_pretty(ir) - .map_err(|err| NanoError::Catalog(format!("serialize schema ir error: {}", err))) + .map_err(|err| CompilerError::Catalog(format!("serialize schema ir error: {}", err))) } pub fn schema_ir_hash(ir: &SchemaIR) -> Result { @@ -228,7 +228,7 @@ fn canonical_properties( .map(|property| { let prop_id = stable_prop_id(&owner_key, &property.name); if let Some(previous) = seen_prop_ids.insert(prop_id, property.name.clone()) { - return Err(NanoError::Catalog(format!( + return Err(CompilerError::Catalog(format!( "property id collision on {}: '{}' and '{}' both hash to {}", owner_name, previous, property.name, prop_id ))); @@ -308,7 +308,7 @@ fn check_type_id_collision( name: &str, ) -> Result<()> { if let Some(previous) = seen_type_ids.insert(type_id, name.to_string()) { - return Err(NanoError::Catalog(format!( + return Err(CompilerError::Catalog(format!( "type id collision: '{}' and '{}' both hash to {}", previous, name, type_id ))); diff --git a/crates/omnigraph-compiler/src/error.rs b/crates/omnigraph-compiler/src/error.rs index ea48759..cbf5c4d 100644 --- a/crates/omnigraph-compiler/src/error.rs +++ b/crates/omnigraph-compiler/src/error.rs @@ -55,7 +55,7 @@ pub fn decode_string_literal(raw: &str) -> Result { let escaped = chars .next() - .ok_or_else(|| NanoError::Parse("unterminated escape sequence".to_string()))?; + .ok_or_else(|| CompilerError::Parse("unterminated escape sequence".to_string()))?; match escaped { '"' => decoded.push('"'), '\\' => decoded.push('\\'), @@ -63,7 +63,7 @@ pub fn decode_string_literal(raw: &str) -> Result { 'r' => decoded.push('\r'), 't' => decoded.push('\t'), other => { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "unsupported escape sequence: \\{}", other ))); @@ -75,7 +75,7 @@ pub fn decode_string_literal(raw: &str) -> Result { } #[derive(Debug, Error)] -pub enum NanoError { +pub enum CompilerError { #[error("parse error: {0}")] Parse(String), @@ -118,11 +118,16 @@ pub enum NanoError { Manifest(String), } -pub type Result = std::result::Result; +#[deprecated(note = "use CompilerError")] +pub type NanoError = CompilerError; + +pub type Result = std::result::Result; #[cfg(test)] mod tests { - use super::{SourceSpan, decode_string_literal, render_span}; + use std::path::Path; + + use super::{CompilerError, SourceSpan, decode_string_literal, render_span}; #[test] fn source_span_preserves_zero_width() { @@ -143,4 +148,71 @@ mod tests { let decoded = decode_string_literal("\"a\\n\\r\\t\\\\\\\"b\"").unwrap(); assert_eq!(decoded, "a\n\r\t\\\"b"); } + + #[test] + fn compiler_error_parse_display_is_stable() { + let err = CompilerError::Parse("bad token".to_string()); + assert_eq!(err.to_string(), "parse error: bad token"); + } + + #[allow(deprecated)] + #[test] + fn legacy_nano_error_alias_constructs_variants() { + let err = super::NanoError::Parse("bad token".to_string()); + assert_eq!(err.to_string(), "parse error: bad token"); + } + + #[test] + fn legacy_name_is_confined_to_alias_and_compatibility_test() { + let legacy_name = ["Nano", "Error"].concat(); + let workspace_root = Path::new(env!("CARGO_MANIFEST_DIR")) + .parent() + .and_then(Path::parent) + .expect("compiler crate should live under crates/"); + let allowed_file = workspace_root.join("crates/omnigraph-compiler/src/error.rs"); + let mut offenders = Vec::new(); + + visit_rs_files(&workspace_root.join("crates"), &mut |path| { + let text = std::fs::read_to_string(path).expect("source file should be readable"); + let count = text.matches(&legacy_name).count(); + if path == allowed_file { + if count != 2 { + offenders.push(format!( + "{} contains {count} legacy-name occurrences; expected exactly 2", + display_path(workspace_root, path) + )); + } + } else if count > 0 { + offenders.push(format!( + "{} contains {count} legacy-name occurrence(s)", + display_path(workspace_root, path) + )); + } + }); + + assert!( + offenders.is_empty(), + "legacy compiler error name should stay compatibility-only:\n{}", + offenders.join("\n") + ); + } + + fn visit_rs_files(dir: &Path, visit: &mut impl FnMut(&Path)) { + for entry in std::fs::read_dir(dir).expect("source directory should be readable") { + let entry = entry.expect("source entry should be readable"); + let path = entry.path(); + if path.is_dir() { + visit_rs_files(&path, visit); + } else if path.extension().and_then(|ext| ext.to_str()) == Some("rs") { + visit(&path); + } + } + } + + fn display_path(root: &Path, path: &Path) -> String { + path.strip_prefix(root) + .unwrap_or(path) + .to_string_lossy() + .into_owned() + } } diff --git a/crates/omnigraph-compiler/src/ir/lower.rs b/crates/omnigraph-compiler/src/ir/lower.rs index 6999d69..9427e27 100644 --- a/crates/omnigraph-compiler/src/ir/lower.rs +++ b/crates/omnigraph-compiler/src/ir/lower.rs @@ -14,7 +14,7 @@ pub fn lower_query( type_ctx: &TypeContext, ) -> Result { if !query.mutations.is_empty() { - return Err(crate::error::NanoError::Plan( + return Err(crate::error::CompilerError::Plan( "cannot lower mutation query with read-query lowerer".to_string(), )); } @@ -62,7 +62,7 @@ pub fn lower_query( pub fn lower_mutation_query(query: &QueryDecl) -> Result { if query.mutations.is_empty() { - return Err(crate::error::NanoError::Plan( + return Err(crate::error::CompilerError::Plan( "query does not contain a mutation body".to_string(), )); } @@ -261,7 +261,7 @@ fn lower_clauses( let edge = catalog .lookup_edge_by_name(&traversal.edge_name) .ok_or_else(|| { - crate::error::NanoError::Plan(format!( + crate::error::CompilerError::Plan(format!( "lowering traversal referenced missing edge '{}' after typecheck", traversal.edge_name )) diff --git a/crates/omnigraph-compiler/src/query/parser.rs b/crates/omnigraph-compiler/src/query/parser.rs index 4ba8476..3284876 100644 --- a/crates/omnigraph-compiler/src/query/parser.rs +++ b/crates/omnigraph-compiler/src/query/parser.rs @@ -3,7 +3,7 @@ use pest::error::InputLocation; use pest_derive::Parser; use crate::error::{ - NanoError, ParseDiagnostic, Result, SourceSpan, decode_string_literal, render_span, + CompilerError, ParseDiagnostic, Result, SourceSpan, decode_string_literal, render_span, }; use super::ast::*; @@ -13,7 +13,7 @@ use super::ast::*; struct QueryParser; pub fn parse_query(input: &str) -> Result { - parse_query_diagnostic(input).map_err(|e| NanoError::Parse(e.to_string())) + parse_query_diagnostic(input).map_err(|e| CompilerError::Parse(e.to_string())) } pub fn parse_query_diagnostic(input: &str) -> std::result::Result { @@ -24,7 +24,7 @@ pub fn parse_query_diagnostic(input: &str) -> std::result::Result) -> ParseDiagnostic { ParseDiagnostic::new(err.to_string(), span) } -fn nano_error_to_diagnostic(err: NanoError) -> ParseDiagnostic { +fn compiler_error_to_diagnostic(err: CompilerError) -> ParseDiagnostic { ParseDiagnostic::new(err.to_string(), None) } @@ -71,7 +71,7 @@ fn parse_query_decl(pair: pest::iterators::Pair) -> Result { match annotation_name { "description" => { if description.replace(value).is_some() { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "query `{}` cannot include duplicate @description annotations", name ))); @@ -79,14 +79,14 @@ fn parse_query_decl(pair: pest::iterators::Pair) -> Result { } "instruction" => { if instruction.replace(value).is_some() { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "query `{}` cannot include duplicate @instruction annotations", name ))); } } other => { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "unsupported query annotation: @{}", other ))); @@ -94,10 +94,9 @@ fn parse_query_decl(pair: pest::iterators::Pair) -> Result { } } Rule::query_body => { - let body = item - .into_inner() - .next() - .ok_or_else(|| NanoError::Parse("query body cannot be empty".to_string()))?; + let body = item.into_inner().next().ok_or_else(|| { + CompilerError::Parse("query body cannot be empty".to_string()) + })?; match body.as_rule() { Rule::read_query_body => { for section in body.into_inner() { @@ -127,7 +126,7 @@ fn parse_query_decl(pair: pest::iterators::Pair) -> Result { let int_pair = section.into_inner().next().unwrap(); limit = Some(int_pair.as_str().parse::().map_err(|e| { - NanoError::Parse(format!("invalid limit: {}", e)) + CompilerError::Parse(format!("invalid limit: {}", e)) })?); } _ => {} @@ -138,7 +137,7 @@ fn parse_query_decl(pair: pest::iterators::Pair) -> Result { for mutation_pair in body.into_inner() { if let Rule::mutation_stmt = mutation_pair.as_rule() { let stmt = mutation_pair.into_inner().next().ok_or_else(|| { - NanoError::Parse( + CompilerError::Parse( "mutation statement cannot be empty".to_string(), ) })?; @@ -170,14 +169,14 @@ fn parse_query_annotation(pair: pest::iterators::Pair) -> Result<(&'static let inner = pair .into_inner() .next() - .ok_or_else(|| NanoError::Parse("query annotation cannot be empty".to_string()))?; + .ok_or_else(|| CompilerError::Parse("query annotation cannot be empty".to_string()))?; match inner.as_rule() { Rule::description_annotation => { let value = inner .into_inner() .next() .ok_or_else(|| { - NanoError::Parse("@description requires a string literal".to_string()) + CompilerError::Parse("@description requires a string literal".to_string()) }) .map(|value| parse_string_lit(value.as_str()))??; Ok(("description", value)) @@ -187,12 +186,12 @@ fn parse_query_annotation(pair: pest::iterators::Pair) -> Result<(&'static .into_inner() .next() .ok_or_else(|| { - NanoError::Parse("@instruction requires a string literal".to_string()) + CompilerError::Parse("@instruction requires a string literal".to_string()) }) .map(|value| parse_string_lit(value.as_str()))??; Ok(("instruction", value)) } - other => Err(NanoError::Parse(format!( + other => Err(CompilerError::Parse(format!( "unexpected query annotation rule: {:?}", other ))), @@ -208,30 +207,29 @@ fn parse_param(pair: pest::iterators::Pair) -> Result { let mut type_inner = type_ref.into_inner(); let core = type_inner .next() - .ok_or_else(|| NanoError::Parse("parameter type is missing".to_string()))?; - let base = match core.as_rule() { - Rule::base_type => core.as_str().to_string(), - Rule::list_type => { - let inner = core - .into_inner() - .next() - .ok_or_else(|| NanoError::Parse("list type missing item type".to_string()))?; - format!("[{}]", inner.as_str().trim()) - } - Rule::vector_type => { - let vector = core - .into_inner() - .next() - .ok_or_else(|| NanoError::Parse("Vector type missing dimension".to_string()))?; - format!("Vector({})", vector.as_str().trim()) - } - other => { - return Err(NanoError::Parse(format!( - "unexpected param type rule: {:?}", - other - ))); - } - }; + .ok_or_else(|| CompilerError::Parse("parameter type is missing".to_string()))?; + let base = + match core.as_rule() { + Rule::base_type => core.as_str().to_string(), + Rule::list_type => { + let inner = core.into_inner().next().ok_or_else(|| { + CompilerError::Parse("list type missing item type".to_string()) + })?; + format!("[{}]", inner.as_str().trim()) + } + Rule::vector_type => { + let vector = core.into_inner().next().ok_or_else(|| { + CompilerError::Parse("Vector type missing dimension".to_string()) + })?; + format!("Vector({})", vector.as_str().trim()) + } + other => { + return Err(CompilerError::Parse(format!( + "unexpected param type rule: {:?}", + other + ))); + } + }; Ok(Param { name, @@ -256,7 +254,7 @@ fn parse_clause(pair: pest::iterators::Pair) -> Result { } Ok(Clause::Negation(clauses)) } - _ => Err(NanoError::Parse(format!( + _ => Err(CompilerError::Parse(format!( "unexpected clause rule: {:?}", inner.as_rule() ))), @@ -267,13 +265,13 @@ fn parse_text_search_clause(pair: pest::iterators::Pair) -> Result let inner = pair .into_inner() .next() - .ok_or_else(|| NanoError::Parse("text search clause cannot be empty".to_string()))?; + .ok_or_else(|| CompilerError::Parse("text search clause cannot be empty".to_string()))?; let expr = match inner.as_rule() { Rule::search_call => parse_search_call(inner)?, Rule::fuzzy_call => parse_fuzzy_call(inner)?, Rule::match_text_call => parse_match_text_call(inner)?, other => { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "unexpected text search clause rule: {:?}", other ))); @@ -325,7 +323,7 @@ fn parse_mutation_stmt(pair: pest::iterators::Pair) -> Result { Rule::insert_stmt => parse_insert_mutation(pair).map(Mutation::Insert), Rule::update_stmt => parse_update_mutation(pair).map(Mutation::Update), Rule::delete_stmt => parse_delete_mutation(pair).map(Mutation::Delete), - other => Err(NanoError::Parse(format!( + other => Err(CompilerError::Parse(format!( "unexpected mutation statement rule: {:?}", other ))), @@ -363,7 +361,7 @@ fn parse_update_mutation(pair: pest::iterators::Pair) -> Result) -> Result) -> Result { } Rule::now_call => Ok(MatchValue::Now), Rule::literal => Ok(MatchValue::Literal(parse_literal(value_inner)?)), - _ => Err(NanoError::Parse(format!( + _ => Err(CompilerError::Parse(format!( "unexpected match value: {:?}", value_inner.as_rule() ))), @@ -436,9 +436,9 @@ fn parse_traversal(pair: pest::iterators::Pair) -> Result { let (min, max) = parse_traversal_bounds(next)?; min_hops = min; max_hops = max; - inner - .next() - .ok_or_else(|| NanoError::Parse("traversal missing destination variable".to_string()))? + inner.next().ok_or_else(|| { + CompilerError::Parse("traversal missing destination variable".to_string()) + })? } else { next }; @@ -459,16 +459,16 @@ fn parse_traversal_bounds(pair: pest::iterators::Pair) -> Result<(u32, Opt let mut inner = pair.into_inner(); let min = inner .next() - .ok_or_else(|| NanoError::Parse("traversal bound missing min hop".to_string()))? + .ok_or_else(|| CompilerError::Parse("traversal bound missing min hop".to_string()))? .as_str() .parse::() - .map_err(|e| NanoError::Parse(format!("invalid traversal min bound: {}", e)))?; + .map_err(|e| CompilerError::Parse(format!("invalid traversal min bound: {}", e)))?; let max = inner .next() .map(|p| { p.as_str() .parse::() - .map_err(|e| NanoError::Parse(format!("invalid traversal max bound: {}", e))) + .map_err(|e| CompilerError::Parse(format!("invalid traversal max bound: {}", e))) }) .transpose()?; Ok((min, max)) @@ -507,7 +507,12 @@ fn parse_expr(pair: pest::iterators::Pair) -> Result { "avg" => AggFunc::Avg, "min" => AggFunc::Min, "max" => AggFunc::Max, - other => return Err(NanoError::Parse(format!("unknown aggregate: {}", other))), + other => { + return Err(CompilerError::Parse(format!( + "unknown aggregate: {}", + other + ))); + } }; let arg = parse_expr(parts.next().unwrap())?; Ok(Expr::Aggregate { @@ -522,7 +527,7 @@ fn parse_expr(pair: pest::iterators::Pair) -> Result { Rule::bm25_call => parse_bm25_call(inner), Rule::rrf_call => parse_rrf_call(inner), Rule::ident => Ok(Expr::AliasRef(inner.as_str().to_string())), - _ => Err(NanoError::Parse(format!( + _ => Err(CompilerError::Parse(format!( "unexpected expr rule: {:?}", inner.as_rule() ))), @@ -533,12 +538,12 @@ fn parse_search_call(pair: pest::iterators::Pair) -> Result { let mut args = pair.into_inner(); let field = args .next() - .ok_or_else(|| NanoError::Parse("search() missing field argument".to_string()))?; + .ok_or_else(|| CompilerError::Parse("search() missing field argument".to_string()))?; let query = args .next() - .ok_or_else(|| NanoError::Parse("search() missing query argument".to_string()))?; + .ok_or_else(|| CompilerError::Parse("search() missing query argument".to_string()))?; if args.next().is_some() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "search() accepts exactly 2 arguments".to_string(), )); } @@ -552,13 +557,13 @@ fn parse_fuzzy_call(pair: pest::iterators::Pair) -> Result { let mut args = pair.into_inner(); let field = args .next() - .ok_or_else(|| NanoError::Parse("fuzzy() missing field argument".to_string()))?; + .ok_or_else(|| CompilerError::Parse("fuzzy() missing field argument".to_string()))?; let query = args .next() - .ok_or_else(|| NanoError::Parse("fuzzy() missing query argument".to_string()))?; + .ok_or_else(|| CompilerError::Parse("fuzzy() missing query argument".to_string()))?; let max_edits = args.next().map(parse_expr).transpose()?.map(Box::new); if args.next().is_some() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "fuzzy() accepts at most 3 arguments".to_string(), )); } @@ -573,12 +578,12 @@ fn parse_match_text_call(pair: pest::iterators::Pair) -> Result { let mut args = pair.into_inner(); let field = args .next() - .ok_or_else(|| NanoError::Parse("match_text() missing field argument".to_string()))?; + .ok_or_else(|| CompilerError::Parse("match_text() missing field argument".to_string()))?; let query = args .next() - .ok_or_else(|| NanoError::Parse("match_text() missing query argument".to_string()))?; + .ok_or_else(|| CompilerError::Parse("match_text() missing query argument".to_string()))?; if args.next().is_some() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "match_text() accepts exactly 2 arguments".to_string(), )); } @@ -592,12 +597,12 @@ fn parse_bm25_call(pair: pest::iterators::Pair) -> Result { let mut args = pair.into_inner(); let field = args .next() - .ok_or_else(|| NanoError::Parse("bm25() missing field argument".to_string()))?; + .ok_or_else(|| CompilerError::Parse("bm25() missing field argument".to_string()))?; let query = args .next() - .ok_or_else(|| NanoError::Parse("bm25() missing query argument".to_string()))?; + .ok_or_else(|| CompilerError::Parse("bm25() missing query argument".to_string()))?; if args.next().is_some() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "bm25() accepts exactly 2 arguments".to_string(), )); } @@ -611,14 +616,14 @@ fn parse_rank_expr(pair: pest::iterators::Pair) -> Result { let inner = if pair.as_rule() == Rule::rank_expr { pair.into_inner() .next() - .ok_or_else(|| NanoError::Parse("rank expression cannot be empty".to_string()))? + .ok_or_else(|| CompilerError::Parse("rank expression cannot be empty".to_string()))? } else { pair }; match inner.as_rule() { Rule::nearest_ordering => parse_nearest_ordering(inner), Rule::bm25_call => parse_bm25_call(inner), - other => Err(NanoError::Parse(format!( + other => Err(CompilerError::Parse(format!( "rrf() rank expression must be nearest(...) or bm25(...), got {:?}", other ))), @@ -629,13 +634,13 @@ fn parse_rrf_call(pair: pest::iterators::Pair) -> Result { let mut args = pair.into_inner(); let primary = args .next() - .ok_or_else(|| NanoError::Parse("rrf() missing primary rank expression".to_string()))?; - let secondary = args - .next() - .ok_or_else(|| NanoError::Parse("rrf() missing secondary rank expression".to_string()))?; + .ok_or_else(|| CompilerError::Parse("rrf() missing primary rank expression".to_string()))?; + let secondary = args.next().ok_or_else(|| { + CompilerError::Parse("rrf() missing secondary rank expression".to_string()) + })?; let k = args.next().map(parse_expr).transpose()?.map(Box::new); if args.next().is_some() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "rrf() accepts at most 3 arguments".to_string(), )); } @@ -654,7 +659,7 @@ fn parse_comp_op(pair: pest::iterators::Pair) -> Result { "<" => Ok(CompOp::Lt), ">=" => Ok(CompOp::Ge), "<=" => Ok(CompOp::Le), - other => Err(NanoError::Parse(format!("unknown operator: {}", other))), + other => Err(CompilerError::Parse(format!("unknown operator: {}", other))), } } @@ -673,14 +678,14 @@ fn parse_literal(pair: pest::iterators::Pair) -> Result { let n: i64 = inner .as_str() .parse() - .map_err(|e| NanoError::Parse(format!("invalid integer: {}", e)))?; + .map_err(|e| CompilerError::Parse(format!("invalid integer: {}", e)))?; Ok(Literal::Integer(n)) } Rule::float_lit => { let f: f64 = inner .as_str() .parse() - .map_err(|e| NanoError::Parse(format!("invalid float: {}", e)))?; + .map_err(|e| CompilerError::Parse(format!("invalid float: {}", e)))?; Ok(Literal::Float(f)) } Rule::bool_lit => { @@ -688,7 +693,7 @@ fn parse_literal(pair: pest::iterators::Pair) -> Result { "true" => true, "false" => false, other => { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "invalid boolean literal: {}", other ))); @@ -701,7 +706,9 @@ fn parse_literal(pair: pest::iterators::Pair) -> Result { .into_inner() .next() .map(|s| parse_string_lit(s.as_str())) - .ok_or_else(|| NanoError::Parse("date literal requires a string".to_string()))?; + .ok_or_else(|| { + CompilerError::Parse("date literal requires a string".to_string()) + })?; Ok(Literal::Date(date_str?)) } Rule::datetime_lit => { @@ -710,7 +717,7 @@ fn parse_literal(pair: pest::iterators::Pair) -> Result { .next() .map(|s| parse_string_lit(s.as_str())) .ok_or_else(|| { - NanoError::Parse("datetime literal requires a string".to_string()) + CompilerError::Parse("datetime literal requires a string".to_string()) })?; Ok(Literal::DateTime(dt_str?)) } @@ -723,7 +730,7 @@ fn parse_literal(pair: pest::iterators::Pair) -> Result { } Ok(Literal::List(items)) } - _ => Err(NanoError::Parse(format!( + _ => Err(CompilerError::Parse(format!( "unexpected literal: {:?}", inner.as_rule() ))), @@ -746,14 +753,14 @@ fn parse_ordering(pair: pest::iterators::Pair) -> Result { let mut inner = pair.into_inner(); let first = inner .next() - .ok_or_else(|| NanoError::Parse("ordering cannot be empty".to_string()))?; + .ok_or_else(|| CompilerError::Parse("ordering cannot be empty".to_string()))?; let (expr, descending) = match first.as_rule() { Rule::nearest_ordering => (parse_nearest_ordering(first)?, false), Rule::expr => { let expr = parse_expr(first)?; let direction = inner.next().map(|p| p.as_str().to_string()); if matches!(expr, Expr::Nearest { .. }) && direction.is_some() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "nearest() ordering does not accept asc/desc modifiers".to_string(), )); } @@ -761,7 +768,7 @@ fn parse_ordering(pair: pest::iterators::Pair) -> Result { (expr, descending) } other => { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "unexpected ordering rule: {:?}", other ))); @@ -775,22 +782,22 @@ fn parse_nearest_ordering(pair: pest::iterators::Pair) -> Result { let mut inner = pair.into_inner(); let prop = inner .next() - .ok_or_else(|| NanoError::Parse("nearest() missing property".to_string()))?; + .ok_or_else(|| CompilerError::Parse("nearest() missing property".to_string()))?; let mut prop_parts = prop.into_inner(); let var = prop_parts .next() - .ok_or_else(|| NanoError::Parse("nearest() missing variable".to_string()))? + .ok_or_else(|| CompilerError::Parse("nearest() missing variable".to_string()))? .as_str(); let variable = var.strip_prefix('$').unwrap_or(var).to_string(); let property = prop_parts .next() - .ok_or_else(|| NanoError::Parse("nearest() missing property name".to_string()))? + .ok_or_else(|| CompilerError::Parse("nearest() missing property name".to_string()))? .as_str() .to_string(); let query = inner .next() - .ok_or_else(|| NanoError::Parse("nearest() missing query expression".to_string()))?; + .ok_or_else(|| CompilerError::Parse("nearest() missing query expression".to_string()))?; Ok(Expr::Nearest { variable, property, diff --git a/crates/omnigraph-compiler/src/query/typecheck.rs b/crates/omnigraph-compiler/src/query/typecheck.rs index b2c235a..2ac1604 100644 --- a/crates/omnigraph-compiler/src/query/typecheck.rs +++ b/crates/omnigraph-compiler/src/query/typecheck.rs @@ -4,7 +4,7 @@ use std::sync::Arc; use arrow_schema::{DataType, Field, Schema, SchemaRef}; use crate::catalog::Catalog; -use crate::error::{NanoError, Result}; +use crate::error::{CompilerError, Result}; use crate::types::{Direction, PropType, ScalarType}; use super::ast::*; @@ -82,7 +82,7 @@ pub fn typecheck_query_decl(catalog: &Catalog, query: &QueryDecl) -> Result Result { if !query.mutations.is_empty() { - return Err(NanoError::Type( + return Err(CompilerError::Type( "mutation query cannot be typechecked with read-query API".to_string(), )); } @@ -115,14 +115,14 @@ fn parse_declared_param_types(params: &[Param]) -> Result Result Result Result {} _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T9: non-aggregate expressions in an aggregate query must be \ property accesses or variables" .to_string(), @@ -221,7 +221,7 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) match mutation { Mutation::Insert(insert) => { if insert.assignments.is_empty() { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T10: insert mutation requires at least one assignment".to_string(), )); } @@ -235,7 +235,7 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) .properties .get(&assignment.property) .ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T11: type `{}` has no property `{}`", insert.type_name, assignment.property )) @@ -265,13 +265,13 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) if assigned_props.contains(embed.source.as_str()) { continue; } - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T12: insert for `{}` must provide non-nullable property `{}` or @embed source `{}`", insert.type_name, prop_name, embed.source ))); } - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T12: insert for `{}` must provide non-nullable property `{}`", insert.type_name, prop_name ))); @@ -308,7 +308,7 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) .properties .get(&assignment.property) .ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T11: type `{}` has no property `{}`", insert.type_name, assignment.property )) @@ -324,13 +324,13 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) } if !has_from { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T12: insert for `{}` must provide required endpoint `from`", insert.type_name ))); } if !has_to { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T12: insert for `{}` must provide required endpoint `to`", insert.type_name ))); @@ -341,7 +341,7 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) continue; } if !insert.assignments.iter().any(|a| &a.property == prop_name) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T12: insert for `{}` must provide non-nullable property `{}`", insert.type_name, prop_name ))); @@ -350,7 +350,7 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) return Ok(insert.type_name.clone()); } - Err(NanoError::Type(format!( + Err(CompilerError::Type(format!( "T10: unknown node/edge type `{}`", insert.type_name ))) @@ -359,19 +359,19 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) let node_type = if let Some(node_type) = catalog.node_types.get(&update.type_name) { node_type } else if catalog.edge_types.contains_key(&update.type_name) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T16: update mutation for edge type `{}` is not supported", update.type_name ))); } else { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T10: unknown node/edge type `{}`", update.type_name ))); }; if update.assignments.is_empty() { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T10: update mutation requires at least one assignment".to_string(), )); } @@ -383,7 +383,7 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) .properties .get(&assignment.property) .ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T11: type `{}` has no property `{}`", update.type_name, assignment.property )) @@ -422,7 +422,7 @@ fn typecheck_mutation(catalog: &Catalog, mutation: &Mutation, params: &[Param]) )?; Ok(delete.type_name.clone()) } else { - Err(NanoError::Type(format!( + Err(CompilerError::Type(format!( "T10: unknown node/edge type `{}`", delete.type_name ))) @@ -435,7 +435,7 @@ fn ensure_no_duplicate_assignment_names(assignments: &[MutationAssignment]) -> R let mut seen = std::collections::HashSet::new(); for assignment in assignments { if !seen.insert(&assignment.property) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T13: duplicate assignment for property `{}`", assignment.property ))); @@ -454,13 +454,13 @@ fn typecheck_mutation_predicate( .properties .get(&predicate.property) .ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T11: type `{}` has no property `{}`", type_name, predicate.property )) })?; if matches!(prop_type.scalar, ScalarType::Blob) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T11: blob property `{}` cannot be used in WHERE predicates", predicate.property ))); @@ -493,7 +493,7 @@ fn typecheck_edge_mutation_predicate( .properties .get(&predicate.property) .ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T11: type `{}` has no property `{}`", type_name, predicate.property )) @@ -517,7 +517,7 @@ fn check_match_value_type( MatchValue::Literal(lit) => check_literal_type(lit, expected, property), MatchValue::Variable(v) => { let Some(actual) = params.get(v) else { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T14: mutation variable `${}` must be a declared query parameter", v ))); @@ -528,7 +528,7 @@ fn check_match_value_type( && matches!(actual.scalar, ScalarType::String) && !actual.list); if !compatible { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: cannot assign/compare {} with {} for property `{}`", actual.display_name(), expected.display_name(), @@ -543,7 +543,7 @@ fn check_match_value_type( fn check_now_match_value_type(expected: &PropType, property: &str) -> Result<()> { if expected.list || expected.scalar != ScalarType::DateTime { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: cannot assign/compare DateTime with {} for property `{}`", expected.display_name(), property @@ -597,7 +597,7 @@ fn typecheck_clauses( } } if !has_outer { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T9: negation block must reference at least one outer-bound variable" .to_string(), )); @@ -616,7 +616,7 @@ fn typecheck_binding( ) -> Result<()> { // T1: binding type must exist in catalog if !catalog.node_types.contains_key(&binding.type_name) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T1: unknown node type `{}`", binding.type_name ))); @@ -627,14 +627,14 @@ fn typecheck_binding( // T2 + T3: property match fields must exist and have correct types for pm in &binding.prop_matches { let prop = node_type.properties.get(&pm.prop_name).ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T2: type `{}` has no property `{}`", binding.type_name, pm.prop_name )) })?; if matches!(prop.scalar, ScalarType::Blob) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T3: blob property `{}.{}` cannot be used in match patterns", binding.type_name, pm.prop_name ))); @@ -658,7 +658,7 @@ fn typecheck_binding( if let Some(existing) = ctx.bindings.get(&binding.variable) && existing.type_name != binding.type_name { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "variable `${}` already bound to type `{}`, cannot rebind to `{}`", binding.variable, existing.type_name, binding.type_name ))); @@ -680,7 +680,7 @@ fn check_binding_literal_type(lit: &Literal, expected: &PropType, property: &str if expected.list { let lit_type = literal_type(lit)?; if lit_type.list { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T3: list equality is not supported for property `{}`; use a scalar value to match list membership", property ))); @@ -688,7 +688,7 @@ fn check_binding_literal_type(lit: &Literal, expected: &PropType, property: &str let expected_member = PropType::scalar(expected.scalar, expected.nullable); if !types_compatible(&lit_type, &expected_member) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T3: property `{}` has type {} but membership match got {}", property, expected.display_name(), @@ -708,7 +708,7 @@ fn check_binding_variable_type( ) -> Result<()> { if expected.list { if actual.list { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: list equality is not supported for property `{}`; use a scalar parameter for membership matching", property ))); @@ -716,7 +716,7 @@ fn check_binding_variable_type( let expected_member = PropType::scalar(expected.scalar, expected.nullable); if !types_compatible(actual, &expected_member) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: cannot compare {} membership against {} for property `{}`", actual.display_name(), expected.display_name(), @@ -727,7 +727,7 @@ fn check_binding_variable_type( } if !types_compatible(actual, expected) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: cannot assign/compare {} with {} for property `{}`", actual.display_name(), expected.display_name(), @@ -746,23 +746,23 @@ fn typecheck_traversal( let edge = catalog .lookup_edge_by_name(&traversal.edge_name) .ok_or_else(|| { - NanoError::Type(format!("T4: unknown edge type `{}`", traversal.edge_name)) + CompilerError::Type(format!("T4: unknown edge type `{}`", traversal.edge_name)) })?; if traversal.min_hops == 0 { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T15: traversal min hop bound must be >= 1".to_string(), )); } if let Some(max_hops) = traversal.max_hops { if max_hops < traversal.min_hops { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T15: invalid traversal bounds {{{},{}}}; max must be >= min", traversal.min_hops, max_hops ))); } } else { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T15: unbounded traversal is disabled; use bounded traversal {min,max}".to_string(), )); } @@ -784,7 +784,7 @@ fn typecheck_traversal( // dst should be edge.from_type bind_traversal_endpoint(ctx, &traversal.dst, &edge.from_type, edge)?; } else { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T5: variable `${}` has type `{}`, which is not an endpoint of edge `{}: {} -> {}`", traversal.src, src_bv.type_name, edge.name, edge.from_type, edge.to_type ))); @@ -798,7 +798,7 @@ fn typecheck_traversal( direction = Direction::In; bind_traversal_endpoint(ctx, &traversal.src, &edge.to_type, edge)?; } else { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T5: variable `${}` has type `{}`, which is not an endpoint of edge `{}: {} -> {}`", traversal.dst, dst_bv.type_name, edge.name, edge.from_type, edge.to_type ))); @@ -833,7 +833,7 @@ fn bind_traversal_endpoint( } if let Some(existing) = ctx.bindings.get(var) { if existing.type_name != expected_type { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T5: variable `${}` has type `{}` but edge `{}` expects `{}`", var, existing.type_name, edge.name, expected_type ))); @@ -863,27 +863,27 @@ fn typecheck_filter( if let (ResolvedType::Scalar(l), ResolvedType::Scalar(r)) = (&left_type, &right_type) { if filter.op == CompOp::Contains { if !l.list { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: contains requires a list property on the left, got {}", l.display_name() ))); } if r.list { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T7: contains requires a scalar right operand".to_string(), )); } if matches!(l.scalar, ScalarType::Vector(_)) || matches!(r.scalar, ScalarType::Vector(_)) { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T7: vector membership filters are not supported".to_string(), )); } let expected_member = PropType::scalar(l.scalar, l.nullable); if !types_compatible(&expected_member, r) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: cannot test membership of {} in {}", r.display_name(), l.display_name() @@ -894,29 +894,29 @@ fn typecheck_filter( // T7: check type compatibility if l.list || r.list { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T7: list comparisons in filters are not supported; use `contains` for list membership".to_string(), )); } if matches!(l.scalar, ScalarType::Vector(_)) || matches!(r.scalar, ScalarType::Vector(_)) { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T7: vector comparisons in filters are not supported".to_string(), )); } if matches!(l.scalar, ScalarType::Blob) || matches!(r.scalar, ScalarType::Blob) { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T7: blob comparisons in filters are not supported".to_string(), )); } if !types_compatible(l, r) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: cannot compare {} with {}", l.display_name(), r.display_name() ))); } } else { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T7: filter comparisons require scalar operands, got {} and {}", left_type.display_name(), right_type.display_name() @@ -940,15 +940,15 @@ fn resolve_expr_type( Expr::PropAccess { variable, property } => { // T6: variable must be bound and property must exist let bv = ctx.bindings.get(variable).ok_or_else(|| { - NanoError::Type(format!("T6: variable `${}` is not bound", variable)) + CompilerError::Type(format!("T6: variable `${}` is not bound", variable)) })?; let node_type = catalog.node_types.get(&bv.type_name).ok_or_else(|| { - NanoError::Type(format!("T6: type `{}` not found in catalog", bv.type_name)) + CompilerError::Type(format!("T6: type `{}` not found in catalog", bv.type_name)) })?; let prop = node_type.properties.get(property).ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T6: type `{}` has no property `{}`", bv.type_name, property )) @@ -962,19 +962,19 @@ fn resolve_expr_type( query, } => { let node_binding = ctx.bindings.get(variable).ok_or_else(|| { - NanoError::Type(format!("T15: variable `${}` is not bound", variable)) + CompilerError::Type(format!("T15: variable `${}` is not bound", variable)) })?; let node_type = catalog .node_types .get(&node_binding.type_name) .ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T15: type `{}` not found in catalog", node_binding.type_name )) })?; let prop_type = node_type.properties.get(property).ok_or_else(|| { - NanoError::Type(format!( + CompilerError::Type(format!( "T15: type `{}` has no property `{}`", node_binding.type_name, property )) @@ -982,7 +982,7 @@ fn resolve_expr_type( let vector_dim = match prop_type.scalar { ScalarType::Vector(dim) => dim, _ => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T15: nearest requires a Vector property, got {}.{}: {}", node_binding.type_name, property, @@ -991,7 +991,7 @@ fn resolve_expr_type( } }; if prop_type.list { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T15: nearest does not support list-wrapped vectors".to_string(), )); } @@ -1000,7 +1000,7 @@ fn resolve_expr_type( && let Some(dim) = numeric_vector_literal_dim(lit) { if dim != vector_dim { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T15: nearest vector dimension mismatch: property is Vector({}), query literal has {} elements", vector_dim, dim ))); @@ -1019,7 +1019,7 @@ fn resolve_expr_type( _ => unreachable!(), }; if qdim != vector_dim { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T15: nearest vector dimension mismatch: property is Vector({}), query is Vector({})", vector_dim, qdim ))); @@ -1029,14 +1029,14 @@ fn resolve_expr_type( // query-time string embedding is supported by the runtime executor } ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T15: nearest query must be Vector({}) or String, got {}", vector_dim, s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T15: nearest query must be a scalar expression".to_string(), )); } @@ -1052,13 +1052,13 @@ fn resolve_expr_type( match field_type { ResolvedType::Scalar(s) if s.scalar == ScalarType::String && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T19: search field must be String, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T19: search field must be a scalar String expression".to_string(), )); } @@ -1068,13 +1068,13 @@ fn resolve_expr_type( match query_type { ResolvedType::Scalar(s) if s.scalar == ScalarType::String && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T19: search query must be String, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T19: search query must be a scalar String expression".to_string(), )); } @@ -1094,13 +1094,13 @@ fn resolve_expr_type( match field_type { ResolvedType::Scalar(s) if s.scalar == ScalarType::String && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T19: fuzzy field must be String, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T19: fuzzy field must be a scalar String expression".to_string(), )); } @@ -1110,13 +1110,13 @@ fn resolve_expr_type( match query_type { ResolvedType::Scalar(s) if s.scalar == ScalarType::String && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T19: fuzzy query must be String, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T19: fuzzy query must be a scalar String expression".to_string(), )); } @@ -1135,13 +1135,13 @@ fn resolve_expr_type( | ScalarType::U64 ) => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T19: fuzzy max_edits must be an integer scalar, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T19: fuzzy max_edits must be an integer scalar expression".to_string(), )); } @@ -1158,13 +1158,13 @@ fn resolve_expr_type( match field_type { ResolvedType::Scalar(s) if s.scalar == ScalarType::String && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T20: match_text field must be String, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T20: match_text field must be a scalar String expression".to_string(), )); } @@ -1174,13 +1174,13 @@ fn resolve_expr_type( match query_type { ResolvedType::Scalar(s) if s.scalar == ScalarType::String && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T20: match_text query must be String, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T20: match_text query must be a scalar String expression".to_string(), )); } @@ -1196,13 +1196,13 @@ fn resolve_expr_type( match field_type { ResolvedType::Scalar(s) if s.scalar == ScalarType::String && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T20: bm25 field must be String, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T20: bm25 field must be a scalar String expression".to_string(), )); } @@ -1212,13 +1212,13 @@ fn resolve_expr_type( match query_type { ResolvedType::Scalar(s) if s.scalar == ScalarType::String && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T20: bm25 query must be String, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T20: bm25 query must be a scalar String expression".to_string(), )); } @@ -1235,12 +1235,12 @@ fn resolve_expr_type( k, } => { if !matches!(primary.as_ref(), Expr::Nearest { .. } | Expr::Bm25 { .. }) { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T21: rrf primary expression must be nearest(...) or bm25(...)".to_string(), )); } if !matches!(secondary.as_ref(), Expr::Nearest { .. } | Expr::Bm25 { .. }) { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T21: rrf secondary expression must be nearest(...) or bm25(...)".to_string(), )); } @@ -1252,13 +1252,13 @@ fn resolve_expr_type( match ty { ResolvedType::Scalar(s) if s.scalar == ScalarType::F64 && !s.list => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T21: rrf rank expressions must evaluate to F64, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T21: rrf rank expressions must be scalar numeric expressions" .to_string(), )); @@ -1279,13 +1279,13 @@ fn resolve_expr_type( | ScalarType::U64 ) => {} ResolvedType::Scalar(s) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T21: rrf k must be an integer scalar, got {}", s.display_name() ))); } _ => { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T21: rrf k must be an integer scalar expression".to_string(), )); } @@ -1293,7 +1293,7 @@ fn resolve_expr_type( if let Expr::Literal(Literal::Integer(v)) = k_expr.as_ref() && *v <= 0 { - return Err(NanoError::Type( + return Err(CompilerError::Type( "T21: rrf k must be greater than 0".to_string(), )); } @@ -1311,7 +1311,7 @@ fn resolve_expr_type( } else if let Some(bv) = ctx.bindings.get(name) { Ok(ResolvedType::Node(bv.type_name.clone())) } else { - Err(NanoError::Type(format!( + Err(CompilerError::Type(format!( "variable `${}` is not bound", name ))) @@ -1327,7 +1327,7 @@ fn resolve_expr_type( if let ResolvedType::Scalar(s) = &arg_type && (s.list || !s.scalar.is_numeric()) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T8: {} requires numeric type, got {}", func, s.display_name() @@ -1338,7 +1338,7 @@ fn resolve_expr_type( if let ResolvedType::Scalar(s) = &arg_type && (s.list || (!s.scalar.is_numeric() && s.scalar != ScalarType::String)) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T8: {} requires numeric or string type, got {}", func, s.display_name() @@ -1420,7 +1420,7 @@ fn resolved_type_to_field_shape( ResolvedType::Scalar(prop_type) => Ok((prop_type.to_arrow(), prop_type.nullable)), ResolvedType::Node(type_name) => { let node_type = catalog.node_types.get(type_name).ok_or_else(|| { - NanoError::Type(format!("type `{}` not found in catalog", type_name)) + CompilerError::Type(format!("type `{}` not found in catalog", type_name)) })?; let fields: Vec = node_type .arrow_schema @@ -1450,14 +1450,14 @@ fn literal_type(lit: &Literal) -> Result { } let first = literal_type(&items[0])?; if first.list { - return Err(NanoError::Type( + return Err(CompilerError::Type( "nested list literals are not supported".to_string(), )); } for item in items.iter().skip(1) { let item_type = literal_type(item)?; if item_type.list || !types_compatible(&first, &item_type) { - return Err(NanoError::Type( + return Err(CompilerError::Type( "list literal elements must share a compatible scalar type".to_string(), )); } @@ -1473,7 +1473,7 @@ fn check_literal_type(lit: &Literal, expected: &PropType, prop_name: &str) -> Re return if expected.nullable { Ok(()) } else { - Err(NanoError::Type(format!( + Err(CompilerError::Type(format!( "T3: property `{}` is non-nullable but got null", prop_name ))) @@ -1487,7 +1487,7 @@ fn check_literal_type(lit: &Literal, expected: &PropType, prop_name: &str) -> Re if actual_dim == expected_dim { return Ok(()); } - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T3: property `{}` has type Vector({}) but got vector literal with {} elements", prop_name, expected_dim, actual_dim ))); @@ -1495,7 +1495,7 @@ fn check_literal_type(lit: &Literal, expected: &PropType, prop_name: &str) -> Re let lit_type = literal_type(lit)?; if !types_compatible(&lit_type, expected) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T3: property `{}` has type {} but got {}", prop_name, expected.display_name(), @@ -1507,7 +1507,7 @@ fn check_literal_type(lit: &Literal, expected: &PropType, prop_name: &str) -> Re match lit { Literal::String(v) => { if !allowed.contains(v) { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T3: property `{}` expects one of [{}], got '{}'", prop_name, allowed.join(", "), @@ -1520,7 +1520,7 @@ fn check_literal_type(lit: &Literal, expected: &PropType, prop_name: &str) -> Re match item { Literal::String(v) if allowed.contains(v) => {} Literal::String(v) => { - return Err(NanoError::Type(format!( + return Err(CompilerError::Type(format!( "T3: property `{}` expects one of [{}], got '{}'", prop_name, allowed.join(", "), diff --git a/crates/omnigraph-compiler/src/query_input.rs b/crates/omnigraph-compiler/src/query_input.rs index b85decf..b641f3e 100644 --- a/crates/omnigraph-compiler/src/query_input.rs +++ b/crates/omnigraph-compiler/src/query_input.rs @@ -3,7 +3,7 @@ use std::fmt; use serde_json::Value; -use crate::error::NanoError; +use crate::error::CompilerError; use crate::ir::ParamMap; use crate::json_output::{JS_MAX_SAFE_INTEGER_U64, is_js_safe_integer_i64}; use crate::query::ast::{Literal, Param, QueryDecl}; @@ -17,7 +17,7 @@ pub enum JsonParamMode { #[derive(Debug)] pub enum RunInputError { - Core(NanoError), + Core(CompilerError), Message(String), } @@ -45,8 +45,8 @@ impl Error for RunInputError { } } -impl From for RunInputError { - fn from(value: NanoError) -> Self { +impl From for RunInputError { + fn from(value: CompilerError) -> Self { Self::Core(value) } } @@ -120,7 +120,7 @@ impl ToParam for i64 { impl ToParam for isize { fn to_param(self) -> crate::error::Result { let value = i64::try_from(self).map_err(|_| { - NanoError::Execution(format!( + CompilerError::Execution(format!( "param value {} exceeds current engine range for numeric literals (max {})", self, i64::MAX @@ -151,7 +151,7 @@ impl ToParam for u32 { impl ToParam for u64 { fn to_param(self) -> crate::error::Result { let value = i64::try_from(self).map_err(|_| { - NanoError::Execution(format!( + CompilerError::Execution(format!( "param value {} exceeds current engine range for numeric literals (max {})", self, i64::MAX @@ -164,7 +164,7 @@ impl ToParam for u64 { impl ToParam for usize { fn to_param(self) -> crate::error::Result { let value = i64::try_from(self).map_err(|_| { - NanoError::Execution(format!( + CompilerError::Execution(format!( "param value {} exceeds current engine range for numeric literals (max {})", self, i64::MAX @@ -177,7 +177,7 @@ impl ToParam for usize { impl ToParam for f32 { fn to_param(self) -> crate::error::Result { if !self.is_finite() { - return Err(NanoError::Execution(format!( + return Err(CompilerError::Execution(format!( "invalid float parameter {}", self ))); @@ -189,7 +189,7 @@ impl ToParam for f32 { impl ToParam for f64 { fn to_param(self) -> crate::error::Result { if !self.is_finite() { - return Err(NanoError::Execution(format!( + return Err(CompilerError::Execution(format!( "invalid float parameter {}", self ))); diff --git a/crates/omnigraph-compiler/src/result.rs b/crates/omnigraph-compiler/src/result.rs index 7de77ac..d92dd1e 100644 --- a/crates/omnigraph-compiler/src/result.rs +++ b/crates/omnigraph-compiler/src/result.rs @@ -5,7 +5,7 @@ use arrow_ipc::writer::StreamWriter; use arrow_schema::{DataType, Field, Schema, SchemaRef}; use serde::de::DeserializeOwned; -use crate::error::{NanoError, Result}; +use crate::error::{CompilerError, Result}; use crate::json_output::{record_batches_to_json_rows, record_batches_to_rust_json_rows}; #[derive(Debug, Clone, Copy, Default)] @@ -47,7 +47,7 @@ impl QueryResult { } arrow_select::concat::concat_batches(&self.schema, &self.batches) - .map_err(|err| NanoError::Execution(err.to_string())) + .map_err(|err| CompilerError::Execution(err.to_string())) } pub fn to_sdk_json(&self) -> serde_json::Value { @@ -60,7 +60,7 @@ impl QueryResult { pub fn deserialize(&self) -> Result { serde_json::from_value(self.to_rust_json()).map_err(|err| { - NanoError::Execution(format!("failed to deserialize query result: {}", err)) + CompilerError::Execution(format!("failed to deserialize query result: {}", err)) }) } diff --git a/crates/omnigraph-compiler/src/schema/parser.rs b/crates/omnigraph-compiler/src/schema/parser.rs index c5f4355..6e34e53 100644 --- a/crates/omnigraph-compiler/src/schema/parser.rs +++ b/crates/omnigraph-compiler/src/schema/parser.rs @@ -5,7 +5,7 @@ use pest::error::InputLocation; use pest_derive::Parser; use crate::error::{ - NanoError, ParseDiagnostic, Result, SourceSpan, decode_string_literal, render_span, + CompilerError, ParseDiagnostic, Result, SourceSpan, decode_string_literal, render_span, }; use crate::types::{PropType, ScalarType}; @@ -16,7 +16,7 @@ use super::ast::*; struct SchemaParser; pub fn parse_schema(input: &str) -> Result { - parse_schema_diagnostic(input).map_err(|e| NanoError::Parse(e.to_string())) + parse_schema_diagnostic(input).map_err(|e| CompilerError::Parse(e.to_string())) } pub fn parse_schema_diagnostic(input: &str) -> std::result::Result { @@ -27,7 +27,8 @@ pub fn parse_schema_diagnostic(input: &str) -> std::result::Result std::result::Result = interfaces.iter().collect(); for decl in &mut declarations { if let SchemaDecl::Node(node) = decl { - resolve_interfaces(node, &iface_refs).map_err(nano_error_to_diagnostic)?; + resolve_interfaces(node, &iface_refs).map_err(compiler_error_to_diagnostic)?; } } let schema = SchemaFile { declarations }; - validate_schema_annotations(&schema).map_err(nano_error_to_diagnostic)?; - validate_constraints(&schema).map_err(nano_error_to_diagnostic)?; + validate_schema_annotations(&schema).map_err(compiler_error_to_diagnostic)?; + validate_constraints(&schema).map_err(compiler_error_to_diagnostic)?; Ok(schema) } @@ -64,7 +65,7 @@ fn pest_error_to_diagnostic(err: pest::error::Error) -> ParseDiagnostic { ParseDiagnostic::new(err.to_string(), span) } -fn nano_error_to_diagnostic(err: NanoError) -> ParseDiagnostic { +fn compiler_error_to_diagnostic(err: CompilerError) -> ParseDiagnostic { ParseDiagnostic::new(err.to_string(), None) } @@ -74,7 +75,7 @@ fn parse_schema_decl(pair: pest::iterators::Pair) -> Result { Rule::interface_decl => Ok(SchemaDecl::Interface(parse_interface_decl(inner)?)), Rule::node_decl => Ok(SchemaDecl::Node(parse_node_decl(inner)?)), Rule::edge_decl => Ok(SchemaDecl::Edge(parse_edge_decl(inner)?)), - _ => Err(NanoError::Parse(format!( + _ => Err(CompilerError::Parse(format!( "unexpected rule: {:?}", inner.as_rule() ))), @@ -180,21 +181,20 @@ fn parse_cardinality(pair: pest::iterators::Pair) -> Result { let min_str = inner.next().unwrap().as_str(); let min = min_str .parse::() - .map_err(|_| NanoError::Parse(format!("invalid cardinality min: {}", min_str)))?; - let max = if let Some(max_pair) = inner.next() { - let max_str = max_pair.as_str(); - Some( - max_str - .parse::() - .map_err(|_| NanoError::Parse(format!("invalid cardinality max: {}", max_str)))?, - ) - } else { - None - }; + .map_err(|_| CompilerError::Parse(format!("invalid cardinality min: {}", min_str)))?; + let max = + if let Some(max_pair) = inner.next() { + let max_str = max_pair.as_str(); + Some(max_str.parse::().map_err(|_| { + CompilerError::Parse(format!("invalid cardinality max: {}", max_str)) + })?) + } else { + None + }; if let Some(max_val) = max { if min > max_val { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "cardinality min ({}) exceeds max ({})", min, max_val ))); @@ -219,7 +219,7 @@ fn parse_body_constraint(pair: pest::iterators::Pair) -> Result>>()?; if names.is_empty() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "@key constraint requires at least one property name".to_string(), )); } @@ -228,7 +228,7 @@ fn parse_body_constraint(pair: pest::iterators::Pair) -> Result { let names = extract_ident_list_from_args(args)?; if names.is_empty() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "@unique constraint requires at least one property name".to_string(), )); } @@ -237,7 +237,7 @@ fn parse_body_constraint(pair: pest::iterators::Pair) -> Result { let names = extract_ident_list_from_args(args)?; if names.is_empty() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "@index constraint requires at least one property name".to_string(), )); } @@ -246,7 +246,7 @@ fn parse_body_constraint(pair: pest::iterators::Pair) -> Result { // @range(prop, min..max) if args.len() < 2 { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "@range requires property name and bounds: @range(prop, min..max)".to_string(), )); } @@ -258,7 +258,7 @@ fn parse_body_constraint(pair: pest::iterators::Pair) -> Result { // @check(prop, "regex") if args.len() < 2 { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "@check requires property name and pattern: @check(prop, \"regex\")" .to_string(), )); @@ -267,7 +267,10 @@ fn parse_body_constraint(pair: pest::iterators::Pair) -> Result Err(NanoError::Parse(format!("unknown constraint: @{}", other))), + other => Err(CompilerError::Parse(format!( + "unknown constraint: @{}", + other + ))), } } @@ -281,7 +284,7 @@ fn extract_ident_from_constraint_arg(pair: pest::iterators::Pair) -> Resul return Ok(inner.as_str().to_string()); } } - Err(NanoError::Parse( + Err(CompilerError::Parse( "expected property name in constraint".to_string(), )) } @@ -309,7 +312,7 @@ fn extract_string_from_constraint_arg(pair: &pest::iterators::Pair) -> Res } find_string(pair)? - .ok_or_else(|| NanoError::Parse("expected string argument in constraint".to_string())) + .ok_or_else(|| CompilerError::Parse("expected string argument in constraint".to_string())) } fn extract_range_bounds( @@ -327,7 +330,9 @@ fn extract_range_bounds( } } found.ok_or_else(|| { - NanoError::Parse("expected range bounds (min..max) in @range constraint".to_string()) + CompilerError::Parse( + "expected range bounds (min..max) in @range constraint".to_string(), + ) })? }; @@ -378,7 +383,7 @@ fn parse_constraint_bound(pair: &pest::iterators::Pair) -> Result Res for iface_name in &node.implements { let iface = interface_map.get(iface_name.as_str()).ok_or_else(|| { - NanoError::Parse(format!( + CompilerError::Parse(format!( "node {} implements unknown interface '{}'", node.name, iface_name )) @@ -421,7 +426,7 @@ fn resolve_interfaces(node: &mut NodeDecl, interfaces: &[&InterfaceDecl]) -> Res if let Some(existing) = node.properties.iter().find(|p| p.name == iface_prop.name) { // Property exists — verify type compatibility if existing.prop_type != iface_prop.prop_type { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "node {} property '{}' has type {} but interface {} declares it as {}", node.name, iface_prop.name, @@ -472,36 +477,35 @@ fn parse_type_ref(pair: pest::iterators::Pair) -> Result { let mut inner = pair .into_inner() .next() - .ok_or_else(|| NanoError::Parse("type reference is missing core type".to_string()))?; + .ok_or_else(|| CompilerError::Parse("type reference is missing core type".to_string()))?; if inner.as_rule() == Rule::core_type { - inner = inner - .into_inner() - .next() - .ok_or_else(|| NanoError::Parse("type reference is missing core type".to_string()))?; + inner = inner.into_inner().next().ok_or_else(|| { + CompilerError::Parse("type reference is missing core type".to_string()) + })?; } match inner.as_rule() { Rule::base_type => { let scalar = ScalarType::from_str_name(inner.as_str()) - .ok_or_else(|| NanoError::Parse(format!("unknown type: {}", inner.as_str())))?; + .ok_or_else(|| CompilerError::Parse(format!("unknown type: {}", inner.as_str())))?; Ok(PropType::scalar(scalar, nullable)) } Rule::vector_type => { let dim_text = inner .into_inner() .next() - .ok_or_else(|| NanoError::Parse("Vector type missing dimension".to_string()))? + .ok_or_else(|| CompilerError::Parse("Vector type missing dimension".to_string()))? .as_str(); let dim = dim_text .parse::() - .map_err(|e| NanoError::Parse(format!("invalid Vector dimension: {}", e)))?; + .map_err(|e| CompilerError::Parse(format!("invalid Vector dimension: {}", e)))?; if dim == 0 { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "Vector dimension must be greater than zero".to_string(), )); } if dim > i32::MAX as u32 { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "Vector dimension {} exceeds maximum supported {}", dim, i32::MAX @@ -510,15 +514,14 @@ fn parse_type_ref(pair: pest::iterators::Pair) -> Result { Ok(PropType::scalar(ScalarType::Vector(dim), nullable)) } Rule::list_type => { - let element = inner - .into_inner() - .next() - .ok_or_else(|| NanoError::Parse("list type missing element type".to_string()))?; + let element = inner.into_inner().next().ok_or_else(|| { + CompilerError::Parse("list type missing element type".to_string()) + })?; let scalar = ScalarType::from_str_name(element.as_str()).ok_or_else(|| { - NanoError::Parse(format!("unknown list element type: {}", element.as_str())) + CompilerError::Parse(format!("unknown list element type: {}", element.as_str())) })?; if matches!(scalar, ScalarType::Blob) { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "list of Blob is not supported".to_string(), )); } @@ -532,7 +535,7 @@ fn parse_type_ref(pair: pest::iterators::Pair) -> Result { } } if values.is_empty() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "enum type must include at least one value".to_string(), )); } @@ -540,13 +543,13 @@ fn parse_type_ref(pair: pest::iterators::Pair) -> Result { dedup.sort(); dedup.dedup(); if dedup.len() != values.len() { - return Err(NanoError::Parse( + return Err(CompilerError::Parse( "enum type cannot include duplicate values".to_string(), )); } Ok(PropType::enum_type(values, nullable)) } - other => Err(NanoError::Parse(format!( + other => Err(CompilerError::Parse(format!( "unexpected type rule: {:?}", other ))), @@ -595,19 +598,19 @@ fn validate_string_annotation( continue; } if seen { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "{} declares @{} multiple times", target, annotation ))); } let value = ann.value.as_deref().ok_or_else(|| { - NanoError::Parse(format!( + CompilerError::Parse(format!( "@{} on {} requires a non-empty value", annotation, target )) })?; if value.trim().is_empty() { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@{} on {} requires a non-empty value", annotation, target ))); @@ -631,7 +634,7 @@ fn validate_schema_annotations(schema: &SchemaFile) -> Result<()> { || ann.name == "index" || ann.name == "embed" { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@{} is only supported on node properties or as body constraint (node {})", ann.name, node.name ))); @@ -660,7 +663,7 @@ fn validate_schema_annotations(schema: &SchemaFile) -> Result<()> { || ann.name == "index" || ann.name == "embed" { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@{} is not supported on edges (edge {})", ann.name, edge.name ))); @@ -714,13 +717,13 @@ fn validate_property_annotations( || ann.name == "index" || ann.name == "embed") { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@{} is not supported on list property {}.{}", ann.name, type_name, prop.name ))); } if is_vector && (ann.name == "key" || ann.name == "unique") { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@{} is not supported on vector property {}.{}", ann.name, type_name, prop.name ))); @@ -731,13 +734,13 @@ fn validate_property_annotations( || ann.name == "index" || ann.name == "embed") { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@{} is not supported on blob property {}.{}", ann.name, type_name, prop.name ))); } if ann.name == "instruction" { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@instruction is only supported on node and edge types (property {}.{})", type_name, prop.name ))); @@ -745,7 +748,7 @@ fn validate_property_annotations( // Edge-specific restrictions if is_edge && (ann.name == "key" || ann.name == "embed") { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@{} is not supported on edge properties (edge {}.{})", ann.name, type_name, prop.name ))); @@ -755,13 +758,13 @@ fn validate_property_annotations( match ann.name.as_str() { "key" => { if ann.value.is_some() { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@key on {}.{} does not accept a value", type_name, prop.name ))); } if key_seen { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "property {}.{} declares @key multiple times", type_name, prop.name ))); @@ -770,13 +773,13 @@ fn validate_property_annotations( } "unique" => { if ann.value.is_some() { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@unique on {}.{} does not accept a value", type_name, prop.name ))); } if unique_seen { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "property {}.{} declares @unique multiple times", type_name, prop.name ))); @@ -785,13 +788,13 @@ fn validate_property_annotations( } "index" => { if ann.value.is_some() { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@index on {}.{} does not accept a value", type_name, prop.name ))); } if index_seen { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "property {}.{} declares @index multiple times", type_name, prop.name ))); @@ -800,7 +803,7 @@ fn validate_property_annotations( } "embed" => { if embed_seen { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "property {}.{} declares @embed multiple times", type_name, prop.name ))); @@ -808,20 +811,20 @@ fn validate_property_annotations( embed_seen = true; if !is_vector { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@embed is only supported on vector properties ({}.{})", type_name, prop.name ))); } let source_prop = ann.value.as_deref().ok_or_else(|| { - NanoError::Parse(format!( + CompilerError::Parse(format!( "@embed on {}.{} requires a source property name", type_name, prop.name )) })?; if source_prop.trim().is_empty() { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@embed on {}.{} requires a non-empty source property name", type_name, prop.name ))); @@ -831,14 +834,14 @@ fn validate_property_annotations( .iter() .find(|p| p.name == source_prop) .ok_or_else(|| { - NanoError::Parse(format!( + CompilerError::Parse(format!( "@embed on {}.{} references unknown source property {}", type_name, prop.name, source_prop )) })?; if source_decl.prop_type.list || source_decl.prop_type.scalar != ScalarType::String { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@embed source property {}.{} must be String", type_name, source_prop ))); @@ -848,7 +851,7 @@ fn validate_property_annotations( // a typo can't be silently ignored (it would never validate). for key in ann.kwargs.keys() { if key != "model" { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@embed on {}.{} has unknown argument '{}=' (only 'model' is supported)", type_name, prop.name, key ))); @@ -893,45 +896,45 @@ fn validate_type_constraints( match constraint { Constraint::Key(cols) => { if is_edge { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@key constraint is not supported on edges (edge {})", type_name ))); } key_count += 1; if key_count > 1 { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "node type {} has multiple @key constraints; only one is supported", type_name ))); } for col in cols { let prop = prop_names.get(col.as_str()).ok_or_else(|| { - NanoError::Parse(format!( + CompilerError::Parse(format!( "@key on {} references unknown property '{}'", type_name, col )) })?; if prop.prop_type.nullable { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@key property {}.{} cannot be nullable", type_name, col ))); } if prop.prop_type.list { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@key is not supported on list property {}.{}", type_name, col ))); } if matches!(prop.prop_type.scalar, ScalarType::Vector(_)) { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@key is not supported on vector property {}.{}", type_name, col ))); } if matches!(prop.prop_type.scalar, ScalarType::Blob) { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@key is not supported on blob property {}.{}", type_name, col ))); @@ -945,7 +948,7 @@ fn validate_type_constraints( continue; } if !prop_names.contains_key(col.as_str()) { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@unique on {} references unknown property '{}'", type_name, col ))); @@ -958,13 +961,13 @@ fn validate_type_constraints( continue; } let prop = prop_names.get(col.as_str()).ok_or_else(|| { - NanoError::Parse(format!( + CompilerError::Parse(format!( "@index on {} references unknown property '{}'", type_name, col )) })?; if matches!(prop.prop_type.scalar, ScalarType::Blob) { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@index is not supported on blob property {}.{}", type_name, col ))); @@ -973,19 +976,19 @@ fn validate_type_constraints( } Constraint::Range { property, .. } => { if is_edge { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@range constraint is not supported on edges (edge {})", type_name ))); } let prop = prop_names.get(property.as_str()).ok_or_else(|| { - NanoError::Parse(format!( + CompilerError::Parse(format!( "@range on {} references unknown property '{}'", type_name, property )) })?; if !prop.prop_type.scalar.is_numeric() { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@range on {}.{} requires a numeric type, got {}", type_name, property, @@ -995,19 +998,19 @@ fn validate_type_constraints( } Constraint::Check { property, .. } => { if is_edge { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@check constraint is not supported on edges (edge {})", type_name ))); } let prop = prop_names.get(property.as_str()).ok_or_else(|| { - NanoError::Parse(format!( + CompilerError::Parse(format!( "@check on {} references unknown property '{}'", type_name, property )) })?; if prop.prop_type.scalar != ScalarType::String { - return Err(NanoError::Parse(format!( + return Err(CompilerError::Parse(format!( "@check on {}.{} requires String type, got {}", type_name, property, diff --git a/crates/omnigraph/src/error.rs b/crates/omnigraph/src/error.rs index 11f4da0..a24f153 100644 --- a/crates/omnigraph/src/error.rs +++ b/crates/omnigraph/src/error.rs @@ -74,7 +74,7 @@ pub enum MergeConflictKind { #[derive(Debug, Error)] pub enum OmniError { #[error("{0}")] - Compiler(#[from] omnigraph_compiler::error::NanoError), + Compiler(#[from] omnigraph_compiler::error::CompilerError), #[error("storage: {0}")] Lance(String), #[error("query: {0}")] diff --git a/docs/user/operations/errors.md b/docs/user/operations/errors.md index 48f1fc9..85b4fde 100644 --- a/docs/user/operations/errors.md +++ b/docs/user/operations/errors.md @@ -12,7 +12,7 @@ - **D₂ parse-time rejection**: a single mutation query that mixes inserts/updates with deletes errors out *before any I/O* with kind `BadRequest`. Message: `mutation '' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes`. See [query-language.md](../queries/index.md) for the rule. - `MergeConflicts(Vec)` -Compiler-side `NanoError` covers parse / catalog / type / storage / plan / execution / arrow / lance / IO / manifest / unique-constraint, each with structured spans (`SourceSpan { start, end }`) for ariadne-style diagnostics. +Compiler-side `CompilerError` covers parse / catalog / type / storage / plan / execution / arrow / lance / IO / manifest / unique-constraint, each with structured spans (`SourceSpan { start, end }`) for ariadne-style diagnostics. The legacy `NanoError` name remains as a deprecated compatibility alias. ## Result serialization (`omnigraph_compiler::result::QueryResult`) From f118b6740268b969840befe25fc7695d30de515a Mon Sep 17 00:00:00 2001 From: aaltshuler Date: Thu, 18 Jun 2026 00:02:21 +0300 Subject: [PATCH 08/13] address compiler error review comments --- crates/omnigraph-cluster/src/store.rs | 6 +++--- crates/omnigraph-cluster/src/tests.rs | 29 ++++++++++++++++++++++++++ crates/omnigraph-compiler/src/error.rs | 8 ++++++- 3 files changed, 39 insertions(+), 4 deletions(-) diff --git a/crates/omnigraph-cluster/src/store.rs b/crates/omnigraph-cluster/src/store.rs index 9a2e748..c19a95d 100644 --- a/crates/omnigraph-cluster/src/store.rs +++ b/crates/omnigraph-cluster/src/store.rs @@ -340,9 +340,9 @@ impl ClusterStore { uris.retain(|uri| uri.ends_with(".json")); uris.sort(); uris.into_iter() - .map(|uri| match uri.rsplit('/').next() { - Some(name) => format!("{}/{name}", self.display(CLUSTER_RECOVERIES_DIR)), - None => uri, + .map(|uri| { + let name = uri.rsplit_once('/').map_or(uri.as_str(), |(_, name)| name); + format!("{}/{name}", self.display(CLUSTER_RECOVERIES_DIR)) }) .collect() } diff --git a/crates/omnigraph-cluster/src/tests.rs b/crates/omnigraph-cluster/src/tests.rs index 536e904..b14b46e 100644 --- a/crates/omnigraph-cluster/src/tests.rs +++ b/crates/omnigraph-cluster/src/tests.rs @@ -3375,6 +3375,35 @@ policies: ); } + #[tokio::test] + async fn read_only_commands_ignore_missing_recovery_sidecar_dir() { + let dir = fixture(); + write_applyable_state(dir.path()); + assert!(!dir.path().join(CLUSTER_RECOVERIES_DIR).exists()); + + let status = status_config_dir(dir.path()).await; + assert!(status.ok, "{:?}", status.diagnostics); + assert!( + !status.diagnostics.iter().any(|diagnostic| matches!( + diagnostic.code.as_str(), + "recovery_sidecar_read_error" | "cluster_recovery_pending" + )), + "{:?}", + status.diagnostics + ); + + let plan = plan_config_dir(dir.path()).await; + assert!(plan.ok, "{:?}", plan.diagnostics); + assert!( + !plan.diagnostics.iter().any(|diagnostic| matches!( + diagnostic.code.as_str(), + "recovery_sidecar_read_error" | "cluster_recovery_pending" + )), + "{:?}", + plan.diagnostics + ); + } + #[tokio::test] async fn read_only_commands_warn_on_pending_recovery_sidecar_in_storage_root() { let dir = fixture(); diff --git a/crates/omnigraph-compiler/src/error.rs b/crates/omnigraph-compiler/src/error.rs index cbf5c4d..0c642c2 100644 --- a/crates/omnigraph-compiler/src/error.rs +++ b/crates/omnigraph-compiler/src/error.rs @@ -172,7 +172,7 @@ mod tests { let allowed_file = workspace_root.join("crates/omnigraph-compiler/src/error.rs"); let mut offenders = Vec::new(); - visit_rs_files(&workspace_root.join("crates"), &mut |path| { + visit_rs_files(workspace_root, &mut |path| { let text = std::fs::read_to_string(path).expect("source file should be readable"); let count = text.matches(&legacy_name).count(); if path == allowed_file { @@ -202,6 +202,12 @@ mod tests { let entry = entry.expect("source entry should be readable"); let path = entry.path(); if path.is_dir() { + if matches!( + path.file_name().and_then(|name| name.to_str()), + Some(".git" | "target") + ) { + continue; + } visit_rs_files(&path, visit); } else if path.extension().and_then(|ext| ext.to_str()) == Some("rs") { visit(&path); From f2c512ae2616e02dddeb9c2ab6a57b842bed2f26 Mon Sep 17 00:00:00 2001 From: aaltshuler Date: Thu, 18 Jun 2026 02:38:02 +0300 Subject: [PATCH 09/13] chore: remove CODEOWNERS chassis and the code-owner review gate The repo is a 2-person team where both maintainers own every path, so the CODEOWNERS machinery (generated CODEOWNERS, roles yml, render script, the two drift/hand-edit CI jobs) gated nothing real while adding friction: every PR showed "Review required" and own-PRs merged only via admin/bypass override. Remove the whole chassis and drop the review gate: - delete .github/CODEOWNERS, codeowners-roles.yml, render-codeowners.py, the CODEOWNERS workflow, and docs/dev/codeowners.md - branch-protection.json: drop the two CODEOWNERS required status checks, set require_code_owner_reviews=false and required_approving_review_count=0 (CI checks are the gate; maintainers merge their own PRs once green) - scrub CODEOWNERS references from AGENTS.md, docs indexes, branch-protection and ci docs, GOVERNANCE.md, and CONTRIBUTING.md The policy change is inert until an admin runs scripts/apply-branch-protection.sh. Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/CODEOWNERS | 18 --- .github/branch-protection.json | 20 +-- .github/codeowners-roles.yml | 56 -------- .github/scripts/render-codeowners.py | 205 --------------------------- .github/workflows/codeowners.yml | 110 -------------- AGENTS.md | 1 - CONTRIBUTING.md | 4 +- GOVERNANCE.md | 17 ++- docs/dev/branch-protection.md | 19 ++- docs/dev/ci.md | 2 +- docs/dev/codeowners.md | 58 -------- docs/dev/index.md | 1 - 12 files changed, 26 insertions(+), 485 deletions(-) delete mode 100644 .github/CODEOWNERS delete mode 100644 .github/codeowners-roles.yml delete mode 100755 .github/scripts/render-codeowners.py delete mode 100644 .github/workflows/codeowners.yml delete mode 100644 docs/dev/codeowners.md diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS deleted file mode 100644 index ce8510c..0000000 --- a/.github/CODEOWNERS +++ /dev/null @@ -1,18 +0,0 @@ -# AUTOGENERATED from .github/codeowners-roles.yml. Do not edit by hand. -# -# To change role membership or path assignments: -# 1. Edit .github/codeowners-roles.yml -# 2. Run `python3 .github/scripts/render-codeowners.py` -# 3. Commit both files together -# -# CI fails if this file drifts from its source, and rejects PRs that -# edit this file directly without also editing the yml. - -* @aaltshuler @ragnorc - -crates/** @aaltshuler @ragnorc -docs/** @aaltshuler @ragnorc -README.md @aaltshuler @ragnorc -AGENTS.md @aaltshuler @ragnorc -CLAUDE.md @aaltshuler @ragnorc -SECURITY.md @aaltshuler @ragnorc diff --git a/.github/branch-protection.json b/.github/branch-protection.json index aa1ab19..fa1d57f 100644 --- a/.github/branch-protection.json +++ b/.github/branch-protection.json @@ -1,27 +1,19 @@ { - "_comment": "Branch protection policy for main. Applied via scripts/apply-branch-protection.sh. See docs/branch-protection.md for rationale. NOTE: bypass_pull_request_allowances.users must mirror the engineering owners in .github/codeowners-roles.yml — code owners merge their own PRs without a second review; non-owners still need a code-owner approval. (render-codeowners.py does NOT generate this list; keep it in sync by hand.)", + "_comment": "Branch protection policy for main. Applied via scripts/apply-branch-protection.sh. See docs/dev/branch-protection.md for rationale. CODEOWNERS was removed (2-person team where both maintainers own everything, so code-owner review added friction without value). Review is no longer code-owner-scoped and no approvals are required; the required CI status checks are the gate. Maintainers merge their own PRs once checks pass.", "required_status_checks": { "strict": true, "contexts": [ "Classify Changes", "Check AGENTS.md Links", - "Test omnigraph-server --features aws", - "CODEOWNERS matches source", - "CODEOWNERS not hand-edited" + "Test omnigraph-server --features aws" ] }, "enforce_admins": false, "required_pull_request_reviews": { - "dismissal_restrictions": {}, - "dismiss_stale_reviews": true, - "require_code_owner_reviews": true, - "required_approving_review_count": 1, - "require_last_push_approval": false, - "bypass_pull_request_allowances": { - "users": ["ragnorc", "aaltshuler"], - "teams": [], - "apps": [] - } + "dismiss_stale_reviews": false, + "require_code_owner_reviews": false, + "required_approving_review_count": 0, + "require_last_push_approval": false }, "restrictions": null, "required_linear_history": true, diff --git a/.github/codeowners-roles.yml b/.github/codeowners-roles.yml deleted file mode 100644 index 65f2400..0000000 --- a/.github/codeowners-roles.yml +++ /dev/null @@ -1,56 +0,0 @@ -# Source of truth for .github/CODEOWNERS. -# -# How to change role membership or path assignments: -# 1. Edit this file. -# 2. Run `python3 .github/scripts/render-codeowners.py` to regenerate -# .github/CODEOWNERS. -# 3. Commit both files in the same PR. -# -# CI fails on drift between this source and the generated CODEOWNERS -# (see .github/workflows/codeowners.yml). CI also rejects direct edits -# to .github/CODEOWNERS that don't accompany a change here. -# -# Why a generator instead of editing CODEOWNERS directly? -# The yml is the audit trail: `git log .github/codeowners-roles.yml` -# shows every role change with a reviewable diff and a merge commit. -# The rendered CODEOWNERS is what GitHub reads at PR time. - -roles: - engineering: - description: > - All production code under crates/**. Engine, CLI, server, - compiler. - members: - - aaltshuler - - ragnorc - - docs: - description: > - Documentation under docs/**, plus repo-level docs (README.md, - AGENTS.md, CLAUDE.md symlink, SECURITY.md). - members: - - aaltshuler - - ragnorc - -# Path → role mapping. GitHub CODEOWNERS uses "last match wins" -# semantics — when multiple patterns match a file, only the last -# matching pattern's owners apply. The generator handles this by -# emitting `default` as the first `*` line and the specific patterns -# below afterward, so specific paths override the catch-all. -# -# Within this list, order matters only between overlapping specific -# patterns (the later one wins). Today nothing overlaps; future -# additions should keep more-specific patterns later. -paths: - "crates/**": [engineering] - "docs/**": [docs] - "README.md": [docs] - "AGENTS.md": [docs] - "CLAUDE.md": [docs] - "SECURITY.md": [docs] - -# Catch-all for paths not explicitly mapped (.github/, scripts/, -# Cargo.toml, Cargo.lock, openapi.json, LICENSE, etc.). Defaults to -# engineering — every change to repo infrastructure needs the -# engineering owner's review. -default: [engineering] diff --git a/.github/scripts/render-codeowners.py b/.github/scripts/render-codeowners.py deleted file mode 100755 index 5e96545..0000000 --- a/.github/scripts/render-codeowners.py +++ /dev/null @@ -1,205 +0,0 @@ -#!/usr/bin/env python3 -"""Render .github/CODEOWNERS and the ownership tables in -docs/dev/codeowners.md from .github/codeowners-roles.yml. - -The yml is the source of truth. This script expands the role-based yml -into (1) the flat path→owners format GitHub expects in -`.github/CODEOWNERS`, and (2) the "who owns what" markdown tables spliced -between the generated-region markers in `docs/dev/codeowners.md`. Both are -derived artifacts; CI re-renders them on every PR (see -.github/workflows/codeowners.yml) and auto-commits the result on same-repo -PRs, so the source of truth and the human-readable view never drift. - -Usage: - python3 .github/scripts/render-codeowners.py - -Exits non-zero on: - - Missing PyYAML. - - Unknown role referenced in `paths` or `default`. - - Role with no members (a role must always resolve to at least - one owner; otherwise CODEOWNERS would assign nobody and GitHub - would silently fall back to "no required reviewer", which - defeats the purpose). - - Missing generated-region markers in docs/dev/codeowners.md. -""" - -from __future__ import annotations - -import sys -from pathlib import Path - -try: - import yaml -except ImportError: - sys.exit( - "error: PyYAML is required. Install with `pip install pyyaml` " - "or `python3 -m pip install pyyaml`." - ) - -REPO_ROOT = Path(__file__).resolve().parents[2] -SOURCE = REPO_ROOT / ".github" / "codeowners-roles.yml" -OUTPUT = REPO_ROOT / ".github" / "CODEOWNERS" -DOCS = REPO_ROOT / "docs" / "dev" / "codeowners.md" - -# The "who owns what" tables in docs/dev/codeowners.md are spliced between -# these markers so the human-readable view never drifts from the source of -# truth. Edit codeowners-roles.yml and re-render — never the table by hand. -DOCS_BEGIN = "" -DOCS_END = "" - -BANNER = """\ -# AUTOGENERATED from .github/codeowners-roles.yml. Do not edit by hand. -# -# To change role membership or path assignments: -# 1. Edit .github/codeowners-roles.yml -# 2. Run `python3 .github/scripts/render-codeowners.py` -# 3. Commit both files together -# -# CI fails if this file drifts from its source, and rejects PRs that -# edit this file directly without also editing the yml. -""" - - -def resolve(role_name: str, roles: dict) -> list[str]: - role = roles.get(role_name) - if role is None: - sys.exit( - f"error: unknown role '{role_name}'. " - f"Known roles: {sorted(roles.keys())}" - ) - members = role.get("members") or [] - if not members: - sys.exit( - f"error: role '{role_name}' has no members. " - f"A role must resolve to at least one owner." - ) - return members - - -def owners_for(role_names: list[str], roles: dict) -> list[str]: - """Return @-prefixed GitHub handles, deduped, preserving order.""" - seen: list[str] = [] - for role_name in role_names: - for member in resolve(role_name, roles): - handle = f"@{member}" - if handle not in seen: - seen.append(handle) - return seen - - -def _oneline(text: str) -> str: - """Collapse a folded/multi-line YAML description into one cell of text.""" - return " ".join((text or "").split()) - - -def ownership_tables(spec: dict, roles: dict) -> str: - """Render the human-readable "who owns what" markdown — a path→owners - table (the operative view at PR time, in last-match-wins order with the - catch-all first) plus a role→members table. Spliced into the docs between - the markers so it is always current with the source of truth.""" - out: list[str] = [] - - out.append("**Path → owners** (GitHub applies *last match wins*; the `*` " - "catch-all is listed first and is overridden by the specific " - "patterns below it):") - out.append("") - out.append("| Path | Owners | Role(s) |") - out.append("|---|---|---|") - if "default" in spec: - owners = " ".join(owners_for(spec["default"], roles)) - out.append(f"| `*` | {owners} | {', '.join(spec['default'])} |") - for pattern, role_names in (spec.get("paths") or {}).items(): - owners = " ".join(owners_for(role_names, roles)) - out.append(f"| `{pattern}` | {owners} | {', '.join(role_names)} |") - out.append("") - - out.append("**Roles**:") - out.append("") - out.append("| Role | Members | Description |") - out.append("|---|---|---|") - for name, role in roles.items(): - members = " ".join(f"@{m}" for m in (role.get("members") or [])) - out.append(f"| `{name}` | {members} | {_oneline(role.get('description', ''))} |") - out.append("") - - return "\n".join(out) - - -def splice_docs(table_md: str) -> None: - """Replace the region between DOCS_BEGIN/DOCS_END in the docs file with the - freshly generated tables, leaving surrounding prose untouched.""" - if not DOCS.exists(): - sys.exit(f"error: docs file not found: {DOCS}") - text = DOCS.read_text() - if DOCS_BEGIN not in text or DOCS_END not in text: - sys.exit( - f"error: ownership markers not found in {DOCS.relative_to(REPO_ROOT)}. " - f"Add the lines:\n {DOCS_BEGIN}\n {DOCS_END}\n" - f"around the generated table region." - ) - head, rest = text.split(DOCS_BEGIN, 1) - _, tail = rest.split(DOCS_END, 1) - new = f"{head}{DOCS_BEGIN}\n\n{table_md}\n{DOCS_END}{tail}" - DOCS.write_text(new) - - -def main() -> int: - if not SOURCE.exists(): - sys.exit(f"error: source file not found: {SOURCE}") - spec = yaml.safe_load(SOURCE.read_text()) - - roles = spec.get("roles") or {} - if not roles: - sys.exit("error: codeowners-roles.yml declares no roles") - - paths = spec.get("paths") or {} - if not paths: - sys.exit("error: codeowners-roles.yml declares no paths") - - lines: list[str] = [BANNER] - - # Pad the path column for alignment. Width is the longest pattern - # plus a small margin. - width = max(len(p) for p in paths) + 2 - - # GitHub CODEOWNERS uses "last match wins" semantics. Emit the - # default catch-all `*` FIRST so specific patterns below override - # it for the paths they cover. If we emitted `*` last, every file - # would resolve to the default owners regardless of more-specific - # rules — which would silently nullify any role distinction. - if "default" in spec: - default_owners = owners_for(spec["default"], roles) - lines.append(f"{'*':<{width}} {' '.join(default_owners)}") - lines.append("") - - for pattern, role_names in paths.items(): - owners = owners_for(role_names, roles) - lines.append(f"{pattern:<{width}} {' '.join(owners)}") - - lines.append("") # trailing newline so the file ends cleanly - - rendered = "\n".join(lines) - - # Regression check: the catch-all `*` line (if any) must precede - # every specific-path line. Failure here means the generator is - # silently nullifying specific rules. - if "default" in spec: - non_comment = [ln for ln in rendered.splitlines() if ln and not ln.startswith("#")] - first_pattern = non_comment[0].split()[0] if non_comment else None - if first_pattern != "*": - sys.exit( - f"error: generator invariant violated — first emitted pattern is " - f"{first_pattern!r}, expected '*'. CODEOWNERS uses last-match-wins; " - f"the catch-all must come first." - ) - - OUTPUT.write_text(rendered) - print(f"wrote {OUTPUT.relative_to(REPO_ROOT)}") - - splice_docs(ownership_tables(spec, roles)) - print(f"updated {DOCS.relative_to(REPO_ROOT)}") - return 0 - - -if __name__ == "__main__": - sys.exit(main()) diff --git a/.github/workflows/codeowners.yml b/.github/workflows/codeowners.yml deleted file mode 100644 index 75b3515..0000000 --- a/.github/workflows/codeowners.yml +++ /dev/null @@ -1,110 +0,0 @@ -name: CODEOWNERS - -# Runs on EVERY pull request (no paths filter). The two jobs below are -# required status checks on `main`; a path-filtered required check never -# reports for PRs outside the filter and leaves them permanently "pending" -# (the trap that forced admin-override merges). Always-run + cheap -# short-circuit is what keeps them honest. -on: - pull_request: - workflow_dispatch: - -# `drift` auto-commits the regenerated artifacts back to same-repo PR -# branches, so it needs write access. -permissions: - contents: write - -jobs: - # NOTE: the job `name:` values below ("CODEOWNERS matches source" / - # "CODEOWNERS not hand-edited") ARE the status-check contexts that - # .github/branch-protection.json must list verbatim. Renaming a job here - # is a branch-protection change — update the JSON and re-apply. - drift: - name: CODEOWNERS matches source - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v5.0.1 - - - name: Set up Python - uses: actions/setup-python@v5.4.0 - with: - python-version: '3.13' - - - name: Install PyYAML - run: pip install pyyaml - - - name: Re-render CODEOWNERS + ownership docs - run: python3 .github/scripts/render-codeowners.py - - # Same-repo PR: push the regenerated artifacts back so contributors - # never have to run the script locally. Mirrors the openapi.json - # auto-commit in ci.yml (separate shallow clone of the head branch so - # the pushed commit carries only the regenerated files). - - name: Commit regenerated artifacts to PR branch - if: | - github.event_name == 'pull_request' && - github.event.pull_request.head.repo.full_name == github.repository - env: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - run: | - if git diff --quiet -- .github/CODEOWNERS docs/dev/codeowners.md; then - echo "CODEOWNERS and ownership docs already in sync." - exit 0 - fi - tmp=$(mktemp -d) - git clone --depth 1 --branch "${{ github.head_ref }}" \ - "https://x-access-token:${GITHUB_TOKEN}@github.com/${{ github.repository }}.git" \ - "$tmp" - cp .github/CODEOWNERS "$tmp/.github/CODEOWNERS" - cp docs/dev/codeowners.md "$tmp/docs/dev/codeowners.md" - cd "$tmp" - if git diff --quiet -- .github/CODEOWNERS docs/dev/codeowners.md; then - echo "Head branch already matches; nothing to push." - exit 0 - fi - git config user.name "github-actions[bot]" - git config user.email "41898282+github-actions[bot]@users.noreply.github.com" - git add .github/CODEOWNERS docs/dev/codeowners.md - git commit -m "chore: regenerate CODEOWNERS + ownership docs" - git push - - # Fork PR / workflow_dispatch: cannot push back, so enforce drift - # strictly. The contributor runs the script and commits the result. - - name: Verify in sync (forks / manual runs) - if: | - !(github.event_name == 'pull_request' && - github.event.pull_request.head.repo.full_name == github.repository) - run: | - if ! git diff --quiet -- .github/CODEOWNERS docs/dev/codeowners.md; then - echo "::error::Generated CODEOWNERS / ownership docs are out of sync with .github/codeowners-roles.yml." - echo "::error::Run \`python3 .github/scripts/render-codeowners.py\` and commit the result." - echo "--- diff ---" - git --no-pager diff -- .github/CODEOWNERS docs/dev/codeowners.md - exit 1 - fi - echo "Generated artifacts are in sync with their source." - - noedit: - name: CODEOWNERS not hand-edited - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v5.0.1 - with: - # Need history so we can diff against the PR base. - fetch-depth: 0 - - - name: Reject hand-edits to generated file - # Only meaningful for PRs (needs a base to diff against). - if: github.event_name == 'pull_request' - run: | - base="origin/${{ github.base_ref }}" - git fetch origin "${{ github.base_ref }}" --quiet - changed=$(git diff --name-only "$base" HEAD) - edited_generated=$(echo "$changed" | grep -E '^\.github/CODEOWNERS$' || true) - edited_source=$(echo "$changed" | grep -E '^\.github/codeowners-roles\.yml$' || true) - if [ -n "$edited_generated" ] && [ -z "$edited_source" ]; then - echo "::error::This PR edits .github/CODEOWNERS but not its source .github/codeowners-roles.yml." - echo "::error::Edit the yml and regenerate via \`python3 .github/scripts/render-codeowners.py\`." - exit 1 - fi - echo "CODEOWNERS edits accompany source edits (or no CODEOWNERS edits in this PR)." diff --git a/AGENTS.md b/AGENTS.md index e8cd035..1772f77 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -102,7 +102,6 @@ Full diagram and concurrency model: [docs/dev/architecture.md](docs/dev/architec | Install (binary / Homebrew / source / channels) | [docs/user/install.md](docs/user/install.md) | | Deployment (binary / container / S3-local testing / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) | | CI / release workflows | [docs/dev/ci.md](docs/dev/ci.md) | -| Code ownership (CODEOWNERS source of truth, roles, regeneration) | [docs/dev/codeowners.md](docs/dev/codeowners.md) | | Branch protection policy (declarative, applied via `scripts/apply-branch-protection.sh`) | [docs/dev/branch-protection.md](docs/dev/branch-protection.md) | | Constants & tunables cheat sheet | [docs/user/reference/constants.md](docs/user/reference/constants.md) | | Per-version release notes | [docs/releases/](docs/releases/) | diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2d77ef0..1029b2f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -22,8 +22,8 @@ that turns out to be non-trivial will be redirected — that's about process, no the merit of the change. > **Maintainers (ModernRelay team)** follow a separate internal process and are -> not bound by the intake rules above. Everyone is bound by review, CODEOWNERS, -> branch protection, and CI. +> not bound by the intake rules above. Everyone is bound by review, branch +> protection, and CI. ## Development diff --git a/GOVERNANCE.md b/GOVERNANCE.md index 5878f1f..2768e5b 100644 --- a/GOVERNANCE.md +++ b/GOVERNANCE.md @@ -9,19 +9,18 @@ to happen before code lands?* > Discussions, RFCs, and pull requests from people outside the ModernRelay > team. **Maintainers operate under a separate internal process** and are not > bound by the intake gates below. Everyone, maintainer or not, is still bound -> by the universal gates: branch protection on `main` and CODEOWNERS review -> (see [docs/dev/branch-protection.md](docs/dev/branch-protection.md) and -> [docs/dev/codeowners.md](docs/dev/codeowners.md)). +> by the universal gates: branch protection on `main` and CI +> (see [docs/dev/branch-protection.md](docs/dev/branch-protection.md)). ## Roles | Role | Who | Authority | |---|---|---| -| **Maintainer** | The code owners in [`.github/CODEOWNERS`](.github/CODEOWNERS) (generated from [`.github/codeowners-roles.yml`](.github/codeowners-roles.yml)) | Validate issues, accept/reject RFCs, review and merge PRs, set direction. Final decision authority. | +| **Maintainer** | The ModernRelay team (repository admins) | Validate issues, accept/reject RFCs, review and merge PRs, set direction. Final decision authority. | | **Contributor** | Anyone else | Report problems (Issues), propose ideas (Discussions), author RFCs, and open pull requests. | -Decision authority rests with the maintainers. CODEOWNERS is the single source -of truth for who that is; this document does not duplicate the list. +Decision authority rests with the maintainers (the ModernRelay team holding +repository-admin access). ## The three channels @@ -51,7 +50,7 @@ contribution. Pull request ◀──────────┴──────────│── merged == accepted │ (links the issue or the accepted RFC) ◀───────┘ (implementation PRs reference it) │ │ - review + CODEOWNERS + branch protection + review + branch protection + CI ▼ merged ``` @@ -91,8 +90,8 @@ where it's reviewable. ## What maintainers do *not* gate Maintainers' own changes do not pass through the intake gates above — the team -runs a separate internal process. The universal gates (review, CODEOWNERS, -branch protection, CI) apply to everyone. Enforcement of the intake rules is, to +runs a separate internal process. The universal gates (review, branch +protection, CI) apply to everyone. Enforcement of the intake rules is, to start, **by convention and review** (PR template + labels); an automated check keyed to author association may be added later if volume warrants. diff --git a/docs/dev/branch-protection.md b/docs/dev/branch-protection.md index 1d1c094..d3a9f6b 100644 --- a/docs/dev/branch-protection.md +++ b/docs/dev/branch-protection.md @@ -8,10 +8,9 @@ This page explains what the policy says and how to change it. | Setting | Value | Why | |---|---|---| -| **Required status checks (strict)** | `Classify Changes`, `Check AGENTS.md Links`, `Test omnigraph-server --features aws`, `CODEOWNERS matches source`, `CODEOWNERS not hand-edited` | Every PR must pass the AWS-feature build/test, AGENTS.md link integrity, and the CODEOWNERS hygiene checks. **`Test Workspace` is deliberately NOT required** — it runs only on push to `main` (post-merge), tags, and manual `workflow_dispatch`, to keep PR turnaround fast (it was the ~15min+ slow gate). It is therefore *not* listed here: a required check that never reports on PRs (the `test` job is `if: github.event_name != 'pull_request'`) would leave every PR permanently pending — the same job-never-reports trap the CODEOWNERS contexts call out below. The trade-off (a regression lands on `main` and is caught by the post-merge run, so `main` can briefly go red) and its mitigations are documented in [ci.md](ci.md). The two CODEOWNERS contexts must equal the job `name:` values in `.github/workflows/codeowners.yml` **verbatim** — a context naming a job that never reports (the old `CODEOWNERS / drift` used the job *id*, and the job was path-filtered) leaves every PR permanently pending and forces admin overrides. `strict: true` requires the branch to be up-to-date with `main` before merge. | -| **Required approving reviews** | `1` | At least one reviewer. With a 2-person team, going higher would block all merges when one person is unavailable. | -| **Require code-owner reviews** | `true` | The reviewer must be a code owner per `.github/CODEOWNERS`. This is what makes the codeowners chassis enforced. | -| **Dismiss stale reviews on new commits** | `true` | A push after approval invalidates the prior review. Prevents the "approve, then sneak in unreviewed changes" pattern. | +| **Required status checks (strict)** | `Classify Changes`, `Check AGENTS.md Links`, `Test omnigraph-server --features aws` | Every PR must pass the AWS-feature build/test and AGENTS.md link integrity. **`Test Workspace` is deliberately NOT required** — it runs only on push to `main` (post-merge), tags, and manual `workflow_dispatch`, to keep PR turnaround fast (it was the ~15min+ slow gate). It is therefore *not* listed here: a required check that never reports on PRs (the `test` job is `if: github.event_name != 'pull_request'`) would leave every PR permanently pending — the job-never-reports trap. The trade-off (a regression lands on `main` and is caught by the post-merge run, so `main` can briefly go red) and its mitigations are documented in [ci.md](ci.md). Each required context must equal a job `name:` that actually reports on PRs **verbatim** — a context naming a job that never reports leaves every PR permanently pending and forces admin overrides. `strict: true` requires the branch to be up-to-date with `main` before merge. | +| **Required approving reviews** | `0` | No human-review gate. With a 2-person team where both maintainers own everything, requiring an approval meant every PR needed the *other* person (or an admin/bypass override) — friction with no real review value. CI checks are the gate; maintainers merge their own PRs once checks pass. Raise this to `1` if an outside-contributor flow ever needs a review gate. | +| **Require code-owner reviews** | `false` | CODEOWNERS was removed entirely (see the git history of `.github/`); there is no code-owner review requirement. | | **Require linear history** | `true` | No merge commits — squash or rebase only. Matches recent practice. | | **Disallow force pushes** | `true` | No history rewrites on `main`. | | **Disallow branch deletions** | `true` | `main` cannot be deleted. | @@ -57,7 +56,7 @@ Outputs the live policy. Compare against `.github/branch-protection.json` to det - **Audit trail**: `git log .github/branch-protection.json` shows every change with a reviewable diff and a merge commit. - **Disaster recovery**: if branch protection is accidentally removed or weakened via the UI, the JSON is the canonical recovery point. -- **Consistency**: pairs with `.github/codeowners-roles.yml` (the CODEOWNERS source of truth). Repository policy lives in the repository. +- **Consistency**: repository policy lives in the repository, reviewed like code. ## What this gates @@ -65,11 +64,11 @@ After branch protection is applied, every PR targeting `main` must: 1. Pass all listed status checks. 2. Be up-to-date with `main` (rebase or merge-from-main). -3. Have at least one approving review from a code owner for the touched paths. -4. Have all review conversations resolved. -5. Be squash- or rebase-merged (no merge commits). +3. Have all review conversations resolved. +4. Be squash- or rebase-merged (no merge commits). -Even repository admins are subject to these rules. +No human approval is required (`required_approving_review_count: 0`). Repository +admins can override the gates (`enforce_admins: false`). ## Subsequent hardening (not in this PR) @@ -77,7 +76,7 @@ The branch-protection policy is the foundation. Future hardening adds: - **Required signed commits** (`required_signatures: true`) — once maintainers enroll GPG/SSH signing. - **Tag protection** for `v*` tags via `repos/.../tags/protection`. -- **Required reviewers from specific teams** for high-leverage paths (e.g., `docs/dev/invariants.md`) via CODEOWNERS tier expansion + the N-unique-approvers CI workaround. +- **Required reviewers from specific teams** for high-leverage paths (e.g., `docs/dev/invariants.md`) via a GitHub ruleset's path-scoped required-review rule, if a review gate is ever reintroduced. - **More required CI checks**: `cargo deny`, `cargo audit`, `cargo fmt --check`, `cargo clippy -D warnings`, CodeQL, secret scanning, schema-lint (MR-946). See the hardening playbook for the full plan. diff --git a/docs/dev/ci.md b/docs/dev/ci.md index 2e80f40..6cc4e1f 100644 --- a/docs/dev/ci.md +++ b/docs/dev/ci.md @@ -3,7 +3,7 @@ `.github/workflows/`: - **ci.yml**: text-only changes skip; otherwise `cargo test --workspace --locked` on ubuntu-latest with protobuf compiler. OpenAPI-drift check that auto-commits the regenerated `openapi.json` for same-repository PRs. Also runs the AGENTS.md cross-link integrity check (`scripts/check-agents-md.sh`). - - **`Test Workspace` does not run on pull requests.** The job is gated `if: github.event_name != 'pull_request'`, so the full workspace + failpoints suite runs only on push to `main` (post-merge), on `v*` tags, and on manual `workflow_dispatch`. This was a deliberate PR-latency trade-off — it was the slowest gate (~15min warm, up to the 75min cold ceiling). `RustFS S3 Integration` `needs: test`, so it is push-/dispatch-only for the same reason. The fast PR gates remain: `Classify Changes`, `Check AGENTS.md Links`, `Test omnigraph-server --features aws`, and the two CODEOWNERS checks. `Test Workspace` is correspondingly **not** in the required-check list (`.github/branch-protection.json`); see [branch-protection.md](branch-protection.md). + - **`Test Workspace` does not run on pull requests.** The job is gated `if: github.event_name != 'pull_request'`, so the full workspace + failpoints suite runs only on push to `main` (post-merge), on `v*` tags, and on manual `workflow_dispatch`. This was a deliberate PR-latency trade-off — it was the slowest gate (~15min warm, up to the 75min cold ceiling). `RustFS S3 Integration` `needs: test`, so it is push-/dispatch-only for the same reason. The fast PR gates remain: `Classify Changes`, `Check AGENTS.md Links`, and `Test omnigraph-server --features aws`. `Test Workspace` is correspondingly **not** in the required-check list (`.github/branch-protection.json`); see [branch-protection.md](branch-protection.md). - **Consequences to internalize:** (1) a regression that the suite would catch now lands on `main` and turns the post-merge run red, rather than being blocked pre-merge — `main` can briefly break, so run `cargo test --workspace --locked` locally before merging anything non-trivial, or trigger this workflow on your branch via the Actions "Run workflow" button. (2) `openapi.json` is no longer auto-regenerated on PRs (that step is inside the `test` job); for server/API changes, regenerate it locally with `OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test openapi` and commit it, or the strict drift check fails the post-merge `main` run. - **Applying this policy:** removing `Test Workspace` from the JSON is inert until an admin runs `./scripts/apply-branch-protection.sh`. **Run it immediately after this change merges** — until then GitHub still requires a `Test Workspace` context that no longer reports on PRs, which leaves every open PR permanently pending (the job-never-reports trap). - **AWS feature build job**: `cargo build/test -p omnigraph-server --features aws` on ubuntu-latest. diff --git a/docs/dev/codeowners.md b/docs/dev/codeowners.md deleted file mode 100644 index 707f4f4..0000000 --- a/docs/dev/codeowners.md +++ /dev/null @@ -1,58 +0,0 @@ -# Code ownership - -`.github/CODEOWNERS` is **generated** — not hand-edited. The source of truth is `.github/codeowners-roles.yml`, expanded by `.github/scripts/render-codeowners.py`. CI rejects drift between the two and rejects direct edits to `CODEOWNERS` that don't accompany a yml change. - -This setup gives every role change a reviewable PR and a permanent in-repository audit trail (`git log .github/codeowners-roles.yml`). - -## Who owns what - -The tables below are **generated** from `.github/codeowners-roles.yml` by `.github/scripts/render-codeowners.py` (the same render that produces `.github/CODEOWNERS`). They are the always-current "who owns what at this commit" view — don't edit them by hand; edit the yml and re-render. - - - -**Path → owners** (GitHub applies *last match wins*; the `*` catch-all is listed first and is overridden by the specific patterns below it): - -| Path | Owners | Role(s) | -|---|---|---| -| `*` | @aaltshuler @ragnorc | engineering | -| `crates/**` | @aaltshuler @ragnorc | engineering | -| `docs/**` | @aaltshuler @ragnorc | docs | -| `README.md` | @aaltshuler @ragnorc | docs | -| `AGENTS.md` | @aaltshuler @ragnorc | docs | -| `CLAUDE.md` | @aaltshuler @ragnorc | docs | -| `SECURITY.md` | @aaltshuler @ragnorc | docs | - -**Roles**: - -| Role | Members | Description | -|---|---|---| -| `engineering` | @aaltshuler @ragnorc | All production code under crates/**. Engine, CLI, server, compiler. | -| `docs` | @aaltshuler @ragnorc | Documentation under docs/**, plus repo-level docs (README.md, AGENTS.md, CLAUDE.md symlink, SECURITY.md). | - - - -GitHub treats multiple owners on a CODEOWNERS line as **"any one of them satisfies the review requirement"**. To require N distinct approvers on a specific path, layer a CI check on top (not currently configured). - -## How to change role membership or path mappings - -1. Edit `.github/codeowners-roles.yml`. -2. Open a PR. **CI re-renders for you**: the `CODEOWNERS` workflow regenerates `.github/CODEOWNERS` and the ownership tables above and auto-commits them back to your PR branch on same-repository PRs — you don't have to run the script locally (though you can: `python3 .github/scripts/render-codeowners.py`, requires PyYAML). - -On a fork (where CI can't push back), the workflow instead fails with the diff so you can run the script and commit it yourself. - -CI fails the PR if: -- a fork PR left a generated artifact out of sync, or -- `CODEOWNERS` was edited without a corresponding yml change (the `CODEOWNERS not hand-edited` check). - -## How to add a new role - -1. Add a new entry to `roles:` in the yml with a `description` and `members` list. -2. Reference the role from `paths:` (or `default:`). -3. Regenerate + commit as above. - -## Why a generator, not direct CODEOWNERS edits? - -- **Audit trail**: `git log .github/codeowners-roles.yml` is the canonical record of every role change. The rendered `CODEOWNERS` is a derived artifact. -- **Roles are first-class**: paths reference roles, not raw handles. Renaming a person or rotating a role updates one place, not every path. -- **Future extension**: scheduled rotation (weekly on-call, quarterly leads) plugs into the same yml without changing the path mappings. Not enabled today. -- **Consistency with the product**: omnigraph itself enforces auditable Cedar policy. The repository's code-owner policy follows the same "policy as reviewed code" pattern. diff --git a/docs/dev/index.md b/docs/dev/index.md index a0a6afb..1fc0b77 100644 --- a/docs/dev/index.md +++ b/docs/dev/index.md @@ -28,7 +28,6 @@ constraints. User-facing behavior should still be documented through | Three-way merge implementation and conflicts | [merge.md](merge.md) | | Diff/change-feed implementation | [changes.md](../user/branching/changes.md) | | Branch protection policy | [branch-protection.md](branch-protection.md) | -| CODEOWNERS source of truth | [codeowners.md](codeowners.md) | ## Language, Runtime, And Boundaries From 7168ee0ed0fbbbb8bb38a1e41add1b6c77d7e791 Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Fri, 19 Jun 2026 00:15:06 +0200 Subject: [PATCH 10/13] fix(engine): stop branch-merge fast-forward OOM on embedding tables (#277) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(engine): stop branch-merge fast-forward OOM on embedding tables A branch→main fast-forward merge of a forked, embedding-bearing table re-derived the whole branch row-by-row: it lumped new + changed rows into one Lance `merge_insert`, i.e. a full-outer hash join over the entire delta that exhausts the DataFusion memory pool (8k rows × 3072-dim → `Resources exhausted: 188MB HashJoinInput, 100MB pool`), so the merge hung/failed instead of completing. Fix the data path on existing, substrate-supported primitives: - Adopt-with-delta split: new rows → `stage_append` (a streaming `Operation::Append`, no hash join), only genuinely-changed rows → a bounded `stage_merge_insert`, deletes inline. New `AdoptDelta` / `compute_adopt_delta` / `publish_adopted_delta` replace the combined `compute_source_delta` path; the three-way merge path is untouched. - Stream the append via `stage_append_stream` → `execute_uncommitted_stream` (the substrate-blessed bulk-append path), removing the `Vec`+`concat` full-delta materialization. Blob-aware via `scan_stream_for_rewrite`. Exposed on the sealed `TableStorage` trait. - Lazy row-signature: stop stringifying every row's embedding eagerly; compute the signature only for the `(Some,Some)` changed-candidate arm. - Index coverage is reconciler-owned: the adopt path no longer rebuilds vector/FTS indexes inline; `optimize`/`ensure_indices` folds the new rows in (reads stay correct via brute-force tail). Post-merge index-coverage contract documented in docs/user/branching/merge.md. - Recovery pin: new `CandidateTableState::AdoptWithDelta` is classified and pinned so the append's HEAD advance is sidecar-covered (invariant 5); the `BranchMerge` sidecar's loose classification covers the two-commit shape. The regression gate is structural, not a brittle size threshold: task-local write probes assert an append-only fast-forward merge does 0 `stage_merge_insert` (the OOM hash join), appends via `stage_append`, and streams (0 whole-delta materialization). Plus functional correctness, blob round-trip, index-defer, and a Phase-B failpoint recovery test. Residual: the classify-time staging round-trip is still O(N) in memory (architecturally required for the all-or-nothing multi-table publish); bounding it fully is the fragment-adopt follow-up. * test(engine): partial branch-merge Phase B must roll back (RED regression) A branch-merge per-table publish is a multi-commit sequence — adopt: append → upsert → delete; three-way: merge_insert → delete → index — each step advancing Lance HEAD before the single manifest publish. Add four failpoint sites at those windows and four regression tests (mixed delta: a fresh id, a modified base id, a removed base id) asserting that a crash mid-sequence rolls the whole merge BACK on the next open and a re-run re-applies the full delta. RED against current code: the loose `BranchMerge` classification rolls any `lance_head > manifest_pinned` forward, so the partial is published and the merge recorded — the rolled-back-to-base assertion fails with the partial state visible (e.g. bob appended, dave not deleted). The fix lands next. The failpoint sites are no-ops unless the `failpoints` feature activates them. * fix(engine): roll back partial branch-merge Phase B (recovery WAL confirmation) A branch-merge publishes each table with several Lance commits (adopt: append → upsert → delete; three-way: merge_insert → delete → index), then one manifest publish makes them atomic. Recovery classified `BranchMerge` loosely: any `lance_head > manifest_pinned` with a matching CAS pin rolled *forward* to the observed HEAD. So a crash mid-sequence published a partial delta (e.g. the append without its sibling upsert/delete) and recorded the merge as complete — silent data loss; a re-merge sees "already up to date" and never repairs it. Fix: make the recovery sidecar a two-phase WAL for `BranchMerge`. After the whole per-table publish loop completes, stamp each pin's `confirmed_version` with its exact achieved Lance version (a second sidecar write), then publish the manifest. Recovery now: - rolls FORWARD only to a pin's `confirmed_version` (set ⇒ Phase B finished); - rolls BACK (`TableClassification::IncompletePhaseB`) when the HEAD moved but no confirmation was recorded ⇒ a partial publish ⇒ all-or-nothing restore to the manifest pin, so a re-run re-applies the full delta. Scope: `BranchMerge` only. Other loose writers (`SchemaApply`, `EnsureIndices`, `Optimize`) keep the loose roll-forward — their drift is derived state (index coverage, compaction) a partial roll-forward never corrupts, so confirmation would be cost without benefit. This is the write-ahead intent record + idempotent roll-forward that the fast-forward-main commit model requires to be crash-atomic across N tables; version-recorded (not phase-count-derived), so it survives later changes to the per-table commit sequence. Regression tests (failpoints): four partial-window crashes — adopt after-append / after-upsert, three-way after-merge / after-delete — each with a mixed delta (new id, modified id, removed id) now roll the whole merge back; the existing complete-Phase-B tests still roll forward. * fix(engine): scope merge index docs to fast-forward; record append probe after write Two PR-review fixes: - docs(merge): the "a merge does not build indexes inline" note only holds for the fast-forward / adopt path (deferred to the reconciler). The three-way `Merged` path still rebuilds indexes inline in its publish, so a Merged-outcome merge of an embedding table pays the build up front. Scope the doc so a Merged-outcome user isn't surprised or led to skip a post-merge optimize. - `stage_append` recorded its instrumentation probe before the fallible `execute_uncommitted`, so a failed staging write left the call/row counters inflated — and diverged from `stage_append_stream`, which records after the transaction is built. Record after the write succeeds. * fix(engine): record stage_merge_insert / vector-index probes after write too The prior commit moved `stage_append`'s instrumentation probe to after the write, but left the two sibling write primitives with the identical ordering bug: `stage_merge_insert` recorded before `execute_uncommitted`, and `create_vector_index` before the index build. A failed write on either would inflate the probe counter. Move both to record only after the write succeeds, so all write-primitive probes share one rule (record-after-success) — closing the class rather than the single instance the review flagged. * docs(engine): mark the fragment-adopt excision boundary in the merge code Comment the transitional row-level merge code so a future fragment-adopt implementation (Lance branch-merge/rebase #7263 + UUID branch paths #7185) knows exactly what it deletes and what it keeps: - `AdoptDelta` / `compute_adopt_delta` / `publish_adopted_delta` — the row-level re-derivation; removed wholesale when a fast-forward merge becomes a fragment graft (adopt the source table version's fragments + indexes by reference). - `stage_append_stream` — its only caller is that merge append; dead with it unless re-adopted as a general bulk-append path. - `confirm_sidecar_phase_b` — the boundary marker: this SURVIVES. The recovery sidecar is the cross-table WAL a fast-forward-main commit still needs; only the within-table multi-commit reason for `IncompletePhaseB` narrows once each table is a single graft commit. Keep the sidecar; only simplify the classifier. Comments only; no behavior change. * test(engine): pre-upgrade v1 branch-merge sidecar must roll forward (RED) Phase-B confirmation made the recovery classifier strict for every BranchMerge sidecar — including ones written by a binary that predates confirmation. A pre-upgrade crash in the Phase-B→C gap can leave such a sidecar over a COMPLETED merge; the new classifier reads its absent confirmed_version as a partial and rolls it back, silently discarding the finished merge (greptile P1 / Cursor High). This regression test synthesizes that sidecar realistically: crash after Phase B (real sidecar + advanced Lance HEAD), downgrade the on-disk JSON to the pre-confirmation v1 shape (schema_version=1, strip confirmed_version), reopen. RED: the merge rolls back, `bob` is discarded (left ["alice"], want ["alice","bob"]). The versioning fix lands next. * fix(engine): version the recovery sidecar; read pre-confirmation merges as loose Phase-B confirmation changed how a BranchMerge sidecar's absent confirmed_version is interpreted (roll forward → roll back) without versioning the artifact, so the new classifier silently discarded completed pre-upgrade merges (greptile P1 / Cursor High). A capability flag would not fix the symmetric direction — keeping schema_version=1, an OLD binary reading a NEW sidecar sails through its already-shipped strict gate, ignores the unknown flag, and applies loose semantics to a new partial → the same data loss on downgrade. Use the versioning system instead. - Bump SIDECAR_SCHEMA_VERSION 1 → 2; add a fixed CONFIRMATION_SCHEMA_VERSION = 2 (the generation at which confirmation shipped — pinned, so a later v3 keeps v2 confirmation-aware). - Make the read gate version-aware (`parse_sidecar`): refuse only versions NEWER than this binary; accept and interpret older ones with their original semantics — no operator toil draining pre-upgrade sidecars. Rename `SidecarSchemaError.supported_version` → `max_supported_version` and reword. - Dispatch classification by version: the strict BranchMerge confirmation path is gated on `schema_version >= CONFIRMATION_SCHEMA_VERSION`; a v1 BranchMerge sidecar falls through to the existing loose roll-forward. Thread `sidecar.schema_version` from `process_sidecar`. This is bidirectionally safe: a new binary interprets v1 (loose) and v2 (strict) and refuses the future; an old binary's `!= 1` gate already refuses v2, so it never misreads a new sidecar. The flag was an additive-field pattern misapplied to a semantics change; versioning is the correct mechanism. Honest residual (any approach): an old *partial* sidecar still rolls forward — v1 carries no confirmation, so partialness is undetectable in it. The fix stops us from interpreting old sidecars under new rules; it can't retrofit information they never had. * fix(engine): harden recovery — mode resolver, loud divergence check, publish classified version Three correct-by-design fixes from the holistic review of the recovery path, all in recovery.rs (each closes a class, not an instance): A. Resolve the classification mode from `(kind, schema_version)` once, instead of a kind×version match accreting fall-through guards in `classify_table`. New `ClassificationMode { Strict, Loose, Confirmed }` + an exhaustive `SidecarKind::classification_mode` — adding a writer kind or version floor is now one arm in the resolver (the compiler forces it), not a guard threaded through the classifier. No behavior change; existing classify/decide tests are the guard. B. `confirm_sidecar_phase_b` now errors loudly when a pinned table has no achieved version in the publish `updates`, instead of silently skipping it (which left the pin unconfirmed → `IncompletePhaseB` → a silent rollback of a COMPLETE merge). Guards the implicit `pins ⊆ updates` invariant against a future divergence between the two filters (invariants 9/13). + a unit test. C. Recovery roll-forward publishes the version classification OBSERVED (`state.lance_head`), not a fresh HEAD re-read at publish time. For a Confirmed pin classify already validated `lance_head == confirmed_version`, so this publishes the recorded WAL intent by construction and closes the classify→publish re-derivation/TOCTOU for every writer (invariant 15). `push_table_update_at_head` → `push_table_update(target_version: Option)`: roll-forward pins the classified version; roll-back keeps `None` (publishes the restore commit it just made). In-scope behavior is preserved, so the existing roll-forward integration tests are the guard; the drift-hardening is correct-by-construction (deterministic mid-sweep drift injection isn't feasible — a sync failpoint can't do an async Lance write). --- crates/omnigraph/src/db/manifest.rs | 8 +- crates/omnigraph/src/db/manifest/recovery.rs | 493 +++++++++++---- crates/omnigraph/src/db/omnigraph/optimize.rs | 3 + .../src/db/omnigraph/schema_apply.rs | 3 + .../omnigraph/src/db/omnigraph/table_ops.rs | 6 + crates/omnigraph/src/exec/merge.rs | 559 ++++++++++++++---- crates/omnigraph/src/exec/staging.rs | 4 + crates/omnigraph/src/instrumentation.rs | 89 +++ crates/omnigraph/src/storage_layer.rs | 21 + crates/omnigraph/src/table_store.rs | 104 +++- crates/omnigraph/tests/failpoints.rs | 354 ++++++++++- crates/omnigraph/tests/merge_fast_forward.rs | 213 +++++++ crates/omnigraph/tests/recovery.rs | 12 +- docs/dev/writes.md | 16 +- docs/user/branching/merge.md | 19 + 15 files changed, 1670 insertions(+), 234 deletions(-) create mode 100644 crates/omnigraph/tests/merge_fast_forward.rs diff --git a/crates/omnigraph/src/db/manifest.rs b/crates/omnigraph/src/db/manifest.rs index ce91513..19f25a3 100644 --- a/crates/omnigraph/src/db/manifest.rs +++ b/crates/omnigraph/src/db/manifest.rs @@ -33,10 +33,10 @@ pub(crate) use namespace::open_table_head_for_write; use namespace::{branch_manifest_namespace, staged_table_namespace}; use publisher::{GraphNamespacePublisher, ManifestBatchPublisher}; pub(crate) use recovery::{ - RecoveryMode, RecoverySidecarHandle, SidecarKind, SidecarTablePin, SidecarTableRegistration, - SidecarTombstone, delete_sidecar, has_schema_apply_sidecar, heal_pending_sidecars_roll_forward, - list_sidecars, new_sidecar, recover_manifest_drift, schema_apply_serial_queue_key, - write_sidecar, + RecoveryMode, RecoverySidecar, RecoverySidecarHandle, SidecarKind, SidecarTablePin, + SidecarTableRegistration, SidecarTombstone, confirm_sidecar_phase_b, delete_sidecar, + has_schema_apply_sidecar, heal_pending_sidecars_roll_forward, list_sidecars, new_sidecar, + recover_manifest_drift, schema_apply_serial_queue_key, write_sidecar, }; pub use state::SubTableEntry; #[cfg(test)] diff --git a/crates/omnigraph/src/db/manifest/recovery.rs b/crates/omnigraph/src/db/manifest/recovery.rs index 4b0f870..d21e0fd 100644 --- a/crates/omnigraph/src/db/manifest/recovery.rs +++ b/crates/omnigraph/src/db/manifest/recovery.rs @@ -62,10 +62,26 @@ pub(crate) const RECOVERY_ACTOR: &str = "omnigraph:recovery"; /// Subdirectory under the graph root holding sidecar files. pub(crate) const RECOVERY_DIR_NAME: &str = "__recovery"; -/// Current sidecar JSON shape version. Bumping this is a breaking change: -/// older binaries will refuse to interpret newer sidecars (intentional — -/// see [`SidecarSchemaError`]). -pub(crate) const SIDECAR_SCHEMA_VERSION: u32 = 1; +/// Max sidecar JSON shape/semantics version this binary writes and understands. +/// The reader accepts every version `<= ` this and refuses only versions ABOVE +/// it (an older binary cannot guess semantics a newer writer baked in — see +/// [`SidecarSchemaError`] and [`parse_sidecar`]). Bump this whenever a change +/// alters how an existing field is *interpreted* (not merely adds an optional +/// one), and add a fixed `*_SCHEMA_VERSION` floor like the one below so older +/// generations keep their original semantics. +/// +/// v1 → v2: Phase-B confirmation. A `BranchMerge` sidecar at v2 carries +/// `confirmed_version` and is classified strictly (unconfirmed ⇒ partial ⇒ roll +/// back); at v1 it predates confirmation and keeps the loose roll-forward. The +/// reader must distinguish the two, so this is a real version bump, not an +/// additive field. +pub(crate) const SIDECAR_SCHEMA_VERSION: u32 = 2; + +/// The version at which Phase-B confirmation shipped. A `BranchMerge` sidecar is +/// confirmation-aware (strict classification) iff `schema_version >=` this. +/// FIXED at 2 — NOT derived from [`SIDECAR_SCHEMA_VERSION`] — so a future bump to +/// v3+ still treats v2 sidecars as confirmation-aware. +pub(crate) const CONFIRMATION_SCHEMA_VERSION: u32 = 2; /// Selects which recovery actions are allowed in a sweep. /// @@ -115,6 +131,54 @@ pub(crate) enum SidecarKind { Optimize, } +/// Which recovery-classification semantics a sidecar's tables use. Resolved once +/// from `(writer_kind, schema_version)` — see [`SidecarKind::classification_mode`] +/// — so [`classify_table`] dispatches on the mode instead of re-deriving it from +/// a kind×version match. Adding a writer kind or a version floor is then one arm +/// in the resolver, not a guard threaded through `classify_table`. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub(crate) enum ClassificationMode { + /// Exactly one `commit_staged` per table (`Mutation`, `Load`): require + /// `lance_head == manifest_pinned + 1` and the pin to match. + Strict, + /// N ≥ 1 commits per table whose drift is content-preserving / derived + /// state (`SchemaApply`, `EnsureIndices`, `Optimize`, and pre-confirmation + /// `BranchMerge`): any `lance_head > manifest_pinned` rolls forward. + Loose, + /// Multi-commit publish of *distinct logical rows* with a recorded + /// `confirmed_version` (`BranchMerge` at `schema_version >= + /// CONFIRMATION_SCHEMA_VERSION`): roll forward ONLY to the confirmed + /// version; an unconfirmed moved HEAD is a partial publish and rolls back. + Confirmed, +} + +impl SidecarKind { + /// Resolve the classification mode for this writer at a given sidecar + /// `schema_version`. Exhaustive over `SidecarKind`, so adding a variant is a + /// compile error here until its recovery semantics are declared. + pub(crate) fn classification_mode(self, schema_version: u32) -> ClassificationMode { + match self { + SidecarKind::Mutation | SidecarKind::Load => ClassificationMode::Strict, + // BranchMerge gained two-phase confirmation at + // `CONFIRMATION_SCHEMA_VERSION`. A sidecar written before that + // carries no `confirmed_version` and must keep the prior loose + // roll-forward — classifying it strictly would misread a *completed* + // pre-upgrade merge as a partial and roll it back. (The read gate + // already refused any version newer than this binary.) + SidecarKind::BranchMerge => { + if schema_version >= CONFIRMATION_SCHEMA_VERSION { + ClassificationMode::Confirmed + } else { + ClassificationMode::Loose + } + } + SidecarKind::SchemaApply | SidecarKind::EnsureIndices | SidecarKind::Optimize => { + ClassificationMode::Loose + } + } + } +} + /// One table's contribution to a sidecar's intended commit. The classifier /// uses these to decide per-table state at recovery time. #[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] @@ -126,8 +190,22 @@ pub(crate) struct SidecarTablePin { /// Manifest-pinned version at writer start (CAS expectation). pub expected_version: u64, /// Lance HEAD that the writer's `commit_staged` would produce - /// (typically `expected_version + 1`). + /// (typically `expected_version + 1`). For multi-commit writers this is + /// only a *lower bound* — see `confirmed_version`. pub post_commit_pin: u64, + /// Phase-B confirmation: the exact Lance HEAD this table reached once the + /// writer's *entire* multi-commit publish for it finished, recorded by a + /// second sidecar write immediately before the manifest publish (Phase C). + /// `None` means Phase B did not complete (the writer crashed mid-publish), + /// so the on-disk drift is a *partial* commit and recovery must roll the + /// whole operation BACK rather than publish an incomplete state. Only the + /// `BranchMerge` writer records this today (its per-table publish is + /// append → upsert → delete, several HEAD advances that the manifest + /// publish makes atomic); other writers leave it `None` and keep their + /// existing loose roll-forward. Backward-compatible: absent on older + /// sidecars. + #[serde(default, skip_serializing_if = "Option::is_none")] + pub confirmed_version: Option, /// Lance branch ref this table lives on (mirrors /// `SubTableEntry::table_branch`). Required for the recovery sweep /// to open the dataset at the correct ref — `Dataset::open(path)` @@ -218,25 +296,27 @@ pub(crate) struct RecoverySidecarHandle { pub(crate) sidecar_uri: String, } -/// Error returned when the sidecar's `schema_version` is unknown to this -/// binary. We refuse-and-error rather than read-and-warn: an old binary -/// cannot guess what semantics a newer writer baked into a future shape. -/// Operator action is required (typically: upgrade the binary). +/// Error returned when the sidecar's `schema_version` is NEWER than this binary +/// understands. We refuse-and-error rather than read-and-warn: an old binary +/// cannot guess what semantics a newer writer baked into a future shape. (Older +/// versions are accepted and interpreted with their original semantics — see +/// [`parse_sidecar`].) Operator action is required (typically: upgrade the +/// binary). #[derive(Debug)] pub(crate) struct SidecarSchemaError { pub sidecar_uri: String, pub found_version: u32, - pub supported_version: u32, + pub max_supported_version: u32, } impl std::fmt::Display for SidecarSchemaError { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { write!( f, - "recovery sidecar at '{}' declares schema_version={}, but this \ - binary supports only schema_version={}; refusing to interpret \ + "recovery sidecar at '{}' declares schema_version={}, newer than the \ + maximum this binary supports (schema_version={}); refusing to interpret \ — upgrade omnigraph or remove the sidecar with operator review", - self.sidecar_uri, self.found_version, self.supported_version, + self.sidecar_uri, self.found_version, self.max_supported_version, ) } } @@ -271,6 +351,14 @@ pub(crate) enum TableClassification { /// previous restore attempt or an external mutation. Roll back to /// the manifest pin. UnexpectedMultistep, + /// A confirmation-using writer (`BranchMerge`) advanced this table's HEAD + /// (`lance_head > manifest_pinned`) but the sidecar carries no + /// `confirmed_version` — its multi-commit publish crashed mid-flight, so + /// the drift is a *partial* commit (e.g. an append without its sibling + /// upsert/delete). Roll back to the manifest pin; the whole operation is + /// re-run from scratch. Distinct from `UnexpectedMultistep` so the audit + /// records a partial Phase B, not a foreign mutation. + IncompletePhaseB, /// `lance_head < manifest_pinned`. Should be impossible: the manifest /// pin can only advance after a successful Lance commit. Surface /// loudly and abort recovery. @@ -341,6 +429,58 @@ pub(crate) async fn write_sidecar( }) } +/// Phase-B confirmation: stamp each pin with the exact Lance HEAD its publish +/// reached, then re-write the sidecar in place (same object). Called once, after +/// the writer's whole multi-commit publish completed and before the manifest +/// publish (Phase C). Recovery then rolls forward ONLY to these confirmed +/// versions; a sidecar still missing them is a partial Phase B that rolls back. +/// +/// Overwriting the same object is atomic (same contract as [`write_sidecar`]): +/// a torn rewrite is never observed, so recovery reads either the pre-confirm +/// sidecar (→ roll back, safe) or the confirmed one (→ roll forward). A failure +/// here leaves the pre-confirm sidecar, so the operation rolls back — correct. +/// +/// SURVIVES the fragment-adopt work (unlike the row-level merge it currently +/// serves — see `AdoptDelta` in `exec/merge.rs`). The recovery sidecar is the +/// cross-table write-ahead log that makes a fast-forward-main commit +/// all-or-nothing across N tables, which a fragment graft still needs. What +/// narrows is the *within-table* reason for confirmation: once each table's +/// merge is a single graft commit, the multi-step partial window shrinks to one +/// commit, so the `BranchMerge` arm of `classify_table` could fold back into the +/// strict single-commit path and `IncompletePhaseB` retire. Do NOT delete this +/// with the row path — keep the sidecar; only simplify the classifier. +pub(crate) async fn confirm_sidecar_phase_b( + root_uri: &str, + storage: &dyn StorageAdapter, + sidecar: &mut RecoverySidecar, + confirmed_versions: &HashMap, +) -> Result<()> { + // Failpoint: models a storage failure on the confirmation write — the + // pre-confirm sidecar stays on disk, so recovery rolls the operation back. + crate::failpoints::maybe_fail("recovery.sidecar_confirm")?; + for pin in &mut sidecar.tables { + // Every pinned table MUST have an achieved version. A miss means the + // pin set and the publish `updates` diverged — fail loudly at the + // producer rather than leave the pin unconfirmed, which recovery would + // read as a partial Phase B and silently roll the whole (complete) merge + // back. Today the two are kept in lockstep by construction; this guards + // the invariant against a future edit to either filter. + let version = confirmed_versions.get(&pin.table_key).ok_or_else(|| { + OmniError::manifest_internal(format!( + "confirm_sidecar_phase_b: no achieved version for pinned table '{}' \ + (pins and publish updates diverged)", + pin.table_key + )) + })?; + pin.confirmed_version = Some(*version); + } + let uri = sidecar_uri(root_uri, &sidecar.operation_id); + let json = serde_json::to_string_pretty(sidecar).map_err(|err| { + OmniError::manifest_internal(format!("failed to serialize recovery sidecar: {}", err)) + })?; + storage.write_text(&uri, &json).await +} + /// Delete a sidecar after Phase C succeeded. Idempotent (safe to retry). pub(crate) async fn delete_sidecar( handle: &RecoverySidecarHandle, @@ -408,11 +548,15 @@ pub(crate) fn parse_sidecar(sidecar_uri: &str, body: &str) -> Result SIDECAR_SCHEMA_VERSION { return Err(SidecarSchemaError { sidecar_uri: sidecar_uri.to_string(), found_version: peek.schema_version, - supported_version: SIDECAR_SCHEMA_VERSION, + max_supported_version: SIDECAR_SCHEMA_VERSION, } .into()); } @@ -427,26 +571,38 @@ pub(crate) fn parse_sidecar(sidecar_uri: &str, body: &str) -> Result manifest_pinned` is ambiguous — it may be a *complete* +/// publish or a *partial* one crashed mid-sequence. The writer resolves +/// the ambiguity by recording the exact achieved version +/// (`confirmed_version`) only after the whole publish finished. So roll +/// forward ONLY to that confirmed version; a missing confirmation is a +/// partial commit (`IncompletePhaseB`) and rolls back. This is the safe +/// form of the loose match for writers where a partial would publish an +/// incomplete delta. /// - **Strict** (`Mutation`, `Load`): exactly one `commit_staged` per /// table, so `lance_head == manifest_pinned + 1` AND /// `post_commit_pin == lance_head` is required. -/// - **Loose** (`SchemaApply`, `EnsureIndices`, `BranchMerge`, -/// `Optimize`): the writer advances the Lance HEAD by N ≥ 1 commits -/// per table (one per index built + one for the overwrite, etc.; -/// merge tables run merge_insert + delete_where + index rebuilds; -/// `Optimize` runs `compact_files`, which commits reserve-fragments + -/// rewrite) and the exact N is hard to compute at sidecar-write time. -/// The loose match accepts +/// - **Loose** (`SchemaApply`, `EnsureIndices`, `Optimize`): the writer +/// advances the Lance HEAD by N ≥ 1 commits per table (one per index +/// built + one for the overwrite, etc.; `Optimize` runs `compact_files`, +/// which commits reserve-fragments + rewrite) and the exact N is hard to +/// compute at sidecar-write time. The loose match accepts /// any `lance_head > manifest_pinned` as `RolledPastExpected` when /// `pin.expected_version == manifest_pinned` (the writer's CAS -/// target matches what the manifest currently shows). The risk this -/// admits — an external agent advancing HEAD between sidecar write -/// and recovery — is out of scope for the single-coordinator model. +/// target matches what the manifest currently shows). This is safe for +/// these writers because their drift is derived state (index coverage, +/// compaction) the reconciler reproduces — a partial roll-forward loses +/// no logical rows. The risk it admits — an external agent advancing HEAD +/// between sidecar write and recovery — is out of scope for the +/// single-coordinator model. pub(crate) fn classify_table( pin: &SidecarTablePin, lance_head: u64, manifest_pinned: u64, kind: SidecarKind, + schema_version: u32, ) -> TableClassification { use TableClassification::*; if lance_head < manifest_pinned { @@ -457,27 +613,49 @@ pub(crate) fn classify_table( if lance_head == manifest_pinned { return NoMovement; } - // lance_head > manifest_pinned - let strict = matches!(kind, SidecarKind::Mutation | SidecarKind::Load); - if strict { - if lance_head == manifest_pinned + 1 { - if pin.expected_version == manifest_pinned && pin.post_commit_pin == lance_head { - RolledPastExpected - } else { - UnexpectedAtP1 + // lance_head > manifest_pinned. The "which semantics" decision is resolved + // once from (kind, schema_version); dispatch on it. + match kind.classification_mode(schema_version) { + ClassificationMode::Confirmed => { + // Two-phase confirmation: roll forward only to the exact version the + // writer recorded after its whole multi-commit publish completed. No + // confirmation ⇒ the publish crashed mid-sequence ⇒ partial ⇒ roll + // back. A confirmation that doesn't match the observed HEAD means a + // foreign writer advanced the table — don't roll a surprise forward. + match pin.confirmed_version { + Some(confirmed) + if lance_head == confirmed && pin.expected_version == manifest_pinned => + { + RolledPastExpected + } + Some(_) => UnexpectedMultistep, + None => IncompletePhaseB, } - } else { - // lance_head > manifest_pinned + 1 - UnexpectedMultistep } - } else { - // Loose match for multi-commit writers (SchemaApply, EnsureIndices). - if pin.expected_version == manifest_pinned { - RolledPastExpected - } else if lance_head == manifest_pinned + 1 { - UnexpectedAtP1 - } else { - UnexpectedMultistep + ClassificationMode::Strict => { + if lance_head == manifest_pinned + 1 { + if pin.expected_version == manifest_pinned && pin.post_commit_pin == lance_head { + RolledPastExpected + } else { + UnexpectedAtP1 + } + } else { + // lance_head > manifest_pinned + 1 + UnexpectedMultistep + } + } + ClassificationMode::Loose => { + // Multi-commit writers whose drift is content-preserving / derived + // state (and pre-confirmation BranchMerge sidecars): any + // `lance_head > manifest_pinned` rolls forward when the CAS target + // matches what the manifest currently shows. + if pin.expected_version == manifest_pinned { + RolledPastExpected + } else if lance_head == manifest_pinned + 1 { + UnexpectedAtP1 + } else { + UnexpectedMultistep + } } } } @@ -496,7 +674,7 @@ pub(crate) fn decide(classifications: &[TableClassification]) -> SidecarDecision } if classifications .iter() - .any(|c| matches!(c, NoMovement | UnexpectedAtP1 | UnexpectedMultistep)) + .any(|c| matches!(c, NoMovement | UnexpectedAtP1 | UnexpectedMultistep | IncompletePhaseB)) { return RollBack; } @@ -891,7 +1069,13 @@ async fn process_sidecar( .map(|e| e.table_version) .unwrap_or(0); states.push(ClassifiedTable { - classification: classify_table(pin, lance_head, manifest_pinned, sidecar.writer_kind), + classification: classify_table( + pin, + lance_head, + manifest_pinned, + sidecar.writer_kind, + sidecar.schema_version, + ), manifest_pinned, lance_head, }); @@ -1028,7 +1212,7 @@ async fn process_sidecar( Phase C did not land)" ); let (new_manifest_version, published_versions) = - roll_forward_all(root_uri, sidecar, snapshot).await?; + roll_forward_all(root_uri, sidecar, &states, snapshot).await?; // `to_version` records the ACTUAL Lance HEAD published for // each table (not pin.post_commit_pin, which is a lower bound // for loose-match writers like SchemaApply / EnsureIndices / @@ -1112,6 +1296,7 @@ async fn roll_back_sidecar( TableClassification::RolledPastExpected | TableClassification::UnexpectedAtP1 | TableClassification::UnexpectedMultistep + | TableClassification::IncompletePhaseB ) { restore_table_to_version( &pin.table_path, @@ -1119,14 +1304,17 @@ async fn roll_back_sidecar( state.manifest_pinned, ) .await?; - // Publish the post-restore HEAD, CAS against the current (unmoved) - // manifest pin — the same helper roll-forward uses. - push_table_update_at_head( + // Publish the post-restore HEAD (the restore commit we just made), + // CAS against the current (unmoved) manifest pin — the same helper + // roll-forward uses. `None` target: there is no prior observation to + // pin to; the version to publish is the HEAD the restore produced. + push_table_update( root_uri, &pin.table_key, &pin.table_path, pin.table_branch.as_deref(), state.manifest_pinned, + None, &mut updates, &mut expected, ) @@ -1227,6 +1415,7 @@ async fn record_audit_recovery_rollforward( async fn roll_forward_all( root_uri: &str, sidecar: &RecoverySidecar, + states: &[ClassifiedTable], snapshot: &Snapshot, ) -> Result<(u64, HashMap)> { let total_changes = @@ -1236,22 +1425,25 @@ async fn roll_forward_all( let mut published_versions: HashMap = HashMap::with_capacity(sidecar.tables.len() + sidecar.additional_registrations.len()); - for pin in &sidecar.tables { - // Publish to the table's CURRENT Lance HEAD on the pin's branch (not the - // sidecar's `post_commit_pin`, a lower bound for loose-match writers that - // run multiple commit_staged calls per table). CAS against the pin's - // pre-write `expected_version`. - let head_version = push_table_update_at_head( + for (pin, state) in sidecar.tables.iter().zip(states.iter()) { + // Publish the version classification OBSERVED (`state.lance_head`), not a + // fresh HEAD re-read. For a `Confirmed` pin classify already validated + // `lance_head == confirmed_version`, so this publishes the recorded WAL + // intent by construction; for loose/strict pins it's the multi-commit + // HEAD classify saw. Single observation, no classify→publish TOCTOU. CAS + // against the pin's pre-write `expected_version`. + let published = push_table_update( root_uri, &pin.table_key, &pin.table_path, pin.table_branch.as_deref(), pin.expected_version, + Some(state.lance_head), &mut updates, &mut expected, ) .await?; - published_versions.insert(pin.table_key.clone(), head_version); + published_versions.insert(pin.table_key.clone(), published); } // SchemaApply-only: register added tables (and renamed targets) and @@ -1351,45 +1543,61 @@ async fn roll_forward_all( /// version the table was just restored to). The HEAD is read AFTER any restore /// in the same single-threaded sweep, so no concurrent writer can have advanced /// it. -async fn push_table_update_at_head( +/// Stage a manifest `Update` for one table. +/// +/// `target_version` selects WHICH Lance version's state to publish: +/// - `Some(v)` — pin the dataset at version `v` and publish it. Roll-FORWARD +/// passes the version classification observed (and, for a `Confirmed` pin, +/// validated equals `confirmed_version`), so recovery publishes the version it +/// *decided* on rather than re-reading a HEAD a concurrent writer may have +/// advanced since classification — one observation, used for both the decision +/// and the publish (invariant 15). +/// - `None` — publish the dataset's current HEAD. Roll-BACK uses this: it just +/// created the restore commit, so HEAD *is* the version to publish. +async fn push_table_update( root_uri: &str, table_key: &str, table_path: &str, branch: Option<&str>, expected_version: u64, + target_version: Option, updates: &mut Vec, expected: &mut HashMap, ) -> Result { - let head_ds = Dataset::open(table_path) + let ds = Dataset::open(table_path) .await .map_err(|e| OmniError::Lance(e.to_string()))?; - let head_ds = match branch { - Some(b) if b != "main" => head_ds + let ds = match branch { + Some(b) if b != "main" => ds .checkout_branch(b) .await .map_err(|e| OmniError::Lance(e.to_string()))?, - _ => head_ds, + _ => ds, }; - let head_version = head_ds.version().version; - let row_count = head_ds + let ds = match target_version { + Some(v) => ds + .checkout_version(v) + .await + .map_err(|e| OmniError::Lance(e.to_string()))?, + None => ds, + }; + let published_version = ds.version().version; + let row_count = ds .count_rows(None) .await .map_err(|e| OmniError::Lance(e.to_string()))? as u64; let table_relative_path = super::table_path_for_table_key(table_key)?; - let version_metadata = super::metadata::TableVersionMetadata::from_dataset( - root_uri, - &table_relative_path, - &head_ds, - )?; + let version_metadata = + super::metadata::TableVersionMetadata::from_dataset(root_uri, &table_relative_path, &ds)?; updates.push(ManifestChange::Update(SubTableUpdate { table_key: table_key.to_string(), - table_version: head_version, + table_version: published_version, table_branch: branch.map(str::to_string), row_count, version_metadata, })); expected.insert(table_key.to_string(), expected_version); - Ok(head_version) + Ok(published_version) } /// Append the audit row describing this recovery action. @@ -1573,6 +1781,7 @@ mod tests { table_path: table_path.to_string(), expected_version: expected, post_commit_pin: post, + confirmed_version: None, table_branch: None, } } @@ -1597,30 +1806,39 @@ mod tests { } #[test] - fn parse_sidecar_refuses_unknown_schema_version() { - let body = r#"{ - "schema_version": 99, - "operation_id": "01H000000000000000000000XX", - "started_at": "0", - "branch": null, - "actor_id": null, - "writer_kind": "Mutation", - "tables": [] - }"#; - let err = parse_sidecar("file:///tmp/__recovery/x.json", body).unwrap_err(); + fn parse_sidecar_refuses_future_but_accepts_older_schema_version() { + let body = |version: u32| { + format!( + r#"{{ + "schema_version": {version}, + "operation_id": "01H000000000000000000000XX", + "started_at": "0", + "branch": null, + "actor_id": null, + "writer_kind": "BranchMerge", + "tables": [] + }}"# + ) + }; + // A version NEWER than this binary's max → refuse (can't guess the future). + let err = parse_sidecar("file:///tmp/__recovery/x.json", &body(99)).unwrap_err(); let msg = err.to_string(); assert!( - msg.contains("schema_version=99") && msg.contains("supports only schema_version=1"), - "expected SidecarSchemaError mentioning the version mismatch, got: {}", - msg, + msg.contains("schema_version=99") && msg.contains("newer than the maximum"), + "expected a future-version refusal, got: {msg}", ); + // An OLDER version (pre-confirmation v1) → accept and interpret with its + // original semantics; never refuse a version we were built to understand. + let parsed = parse_sidecar("file:///tmp/__recovery/x.json", &body(1)) + .expect("a v1 (older) sidecar must parse, not be refused"); + assert_eq!(parsed.schema_version, 1); } #[test] fn classify_no_movement_when_head_equals_pinned() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 5, 5, SidecarKind::Mutation), + classify_table(&pin, 5, 5, SidecarKind::Mutation, SIDECAR_SCHEMA_VERSION), TableClassification::NoMovement, ); } @@ -1629,7 +1847,7 @@ mod tests { fn classify_rolled_past_expected_when_sidecar_matches_strict() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 6, 5, SidecarKind::Mutation), + classify_table(&pin, 6, 5, SidecarKind::Mutation, SIDECAR_SCHEMA_VERSION), TableClassification::RolledPastExpected, ); } @@ -1639,7 +1857,7 @@ mod tests { // Same +1 drift but post_commit_pin says it should be 7, not 6. let pin = make_pin("node:Person", "irrelevant", 5, 7); assert_eq!( - classify_table(&pin, 6, 5, SidecarKind::Mutation), + classify_table(&pin, 6, 5, SidecarKind::Mutation, SIDECAR_SCHEMA_VERSION), TableClassification::UnexpectedAtP1, ); } @@ -1648,7 +1866,7 @@ mod tests { fn classify_unexpected_multistep_when_head_jumped_more_than_one_strict() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 8, 5, SidecarKind::Mutation), + classify_table(&pin, 8, 5, SidecarKind::Mutation, SIDECAR_SCHEMA_VERSION), TableClassification::UnexpectedMultistep, ); } @@ -1657,7 +1875,7 @@ mod tests { fn classify_invariant_violation_when_head_below_pinned() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 3, 5, SidecarKind::Mutation), + classify_table(&pin, 3, 5, SidecarKind::Mutation, SIDECAR_SCHEMA_VERSION), TableClassification::InvariantViolation { observed: 3 }, ); } @@ -1673,7 +1891,7 @@ mod tests { // built two indices). Strict would say UnexpectedMultistep; loose // accepts it as RolledPastExpected. assert_eq!( - classify_table(&pin, 8, 5, SidecarKind::SchemaApply), + classify_table(&pin, 8, 5, SidecarKind::SchemaApply, SIDECAR_SCHEMA_VERSION), TableClassification::RolledPastExpected, ); } @@ -1682,7 +1900,7 @@ mod tests { fn classify_loose_match_accepts_multi_commit_drift_for_ensure_indices() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 9, 5, SidecarKind::EnsureIndices), + classify_table(&pin, 9, 5, SidecarKind::EnsureIndices, SIDECAR_SCHEMA_VERSION), TableClassification::RolledPastExpected, ); } @@ -1691,7 +1909,7 @@ mod tests { fn classify_loose_match_no_movement_unchanged() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 5, 5, SidecarKind::SchemaApply), + classify_table(&pin, 5, 5, SidecarKind::SchemaApply, SIDECAR_SCHEMA_VERSION), TableClassification::NoMovement, ); } @@ -1700,31 +1918,65 @@ mod tests { fn classify_loose_match_invariant_violation_unchanged() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 3, 5, SidecarKind::SchemaApply), + classify_table(&pin, 3, 5, SidecarKind::SchemaApply, SIDECAR_SCHEMA_VERSION), TableClassification::InvariantViolation { observed: 3 }, ); } - /// BranchMerge must be loose-matched, not strict: while the strict - /// classifier expects exactly one `commit_staged` per table, - /// `publish_rewritten_merge_table` runs multiple per table - /// (merge_insert + delete_where + index rebuilds — the comment in - /// `merge.rs` explicitly says so). Strict classification would roll - /// back valid completed Phase B work as `UnexpectedMultistep`. + /// BranchMerge advances each table by several commits per table + /// (adopt: append + upsert + delete; three-way: merge_insert + delete + + /// index), so a bare "HEAD moved" is ambiguous between a complete and a + /// partial publish. At a confirmation-aware version the two-phase + /// confirmation resolves it: roll forward ONLY to the recorded + /// `confirmed_version`; an unconfirmed moved HEAD is a partial publish + /// (`IncompletePhaseB` ⇒ roll back), and a confirmed version that doesn't + /// match the observed HEAD is a foreign advance (`UnexpectedMultistep` ⇒ + /// roll back). A *pre-confirmation* (v1) sidecar carries no confirmation and + /// must keep the original loose roll-forward — reading it as strict would + /// roll a completed pre-upgrade merge back (silent discard). #[test] - fn classify_loose_match_accepts_multi_commit_drift_for_branch_merge() { - let pin = make_pin("node:Person", "irrelevant", 5, 6); + fn classify_branch_merge_requires_phase_b_confirmation() { + // Unconfirmed multi-commit drift at a confirmation-aware version → + // partial Phase B → roll back. + let unconfirmed = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 8, 5, SidecarKind::BranchMerge), + classify_table(&unconfirmed, 8, 5, SidecarKind::BranchMerge, SIDECAR_SCHEMA_VERSION), + TableClassification::IncompletePhaseB, + ); + // Backward-compat: the SAME unconfirmed pin in a PRE-confirmation (v1) + // sidecar → loose roll-forward (the regression fix — a completed + // pre-upgrade merge must not be discarded). + assert_eq!( + classify_table( + &unconfirmed, + 8, + 5, + SidecarKind::BranchMerge, + CONFIRMATION_SCHEMA_VERSION - 1, + ), TableClassification::RolledPastExpected, ); + // Confirmed to the observed HEAD → complete Phase B → roll forward. + let confirmed = SidecarTablePin { + confirmed_version: Some(8), + ..make_pin("node:Person", "irrelevant", 5, 6) + }; + assert_eq!( + classify_table(&confirmed, 8, 5, SidecarKind::BranchMerge, SIDECAR_SCHEMA_VERSION), + TableClassification::RolledPastExpected, + ); + // Confirmed, but HEAD drifted past it (foreign writer) → roll back. + assert_eq!( + classify_table(&confirmed, 9, 5, SidecarKind::BranchMerge, SIDECAR_SCHEMA_VERSION), + TableClassification::UnexpectedMultistep, + ); } #[test] fn classify_loose_match_branch_merge_no_movement_unchanged() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 5, 5, SidecarKind::BranchMerge), + classify_table(&pin, 5, 5, SidecarKind::BranchMerge, SIDECAR_SCHEMA_VERSION), TableClassification::NoMovement, ); } @@ -1733,7 +1985,7 @@ mod tests { fn classify_loose_match_branch_merge_invariant_violation_unchanged() { let pin = make_pin("node:Person", "irrelevant", 5, 6); assert_eq!( - classify_table(&pin, 3, 5, SidecarKind::BranchMerge), + classify_table(&pin, 3, 5, SidecarKind::BranchMerge, SIDECAR_SCHEMA_VERSION), TableClassification::InvariantViolation { observed: 3 }, ); } @@ -1888,6 +2140,37 @@ mod tests { assert!(after.is_empty()); } + #[tokio::test] + async fn confirm_sidecar_phase_b_errors_when_pin_missing_from_updates() { + // A pinned table with no achieved version in the publish `updates` must + // be a loud producer error, NOT a silent skip that leaves the pin + // unconfirmed (which recovery would read as a partial Phase B and roll + // the whole complete merge back). Guards the implicit `pins ⊆ updates` + // invariant against a future divergence between the two filters. + let dir = tempfile::tempdir().unwrap(); + let storage = ObjectStorageAdapter::local(); + let mut sidecar = new_sidecar( + SidecarKind::BranchMerge, + Some("main".to_string()), + None, + vec![make_pin("node:Person", "file:///tmp/x.lance", 5, 6)], + ); + // The confirmed-versions map does NOT cover the pinned table. + let confirmed: HashMap = HashMap::new(); + let err = confirm_sidecar_phase_b( + dir.path().to_str().unwrap(), + &storage, + &mut sidecar, + &confirmed, + ) + .await + .expect_err("a pinned table with no achieved version must be a loud error"); + assert!( + err.to_string().contains("pins and publish updates diverged"), + "expected a pin/updates divergence error, got: {err}", + ); + } + #[tokio::test] async fn list_sidecars_skips_non_json_files() { let dir = tempfile::tempdir().unwrap(); diff --git a/crates/omnigraph/src/db/omnigraph/optimize.rs b/crates/omnigraph/src/db/omnigraph/optimize.rs index 498f9ae..9a0a17f 100644 --- a/crates/omnigraph/src/db/omnigraph/optimize.rs +++ b/crates/omnigraph/src/db/omnigraph/optimize.rs @@ -420,6 +420,9 @@ async fn optimize_one_table( // Lower bound — compaction commits N≥1 versions (reserve + rewrite); // the classifier loose-matches SidecarKind::Optimize. post_commit_pin: expected_version + 1, + // Optimize uses the loose match (drift is derived state), not + // BranchMerge's Phase-B confirmation — left None. + confirmed_version: None, table_branch: None, }], ); diff --git a/crates/omnigraph/src/db/omnigraph/schema_apply.rs b/crates/omnigraph/src/db/omnigraph/schema_apply.rs index d013eb2..3089641 100644 --- a/crates/omnigraph/src/db/omnigraph/schema_apply.rs +++ b/crates/omnigraph/src/db/omnigraph/schema_apply.rs @@ -362,6 +362,9 @@ where table_path: db.storage().dataset_uri(&entry.table_path), expected_version: entry.table_version, post_commit_pin: entry.table_version + 1, + // SchemaApply uses the loose match, not BranchMerge's Phase-B + // confirmation — left None. + confirmed_version: None, table_branch: entry.table_branch.clone(), }) }) diff --git a/crates/omnigraph/src/db/omnigraph/table_ops.rs b/crates/omnigraph/src/db/omnigraph/table_ops.rs index c325931..ed5d082 100644 --- a/crates/omnigraph/src/db/omnigraph/table_ops.rs +++ b/crates/omnigraph/src/db/omnigraph/table_ops.rs @@ -125,6 +125,9 @@ pub(super) async fn ensure_indices_for_branch( table_path: full_path, expected_version: entry.table_version, post_commit_pin: entry.table_version + 1, + // EnsureIndices uses the loose match (index coverage is derived + // state), not BranchMerge's Phase-B confirmation — left None. + confirmed_version: None, // Use active_branch (where commits actually land), NOT // entry.table_branch (where the table currently lives). // open_owned_dataset_for_branch_write forks a feature @@ -150,6 +153,9 @@ pub(super) async fn ensure_indices_for_branch( table_path: full_path, expected_version: entry.table_version, post_commit_pin: entry.table_version + 1, + // EnsureIndices uses the loose match (index coverage is derived + // state), not BranchMerge's Phase-B confirmation — left None. + confirmed_version: None, // Use active_branch (where commits actually land), NOT // entry.table_branch (where the table currently lives). // open_owned_dataset_for_branch_write forks a feature diff --git a/crates/omnigraph/src/exec/merge.rs b/crates/omnigraph/src/exec/merge.rs index 5d0be74..600fdf1 100644 --- a/crates/omnigraph/src/exec/merge.rs +++ b/crates/omnigraph/src/exec/merge.rs @@ -5,7 +5,14 @@ const MERGE_STAGE_DIR_ENV: &str = "OMNIGRAPH_MERGE_STAGING_DIR"; #[derive(Debug)] enum CandidateTableState { + /// Adopt the source's table state via a pointer switch or a branch fork — + /// no data HEAD advance, so nothing to pin for recovery. AdoptSourceState, + /// Adopt the source's state by applying a non-empty delta onto the target's + /// lineage (append new + upsert changed + delete removed). The delta is + /// pre-computed at classification so this candidate can be recovery-pinned: + /// its publish advances Lance HEAD before the manifest commit. + AdoptWithDelta(AdoptDelta), RewriteMerged(StagedMergeResult), } @@ -22,6 +29,38 @@ struct StagedMergeResult { deleted_ids: Vec, } +/// Delta for an adopted-source merge (the fast-forward / target-owns path): +/// the new + changed rows to apply onto the target's base lineage, plus the ids +/// removed on source. Distinct from [`StagedMergeResult`] (the three-way path), +/// which also carries a `full_staged` table for validation — the adopt path +/// validates against the source snapshot directly (`candidate_dataset`), so it +/// needs no `full_staged` and never builds it. +/// +/// TRANSITIONAL — fragment-adopt excision point. This whole row-level adopt +/// (`AdoptDelta`, [`compute_adopt_delta`], [`publish_adopted_delta`], and the +/// streaming append it drives) re-derives the source branch row-by-row because +/// today's Lance offers no fragment-level branch merge. When Lance ships +/// branch-merge/rebase ([#7263]) + UUID branch paths ([#7185]), a fast-forward +/// merge becomes a *fragment graft* — adopt the source table version's +/// fragments (and their already-built indexes) by reference, no rows scanned, +/// re-appended, upserted, or deleted. At that point this struct and its two +/// functions are removed wholesale; the merge collapses to ~one ref/metadata +/// op per table. Keep them self-contained so that excision stays a clean delete. +/// +/// [#7263]: https://github.com/lance-format/lance/issues/7263 +/// [#7185]: https://github.com/lance-format/lance/issues/7185 +#[derive(Debug)] +struct AdoptDelta { + /// New-on-source rows → `stage_append` (a streaming `Operation::Append`, no + /// hash join). The connector's dominant case and the OOM fix: appending new + /// rows never buffers the whole delta in a full-outer hash join. + appends: Option, + /// Changed-on-source rows → `stage_merge_insert` (a hash join bounded to the + /// genuinely-changed set, not the whole delta). + upserts: Option, + deleted_ids: Vec, +} + #[derive(Debug, Clone)] struct CursorRow { id: String, @@ -31,24 +70,48 @@ struct CursorRow { row_index: usize, } +impl CursorRow { + /// Compute this row's signature on demand. Used by the lazy adopt cursor, + /// where `signature` is left empty; the value is identical to the eager + /// `signature` field the three-way cursor populates. + fn compute_signature(&self) -> Result { + row_signature(&self.batch, self.row_index) + } +} + struct OrderedTableCursor { stream: Option>>, dataset: Option, current_batch: Option, current_row: usize, peeked: Option, + /// When false, `next_row` leaves `CursorRow::signature` empty and callers + /// compute it on demand via `CursorRow::compute_signature`. The adopt path + /// uses this: new/deleted rows never need a signature comparison and would + /// otherwise eagerly stringify their embedding for nothing. + eager_signatures: bool, } impl OrderedTableCursor { async fn from_snapshot(snapshot: &Snapshot, table_key: &str) -> Result { + Self::open(snapshot, table_key, true).await + } + + /// Like `from_snapshot` but leaves row signatures uncomputed (callers use + /// `CursorRow::compute_signature` on demand). See `eager_signatures`. + async fn from_snapshot_lazy(snapshot: &Snapshot, table_key: &str) -> Result { + Self::open(snapshot, table_key, false).await + } + + async fn open(snapshot: &Snapshot, table_key: &str, eager_signatures: bool) -> Result { let dataset = match snapshot.entry(table_key) { Some(_) => Some(snapshot.open(table_key).await?), None => None, }; - Self::from_dataset(dataset).await + Self::from_dataset(dataset, eager_signatures).await } - async fn from_dataset(dataset: Option) -> Result { + async fn from_dataset(dataset: Option, eager_signatures: bool) -> Result { let stream = if let Some(ds) = &dataset { Some(Box::pin( crate::table_store::TableStore::scan_stream_with( @@ -71,6 +134,7 @@ impl OrderedTableCursor { current_batch: None, current_row: 0, peeked: None, + eager_signatures, }) } @@ -97,9 +161,14 @@ impl OrderedTableCursor { let dataset = self.dataset.clone().ok_or_else(|| { OmniError::manifest("cursor row missing source dataset".to_string()) })?; + let signature = if self.eager_signatures { + row_signature(batch, row_index)? + } else { + String::new() + }; return Ok(Some(CursorRow { id: row_id_at(batch, row_index)?, - signature: row_signature(batch, row_index)?, + signature, dataset, batch: batch.clone(), row_index, @@ -258,20 +327,30 @@ fn sanitize_table_key(table_key: &str) -> String { } /// Computes the delta between base and source for an adopted-source merge. -/// Returns the changed/new rows (for merge_insert) and deleted IDs (for delete). -async fn compute_source_delta( +/// Returns the new + changed rows and the ids deleted on source. +/// +/// Unchanged rows are dropped: the adopt path validates against the source +/// snapshot directly (`candidate_dataset`), so no `full_staged` table is built +/// — saving the O(rows) temp write that `compute_source_delta` used to produce +/// and then discard. +/// +/// TRANSITIONAL — removed by the fragment-adopt work (see [`AdoptDelta`]): a +/// fragment graft adopts the source's fragments by reference, so there is no +/// row-level delta to compute. +async fn compute_adopt_delta( table_key: &str, catalog: &Catalog, base_snapshot: &Snapshot, source_snapshot: &Snapshot, -) -> Result> { +) -> Result> { let schema = schema_for_table_key(catalog, table_key)?; - let mut full_writer = - StagedTableWriter::new(&format!("{}_adopt_full", table_key), schema.clone())?; - let mut delta_writer = StagedTableWriter::new(&format!("{}_adopt_delta", table_key), schema)?; + let mut append_writer = + StagedTableWriter::new(&format!("{}_adopt_append", table_key), schema.clone())?; + let mut upsert_writer = + StagedTableWriter::new(&format!("{}_adopt_upsert", table_key), schema)?; let mut deleted_ids: Vec = Vec::new(); - let mut base = OrderedTableCursor::from_snapshot(base_snapshot, table_key).await?; - let mut source = OrderedTableCursor::from_snapshot(source_snapshot, table_key).await?; + let mut base = OrderedTableCursor::from_snapshot_lazy(base_snapshot, table_key).await?; + let mut source = OrderedTableCursor::from_snapshot_lazy(source_snapshot, table_key).await?; let mut needs_update = false; @@ -297,9 +376,6 @@ async fn compute_source_delta( None }; - let base_sig = base_row.as_ref().map(|r| r.signature.as_str()); - let source_sig = source_row.as_ref().map(|r| r.signature.as_str()); - match (&base_row, &source_row) { (Some(_), None) => { // Deleted on source @@ -307,20 +383,21 @@ async fn compute_source_delta( needs_update = true; } (None, Some(src)) => { - // New on source - full_writer.push_row(src).await?; - delta_writer.push_row(src).await?; + // New on source → append (streaming, no hash join). No signature + // needed — a new id is absent from base by construction. + append_writer.push_row(src).await?; needs_update = true; } - (Some(_), Some(src)) if source_sig != base_sig => { - // Changed on source - full_writer.push_row(src).await?; - delta_writer.push_row(src).await?; - needs_update = true; - } - (Some(base), Some(_)) => { - // Unchanged — write to full (for validation), skip delta - full_writer.push_row(base).await?; + (Some(base), Some(src)) => { + // Present on both — compute signatures lazily (the only case + // that needs them) to tell a changed row from an unchanged one. + // New/deleted rows above skip the embedding stringify entirely. + if src.compute_signature()? != base.compute_signature()? { + // Changed on source → upsert. + upsert_writer.push_row(src).await?; + needs_update = true; + } + // else unchanged — already on the target's base lineage; drop. } (None, None) => unreachable!(), } @@ -330,15 +407,20 @@ async fn compute_source_delta( return Ok(None); } - let delta_staged = if delta_writer.row_count > 0 { - Some(delta_writer.finish().await?) + let appends = if append_writer.row_count > 0 { + Some(append_writer.finish().await?) + } else { + None + }; + let upserts = if upsert_writer.row_count > 0 { + Some(upsert_writer.finish().await?) } else { None }; - Ok(Some(StagedMergeResult { - full_staged: full_writer.finish().await?, - delta_staged, + Ok(Some(AdoptDelta { + appends, + upserts, deleted_ids, })) } @@ -651,10 +733,12 @@ async fn candidate_dataset( ) -> Result> { if let Some(candidate) = candidates.get(table_key) { return match candidate { - CandidateTableState::AdoptSourceState => match source_snapshot.entry(table_key) { - Some(_) => Ok(Some(source_snapshot.open(table_key).await?)), - None => Ok(None), - }, + CandidateTableState::AdoptSourceState | CandidateTableState::AdoptWithDelta(_) => { + match source_snapshot.entry(table_key) { + Some(_) => Ok(Some(source_snapshot.open(table_key).await?)), + None => Ok(None), + } + } CandidateTableState::RewriteMerged(staged) => { Ok(Some(staged.full_staged.dataset.clone())) } @@ -840,13 +924,62 @@ fn row_id_at(batch: &RecordBatch, row: usize) -> Result { Ok(ids.value(row).to_string()) } -async fn publish_adopted_source_state( +/// Classify a table whose target state equals base (the adopt / fast-forward +/// case). Returns [`CandidateTableState::AdoptWithDelta`] — with the delta +/// pre-computed so it can be recovery-pinned — when the adopt applies a +/// non-empty delta onto the target's lineage (a HEAD-advancing publish via +/// [`publish_adopted_delta`]); otherwise [`CandidateTableState::AdoptSourceState`] +/// (a pointer switch or fork, which does not advance the data HEAD). +/// +/// The HEAD-advancing subcases mirror [`publish_adopted_source_state`]: source +/// on a branch with the target either on main or owning the table. Computing the +/// delta here (rather than inside the publish) is what closes the recovery gap — +/// the classifier knows whether the publish will move Lance HEAD. +async fn classify_adopt( target_db: &Omnigraph, catalog: &Catalog, base_snapshot: &Snapshot, source_snapshot: &Snapshot, target_snapshot: &Snapshot, table_key: &str, +) -> Result { + let Some(source_entry) = source_snapshot.entry(table_key) else { + return Ok(CandidateTableState::AdoptSourceState); + }; + let target_entry = target_snapshot.entry(table_key); + let target_active = target_db.active_branch().await; + let advances_head = match ( + target_active.as_deref(), + source_entry.table_branch.as_deref(), + ) { + // Source on a branch, target on main — delta applied onto main's lineage. + (None, Some(_)) => true, + // Both on branches, target owns this table — delta applied onto it. + (Some(target_branch), Some(_)) => { + target_entry.and_then(|e| e.table_branch.as_deref()) == Some(target_branch) + } + // Source on main (pointer switch) or target doesn't own (fork): no advance. + _ => false, + }; + if !advances_head { + return Ok(CandidateTableState::AdoptSourceState); + } + match compute_adopt_delta(table_key, catalog, base_snapshot, source_snapshot).await? { + Some(delta) => Ok(CandidateTableState::AdoptWithDelta(delta)), + None => Ok(CandidateTableState::AdoptSourceState), + } +} + +/// Adopt the source's table state without applying a row delta: a pointer +/// switch (source/target share lineage) or a branch fork. The HEAD-advancing +/// delta case is classified [`CandidateTableState::AdoptWithDelta`] and +/// published by [`publish_adopted_delta`], so reaching the branch-bearing arms +/// here means the delta was empty. +async fn publish_adopted_source_state( + target_db: &Omnigraph, + source_snapshot: &Snapshot, + target_snapshot: &Snapshot, + table_key: &str, ) -> Result { let source_entry = source_snapshot .entry(table_key) @@ -875,44 +1008,31 @@ async fn publish_adopted_source_state( row_count: source_entry.row_count, version_metadata: source_entry.version_metadata.clone(), }), - // Source on branch, target on main — apply delta to preserve version metadata - (None, Some(_source_branch)) => { - let delta = - compute_source_delta(table_key, catalog, base_snapshot, source_snapshot).await?; - match delta { - Some(staged) => publish_rewritten_merge_table(target_db, table_key, &staged).await, - None => Ok(crate::db::SubTableUpdate { - table_key: table_key.to_string(), - table_version: target_entry - .map(|e| e.table_version) - .unwrap_or(source_entry.table_version), - table_branch: None, - row_count: source_entry.row_count, - version_metadata: target_entry - .map(|entry| entry.version_metadata.clone()) - .unwrap_or_else(|| source_entry.version_metadata.clone()), - }), - } - } + // Source on branch, target on main, empty delta — adopt source's + // version by a pointer switch (the non-empty case is `AdoptWithDelta`). + (None, Some(_source_branch)) => Ok(crate::db::SubTableUpdate { + table_key: table_key.to_string(), + table_version: target_entry + .map(|e| e.table_version) + .unwrap_or(source_entry.table_version), + table_branch: None, + row_count: source_entry.row_count, + version_metadata: target_entry + .map(|entry| entry.version_metadata.clone()) + .unwrap_or_else(|| source_entry.version_metadata.clone()), + }), // Both on branches (Some(target_branch), Some(source_branch)) => { if target_entry.and_then(|entry| entry.table_branch.as_deref()) == Some(target_branch) { - // Target already owns this table — apply delta onto its lineage - let delta = - compute_source_delta(table_key, catalog, base_snapshot, source_snapshot) - .await?; - match delta { - Some(staged) => { - publish_rewritten_merge_table(target_db, table_key, &staged).await - } - None => Ok(crate::db::SubTableUpdate { - table_key: table_key.to_string(), - table_version: target_entry.unwrap().table_version, - table_branch: Some(target_branch.to_string()), - row_count: source_entry.row_count, - version_metadata: target_entry.unwrap().version_metadata.clone(), - }), - } + // Target already owns this table, empty delta — pointer switch + // onto its own lineage (the non-empty case is `AdoptWithDelta`). + Ok(crate::db::SubTableUpdate { + table_key: table_key.to_string(), + table_version: target_entry.unwrap().table_version, + table_branch: Some(target_branch.to_string()), + row_count: source_entry.row_count, + version_metadata: target_entry.unwrap().version_metadata.clone(), + }) } else { // Target doesn't own this table yet — fork from source state. // This creates the target branch on the sub-table dataset. @@ -1000,6 +1120,13 @@ async fn publish_rewritten_merge_table( } } + // Failpoint: crash after the Phase 1 merge_insert commit, before the delete. + // Models a partial Phase B on the three-way path — the merged constructive + // rows are on Lance HEAD but the delete has not committed and the + // achieved-version intent has not been recorded, so recovery must roll BACK. + // See tests/failpoints.rs::branch_merge_rewrite_partial_after_merge_rolls_back. + crate::failpoints::maybe_fail("branch_merge.rewrite_after_merge_pre_delete")?; + // Phase 2: delete removed rows via deletion vectors. // // INLINE-COMMIT RESIDUAL: lance-6.0.1 does not expose a public @@ -1023,6 +1150,14 @@ async fn publish_rewritten_merge_table( current_ds = new_ds; } + // Failpoint: crash after the Phase 2 delete commit, before the index build. + // Models a partial Phase B on the three-way path — constructive rows + + // deletes are on Lance HEAD but the achieved-version intent has not been + // recorded, so recovery must roll BACK (the index is reconciler-owned derived + // state, but the merge itself never reached its commit boundary). See + // tests/failpoints.rs::branch_merge_rewrite_partial_after_delete_rolls_back. + crate::failpoints::maybe_fail("branch_merge.rewrite_after_delete_pre_index")?; + // Phase 3: rebuild indices. // // `build_indices_on_dataset` uses `stage_create_btree_index` / @@ -1054,6 +1189,160 @@ async fn publish_rewritten_merge_table( }) } +/// Scan a staged temp table and concat its non-empty batches into the single +/// batch that `stage_append` / `stage_merge_insert` consume. Returns `None` when +/// the table has no rows (both staged primitives reject an empty batch). +async fn scan_staged_combined( + target_db: &Omnigraph, + table: &StagedTable, +) -> Result> { + crate::instrumentation::record_scan_staged_combined(); + let snapshot = SnapshotHandle::new(table.dataset.clone()); + let batches: Vec = target_db + .storage() + .scan_batches_for_rewrite(&snapshot) + .await? + .into_iter() + .filter(|batch| batch.num_rows() > 0) + .collect(); + if batches.is_empty() { + return Ok(None); + } + let combined = if batches.len() == 1 { + batches.into_iter().next().unwrap() + } else { + let schema = batches[0].schema(); + arrow_select::concat::concat_batches(&schema, &batches) + .map_err(|e| OmniError::Lance(e.to_string()))? + }; + Ok(Some(combined)) +} + +/// Apply an [`AdoptDelta`] onto the target's base lineage (the fast-forward / +/// target-owns path). Kept separate from `publish_rewritten_merge_table` (the +/// three-way path) because the two paths diverge: commit 3 splits this Phase 1 +/// into append (new) + merge_insert (changed), and commit 6 makes its index +/// coverage incremental — neither of which the three-way path takes. +/// +/// `open_for_mutation(Merge)` opens the target's own table lineage (active +/// branch is the merge target after the caller's swap), so every write lands on +/// the target and survives source-branch deletion — GC-safe. +/// +/// TRANSITIONAL — removed by the fragment-adopt work (see [`AdoptDelta`]): the +/// multi-commit append → upsert → delete publish here (the source of the +/// partial-Phase-B recovery window the sidecar confirmation guards) collapses to +/// a single fragment-graft commit per table, so this whole function goes away. +async fn publish_adopted_delta( + target_db: &Omnigraph, + table_key: &str, + delta: &AdoptDelta, +) -> Result { + let (ds, full_path, table_branch) = target_db + .open_for_mutation(table_key, crate::db::MutationOpKind::Merge) + .await?; + let mut current_ds = ds; + + // Phase 1a: append the NEW rows. `stage_append_stream` is a streaming + // `Operation::Append` — no hash join — so it never buffers the delta and + // cannot exhaust the DataFusion memory pool (the OOM fix). It streams the + // staged rows straight into the target (Lance rolls fragments at + // `max_rows_per_file`), so memory is bounded regardless of how many rows the + // connector appended — never the whole set in one batch. New ids are absent + // from base by construction (the ordered walk only classifies a row + // `(None, Some)` when base lacks it), so they never collide on `id`. Routed + // through the staged primitive so a failure between writing fragments and + // committing leaves no Lance-HEAD drift. `appends` is `Some` only when the + // staged table is non-empty (`compute_adopt_delta`). + if let Some(append_table) = &delta.appends { + let source = SnapshotHandle::new(append_table.dataset.clone()); + let staged = target_db + .storage() + .stage_append_stream(¤t_ds, &source, &[]) + .await?; + current_ds = target_db + .storage() + .commit_staged(current_ds, staged) + .await?; + } + + // Failpoint: crash after the Phase 1a append commit, before the upsert. + // Models a partial Phase B — appends are on Lance HEAD but the upserts/deletes + // have not committed and the achieved-version intent has not been recorded, so + // recovery must roll BACK (not publish the appends-only state). See + // tests/failpoints.rs::branch_merge_adopt_partial_after_append_rolls_back. + crate::failpoints::maybe_fail("branch_merge.adopt_after_append_pre_upsert")?; + + // Phase 1b: upsert the CHANGED rows. The merge_insert hash join is now + // bounded to the genuinely-changed set, not the whole delta. It runs against + // the committed view that already includes the appends; the changed ids are + // disjoint from the appended ids (each id is classified into exactly one of + // new / changed / deleted / unchanged in the single ordered walk), so the + // join never collides with an appended row. + if let Some(upsert_table) = &delta.upserts { + if let Some(combined) = scan_staged_combined(target_db, upsert_table).await? { + let staged_merge = target_db + .storage() + .stage_merge_insert( + current_ds.clone(), + combined, + vec!["id".to_string()], + lance::dataset::WhenMatched::UpdateAll, + lance::dataset::WhenNotMatched::InsertAll, + ) + .await?; + current_ds = target_db + .storage() + .commit_staged(current_ds, staged_merge) + .await?; + } + } + + // Failpoint: crash after the Phase 1b upsert commit, before the delete. + // Models a partial Phase B — appends + upserts on Lance HEAD but the delete + // has not committed and the achieved-version intent has not been recorded, so + // recovery must roll BACK. See + // tests/failpoints.rs::branch_merge_adopt_partial_after_upsert_rolls_back. + crate::failpoints::maybe_fail("branch_merge.adopt_after_upsert_pre_delete")?; + + // Phase 2: delete removed rows via deletion vectors (inline-commit residual, + // same as the three-way path until Lance ships a public two-phase delete). + if !delta.deleted_ids.is_empty() { + let escaped: Vec = delta + .deleted_ids + .iter() + .map(|id| format!("'{}'", id.replace('\'', "''"))) + .collect(); + let filter = format!("id IN ({})", escaped.join(", ")); + let (new_ds, _) = target_db + .storage_inline_residual() + .delete_where(&full_path, current_ds, &filter) + .await?; + current_ds = new_ds; + } + + // Phase 4: index coverage is reconciler-owned on the adopt path. Unlike the + // three-way `RewriteMerged` path, this does NOT build indices inline: the + // appended/upserted rows are left uncovered (reads stay correct via + // brute-force — indexes are derived state, invariant 7) and + // `optimize` / `ensure_indices` folds them in. This keeps even the first + // merge into a freshly schema-applied (unindexed) table fast — no inline IVF + // retrain on the publish path — and is the row-level approximation of Layer + // 2's fragment-adopt, where the source branch's already-built indices carry + // over by reference. See docs/user/branching/merge.md. + let final_state = target_db + .storage() + .table_state(&full_path, ¤t_ds) + .await?; + + Ok(crate::db::SubTableUpdate { + table_key: table_key.to_string(), + table_version: final_state.version, + table_branch, + row_count: final_state.row_count, + version_metadata: final_state.version_metadata, + }) +} + impl Omnigraph { pub async fn branch_merge(&self, source: &str, target: &str) -> Result { self.branch_merge_as(source, target, None).await @@ -1262,7 +1551,16 @@ impl Omnigraph { continue; } if same_manifest_state(base_entry, target_entry) { - candidates.insert(table_key.clone(), CandidateTableState::AdoptSourceState); + let candidate = classify_adopt( + self, + &self.catalog(), + base_snapshot, + source_snapshot, + &target_snapshot, + table_key, + ) + .await?; + candidates.insert(table_key.clone(), candidate); continue; } @@ -1290,31 +1588,24 @@ impl Omnigraph { validate_merge_candidates(self, source_snapshot, &target_snapshot, &candidates).await?; // Recovery sidecar: protect the per-table commit_staged loop. - // Pin only `RewriteMerged` candidates because they always - // advance Lance HEAD through `publish_rewritten_merge_table` - // (which runs stage_merge_insert + delete_where + index - // rebuilds — multiple commit_staged calls per table; loose - // classification handles the multi-step drift). + // Pin `RewriteMerged` and `AdoptWithDelta` candidates — both advance + // Lance HEAD before the manifest publish (RewriteMerged via + // publish_rewritten_merge_table; AdoptWithDelta via publish_adopted_delta: + // stage_append + stage_merge_insert + delete_where + index — multiple + // commit_staged calls per table, which the loose classification handles + // as multi-step drift). // // `AdoptSourceState` candidates are NOT pinned: their publish - // path is `publish_adopted_source_state`, whose subcases mostly - // don't advance Lance HEAD (pure manifest pointer switch, or - // fork via `fork_dataset_from_entry_state` which only adds a - // Lance branch ref). If those subcases were pinned, recovery - // would classify them as NoMovement and the all-or-nothing - // decision would force a rollback that destroys legitimately- - // committed work on sibling RewriteMerged tables. + // (`publish_adopted_source_state`) is a pure pointer switch or a fork + // (`fork_dataset_from_entry_state` only adds a Lance branch ref), neither + // of which advances the data HEAD. Pinning them would classify as + // NoMovement and force an all-or-nothing rollback that destroys sibling + // tables' committed work. // - // Residual: two `AdoptSourceState` subcases (when source has a - // table_branch AND the source delta is non-empty) internally - // call `publish_rewritten_merge_table` and DO advance HEAD. - // Those are not covered by this sidecar — if they fail mid- - // commit, the residual persists until the next ReadWrite open - // detects it via a subsequent ExpectedVersionMismatch from a - // later writer that touches the same table. Closing this gap - // requires pre-computing source deltas during candidate - // classification (a structural change to `CandidateTableState`) - // and is left as follow-up work. + // The former gap — adopt subcases that applied a non-empty delta advanced + // HEAD unpinned — is closed: `classify_adopt` pre-computes the delta, so a + // HEAD-advancing adopt is `AdoptWithDelta` (pinned here) and an empty-delta + // adopt stays `AdoptSourceState`. // Acquire per-(table_key, target_branch) queues for every table // touched by the merge plan. Sorted-order acquisition prevents // lock-order inversion against concurrent multi-table writers. @@ -1334,6 +1625,7 @@ impl Omnigraph { candidates.get(*table_key), Some(CandidateTableState::RewriteMerged(_)) | Some(CandidateTableState::AdoptSourceState) + | Some(CandidateTableState::AdoptWithDelta(_)) ) }) .map(|table_key| (table_key.clone(), active_branch_for_keys.clone())) @@ -1347,7 +1639,9 @@ impl Omnigraph { }; if !matches!( candidate, - CandidateTableState::RewriteMerged(_) | CandidateTableState::AdoptSourceState + CandidateTableState::RewriteMerged(_) + | CandidateTableState::AdoptSourceState + | CandidateTableState::AdoptWithDelta(_) ) { continue; } @@ -1368,7 +1662,10 @@ impl Omnigraph { .iter() .filter_map(|table_key| { let candidate = candidates.get(table_key)?; - if !matches!(candidate, CandidateTableState::RewriteMerged(_)) { + if !matches!( + candidate, + CandidateTableState::RewriteMerged(_) | CandidateTableState::AdoptWithDelta(_) + ) { return None; } let entry = target_snapshot.entry(table_key)?; @@ -1377,6 +1674,11 @@ impl Omnigraph { table_path: self.storage().dataset_uri(&entry.table_path), expected_version: entry.table_version, post_commit_pin: entry.table_version + 1, + // Stamped after the whole per-table publish completes + // (Phase-B confirmation, just before the manifest publish). + // Until then `None` marks an unfinished publish that + // recovery must roll back, not roll forward. + confirmed_version: None, // Use the merge target branch (where commits actually // land), NOT entry.table_branch (where the table // currently lives). publish_rewritten_merge_table calls @@ -1393,7 +1695,14 @@ impl Omnigraph { }) }) .collect(); - let recovery_handle = if recovery_pins.is_empty() { + // Keep the sidecar alongside its handle: after the per-table publish + // loop completes (Phase B), we re-write it with each table's confirmed + // version before the manifest publish, so recovery can tell a finished + // publish (roll forward) from a partial one (roll back). + let mut recovery: Option<( + crate::db::manifest::RecoverySidecar, + crate::db::manifest::RecoverySidecarHandle, + )> = if recovery_pins.is_empty() { None } else { // Use the merge target branch directly, NOT a heuristic @@ -1418,14 +1727,13 @@ impl Omnigraph { // this, future merges between the same pair lose // already-up-to-date detection and merge-base correctness. sidecar.merge_source_commit_id = Some(source_head_commit_id.to_string()); - Some( - crate::db::manifest::write_sidecar( - self.root_uri(), - self.storage_adapter(), - &sidecar, - ) - .await?, + let handle = crate::db::manifest::write_sidecar( + self.root_uri(), + self.storage_adapter(), + &sidecar, ) + .await?; + Some((sidecar, handle)) }; let mut updates = Vec::new(); @@ -1436,15 +1744,11 @@ impl Omnigraph { }; let update = match candidate_state { CandidateTableState::AdoptSourceState => { - publish_adopted_source_state( - self, - &self.catalog(), - base_snapshot, - source_snapshot, - &target_snapshot, - table_key, - ) - .await? + publish_adopted_source_state(self, source_snapshot, &target_snapshot, table_key) + .await? + } + CandidateTableState::AdoptWithDelta(delta) => { + publish_adopted_delta(self, table_key, delta).await? } CandidateTableState::RewriteMerged(staged) => { publish_rewritten_merge_table(self, table_key, staged).await? @@ -1456,10 +1760,33 @@ impl Omnigraph { updates.push(update); } + // Phase-B confirmation: every table's publish finished, so stamp the + // sidecar with each table's exact achieved version before the manifest + // publish. This is the commit point of the recovery WAL: a crash from + // here on rolls FORWARD to these versions, while a crash anywhere in the + // publish loop above left the sidecar unconfirmed and rolls BACK. The + // `updates` carry the real per-table final versions (multiple + // commit_staged calls per table, so not derivable from `post_commit_pin` + // alone). A failure here leaves the unconfirmed sidecar → roll back. + if let Some((sidecar, _)) = recovery.as_mut() { + let confirmed_versions: std::collections::HashMap = updates + .iter() + .map(|u| (u.table_key.clone(), u.table_version)) + .collect(); + crate::db::manifest::confirm_sidecar_phase_b( + self.root_uri(), + self.storage_adapter(), + sidecar, + &confirmed_versions, + ) + .await?; + } + // Failpoint: pin the per-writer Phase B → Phase C residual for // branch_merge. Lance HEAD has advanced on every touched table - // (publish_*) but the manifest publish below hasn't run. Used - // by `tests/failpoints.rs::branch_merge_phase_b_failure_recovered_on_next_open`. + // (publish_*) AND the sidecar is confirmed, but the manifest publish + // below hasn't run — so recovery rolls FORWARD. Used by + // `tests/failpoints.rs::branch_merge_phase_b_failure_recovered_on_next_open`. crate::failpoints::maybe_fail("branch_merge.post_phase_b_pre_manifest_commit")?; let manifest_version = if updates.is_empty() { @@ -1471,7 +1798,7 @@ impl Omnigraph { // Recovery sidecar lifecycle: delete after manifest publish. // Best-effort cleanup; the merge already landed durably so // failing the user here is undesirable. - if let Some(handle) = recovery_handle { + if let Some((_, handle)) = recovery { if let Err(err) = crate::db::manifest::delete_sidecar(&handle, self.storage_adapter()).await { diff --git a/crates/omnigraph/src/exec/staging.rs b/crates/omnigraph/src/exec/staging.rs index 464ec34..31d5ce8 100644 --- a/crates/omnigraph/src/exec/staging.rs +++ b/crates/omnigraph/src/exec/staging.rs @@ -712,6 +712,9 @@ impl StagedMutation { table_path: entry.path.full_path.clone(), expected_version: entry.expected_version, post_commit_pin: entry.expected_version + 1, + // Mutation/Load use strict single-commit classification, not + // BranchMerge's Phase-B confirmation — left None. + confirmed_version: None, table_branch: entry.path.table_branch.clone(), }); } @@ -738,6 +741,7 @@ impl StagedMutation { // can advance HEAD by more than one version (e.g., // when Lance internally compacts deletion vectors). post_commit_pin: update.table_version, + confirmed_version: None, table_branch: path.table_branch.clone(), }); } diff --git a/crates/omnigraph/src/instrumentation.rs b/crates/omnigraph/src/instrumentation.rs index 98249c0..de5b7d3 100644 --- a/crates/omnigraph/src/instrumentation.rs +++ b/crates/omnigraph/src/instrumentation.rs @@ -80,6 +80,95 @@ pub(crate) fn record_probe() { let _ = current(|p| p.probe_count.fetch_add(1, Ordering::Relaxed)); } +/// Per-operation staged-write counts, installed for a task via +/// [`with_merge_write_probes`]. Lets a cost-budget test assert WHICH staged-write +/// primitive an operation invokes — e.g. that an append-only fast-forward merge +/// routes new rows through `stage_append` and does **zero** `stage_merge_insert` +/// (the full-outer hash join). Counts the publish-path primitives only; +/// merge-staging temp tables use `append_or_create_batch`, not these. +#[derive(Clone, Default)] +pub struct MergeWriteProbes { + pub stage_append_calls: Arc, + pub stage_append_rows: Arc, + pub stage_merge_insert_calls: Arc, + pub stage_merge_insert_rows: Arc, + /// Inline vector-index (IVF) builds. The fast-forward adopt path defers + /// index coverage to the reconciler, so an adopt merge must do 0 of these. + pub create_vector_index_calls: Arc, + /// Times the merge materialized a staged delta into one in-memory batch + /// (`scan_staged_combined`). The append path streams instead, so an + /// append-only fast-forward merge must do 0 of these. + pub scan_staged_combined_calls: Arc, +} + +impl MergeWriteProbes { + pub fn stage_append_calls(&self) -> u64 { + self.stage_append_calls.load(Ordering::Relaxed) + } + pub fn stage_append_rows(&self) -> u64 { + self.stage_append_rows.load(Ordering::Relaxed) + } + pub fn stage_merge_insert_calls(&self) -> u64 { + self.stage_merge_insert_calls.load(Ordering::Relaxed) + } + pub fn stage_merge_insert_rows(&self) -> u64 { + self.stage_merge_insert_rows.load(Ordering::Relaxed) + } + pub fn create_vector_index_calls(&self) -> u64 { + self.create_vector_index_calls.load(Ordering::Relaxed) + } + pub fn scan_staged_combined_calls(&self) -> u64 { + self.scan_staged_combined_calls.load(Ordering::Relaxed) + } +} + +tokio::task_local! { + static MERGE_WRITE_PROBES: MergeWriteProbes; +} + +/// Run `fut` with staged-write probes installed. Test-only entry point; nothing +/// in production sets the probes, so `record_stage_*` below are no-ops. +pub async fn with_merge_write_probes(probes: MergeWriteProbes, fut: F) -> F::Output +where + F: std::future::Future, +{ + MERGE_WRITE_PROBES.scope(probes, fut).await +} + +/// Record one `stage_append` of `rows` rows against the active probes. No-op in +/// production (no probes installed). +pub(crate) fn record_stage_append(rows: u64) { + let _ = MERGE_WRITE_PROBES.try_with(|p| { + p.stage_append_calls.fetch_add(1, Ordering::Relaxed); + p.stage_append_rows.fetch_add(rows, Ordering::Relaxed); + }); +} + +/// Record one `stage_merge_insert` of `rows` rows against the active probes. +/// No-op in production (no probes installed). +pub(crate) fn record_stage_merge_insert(rows: u64) { + let _ = MERGE_WRITE_PROBES.try_with(|p| { + p.stage_merge_insert_calls.fetch_add(1, Ordering::Relaxed); + p.stage_merge_insert_rows.fetch_add(rows, Ordering::Relaxed); + }); +} + +/// Record one inline vector-index build against the active probes. No-op in +/// production (no probes installed). +pub(crate) fn record_create_vector_index() { + let _ = MERGE_WRITE_PROBES.try_with(|p| { + p.create_vector_index_calls.fetch_add(1, Ordering::Relaxed); + }); +} + +/// Record one `scan_staged_combined` materialization against the active probes. +/// No-op in production (no probes installed). +pub(crate) fn record_scan_staged_combined() { + let _ = MERGE_WRITE_PROBES.try_with(|p| { + p.scan_staged_combined_calls.fetch_add(1, Ordering::Relaxed); + }); +} + /// Open a Lance dataset at `uri`, attaching `wrapper` (for IO counting) when /// present. With no wrapper this is exactly `Dataset::open(uri)`. The wrapper is /// set via `ObjectStoreParams` on the builder so the open itself is counted diff --git a/crates/omnigraph/src/storage_layer.rs b/crates/omnigraph/src/storage_layer.rs index 7c7685d..3ea9647 100644 --- a/crates/omnigraph/src/storage_layer.rs +++ b/crates/omnigraph/src/storage_layer.rs @@ -353,6 +353,15 @@ pub trait TableStorage: sealed::Sealed + Send + Sync + Debug { prior_stages: &[StagedHandle], ) -> Result; + /// Append `source`'s rows into `snapshot`'s table, streaming so the whole + /// row set is never materialized in memory (see `TableStore::stage_append_stream`). + async fn stage_append_stream( + &self, + snapshot: &SnapshotHandle, + source: &SnapshotHandle, + prior_stages: &[StagedHandle], + ) -> Result; + async fn stage_merge_insert( &self, snapshot: SnapshotHandle, @@ -684,6 +693,18 @@ impl TableStorage for TableStore { .map(StagedHandle::new) } + async fn stage_append_stream( + &self, + snapshot: &SnapshotHandle, + source: &SnapshotHandle, + prior_stages: &[StagedHandle], + ) -> Result { + let staged_writes = staged_handles_as_writes(prior_stages); + TableStore::stage_append_stream(self, snapshot.dataset(), source.dataset(), &staged_writes) + .await + .map(StagedHandle::new) + } + async fn stage_merge_insert( &self, snapshot: SnapshotHandle, diff --git a/crates/omnigraph/src/table_store.rs b/crates/omnigraph/src/table_store.rs index 5c99b01..0325e1e 100644 --- a/crates/omnigraph/src/table_store.rs +++ b/crates/omnigraph/src/table_store.rs @@ -2,6 +2,7 @@ use arrow_array::{ Array, ArrayRef, RecordBatch, StringArray, StructArray, UInt8Array, UInt32Array, UInt64Array, }; use arrow_schema::SchemaRef; +use datafusion::physical_plan::SendableRecordBatchStream; use futures::TryStreamExt; use lance::Dataset; use lance::blob::BlobArrayBuilder; @@ -362,6 +363,29 @@ impl TableStore { Ok(materialized) } + /// Streaming, blob-aware sibling of [`Self::scan_batches_for_rewrite`]. + /// Yields the dataset's rows lazily as a `SendableRecordBatchStream` so a + /// downstream writer (`stage_append_stream`) never materializes the whole + /// table in memory. Blob columns still need per-row rebuild, so those tables + /// fall back to the materialized path and are re-streamed from the `Vec` + /// (rare — only tables with a `Blob` property; bounded-memory blob streaming + /// is a follow-up). The non-blob path is a true lazy scan. + pub async fn scan_stream_for_rewrite(&self, ds: &Dataset) -> Result { + let has_blob_columns = ds.schema().fields_pre_order().any(|field| field.is_blob()); + if has_blob_columns { + let arrow_schema: SchemaRef = Arc::new(ds.schema().into()); + let batches = self.scan_batches_for_rewrite(ds).await?; + let reader = arrow_array::RecordBatchIterator::new( + batches.into_iter().map(Ok), + arrow_schema, + ); + return Ok(lance_datafusion::utils::reader_to_stream(Box::new(reader))); + } + // Non-blob: a true lazy scan. `DatasetRecordBatchStream` converts to the + // `SendableRecordBatchStream` that `execute_uncommitted_stream` consumes. + Ok(Self::scan_stream(ds, None, None, None, false).await?.into()) + } + pub(crate) async fn materialize_blob_batch( ds: &Dataset, batch: RecordBatch, @@ -919,6 +943,7 @@ impl TableStore { "stage_append called with empty batch".to_string(), )); } + let appended_rows = batch.num_rows() as u64; let params = WriteParams { mode: WriteMode::Append, allow_external_blob_outside_bases: true, @@ -931,6 +956,9 @@ impl TableStore { .execute_uncommitted(vec![batch]) .await .map_err(|e| OmniError::Lance(e.to_string()))?; + // Record only after the staging write succeeds, so a failed write does + // not inflate the probe (matches `stage_append_stream`'s ordering). + crate::instrumentation::record_stage_append(appended_rows); let mut new_fragments = match &transaction.operation { Operation::Append { fragments } => fragments.clone(), Operation::Overwrite { fragments, .. } => fragments.clone(), @@ -971,6 +999,71 @@ impl TableStore { }) } + /// Streaming variant of [`Self::stage_append`]: appends the rows of `source` + /// into `ds` without materializing them in memory. It scans `source` lazily + /// (`scan_stream_for_rewrite`) and hands the stream to Lance's + /// `execute_uncommitted_stream`, which rolls fragments at `max_rows_per_file` + /// — bounded memory, one Append transaction. This is the substrate-blessed + /// bulk-append path (the same one LanceDB's `Table::add` uses). Identical + /// fragment-id / stable-row-id staging as `stage_append`. + /// + /// TRANSITIONAL caller — its only caller is the row-level merge append + /// (`publish_adopted_delta`, see `AdoptDelta`), which the fragment-adopt work + /// (Lance #7263/#7185) removes: a fragment graft re-appends no rows. This + /// primitive and `scan_stream_for_rewrite` are then dead unless re-adopted as + /// a general bulk-append path (the `Table::add` shape makes that plausible). + pub async fn stage_append_stream( + &self, + ds: &Dataset, + source: &Dataset, + prior_stages: &[StagedWrite], + ) -> Result { + let stream = self.scan_stream_for_rewrite(source).await?; + let params = WriteParams { + mode: WriteMode::Append, + allow_external_blob_outside_bases: true, + auto_cleanup: None, + skip_auto_cleanup: true, + ..Default::default() + }; + let transaction = InsertBuilder::new(Arc::new(ds.clone())) + .with_params(¶ms) + .execute_uncommitted_stream(stream) + .await + .map_err(|e| OmniError::Lance(e.to_string()))?; + let mut new_fragments = match &transaction.operation { + Operation::Append { fragments } => fragments.clone(), + Operation::Overwrite { fragments, .. } => fragments.clone(), + other => { + return Err(OmniError::manifest_internal(format!( + "stage_append_stream: unexpected Lance operation {:?}", + std::mem::discriminant(other) + ))); + } + }; + let appended_rows: u64 = new_fragments + .iter() + .filter_map(|f| f.physical_rows) + .map(|r| r as u64) + .sum(); + crate::instrumentation::record_stage_append(appended_rows); + // Same commit-time fragment-id / row-id renumbering as `stage_append`. + let next_id_base = ds.manifest.max_fragment_id.unwrap_or(0) as u64 + + 1 + + prior_stages_fragment_count(prior_stages); + assign_fragment_ids(&mut new_fragments, next_id_base); + if ds.manifest.uses_stable_row_ids() { + let prior_rows = prior_stages_row_count(prior_stages)?; + let start_row_id = ds.manifest.next_row_id + prior_rows; + assign_row_id_meta(&mut new_fragments, start_row_id)?; + } + Ok(StagedWrite { + transaction, + new_fragments, + removed_fragment_ids: Vec::new(), + }) + } + /// Stage a merge_insert (upsert): write fragment files describing the /// merge result, return the uncommitted transaction plus the new /// fragments. The transaction's `Operation::Update` carries the @@ -1012,6 +1105,7 @@ impl TableStore { "stage_merge_insert called with empty batch".to_string(), )); } + let merged_rows = batch.num_rows() as u64; // Precondition for the FirstSeen workaround below: every call path that // reaches stage_merge_insert (load, MutationStaging::finalize, @@ -1052,6 +1146,9 @@ impl TableStore { .execute_uncommitted(stream) .await .map_err(|e| OmniError::Lance(e.to_string()))?; + // Record only after the staging write succeeds, so a failed write does + // not inflate the probe (matches `stage_append`/`stage_append_stream`). + crate::instrumentation::record_stage_merge_insert(merged_rows); // Operation::Update { removed_fragment_ids, updated_fragments, new_fragments, .. } — // `new_fragments` are the freshly inserted rows; `updated_fragments` // are rewrites of existing fragments that include both retained and @@ -1541,8 +1638,11 @@ impl TableStore { ds.create_index_builder(&[column], IndexType::Vector, ¶ms) .replace(true) .await - .map(|_| ()) - .map_err(|e| OmniError::Lance(e.to_string())) + .map_err(|e| OmniError::Lance(e.to_string()))?; + // Record only after the index build succeeds, so a failed build does not + // inflate the probe (matches the `stage_*` probes). + crate::instrumentation::record_create_vector_index(); + Ok(()) } pub async fn create_empty_dataset(dataset_uri: &str, schema: &SchemaRef) -> Result { diff --git a/crates/omnigraph/tests/failpoints.rs b/crates/omnigraph/tests/failpoints.rs index 38a60ae..9d65bc1 100644 --- a/crates/omnigraph/tests/failpoints.rs +++ b/crates/omnigraph/tests/failpoints.rs @@ -8,12 +8,15 @@ use omnigraph::db::Omnigraph; use omnigraph::error::{ManifestErrorKind, OmniError}; use omnigraph::failpoints::ScopedFailPoint; use omnigraph::loader::LoadMode; +use serial_test::serial; use helpers::recovery::{ FollowUpMutation, RecoveryExpectation, TableExpectation, assert_post_recovery_invariants, branch_head_commit_id, single_sidecar_operation_id, }; -use helpers::{MUTATION_QUERIES, mixed_params, mutate_main, version_main}; +use helpers::{ + MUTATION_QUERIES, collect_column_strings, mixed_params, mutate_main, read_table, version_main, +}; const SCHEMA_V1: &str = "node Person { name: String @key }\n"; const SCHEMA_V2_ADDED_TYPE: &str = @@ -3176,6 +3179,7 @@ async fn optimize_phase_b_failure_recovered_on_next_open() { } #[tokio::test] +#[serial(branch_merge_phase_b)] async fn branch_merge_phase_b_failure_recovered_on_next_open() { use omnigraph::loader::{LoadMode, load_jsonl}; @@ -3337,6 +3341,352 @@ async fn branch_merge_phase_b_failure_recovered_on_next_open() { drop(db); } +/// AdoptWithDelta recovery (the gap closure): a fast-forward merge — main has +/// NOT advanced since the branch forked, so the touched table is classified +/// `AdoptWithDelta`, not `RewriteMerged` — that fails after Phase B must still +/// recover on the next open. Before the recovery-pin closure this drifted +/// silently: the adopt path advanced Lance HEAD but was unpinned, so the sweep +/// found no sidecar and the merge was lost. +#[tokio::test] +#[serial(branch_merge_phase_b)] +async fn branch_merge_adopt_with_delta_phase_b_failure_recovered_on_next_open() { + use omnigraph::loader::{LoadMode, load_jsonl}; + + let _scenario = FailScenario::setup(); + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap().to_string(); + + // Seed main, branch off, mutate ONLY the branch. main stays at base, so the + // merge is a fast-forward and Person classifies `AdoptWithDelta` (forked + // source, target == base, non-empty delta) — NOT `RewriteMerged`. + { + let mut db = Omnigraph::init(&uri, helpers::TEST_SCHEMA).await.unwrap(); + load_jsonl( + &mut db, + r#"{"type":"Person","data":{"name":"alice","age":30}} +"#, + LoadMode::Append, + ) + .await + .unwrap(); + db.branch_create("feature").await.unwrap(); + db.mutate( + "feature", + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "Bob")], &[("$age", 40)]), + ) + .await + .unwrap(); + // main intentionally NOT mutated → fast-forward → AdoptWithDelta. + } + + let pre_failure_version = { + let db = Omnigraph::open(&uri).await.unwrap(); + version_main(&db).await.unwrap() + }; + + // Fail after the per-table publish loop, before commit_manifest_updates. + { + let db = Omnigraph::open(&uri).await.unwrap(); + let _failpoint = + ScopedFailPoint::new("branch_merge.post_phase_b_pre_manifest_commit", "return"); + let err = db.branch_merge("feature", "main").await.unwrap_err(); + assert!( + err.to_string().contains( + "injected failpoint triggered: branch_merge.post_phase_b_pre_manifest_commit" + ), + "unexpected error: {err}" + ); + + // The gap closure: an AdoptWithDelta merge must persist a sidecar. + let recovery_dir = dir.path().join("__recovery"); + let sidecars: Vec<_> = std::fs::read_dir(&recovery_dir) + .unwrap() + .filter_map(|e| e.ok()) + .collect(); + assert_eq!( + sidecars.len(), + 1, + "AdoptWithDelta merge must persist exactly one recovery sidecar (the closed gap)" + ); + } + + // Reopen → the recovery sweep rolls the AdoptWithDelta merge forward. + let db = Omnigraph::open(&uri).await.unwrap(); + let recovery_dir = dir.path().join("__recovery"); + if recovery_dir.exists() { + let remaining: Vec<_> = std::fs::read_dir(&recovery_dir) + .unwrap() + .filter_map(|e| e.ok()) + .collect(); + assert!( + remaining.is_empty(), + "sidecar must be deleted post-recovery; remaining: {remaining:?}" + ); + } + + let post_recovery_version = version_main(&db).await.unwrap(); + assert!( + post_recovery_version > pre_failure_version, + "manifest must advance post-recovery; pre={pre_failure_version} post={post_recovery_version}" + ); + let names = collect_column_strings(&read_table(&db, "node:Person").await, "name"); + assert!( + names.contains(&"Bob".to_string()), + "recovered AdoptWithDelta merge must include Bob; have {names:?}" + ); + drop(db); +} + +/// Which branch-merge publish path a partial-Phase-B test exercises. +enum MergeScenario { + /// main stays at base → the touched table is `AdoptWithDelta` + /// (`publish_adopted_delta`: append → upsert → delete). + Adopt, + /// main advances past base → the touched table is `RewriteMerged` + /// (`publish_rewritten_merge_table`: merge_insert → delete → index). + Rewrite, +} + +async fn sorted_person_names(db: &Omnigraph) -> Vec { + let mut names = collect_column_strings(&read_table(db, "node:Person").await, "name"); + names.sort(); + names +} + +/// THE recovery-atomicity regression gate. A branch merge whose per-table publish +/// is a multi-commit sequence (append → upsert → delete, or merge_insert → delete +/// → index) advances Lance HEAD step by step before the manifest publish. If the +/// process dies *mid*-sequence — after some commits but before the achieved-version +/// intent is recorded — recovery must roll the whole merge **back**, not publish +/// the partial and record the merge as complete. +/// +/// The delta is deliberately MIXED — a fresh id (`bob`, append), a modified base id +/// (`carol`, upsert) and a removed base id (`dave`, delete) — so every partial +/// window leaves real work undone. Proof of rollback: after recovery the target is +/// back at its base name-set, and a *re-run* of the merge re-applies the full delta +/// (the partial was not silently recorded as "already merged"). +/// +/// RED before the fix: the loose `BranchMerge` classification rolls any +/// `lance_head > manifest_pinned` forward, so the partial is published (e.g. `bob` +/// present, `dave` kept) and the merge recorded — the first assert (back at base) +/// fails. GREEN after: `achieved_version == None` → `IncompletePhaseB` → roll back. +async fn assert_partial_merge_rolls_back(scenario: MergeScenario, failpoint: &str) { + use omnigraph::loader::load_jsonl; + + let _scenario = FailScenario::setup(); + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap().to_string(); + + // Seed main {alice, carol, dave}; on `feature` add bob (append), bump carol + // (upsert), remove dave (delete). For Rewrite, also move main past base so the + // table classifies RewriteMerged instead of a fast-forward AdoptWithDelta. + { + let mut db = Omnigraph::init(&uri, helpers::TEST_SCHEMA).await.unwrap(); + load_jsonl( + &mut db, + "{\"type\":\"Person\",\"data\":{\"name\":\"alice\",\"age\":30}}\n\ + {\"type\":\"Person\",\"data\":{\"name\":\"carol\",\"age\":50}}\n\ + {\"type\":\"Person\",\"data\":{\"name\":\"dave\",\"age\":60}}\n", + LoadMode::Append, + ) + .await + .unwrap(); + db.branch_create("feature").await.unwrap(); + db.mutate( + "feature", + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "bob")], &[("$age", 40)]), + ) + .await + .unwrap(); + db.mutate( + "feature", + MUTATION_QUERIES, + "set_age", + &mixed_params(&[("$name", "carol")], &[("$age", 55)]), + ) + .await + .unwrap(); + db.mutate( + "feature", + MUTATION_QUERIES, + "remove_person", + &mixed_params(&[("$name", "dave")], &[]), + ) + .await + .unwrap(); + if matches!(scenario, MergeScenario::Rewrite) { + db.mutate( + "main", + MUTATION_QUERIES, + "set_age", + &mixed_params(&[("$name", "alice")], &[("$age", 35)]), + ) + .await + .unwrap(); + } + } + + // Crash mid-Phase-B at the injected window. + { + let db = Omnigraph::open(&uri).await.unwrap(); + let _fp = ScopedFailPoint::new(failpoint, "return"); + let err = db.branch_merge("feature", "main").await.unwrap_err(); + assert!( + err.to_string().contains(failpoint), + "expected the injected failpoint {failpoint}, got: {err}" + ); + } + + // Reopen → the open-time sweep must ROLL BACK to base (the merge never reached + // its commit boundary), and a re-run must then apply the FULL delta. + { + let db = Omnigraph::open(&uri).await.unwrap(); + assert_eq!( + sorted_person_names(&db).await, + vec!["alice", "carol", "dave"], + "partial Phase B at {failpoint} must roll back to base \ + (no bob, dave kept, carol's upsert reverted); the merge must NOT be recorded", + ); + db.branch_merge("feature", "main").await.unwrap(); + assert_eq!( + sorted_person_names(&db).await, + vec!["alice", "bob", "carol"], + "re-merge after rollback must re-apply the full delta \ + (bob added, dave removed) — proof the partial was not silently recorded", + ); + } +} + +#[tokio::test] +#[serial(branch_merge_phase_b)] +async fn branch_merge_adopt_partial_after_append_rolls_back() { + assert_partial_merge_rolls_back( + MergeScenario::Adopt, + "branch_merge.adopt_after_append_pre_upsert", + ) + .await; +} + +#[tokio::test] +#[serial(branch_merge_phase_b)] +async fn branch_merge_adopt_partial_after_upsert_rolls_back() { + assert_partial_merge_rolls_back( + MergeScenario::Adopt, + "branch_merge.adopt_after_upsert_pre_delete", + ) + .await; +} + +#[tokio::test] +#[serial(branch_merge_phase_b)] +async fn branch_merge_rewrite_partial_after_merge_rolls_back() { + assert_partial_merge_rolls_back( + MergeScenario::Rewrite, + "branch_merge.rewrite_after_merge_pre_delete", + ) + .await; +} + +#[tokio::test] +#[serial(branch_merge_phase_b)] +async fn branch_merge_rewrite_partial_after_delete_rolls_back() { + assert_partial_merge_rolls_back( + MergeScenario::Rewrite, + "branch_merge.rewrite_after_delete_pre_index", + ) + .await; +} + +/// Backward-compat: a `BranchMerge` sidecar written by a *pre-confirmation* +/// binary (schema_version 1, no `confirmed_version`) must NOT be misread as a +/// partial Phase B and rolled back. A pre-upgrade crash in the Phase-B→C gap can +/// leave such a sidecar over a *completed* merge; rolling it back would silently +/// discard a finished merge with no operator signal — the regression greptile / +/// Cursor flagged. +/// +/// We synthesize the pre-upgrade sidecar realistically: crash after Phase B (a +/// real sidecar + advanced Lance HEAD), then downgrade the on-disk JSON to the +/// v1 shape (`schema_version` = 1, strip every pin's `confirmed_version`) before +/// reopening — exactly what an old binary would have left. +/// +/// RED before the versioning fix: a v1 sidecar with no `confirmed_version` +/// classifies `IncompletePhaseB` → rolls back → `bob` is discarded. GREEN after: +/// the version-aware classifier reads v1 as the old loose generation → rolls +/// forward → `bob` preserved. +#[tokio::test] +#[serial(branch_merge_phase_b)] +async fn pre_upgrade_v1_branch_merge_sidecar_rolls_forward_not_back() { + use omnigraph::loader::{LoadMode, load_jsonl}; + + let _scenario = FailScenario::setup(); + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap().to_string(); + + // main {alice}; feature adds bob → a fast-forward AdoptWithDelta merge, which + // writes a recovery sidecar. + { + let mut db = Omnigraph::init(&uri, helpers::TEST_SCHEMA).await.unwrap(); + load_jsonl( + &mut db, + "{\"type\":\"Person\",\"data\":{\"name\":\"alice\",\"age\":30}}\n", + LoadMode::Append, + ) + .await + .unwrap(); + db.branch_create("feature").await.unwrap(); + db.mutate( + "feature", + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", "bob")], &[("$age", 40)]), + ) + .await + .unwrap(); + } + + // Crash after Phase B (Lance HEAD advanced, manifest not published) → a real + // sidecar lands on disk. + { + let db = Omnigraph::open(&uri).await.unwrap(); + let _fp = ScopedFailPoint::new("branch_merge.post_phase_b_pre_manifest_commit", "return"); + db.branch_merge("feature", "main").await.unwrap_err(); + } + + // Downgrade the sidecar to the pre-confirmation v1 shape an old binary writes. + { + let recovery_dir = std::path::Path::new(&uri).join("__recovery"); + let path = std::fs::read_dir(&recovery_dir) + .unwrap() + .filter_map(Result::ok) + .map(|e| e.path()) + .find(|p| p.extension().is_some_and(|x| x == "json")) + .expect("a recovery sidecar must exist after the post-Phase-B crash"); + let mut v: serde_json::Value = + serde_json::from_str(&std::fs::read_to_string(&path).unwrap()).unwrap(); + v["schema_version"] = serde_json::json!(1); + for table in v["tables"].as_array_mut().unwrap() { + table.as_object_mut().unwrap().remove("confirmed_version"); + } + std::fs::write(&path, serde_json::to_string_pretty(&v).unwrap()).unwrap(); + } + + // Reopen → the pre-upgrade completed merge must roll FORWARD (bob kept), not + // be silently discarded. + { + let db = Omnigraph::open(&uri).await.unwrap(); + assert_eq!( + sorted_person_names(&db).await, + vec!["alice", "bob"], + "a pre-confirmation (v1) BranchMerge sidecar over a completed merge must roll \ + forward, not be misread as a partial and rolled back", + ); + } +} + /// Branch-axis variant of the branch_merge recovery test: target is a /// non-main branch. Catches the branch-specific commit-graph head bug /// (D2) — without `CommitGraph::open_at_branch`, the recovery sweep @@ -3344,6 +3694,7 @@ async fn branch_merge_phase_b_failure_recovered_on_next_open() { /// target, and future merges between the same pair would lose /// already-up-to-date detection. #[tokio::test] +#[serial(branch_merge_phase_b)] async fn branch_merge_phase_b_failure_recovered_on_non_main_target() { use omnigraph::loader::{LoadMode, load_jsonl}; @@ -3468,6 +3819,7 @@ async fn branch_merge_phase_b_failure_recovered_on_non_main_target() { /// keeps RewriteMerged tables on active_branch), the contract assertion /// catches a regression that reverts to `entry.table_branch.clone()`. #[tokio::test] +#[serial(branch_merge_phase_b)] async fn branch_merge_sidecar_pins_table_branch_to_active_branch() { use omnigraph::loader::{LoadMode, load_jsonl}; diff --git a/crates/omnigraph/tests/merge_fast_forward.rs b/crates/omnigraph/tests/merge_fast_forward.rs new file mode 100644 index 0000000..185f45d --- /dev/null +++ b/crates/omnigraph/tests/merge_fast_forward.rs @@ -0,0 +1,213 @@ +//! Fast-forward branch-merge cost + correctness. +//! +//! The data-path fix routes *new* rows of an adopted-source merge through +//! `stage_append` (a streaming `Operation::Append`) instead of lumping new + +//! changed rows into one `stage_merge_insert` (a full-outer hash join that +//! buffers the whole delta and exhausts the DataFusion memory pool on +//! embedding-bearing tables). +//! +//! The regression gate here is *structural*, not a brittle size threshold: it +//! asserts WHICH staged-write primitive the merge invokes, via the task-local +//! write probes in `omnigraph::instrumentation`. That is deterministic and +//! machine-independent — it cannot flake on a bigger memory pool. + +// Wrapping `branch_merge` in `with_merge_write_probes` (a task-local scope) +// nests the already-deep merge future one layer deeper, overflowing rustc's +// default 128 layout-query depth. Bump it for this test crate. +#![recursion_limit = "512"] + +mod helpers; + +use omnigraph::db::{MergeOutcome, Omnigraph}; +use omnigraph::instrumentation::{MergeWriteProbes, with_merge_write_probes}; + +use helpers::*; + +/// Insert `n` brand-new persons (fresh ids) onto `branch`, forking the Person +/// table onto it. All rows are "new on source" — none collide with base ids. +async fn append_new_persons(db: &mut Omnigraph, branch: &str, n: usize) { + for i in 0..n { + mutate_branch( + db, + branch, + MUTATION_QUERIES, + "insert_person", + &mixed_params(&[("$name", &format!("ff_new_{i}"))], &[("$age", 30)]), + ) + .await + .unwrap(); + } +} + +/// THE cost-budget gate. An append-only fast-forward merge must append the new +/// rows and run **zero** `stage_merge_insert` (the full-outer hash join that is +/// the OOM). RED today (new + changed are lumped into one `stage_merge_insert`); +/// GREEN once the adopt path splits new→`stage_append`, changed→`stage_merge_insert`. +#[tokio::test] +async fn append_only_fast_forward_merge_does_no_merge_insert() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let main = init_and_load(&dir).await; + main.branch_create("feature").await.unwrap(); + + let mut feature = Omnigraph::open(uri).await.unwrap(); + append_new_persons(&mut feature, "feature", 5).await; + + let probes = MergeWriteProbes::default(); + let outcome = + with_merge_write_probes(probes.clone(), main.branch_merge("feature", "main")) + .await + .unwrap(); + assert_eq!(outcome, MergeOutcome::FastForward); + + assert_eq!( + probes.stage_merge_insert_calls(), + 0, + "append-only fast-forward merge must do 0 stage_merge_insert (the OOM hash join); did {}", + probes.stage_merge_insert_calls(), + ); + assert!( + probes.stage_append_calls() >= 1, + "append-only fast-forward merge must append the new rows via stage_append; did {}", + probes.stage_append_calls(), + ); + assert_eq!( + probes.scan_staged_combined_calls(), + 0, + "append-only merge must stream the append (stage_append_stream), not materialize the \ + whole delta into one batch via scan_staged_combined; did {}", + probes.scan_staged_combined_calls(), + ); +} + +/// Functional correctness: a fast-forward merge of an append-only branch leaves +/// main equal to the source branch. Independent of the cost-budget gate. +#[tokio::test] +async fn fast_forward_merge_yields_source_state() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let main = init_and_load(&dir).await; + let base_count = count_rows(&main, "node:Person").await; + + main.branch_create("feature").await.unwrap(); + let mut feature = Omnigraph::open(uri).await.unwrap(); + append_new_persons(&mut feature, "feature", 5).await; + let source_count = count_rows_branch(&feature, "feature", "node:Person").await; + assert_eq!(source_count, base_count + 5); + + let outcome = main.branch_merge("feature", "main").await.unwrap(); + assert_eq!(outcome, MergeOutcome::FastForward); + + // main now equals source: the 5 new persons are present, the base rows kept. + assert_eq!(count_rows(&main, "node:Person").await, source_count); + let names = collect_column_strings(&read_table(&main, "node:Person").await, "name"); + for i in 0..5 { + assert!( + names.contains(&format!("ff_new_{i}")), + "merged main missing new person ff_new_{i}; have {names:?}" + ); + } +} + +const VEC_SCHEMA: &str = "node Chunk {\n slug: String @key\n embedding: Vector(8) @index\n}\n"; + +/// Commit 6 behavior: the fast-forward adopt path does NOT build indices inline +/// — index coverage is reconciler-owned (`optimize`/`ensure_indices`). A merge +/// into a freshly-initialized (unindexed) vector table must perform **0** inline +/// vector-index (IVF) builds; reads stay correct via brute-force until +/// `optimize` covers the new rows. RED before the change (the publish path built +/// the IVF inline); GREEN after. +#[tokio::test] +async fn fast_forward_merge_defers_vector_index_to_reconciler() { + use omnigraph::loader::LoadMode; + + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + // Empty Chunk table → no vector index at init (KMeans can't train on 0 rows). + let main = Omnigraph::init(uri, VEC_SCHEMA).await.unwrap(); + main.branch_create("feature").await.unwrap(); + + // Load embedding-bearing chunks onto the branch. The branch builds its own + // index here (outside the probe scope) — irrelevant to the merge's cost. + let mut rows = String::new(); + for i in 0..24 { + let v: Vec = (0..8).map(|j| format!("{}.0", (i + j) % 5)).collect(); + rows.push_str(&format!( + "{{\"type\":\"Chunk\",\"data\":{{\"slug\":\"c{i}\",\"embedding\":[{}]}}}}\n", + v.join(",") + )); + } + let feature = Omnigraph::open(uri).await.unwrap(); + feature.load("feature", &rows, LoadMode::Merge).await.unwrap(); + + // Merge, counting inline vector-index builds the publish path performs. + let probes = MergeWriteProbes::default(); + let outcome = with_merge_write_probes(probes.clone(), main.branch_merge("feature", "main")) + .await + .unwrap(); + assert_eq!(outcome, MergeOutcome::FastForward); + + assert_eq!( + probes.create_vector_index_calls(), + 0, + "fast-forward adopt merge must defer vector-index coverage to the reconciler \ + (0 inline IVF builds); did {}", + probes.create_vector_index_calls(), + ); + // Correctness: the rows landed on main (reads brute-force until optimize). + assert_eq!(count_rows(&main, "node:Chunk").await, 24); +} + +const BLOB_SCHEMA: &str = "node Document {\n title: String @key\n content: Blob?\n note: String?\n}\n"; +const BLOB_INSERT: &str = r#" +query insert_doc($title: String, $content: Blob, $note: String) { + insert Document { title: $title, content: $content, note: $note } +} +"#; + +/// A fast-forward merge of a branch with a Blob column exercises the blob +/// fallback in `scan_stream_for_rewrite` (materialize → re-stream) through the +/// streaming append. main is NOT mutated, so Document is `AdoptWithDelta` (the +/// adopt/append path), not `RewriteMerged`. The blob bytes must survive the +/// materialize → stream → append round-trip. +#[tokio::test] +async fn fast_forward_merge_streams_blob_columns() { + use omnigraph::loader::{LoadMode, load_jsonl}; + + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let mut main = Omnigraph::init(uri, BLOB_SCHEMA).await.unwrap(); + load_jsonl( + &mut main, + "{\"type\":\"Document\",\"data\":{\"title\":\"seed\",\"content\":\"base64:U2VlZA==\",\"note\":\"base\"}}", + LoadMode::Overwrite, + ) + .await + .unwrap(); + main.branch_create("feature").await.unwrap(); + + // Only the branch is mutated → fast-forward → adopt/append path. + let mut feature = Omnigraph::open(uri).await.unwrap(); + mutate_branch( + &mut feature, + "feature", + BLOB_INSERT, + "insert_doc", + ¶ms(&[ + ("$title", "readme"), + ("$content", "base64:SGVsbG8="), + ("$note", "branch"), + ]), + ) + .await + .unwrap(); + + let outcome = main.branch_merge("feature", "main").await.unwrap(); + assert_eq!(outcome, MergeOutcome::FastForward); + + // The appended blob row's bytes survive the streaming append; the base row stays intact. + let readme = main.read_blob("Document", "readme", "content").await.unwrap(); + assert_eq!(&readme.read().await.unwrap()[..], b"Hello"); + let seed = main.read_blob("Document", "seed", "content").await.unwrap(); + assert_eq!(&seed.read().await.unwrap()[..], b"Seed"); +} diff --git a/crates/omnigraph/tests/recovery.rs b/crates/omnigraph/tests/recovery.rs index b5ca58f..ed47811 100644 --- a/crates/omnigraph/tests/recovery.rs +++ b/crates/omnigraph/tests/recovery.rs @@ -104,8 +104,10 @@ async fn recovery_refuses_unknown_schema_version_on_open() { let _db = Omnigraph::init(uri, TEST_SCHEMA).await.unwrap(); drop(_db); - // A sidecar from a hypothetical future writer; the older binary must - // refuse to interpret it (resolved-decisions §3 in the design doc). + // A sidecar from a hypothetical future writer (version NEWER than this + // binary's max); the reader must refuse to interpret it — it cannot guess + // semantics a newer writer baked in. (Older versions are accepted and + // interpreted with their original semantics; see `parse_sidecar`.) let sidecar_json = r#"{ "schema_version": 99, "operation_id": "01H000000000000000000000ZZ", @@ -120,11 +122,11 @@ async fn recovery_refuses_unknown_schema_version_on_open() { let err = Omnigraph::open(uri) .await .err() - .expect("expected open to fail because of unknown sidecar schema_version"); + .expect("expected open to fail because of a future sidecar schema_version"); let msg = err.to_string(); assert!( - msg.contains("schema_version=99") && msg.contains("supports only schema_version=1"), - "expected SidecarSchemaError mentioning the version mismatch, got: {}", + msg.contains("schema_version=99") && msg.contains("newer than the maximum"), + "expected a future-version refusal, got: {}", msg, ); // Sidecar must still be on disk — we don't auto-delete unparseable files. diff --git a/docs/dev/writes.md b/docs/dev/writes.md index 01c166e..c4e174c 100644 --- a/docs/dev/writes.md +++ b/docs/dev/writes.md @@ -178,6 +178,17 @@ are left at `Lance HEAD = manifest_pinned + 1`. post_commit_pin)` it intends to commit + the writer kind + actor_id. 2. **Phase B**: writer's per-table `commit_staged` loop runs. + - **Phase-B confirmation (`BranchMerge` only)**: a `BranchMerge` writer + advances each table's HEAD by *several* commits (append → upsert → + delete), so a bare "HEAD moved" is ambiguous — it could be a complete + publish or one crashed mid-sequence. After the whole per-table loop + finishes, the writer re-writes the sidecar stamping each pin's + `confirmed_version` with the exact achieved version, then proceeds to + Phase C. This is the commit point of the recovery WAL: a crash *after* + confirmation rolls forward to those versions; a crash *during* Phase B + (sidecar still unconfirmed) rolls back. Other writers don't confirm — + their drift is derived state (index coverage, compaction) that a partial + roll-forward never corrupts. 3. **Phase C**: publisher commits the manifest. 4. **Phase D**: writer deletes the sidecar. @@ -197,7 +208,10 @@ recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`: - For each sidecar in `__recovery/`, compare every named table's Lance HEAD to the manifest pin. Classify per the all-or-nothing decision tree (RolledPastExpected / NoMovement / UnexpectedAtP1 / - UnexpectedMultistep / InvariantViolation). + UnexpectedMultistep / IncompletePhaseB / InvariantViolation). For a + `BranchMerge` sidecar, a moved HEAD with no `confirmed_version` classifies + as `IncompletePhaseB` (a partial multi-commit publish) and forces roll-back; + with a `confirmed_version`, roll-forward targets exactly that version. - If any table is `InvariantViolation` (Lance HEAD < manifest pinned — should be impossible), **abort** with a loud error and leave the sidecar on disk for operator review. diff --git a/docs/user/branching/merge.md b/docs/user/branching/merge.md index fde2fab..cb54ed6 100644 --- a/docs/user/branching/merge.md +++ b/docs/user/branching/merge.md @@ -22,6 +22,25 @@ A merge resolves to one of three outcomes: simply advances to the source. - **Merged** — both sides diverged; a new merge commit is created with two parents. +## Indexes after a merge + +A **fast-forward** merge (the common case — the target had no conflicting +changes, so the source's rows are adopted) does not build or rebuild indexes on +the rows it brings into the target. Newly merged rows (and any index a table does +not yet have) are covered the next time `optimize` runs — indexes are derived +state, and reads stay correct in the meantime via brute-force scan over the +not-yet-covered rows. This keeps a fast-forward merge fast (it never pays an +inline vector/FTS rebuild on the publish path), at the cost of brute-force search +latency on freshly merged rows until the next `optimize`. + +A **three-way** merge (the `Merged` outcome — both branches changed the table and +the rows were reconciled) still rebuilds the table's indexes inline today, as part +of the publish. So a Merged-outcome merge of an embedding-bearing table pays the +index-build cost up front. + +Either way, run `omnigraph optimize` after a large merge to restore (or, for the +fast-forward path, establish) full index coverage. + ## Conflicts When both branches changed the same data incompatibly, the merge fails with a From 7fd23c54a39de8e112ba1f690282f4dd842f329a Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Fri, 19 Jun 2026 03:34:15 +0300 Subject: [PATCH 11/13] fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap (#284) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap A `cluster apply` carrying a schema change against a graph that has non-main branches, or an unsupported "needs backfill" migration, armed a recovery sidecar *before* calling the engine, then left it behind when the engine rejected the apply pre-movement. The server refuses to boot while any sidecar is pending, and re-running apply re-armed a fresh sidecar — an unescapable crash loop. None of the engine rejections are bugs; the trap is in the apply/serve choreography. Three coordinated changes: 1. Preview before arming the sidecar. `cluster apply` now runs `preview_schema_apply_with_options` before `write_recovery_sidecar`, so parser/planner rejections (non-main branches, unsupported plan) fail loudly without leaving recovery work behind. The post-preview engine error path now deletes the sidecar when the live schema still matches the recorded digest (nothing moved), and keeps it only on real mid-movement failure — both branches covered by new engine-failpoint tests (cluster failpoints now enable omnigraph/failpoints). 2. Per-graph quarantine at serve time instead of whole-cluster refusal. A graph-attributed pending sidecar, an unopenable graph root, a query parse failure, or an unresolvable embedding provider now quarantines just that graph (logged loudly at every boot layer) while healthy graphs serve; `/graphs` lists only ready graphs and quarantined routes 404. Cluster-global problems (missing/unreadable state, malformed or unattributable sidecars, shared-catalog or cluster-policy errors, zero healthy graphs) stay fail-fast. `--require-all-graphs` / OMNIGRAPH_REQUIRE_ALL_GRAPHS=1 restores all-or-nothing boot. 3. Backfill embedding-provider profile metadata on apply. Mirrors the existing policy-binding backfill: a pre-5A ledger missing `embedding_profile` is now detected as a metadata-only change and backfilled by a no-op apply, instead of bricking serve with `embedding_provider_profile_missing` forever. Tests: trap (no sidecar after a rejected apply), both digest-cleanup branches, per-graph quarantine (cluster + server), embedding backfill. Co-Authored-By: Claude Opus 4.8 (1M context) * docs: resilient cluster boot + recovery-sidecar trap fix Amend RFC-005 D4 readiness posture (cluster-global fail-fast vs graph-local quarantine; deviation #5 for --require-all-graphs), add the v0.7.0 release note, and update the user cluster/server/deployment docs and the OMNIGRAPH_REQUIRE_ALL_GRAPHS env var. Co-Authored-By: Claude Opus 4.8 (1M context) * fix(cluster): surface sidecar-cleanup failures; document severity promotion Address Greptile review on PR #284: - The pre-movement sidecar cleanup fast-path discarded `delete_object`'s result, so a transient delete failure left the graph quarantined with no signal. Add `try_delete_object` (Result-returning) and emit a `recovery_sidecar_cleanup_failed` warning diagnostic on failure; the fire-and-forget `delete_object` now delegates to it. - Document why the serve-time loop promotes every `list_recovery_sidecars` diagnostic to a cluster-fatal error (the listing only emits genuine read/parse/version failures, as warnings, whose blast radius serving cannot prove) and note the promote-by-code path if that ever changes. Co-Authored-By: Claude Opus 4.8 (1M context) --------- Co-authored-by: Claude Opus 4.8 (1M context) --- crates/omnigraph-cluster/Cargo.toml | 4 +- crates/omnigraph-cluster/src/diff.rs | 44 ++++ crates/omnigraph-cluster/src/lib.rs | 150 ++++++++++---- crates/omnigraph-cluster/src/serve.rs | 98 +++++++-- crates/omnigraph-cluster/src/store.rs | 9 +- crates/omnigraph-cluster/src/tests.rs | 206 ++++++++++++++++++- crates/omnigraph-cluster/src/types.rs | 11 + crates/omnigraph-cluster/tests/failpoints.rs | 127 ++++++++++-- crates/omnigraph-server/src/lib.rs | 126 ++++++++---- crates/omnigraph-server/src/main.rs | 14 +- crates/omnigraph-server/src/settings.rs | 196 ++++++++++++++---- crates/omnigraph-server/tests/multi_graph.rs | 132 +++++++++++- crates/omnigraph-server/tests/s3.rs | 7 +- crates/omnigraph-server/tests/support/mod.rs | 11 +- docker/entrypoint.sh | 9 +- docs/dev/rfc-005-server-cluster-boot.md | 21 +- docs/releases/v0.7.0.md | 6 + docs/user/clusters/config.md | 43 ++-- docs/user/clusters/index.md | 10 +- docs/user/deployment.md | 1 + docs/user/operations/server.md | 21 +- 21 files changed, 1043 insertions(+), 203 deletions(-) diff --git a/crates/omnigraph-cluster/Cargo.toml b/crates/omnigraph-cluster/Cargo.toml index 05a9308..f0a3a22 100644 --- a/crates/omnigraph-cluster/Cargo.toml +++ b/crates/omnigraph-cluster/Cargo.toml @@ -10,8 +10,8 @@ documentation = "https://docs.rs/omnigraph-cluster" [features] # Fault-injection hooks for the apply protocol (crash-mid-apply, CAS-race -# tests). Deliberately does NOT enable omnigraph/failpoints. -failpoints = ["dep:fail", "fail/failpoints"] +# tests), including cluster/engine boundary failures. +failpoints = ["dep:fail", "fail/failpoints", "omnigraph/failpoints"] [dependencies] omnigraph-compiler = { path = "../omnigraph-compiler", version = "0.7.0" } diff --git a/crates/omnigraph-cluster/src/diff.rs b/crates/omnigraph-cluster/src/diff.rs index 516a86e..ce29a45 100644 --- a/crates/omnigraph-cluster/src/diff.rs +++ b/crates/omnigraph-cluster/src/diff.rs @@ -18,6 +18,7 @@ pub(crate) fn diff_resources( disposition: None, reason: None, binding_change: false, + metadata_change: None, migration: None, }), Some(before) if before != after => changes.push(PlanChange { @@ -28,6 +29,7 @@ pub(crate) fn diff_resources( disposition: None, reason: None, binding_change: false, + metadata_change: None, migration: None, }), Some(_) => {} @@ -43,6 +45,7 @@ pub(crate) fn diff_resources( disposition: None, reason: None, binding_change: false, + metadata_change: None, migration: None, }); } @@ -82,6 +85,47 @@ pub(crate) fn append_policy_binding_changes( disposition: None, reason: None, binding_change: true, + metadata_change: Some(PlanMetadataChange::PolicyBindings), + migration: None, + }); + } + changes.sort_by(|a, b| a.resource.cmp(&b.resource)); +} + +/// Metadata-only embedding provider changes: the provider digest is unchanged +/// but the applied state predates storing the profile body needed by +/// config-free serving. This mirrors policy binding backfill instead of +/// hiding a serving-time failure behind a no-op plan. +pub(crate) fn append_embedding_profile_changes( + changes: &mut Vec, + prior_state: Option<&ClusterState>, + desired: &DesiredCluster, +) { + let Some(state) = prior_state else { + return; // no state: provider Creates carry profiles already + }; + for (address, desired_profile) in &desired.embedding_providers { + if changes + .iter() + .any(|change| change.resource.as_str() == address.as_str()) + { + continue; // content change already covers it + } + let Some(entry) = state.applied_revision.resources.get(address) else { + continue; // not applied yet: the Create covers it + }; + if entry.embedding_profile.as_ref() == Some(desired_profile) { + continue; + } + changes.push(PlanChange { + resource: address.clone(), + operation: PlanOperation::Update, + before_digest: Some(entry.digest.clone()), + after_digest: Some(entry.digest.clone()), + disposition: None, + reason: None, + binding_change: false, + metadata_change: Some(PlanMetadataChange::EmbeddingProfile), migration: None, }); } diff --git a/crates/omnigraph-cluster/src/lib.rs b/crates/omnigraph-cluster/src/lib.rs index 0dad78c..bed27c8 100644 --- a/crates/omnigraph-cluster/src/lib.rs +++ b/crates/omnigraph-cluster/src/lib.rs @@ -33,9 +33,9 @@ use config::{ validate_id, validate_query_source, }; use diff::{ - FailedGraphOrigin, ResourceKind, append_policy_binding_changes, approved_resources, - classify_changes, compute_approvals, compute_blast_radius, demote_dependents_of_failed_graphs, - diff_resources, resource_kind, + FailedGraphOrigin, ResourceKind, append_embedding_profile_changes, + append_policy_binding_changes, approved_resources, classify_changes, compute_approvals, + compute_blast_radius, demote_dependents_of_failed_graphs, diff_resources, resource_kind, }; pub use serve::{ ServingGraph, ServingPolicy, ServingQuery, ServingSnapshot, cluster_graph_ids, @@ -183,6 +183,7 @@ pub async fn plan_config_dir(config_dir: impl AsRef) -> PlanOutput { }; if !has_errors(&diagnostics) { append_policy_binding_changes(&mut changes, prior_state.as_ref(), &desired); + append_embedding_profile_changes(&mut changes, prior_state.as_ref(), &desired); } // Plan previews dispositions without sweeping; a pending recovery is // surfaced as the cluster_recovery_pending warning above instead. @@ -404,6 +405,7 @@ pub async fn apply_config_dir_with_options( let prior_resources = state_resource_digests(&state); let mut changes = diff_resources(&prior_resources, &desired.resource_digests); append_policy_binding_changes(&mut changes, Some(&state), &desired); + append_embedding_profile_changes(&mut changes, Some(&state), &desired); let approval_artifacts = backend.list_approval_artifacts(&mut diagnostics).await; let approved = approved_resources( &approval_artifacts, @@ -639,42 +641,9 @@ pub async fn apply_config_dir_with_options( continue; } }; - let observed_manifest_version = match db.snapshot_of(ReadTarget::branch("main")).await { - Ok(snapshot) => Some(snapshot.version()), - Err(_) => None, - }; - let mut sidecar = RecoverySidecar { - schema_version: 1, - operation_id: Ulid::new().to_string(), - started_at: now_rfc3339(), - actor: options.actor.clone(), - kind: RecoverySidecarKind::SchemaApply, - graph_id: graph_id.clone(), - graph_uri: graph_uri.clone(), - observed_manifest_version, - expected_manifest_version: None, - desired_schema_digest: desired_graph.schema_digest.clone(), - state_cas_base: expected_cas.clone(), - approval_id: None, - }; - let sidecar_path = match backend.write_recovery_sidecar(&sidecar).await { - Ok(path) => path, - Err(diagnostic) => { - diagnostics.push(diagnostic); - failed_graphs.insert(graph_id.clone(), FailedGraphOrigin::SchemaApply); - graph_moving_aborted = true; - continue; - } - }; - if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.before_schema_apply") { - // Simulated crash before the engine call: the sidecar stays; the - // sweep retires it next run (ledger still consistent with live). - diagnostics.push(diagnostic); - failed_graphs.insert(graph_id.clone(), FailedGraphOrigin::SchemaApply); - graph_moving_aborted = true; - continue; - } - // Re-read + digest-verify the desired schema source under the lock. + // Re-read + digest-verify the desired schema source before the + // cluster sidecar exists. Parser/planner rejections cannot have + // moved graph state, so they must not leave recovery work behind. let schema_source = source_paths .get(schema_address(graph_id).as_str()) .ok_or_else(|| { @@ -708,12 +677,64 @@ pub async fn apply_config_dir_with_options( Ok(source) => source, Err(diagnostic) => { diagnostics.push(diagnostic); - backend.delete_object(&sidecar_path).await; // nothing moved failed_graphs.insert(graph_id.clone(), FailedGraphOrigin::SchemaApply); graph_moving_aborted = true; continue; } }; + if let Err(err) = db + .preview_schema_apply_with_options(&schema_source, SchemaApplyOptions::default()) + .await + { + diagnostics.push(Diagnostic::error( + "schema_apply_failed", + schema_address(graph_id), + format!("schema apply is not supported on '{graph_uri}': {err}"), + )); + failed_graphs.insert(graph_id.clone(), FailedGraphOrigin::SchemaApply); + graph_moving_aborted = true; + continue; + } + let observed_manifest_version = match db.snapshot_of(ReadTarget::branch("main")).await { + Ok(snapshot) => Some(snapshot.version()), + Err(_) => None, + }; + let recorded_schema_digest = state + .applied_revision + .resources + .get(&schema_address(graph_id)) + .map(|entry| entry.digest.clone()); + let mut sidecar = RecoverySidecar { + schema_version: 1, + operation_id: Ulid::new().to_string(), + started_at: now_rfc3339(), + actor: options.actor.clone(), + kind: RecoverySidecarKind::SchemaApply, + graph_id: graph_id.clone(), + graph_uri: graph_uri.clone(), + observed_manifest_version, + expected_manifest_version: None, + desired_schema_digest: desired_graph.schema_digest.clone(), + state_cas_base: expected_cas.clone(), + approval_id: None, + }; + let sidecar_path = match backend.write_recovery_sidecar(&sidecar).await { + Ok(path) => path, + Err(diagnostic) => { + diagnostics.push(diagnostic); + failed_graphs.insert(graph_id.clone(), FailedGraphOrigin::SchemaApply); + graph_moving_aborted = true; + continue; + } + }; + if let Err(diagnostic) = failpoints::maybe_fail("cluster_apply.before_schema_apply") { + // Simulated crash before the engine call: the sidecar stays; the + // sweep retires it next run (ledger still consistent with live). + diagnostics.push(diagnostic); + failed_graphs.insert(graph_id.clone(), FailedGraphOrigin::SchemaApply); + graph_moving_aborted = true; + continue; + } // Soft drops only: allow_data_loss stays false until the approval // artifacts of stage 4C exist (RFC-004 §D4). match db @@ -736,8 +757,29 @@ pub async fn apply_config_dir_with_options( schema_address(graph_id), format!("schema apply failed on '{graph_uri}': {err}"), )); - // Sidecar stays; the sweep retires it (live digest unchanged - // == ledger consistent) or flags real movement. + if live_schema_matches_recorded_digest( + &graph_uri, + recorded_schema_digest.as_deref(), + observed_manifest_version, + ) + .await + { + // Pre-movement rejection: nothing moved, so retire the + // sidecar eagerly. A delete failure leaves it safe (the + // graph is quarantined until the next sweep), but surface + // it so an operator isn't left debugging a silent stick. + if let Err(err) = backend.try_delete_object(&sidecar_path).await { + diagnostics.push(Diagnostic::warning( + "recovery_sidecar_cleanup_failed", + sidecar_path.clone(), + format!( + "could not delete the stale recovery sidecar after a pre-movement \ + schema-apply rejection; graph `{graph_id}` stays quarantined until \ + a state-mutating cluster command sweeps it: {err}" + ), + )); + } + } failed_graphs.insert(graph_id.clone(), FailedGraphOrigin::SchemaApply); graph_moving_aborted = true; continue; @@ -1022,6 +1064,7 @@ pub async fn apply_config_dir_with_options( &desired.resource_digests, ); append_policy_binding_changes(&mut residual, Some(&new_state), &desired); + append_embedding_profile_changes(&mut residual, Some(&new_state), &desired); let converged = residual.is_empty(); if converged { new_state.applied_revision.config_digest = Some(desired.config_digest.clone()); @@ -1939,6 +1982,29 @@ fn embedding_provider_digest(profile: &EmbeddingProviderConfig) -> String { sha256_hex(input.as_bytes()) } +async fn live_schema_matches_recorded_digest( + graph_uri: &str, + recorded_schema_digest: Option<&str>, + observed_manifest_version: Option, +) -> bool { + let Some(recorded_schema_digest) = recorded_schema_digest else { + return false; + }; + let Some(observed_manifest_version) = observed_manifest_version else { + return false; + }; + let Ok(db) = Omnigraph::open_read_only(graph_uri).await else { + return false; + }; + let Ok(snapshot) = db.snapshot_of(ReadTarget::branch("main")).await else { + return false; + }; + if snapshot.version() != observed_manifest_version { + return false; + } + sha256_hex(db.schema_source().as_bytes()) == recorded_schema_digest +} + fn desired_config_digest( raw: &RawClusterConfig, resource_digests: &BTreeMap, diff --git a/crates/omnigraph-cluster/src/serve.rs b/crates/omnigraph-cluster/src/serve.rs index 6f89e2d..54d3017 100644 --- a/crates/omnigraph-cluster/src/serve.rs +++ b/crates/omnigraph-cluster/src/serve.rs @@ -37,11 +37,14 @@ pub struct ServingSnapshot { pub graphs: Vec, pub queries: Vec, pub policies: Vec, + pub diagnostics: Vec, } /// Read the applied revision as a serving snapshot — the read-only loader for -/// the Phase-5 server boot. All-or-nothing per RFC-005 §D4: every readiness -/// failure is collected and the whole snapshot refused; no partial serving. +/// the Phase-5 server boot. Cluster-global readiness failures are still +/// all-or-nothing, but graph-attributed pending recovery sidecars quarantine +/// only that graph so healthy graphs can continue serving. This loader never +/// runs a recovery sweep. /// Takes no lock: the state file is replaced atomically, so this reads a /// consistent point-in-time ledger. pub async fn read_serving_snapshot( @@ -190,19 +193,44 @@ async fn read_snapshot_with_store( backend: ClusterStore, ) -> Result> { let mut diagnostics: Vec = Vec::new(); + let mut startup_diagnostics: Vec = Vec::new(); + let mut quarantined_graphs: BTreeSet = BTreeSet::new(); - // A ledger a sweep is about to rewrite must not start serving. + // Do not sweep at serve time. Valid graph-attributed sidecars quarantine + // that graph; malformed/unattributable sidecars remain cluster-fatal + // because serving cannot prove their blast radius. + let sidecar_diag_start = diagnostics.len(); let sidecars = backend.list_recovery_sidecars(&mut diagnostics).await; - if !sidecars.is_empty() { - diagnostics.push(Diagnostic::error( + // Every diagnostic `list_recovery_sidecars` appends is a genuine + // read/parse/version failure (emitted as a warning by `store::list_json_dir`) + // whose blast radius serving cannot prove — promote each to a cluster-fatal + // error. This depends on that listing only ever emitting failure diagnostics; + // if it grows a benign/informational one, promote by code instead. + for diagnostic in diagnostics.iter_mut().skip(sidecar_diag_start) { + diagnostic.severity = DiagnosticSeverity::Error; + } + for (path, sidecar) in sidecars { + if sidecar.graph_id.trim().is_empty() { + diagnostics.push(Diagnostic::error( + "cluster_recovery_unattributed", + path, + "recovery sidecar has no graph id; run a state-mutating cluster command to sweep it before serving", + )); + continue; + } + quarantined_graphs.insert(sidecar.graph_id.clone()); + startup_diagnostics.push(Diagnostic::warning( "cluster_recovery_pending", - CLUSTER_RECOVERIES_DIR, + graph_address(&sidecar.graph_id), format!( - "{} interrupted operation(s) await recovery; run any state-mutating cluster command (e.g. `cluster apply`) to sweep, then retry", - sidecars.len() + "graph `{}` is quarantined because interrupted operation `{}` awaits recovery; run any state-mutating cluster command (e.g. `cluster apply`) to sweep", + sidecar.graph_id, sidecar.operation_id ), )); } + if has_errors(&diagnostics) { + return Err(diagnostics); + } let mut observations = backend.observations(); let state = match backend.read_state(&mut observations).await { @@ -223,14 +251,29 @@ async fn read_snapshot_with_store( } }; let Some(state) = state else { + diagnostics.extend(startup_diagnostics); return Err(diagnostics); }; + let required_embedding_providers: BTreeSet = state + .applied_revision + .resources + .iter() + .filter_map(|(address, entry)| match resource_kind(address) { + ResourceKind::Graph(graph_id) if !quarantined_graphs.contains(&graph_id) => { + entry.embedding_provider.clone() + } + _ => None, + }) + .collect(); let mut embedding_profiles: BTreeMap = BTreeMap::new(); for (address, entry) in &state.applied_revision.resources { if !matches!(resource_kind(address), ResourceKind::EmbeddingProvider(_)) { continue; } + if !required_embedding_providers.contains(address) { + continue; + } let Some(profile) = entry.embedding_profile.clone() else { diagnostics.push(Diagnostic::error( "embedding_provider_profile_missing", @@ -256,9 +299,14 @@ async fn read_snapshot_with_store( let mut graphs = Vec::new(); let mut queries = Vec::new(); let mut policies = Vec::new(); + let mut saw_applied_graph = false; for (address, entry) in &state.applied_revision.resources { match resource_kind(address) { ResourceKind::Graph(graph_id) => { + saw_applied_graph = true; + if quarantined_graphs.contains(&graph_id) { + continue; + } let embedding = match entry.embedding_provider.as_deref() { Some(provider_address) => match resource_kind(provider_address) { ResourceKind::EmbeddingProvider(_) => { @@ -300,6 +348,9 @@ async fn read_snapshot_with_store( let ResourceKind::Query { graph, name } = &kind else { unreachable!() }; + if quarantined_graphs.contains(graph) { + continue; + } match backend .read_verified_payload(&kind, &entry.digest, address) .await @@ -324,6 +375,17 @@ async fn read_snapshot_with_store( )); continue; }; + let applies_to: Vec = applies_to + .into_iter() + .filter(|binding| { + binding + .strip_prefix("graph.") + .is_none_or(|graph| !quarantined_graphs.contains(graph)) + }) + .collect(); + if applies_to.is_empty() { + continue; + } match backend .read_verified_payload(&kind, &entry.digest, address) .await @@ -342,19 +404,29 @@ async fn read_snapshot_with_store( } if graphs.is_empty() { - diagnostics.push(Diagnostic::error( - "cluster_empty", - CLUSTER_STATE_FILE, - "the applied revision records no graphs; apply a cluster with at least one graph before serving from it", - )); + if saw_applied_graph && !quarantined_graphs.is_empty() { + diagnostics.push(Diagnostic::error( + "cluster_no_healthy_graphs", + CLUSTER_RECOVERIES_DIR, + "all applied graphs are quarantined by pending recovery sidecars; run any state-mutating cluster command (e.g. `cluster apply`) to sweep, then retry", + )); + } else { + diagnostics.push(Diagnostic::error( + "cluster_empty", + CLUSTER_STATE_FILE, + "the applied revision records no graphs; apply a cluster with at least one graph before serving from it", + )); + } } if has_errors(&diagnostics) { + diagnostics.extend(startup_diagnostics); return Err(diagnostics); } Ok(ServingSnapshot { graphs, queries, policies, + diagnostics: startup_diagnostics, }) } diff --git a/crates/omnigraph-cluster/src/store.rs b/crates/omnigraph-cluster/src/store.rs index c19a95d..a156d78 100644 --- a/crates/omnigraph-cluster/src/store.rs +++ b/crates/omnigraph-cluster/src/store.rs @@ -250,7 +250,14 @@ impl ClusterStore { /// Best-effort object removal (sidecar retirement after a CAS lands, /// lock cleanup) — failures are recoverable by the next sweep. pub(crate) async fn delete_object(&self, uri: &str) { - let _ = self.adapter.delete(uri).await; + let _ = self.try_delete_object(uri).await; + } + + /// Like `delete_object` but surfaces the failure, so a caller that depends + /// on the deletion (e.g. the pre-movement sidecar cleanup fast-path) can + /// report it as a diagnostic instead of silently leaving stale state. + pub(crate) async fn try_delete_object(&self, uri: &str) -> Result<(), String> { + self.adapter.delete(uri).await.map_err(|err| err.to_string()) } /// Recursive prefix delete for graph roots (approved deletes). Idempotent; diff --git a/crates/omnigraph-cluster/src/tests.rs b/crates/omnigraph-cluster/src/tests.rs index b14b46e..7eae69f 100644 --- a/crates/omnigraph-cluster/src/tests.rs +++ b/crates/omnigraph-cluster/src/tests.rs @@ -1174,6 +1174,19 @@ graphs: .unwrap() } + fn recovery_sidecars(config_dir: &Path) -> Vec { + let dir = config_dir.join(CLUSTER_RECOVERIES_DIR); + if !dir.exists() { + return Vec::new(); + } + let mut sidecars: Vec<_> = fs::read_dir(dir) + .unwrap() + .map(|entry| entry.unwrap().path()) + .collect(); + sidecars.sort(); + sidecars + } + fn query_payload_path(config_dir: &Path, digest: &str) -> std::path::PathBuf { config_dir .join(CLUSTER_RESOURCES_DIR) @@ -1586,8 +1599,17 @@ graphs: state["applied_revision"]["resources"]["schema.knowledge"]["digest"], desired.resource_digests["schema.knowledge"] ); - // Second run: the sweep retires the stale sidecar (ledger consistent) - // and the run fails just as loudly — idempotent loudness. + let db = Omnigraph::open_read_only(&derived_graph_uri(dir.path(), "knowledge")) + .await + .unwrap(); + assert_eq!(db.schema_source().as_str(), SCHEMA); + assert!( + recovery_sidecars(dir.path()).is_empty(), + "{:?}", + recovery_sidecars(dir.path()) + ); + // Second run fails just as loudly and still leaves no sidecar because + // the engine preview rejects before graph state can move. let second = apply_config_dir(dir.path()).await; assert!(!second.ok); assert!( @@ -1596,6 +1618,45 @@ graphs: .iter() .any(|diagnostic| diagnostic.code == "schema_apply_failed") ); + assert!( + recovery_sidecars(dir.path()).is_empty(), + "{:?}", + recovery_sidecars(dir.path()) + ); + } + + #[tokio::test] + async fn apply_schema_update_blocked_by_non_main_branch_leaves_no_sidecar() { + let dir = fixture(); + init_derived_graph(dir.path()).await; + write_applyable_state(dir.path()); + let graph_uri = derived_graph_uri(dir.path(), "knowledge"); + let db = Omnigraph::open(&graph_uri).await.unwrap(); + db.branch_create("feature").await.unwrap(); + drop(db); + let before_state = read_state_json(dir.path()); + fs::write(dir.path().join("people.pg"), SCHEMA_V2).unwrap(); + + let out = apply_config_dir(dir.path()).await; + assert!(!out.ok); + assert!(out.diagnostics.iter().any(|diagnostic| { + diagnostic.code == "schema_apply_failed" + && diagnostic + .message + .contains("schema apply requires a graph with only main") + })); + assert!( + recovery_sidecars(dir.path()).is_empty(), + "{:?}", + recovery_sidecars(dir.path()) + ); + let after_state = read_state_json(dir.path()); + assert_eq!( + after_state["applied_revision"]["resources"], + before_state["applied_revision"]["resources"] + ); + let reopened = Omnigraph::open_read_only(&graph_uri).await.unwrap(); + assert_eq!(reopened.schema_source().as_str(), SCHEMA); } #[tokio::test] @@ -2964,6 +3025,10 @@ policies: .find(|change| change.resource == "policy.base") .expect("binding change must be visible in plan"); assert!(change.binding_change); + assert_eq!( + change.metadata_change, + Some(PlanMetadataChange::PolicyBindings) + ); assert_eq!(change.operation, PlanOperation::Update); assert_eq!(change.before_digest, change.after_digest); @@ -3002,9 +3067,9 @@ policies: let plan = plan_config_dir(dir.path()).await; assert!( - plan.changes - .iter() - .any(|change| change.resource == "policy.base" && change.binding_change), + plan.changes.iter().any(|change| change.resource == "policy.base" + && change.binding_change + && change.metadata_change == Some(PlanMetadataChange::PolicyBindings)), "{plan:?}" ); let out = apply_config_dir(dir.path()).await; @@ -3016,6 +3081,52 @@ policies: ); } + #[tokio::test] + async fn pre_5a_state_backfills_embedding_profile() { + let dir = fixture(); + init_derived_graph(dir.path()).await; + write_mock_embedding_cluster(dir.path(), "recorded-x"); + write_applyable_state(dir.path()); + let converge = apply_config_dir(dir.path()).await; + assert!(converge.converged, "{converge:?}"); + + let mut state = read_state_json(dir.path()); + state["applied_revision"]["resources"]["provider.embedding.default"] + .as_object_mut() + .unwrap() + .remove("embedding_profile"); + fs::write( + dir.path().join(CLUSTER_STATE_FILE), + serde_json::to_string_pretty(&state).unwrap(), + ) + .unwrap(); + + let plan = plan_config_dir(dir.path()).await; + let change = plan + .changes + .iter() + .find(|change| change.resource == "provider.embedding.default") + .expect("embedding profile backfill must be visible in plan"); + assert_eq!(change.operation, PlanOperation::Update); + assert_eq!(change.before_digest, change.after_digest); + assert_eq!( + change.metadata_change, + Some(PlanMetadataChange::EmbeddingProfile) + ); + + let out = apply_config_dir(dir.path()).await; + assert!(out.ok && out.converged, "{out:?}"); + let healed = read_state_json(dir.path()); + assert_eq!( + healed["applied_revision"]["resources"]["provider.embedding.default"] + ["embedding_profile"]["model"], + serde_json::json!("recorded-x") + ); + let snapshot = read_serving_snapshot(dir.path()).await.unwrap(); + let profile = snapshot.graphs[0].embedding.as_ref().unwrap(); + assert_eq!(profile.model.as_deref(), Some("recorded-x")); + } + #[tokio::test] async fn bindings_survive_refresh() { let dir = fixture(); @@ -3189,9 +3300,92 @@ policies: let err = read_serving_snapshot(dir.path()).await.unwrap_err(); assert!( - err.iter().any(|diagnostic| diagnostic.code == "cluster_recovery_pending"), + err.iter() + .any(|diagnostic| diagnostic.code == "cluster_no_healthy_graphs"), "{err:?}" ); + assert!( + err.iter().any(|diagnostic| { + diagnostic.code == "cluster_recovery_pending" + && diagnostic.path == "graph.knowledge" + }), + "{err:?}" + ); + } + + #[tokio::test] + async fn serving_snapshot_quarantines_one_graph_with_pending_recovery() { + let dir = fixture(); + fs::write( + dir.path().join(CLUSTER_CONFIG_FILE), + r#" +version: 1 +metadata: + name: test +state: + backend: cluster + lock: true +graphs: + knowledge: + schema: ./people.pg + archive: + schema: ./people.pg +"#, + ) + .unwrap(); + let graph_dir = dir.path().join(CLUSTER_GRAPHS_DIR); + fs::create_dir_all(&graph_dir).unwrap(); + Omnigraph::init( + graph_dir.join("knowledge.omni").to_string_lossy().as_ref(), + SCHEMA, + ) + .await + .unwrap(); + Omnigraph::init( + graph_dir.join("archive.omni").to_string_lossy().as_ref(), + SCHEMA, + ) + .await + .unwrap(); + let desired = validate_config_dir(dir.path()); + assert!(desired.ok, "{:?}", desired.diagnostics); + let schema_digest = desired.resource_digests["schema.knowledge"].clone(); + let empty_queries = BTreeMap::new(); + let knowledge_digest = graph_digest( + "knowledge", + Some(&schema_digest), + Some(&empty_queries), + None, + None, + ); + let archive_digest = graph_digest( + "archive", + Some(&schema_digest), + Some(&empty_queries), + None, + None, + ); + write_state_resources( + dir.path(), + &[ + ("graph.knowledge", knowledge_digest.as_str()), + ("schema.knowledge", schema_digest.as_str()), + ("graph.archive", archive_digest.as_str()), + ("schema.archive", schema_digest.as_str()), + ], + ); + write_schema_apply_sidecar(dir.path(), "knowledge", "whatever", "01SERVE2"); + + let snapshot = read_serving_snapshot(dir.path()).await.unwrap(); + assert_eq!(snapshot.graphs.len(), 1); + assert_eq!(snapshot.graphs[0].graph_id, "archive"); + assert!(snapshot.queries.is_empty()); + assert!(snapshot.policies.is_empty()); + assert!(snapshot.diagnostics.iter().any(|diagnostic| { + diagnostic.code == "cluster_recovery_pending" + && diagnostic.path == "graph.knowledge" + && diagnostic.severity == DiagnosticSeverity::Warning + })); } #[tokio::test] diff --git a/crates/omnigraph-cluster/src/types.rs b/crates/omnigraph-cluster/src/types.rs index 97ad406..7687575 100644 --- a/crates/omnigraph-cluster/src/types.rs +++ b/crates/omnigraph-cluster/src/types.rs @@ -176,6 +176,10 @@ pub struct PlanChange { /// pre-5A backfill case). #[serde(default, skip_serializing_if = "std::ops::Not::not")] pub binding_change: bool, + /// Metadata-only updates whose resource content digest is unchanged but + /// whose applied ledger metadata needs to converge. + #[serde(skip_serializing_if = "Option::is_none")] + pub metadata_change: Option, /// For schema updates: the engine's migration plan against the live /// graph (RFC-004 §D7's data-aware preview). Absent when the preview is /// unavailable (warning `schema_preview_unavailable`). @@ -183,6 +187,13 @@ pub struct PlanChange { pub migration: Option, } +#[derive(Debug, Clone, Copy, Serialize, PartialEq, Eq)] +#[serde(rename_all = "snake_case")] +pub enum PlanMetadataChange { + PolicyBindings, + EmbeddingProfile, +} + #[derive(Debug, Clone, Serialize, PartialEq, Eq)] pub struct BlastRadius { pub resource: String, diff --git a/crates/omnigraph-cluster/tests/failpoints.rs b/crates/omnigraph-cluster/tests/failpoints.rs index 5cdf2d4..51997ce 100644 --- a/crates/omnigraph-cluster/tests/failpoints.rs +++ b/crates/omnigraph-cluster/tests/failpoints.rs @@ -13,8 +13,9 @@ use std::fs; use std::path::{Path, PathBuf}; use fail::FailScenario; -use omnigraph_cluster::failpoints::ScopedFailPoint; use omnigraph::db::Omnigraph; +use omnigraph::failpoints::ScopedFailPoint as EngineScopedFailPoint; +use omnigraph_cluster::failpoints::ScopedFailPoint; use omnigraph_cluster::{ ApplyOptions, apply_config_dir, apply_config_dir_with_options, approve_config_dir, validate_config_dir, @@ -178,13 +179,12 @@ async fn apply_cas_race_surfaces_state_cas_mismatch() { // after apply read it but before apply writes. RAII-guarded so a panic // inside apply cannot leak the callback into the global registry. let race_path = state_path(dir.path()); - let failpoint = - ScopedFailPoint::with_callback("cluster_apply.before_state_write", move || { - let mut state: serde_json::Value = - serde_json::from_str(&fs::read_to_string(&race_path).unwrap()).unwrap(); - state["state_revision"] = serde_json::json!(99); - fs::write(&race_path, serde_json::to_string_pretty(&state).unwrap()).unwrap(); - }); + let failpoint = ScopedFailPoint::with_callback("cluster_apply.before_state_write", move || { + let mut state: serde_json::Value = + serde_json::from_str(&fs::read_to_string(&race_path).unwrap()).unwrap(); + state["state_revision"] = serde_json::json!(99); + fs::write(&race_path, serde_json::to_string_pretty(&state).unwrap()).unwrap(); + }); let out = apply_config_dir(dir.path()).await; drop(failpoint); @@ -336,10 +336,9 @@ async fn create_crash_after_init_rolls_state_forward() { ); assert!(recovered.converged); assert!(recovery_sidecars(dir.path()).is_empty()); - let state: serde_json::Value = serde_json::from_str( - &fs::read_to_string(dir.path().join("__cluster/state.json")).unwrap(), - ) - .unwrap(); + let state: serde_json::Value = + serde_json::from_str(&fs::read_to_string(dir.path().join("__cluster/state.json")).unwrap()) + .unwrap(); assert!( state["recovery_records"] .as_object() @@ -422,6 +421,105 @@ async fn schema_crash_before_apply_recovers_via_sweep() { scenario.teardown(); } +/// Engine apply fails after cluster preview and sidecar creation, but before +/// the graph manifest moves. The defensive cleanup proof should remove the +/// cluster sidecar immediately so a pre-movement error cannot brick boot. +#[tokio::test] +async fn schema_apply_error_before_graph_movement_removes_sidecar() { + let scenario = FailScenario::setup(); + let dir = fixture(); + converge_with_live_graph(dir.path()).await; + let pre_digest = live_schema_digest(dir.path()).await; + fs::write(dir.path().join("people.pg"), SCHEMA_V2).unwrap(); + + { + let _failpoint = EngineScopedFailPoint::new("schema_apply.before_staging_write", "return"); + let out = apply_config_dir(dir.path()).await; + assert!(!out.ok); + assert!( + out.diagnostics + .iter() + .any(|diagnostic| diagnostic.code == "schema_apply_failed"), + "{:?}", + out.diagnostics + ); + assert_eq!(live_schema_digest(dir.path()).await, pre_digest); + assert!( + recovery_sidecars(dir.path()).is_empty(), + "{:?}", + recovery_sidecars(dir.path()) + ); + } + + let recovered = apply_config_dir(dir.path()).await; + assert!(recovered.ok && recovered.converged, "{recovered:?}"); + assert!(recovery_sidecars(dir.path()).is_empty()); + assert_ne!(live_schema_digest(dir.path()).await, pre_digest); + scenario.teardown(); +} + +/// Engine apply fails after the graph manifest moved. The cluster cannot +/// prove this is a pre-movement failure, so the sidecar must survive for +/// explicit recovery/quarantine instead of being cleaned up defensively. +#[tokio::test] +async fn schema_apply_error_after_graph_movement_keeps_sidecar() { + let scenario = FailScenario::setup(); + let dir = fixture(); + converge_with_live_graph(dir.path()).await; + let pre_digest = live_schema_digest(dir.path()).await; + fs::write(dir.path().join("people.pg"), SCHEMA_V2).unwrap(); + let desired = validate_config_dir(dir.path()); + let v2_digest = desired.resource_digests["schema.knowledge"].clone(); + + { + let _failpoint = EngineScopedFailPoint::new("schema_apply.after_manifest_commit", "return"); + let out = apply_config_dir(dir.path()).await; + assert!(!out.ok); + assert!( + out.diagnostics + .iter() + .any(|diagnostic| diagnostic.code == "schema_apply_failed"), + "{:?}", + out.diagnostics + ); + // Read-only opens do not run engine schema-state recovery, so the + // schema file still reads as the old digest even though the manifest + // has moved. The cluster sidecar must remain because movement was + // detected by the fallback manifest-version proof. + assert_eq!(live_schema_digest(dir.path()).await, pre_digest); + let sidecars = recovery_sidecars(dir.path()); + assert_eq!(sidecars.len(), 1, "{sidecars:?}"); + let sidecar: serde_json::Value = + serde_json::from_str(&fs::read_to_string(&sidecars[0]).unwrap()).unwrap(); + assert_eq!(sidecar["kind"], "schema_apply"); + assert!(sidecar["expected_manifest_version"].is_null(), "{sidecar}"); + } + + let uri = dir.path().join("graphs/knowledge.omni"); + let db = Omnigraph::open(uri.to_string_lossy().as_ref()) + .await + .unwrap(); + assert_eq!( + db.schema_source().as_str(), + SCHEMA_V2, + "read-write open should complete engine schema-state recovery" + ); + drop(db); + assert_eq!(live_schema_digest(dir.path()).await, v2_digest); + + let recovered = apply_config_dir(dir.path()).await; + assert!(recovered.ok, "{:?}", recovered.diagnostics); + assert!( + recovered + .diagnostics + .iter() + .any(|diagnostic| diagnostic.code == "cluster_recovery_rolled_forward") + ); + assert!(recovered.converged); + assert!(recovery_sidecars(dir.path()).is_empty()); + scenario.teardown(); +} + /// Crash after the engine schema apply, before the state CAS: the manifest /// moved, the ledger is stale, nothing acknowledged; the next run's sweep /// rolls the ledger forward with an audit entry and the run converges. @@ -447,7 +545,10 @@ async fn schema_crash_after_apply_rolls_state_forward() { assert_eq!(sidecars.len(), 1); let sidecar: serde_json::Value = serde_json::from_str(&fs::read_to_string(&sidecars[0]).unwrap()).unwrap(); - assert!(sidecar["expected_manifest_version"].is_number(), "{sidecar}"); + assert!( + sidecar["expected_manifest_version"].is_number(), + "{sidecar}" + ); } let recovered = apply_config_dir(dir.path()).await; diff --git a/crates/omnigraph-server/src/lib.rs b/crates/omnigraph-server/src/lib.rs index 5451b05..fbc37d2 100644 --- a/crates/omnigraph-server/src/lib.rs +++ b/crates/omnigraph-server/src/lib.rs @@ -1,9 +1,9 @@ pub mod api; mod handlers; mod settings; -pub use settings::{load_server_settings, classify_server_runtime_state, ServerRuntimeState}; -use settings::*; use handlers::*; +use settings::*; +pub use settings::{ServerRuntimeState, classify_server_runtime_state, load_server_settings}; pub mod auth; pub mod graph_id; pub mod identity; @@ -29,10 +29,10 @@ use api::{ BranchCreateOutput, BranchCreateRequest, BranchDeleteOutput, BranchListOutput, BranchMergeOutput, BranchMergeRequest, ChangeOutput, ChangeRequest, CommitListOutput, CommitListQuery, ErrorCode, ErrorOutput, ExportRequest, GraphInfo, GraphListResponse, - HealthOutput, IngestOutput, IngestRequest, InvokeStoredQueryRequest, - InvokeStoredQueryResponse, QueriesCatalogOutput, QueryRequest, ReadOutput, ReadRequest, - SchemaApplyOutput, SchemaApplyRequest, SchemaOutput, SnapshotQuery, ingest_output, - schema_apply_output, snapshot_payload, + HealthOutput, IngestOutput, IngestRequest, InvokeStoredQueryRequest, InvokeStoredQueryResponse, + QueriesCatalogOutput, QueryRequest, ReadOutput, ReadRequest, SchemaApplyOutput, + SchemaApplyRequest, SchemaOutput, SnapshotQuery, ingest_output, schema_apply_output, + snapshot_payload, }; pub use auth::{AWS_SECRET_ENV, EnvOrFileTokenSource, TokenSource, resolve_token_source}; use axum::body::{Body, Bytes}; @@ -166,6 +166,10 @@ pub struct ServerConfig { /// who set up auth and forgot the policy file would otherwise ship /// the illusion of protection. pub allow_unauthenticated: bool, + /// Operator opt-in for fail-fast cluster boot. By default, graph-local + /// startup failures quarantine that graph and healthy graphs still serve. + /// When true, any quarantined or failed graph aborts startup. + pub require_all_graphs: bool, } /// What `load_server_settings` produces. RFC-011 cluster-only: the @@ -303,7 +307,14 @@ impl AppState { ) -> Self { let bearer_tokens = hash_bearer_tokens(bearer_tokens); let per_graph_policy = policy_engine.map(Arc::new); - Self::build_single_mode(uri, db, bearer_tokens, per_graph_policy, Arc::new(workload), None) + Self::build_single_mode( + uri, + db, + bearer_tokens, + per_graph_policy, + Arc::new(workload), + None, + ) } /// Like `new_single`, but attaches a pre-validated stored-query @@ -420,13 +431,8 @@ impl AppState { bearer_tokens: Vec<(String, String)>, policy_file: Option<&PathBuf>, ) -> Result { - Self::open_single_with_queries( - uri, - bearer_tokens, - policy_file, - QueryRegistry::default(), - ) - .await + Self::open_single_with_queries(uri, bearer_tokens, policy_file, QueryRegistry::default()) + .await } /// Single-mode boot with a stored-query registry: open the engine, @@ -509,8 +515,7 @@ impl AppState { // reserved id `default` — both the registry key and the URL // segment (`/graphs/default/...`). let uri = normalize_root_uri(&uri).unwrap_or(uri); - let graph_id = - GraphId::try_from("default").expect("'default' is a valid GraphId"); + let graph_id = GraphId::try_from("default").expect("'default' is a valid GraphId"); let key = GraphKey::cluster(graph_id); let handle = Arc::new(GraphHandle { key, @@ -889,15 +894,21 @@ pub fn build_app(state: AppState) -> Router { // flagged and their responses include RFC 9745 Deprecation + // RFC 8288 Link headers. Suppress the call-site warning for the // route registration itself. - .route("/read", post({ - #[allow(deprecated)] - server_read - })) + .route( + "/read", + post({ + #[allow(deprecated)] + server_read + }), + ) .route("/query", post(server_query)) - .route("/change", post({ - #[allow(deprecated)] - server_change - })) + .route( + "/change", + post({ + #[allow(deprecated)] + server_change + }), + ) .route("/mutate", post(server_mutate)) .route("/queries", get(server_list_queries)) .route("/queries/{name}", post(server_invoke_query)) @@ -1013,7 +1024,14 @@ pub async fn serve(config: ServerConfig) -> Result<()> { config = %config_path.display(), "serving omnigraph" ); - open_multi_graph_state(graphs, tokens, server_policy.as_ref(), config_path).await? + open_multi_graph_state( + graphs, + tokens, + server_policy.as_ref(), + config_path, + config.require_all_graphs, + ) + .await? } }; @@ -1033,9 +1051,9 @@ fn load_graph_policy(source: &PolicySource, graph_id: &str) -> Result, server_policy_source: Option<&PolicySource>, config_path: PathBuf, + require_all_graphs: bool, ) -> Result { - use futures::{StreamExt, TryStreamExt}; + use futures::StreamExt; if graphs.is_empty() { bail!("multi-graph mode requires at least one graph in the `graphs:` map"); @@ -1058,21 +1077,48 @@ pub async fn open_multi_graph_state( // `Omnigraph::Server::"root"` entity at evaluation time. let server_policy = match server_policy_source { Some(PolicySource::File(path)) => Some(PolicyEngine::load_server(path)?), - Some(PolicySource::Inline(source)) => { - Some(PolicyEngine::load_server_from_source(source)?) - } + Some(PolicySource::Inline(source)) => Some(PolicyEngine::load_server_from_source(source)?), None => None, }; - // `try_collect` propagates the first error eagerly, dropping every - // in-flight open. `buffer_unordered + collect::>` would drain - // the stream before checking errors — incorrect for the docstring's - // "fail-fast" claim and wasteful on S3-backed graphs. - let handles: Vec> = futures::stream::iter(graphs.into_iter()) - .map(|cfg| async move { open_single_graph(cfg).await }) + let configured_graphs = graphs.len(); + let results = futures::stream::iter(graphs.into_iter()) + .map(|cfg| async move { + let graph_id = cfg.graph_id.clone(); + open_single_graph(cfg).await.map_err(|err| (graph_id, err)) + }) .buffer_unordered(4) - .try_collect() - .await?; + .collect::>() + .await; + let mut handles = Vec::new(); + let mut failed = 0usize; + for result in results { + match result { + Ok(handle) => handles.push(handle), + Err((graph_id, err)) => { + failed += 1; + warn!( + graph_id = %graph_id, + error = %err, + "graph quarantined during startup" + ); + } + } + } + if require_all_graphs && failed > 0 { + bail!( + "strict multi-graph startup requires every graph to open ({} configured, {} failed)", + configured_graphs, + failed + ); + } + if handles.is_empty() { + bail!( + "no healthy graphs opened from multi-graph startup config ({} configured, {} failed)", + configured_graphs, + failed + ); + } let workload = workload::WorkloadController::from_env(); let state = AppState::new_multi(handles, tokens, server_policy, workload, Some(config_path)) diff --git a/crates/omnigraph-server/src/main.rs b/crates/omnigraph-server/src/main.rs index 482c9af..c45b77f 100644 --- a/crates/omnigraph-server/src/main.rs +++ b/crates/omnigraph-server/src/main.rs @@ -22,6 +22,11 @@ struct Cli { /// Equivalent to setting `OMNIGRAPH_UNAUTHENTICATED=1`. #[arg(long)] unauthenticated: bool, + /// Fail startup if any applied graph is quarantined or fails to open. + /// By default, graph-local failures are logged and healthy graphs still + /// serve. Equivalent to setting `OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`. + #[arg(long)] + require_all_graphs: bool, } #[tokio::main] @@ -30,7 +35,12 @@ async fn main() -> Result<()> { init_tracing(); let cli = Cli::parse(); - let settings: ServerConfig = - load_server_settings(cli.cluster.as_ref(), cli.bind, cli.unauthenticated).await?; + let settings: ServerConfig = load_server_settings( + cli.cluster.as_ref(), + cli.bind, + cli.unauthenticated, + cli.require_all_graphs, + ) + .await?; serve(settings).await } diff --git a/crates/omnigraph-server/src/settings.rs b/crates/omnigraph-server/src/settings.rs index bb6febd..ae28205 100644 --- a/crates/omnigraph-server/src/settings.rs +++ b/crates/omnigraph-server/src/settings.rs @@ -12,6 +12,7 @@ pub(crate) async fn load_cluster_settings( cluster_dir: &PathBuf, cli_bind: Option, cli_allow_unauthenticated: bool, + cli_require_all_graphs: bool, ) -> Result { // `--cluster` accepts either a config directory (the ledger location is // resolved through cluster.yaml's `storage:` key) or a storage-root URI @@ -28,11 +29,45 @@ pub(crate) async fn load_cluster_settings( .map_err(|diagnostics| { let details = diagnostics .iter() - .map(|diagnostic| format!("[{}] {}: {}", diagnostic.code, diagnostic.path, diagnostic.message)) + .map(|diagnostic| { + format!( + "[{}] {}: {}", + diagnostic.code, diagnostic.path, diagnostic.message + ) + }) .collect::>() .join("\n "); - eyre!("the cluster at '{}' is not ready to serve:\n {details}", cluster_dir.display()) + eyre!( + "the cluster at '{}' is not ready to serve:\n {details}", + cluster_dir.display() + ) })?; + for diagnostic in &snapshot.diagnostics { + warn!( + code = %diagnostic.code, + path = %diagnostic.path, + message = %diagnostic.message, + "cluster startup diagnostic" + ); + } + let env_require_all_graphs = env_flag("OMNIGRAPH_REQUIRE_ALL_GRAPHS"); + let require_all_graphs = cli_require_all_graphs || env_require_all_graphs; + if require_all_graphs && !snapshot.diagnostics.is_empty() { + let details = snapshot + .diagnostics + .iter() + .map(|diagnostic| { + format!( + "[{}] {}: {}", + diagnostic.code, diagnostic.path, diagnostic.message + ) + }) + .collect::>() + .join("\n "); + bail!( + "strict cluster boot requires every applied graph to be ready; startup diagnostics:\n {details}" + ); + } // Bindings -> Cedar slots. The serving pipeline loads one bundle per // graph plus one server-level bundle; stacked bundles per scope are a @@ -69,6 +104,7 @@ pub(crate) async fn load_cluster_settings( } let mut graphs = Vec::new(); + let mut skipped_graphs = Vec::new(); for graph in &snapshot.graphs { let specs: Vec = snapshot .queries @@ -84,40 +120,75 @@ pub(crate) async fn load_cluster_settings( tool_name: None, }) .collect(); - let registry = QueryRegistry::from_specs(specs).map_err(|errors| { - let details = errors - .iter() - .map(|error| error.to_string()) - .collect::>() - .join("\n "); - eyre!( - "stored queries in the applied revision failed to parse:\n {details}\nrun `cluster refresh` then `cluster apply`, and restart" - ) - })?; + let registry = match QueryRegistry::from_specs(specs) { + Ok(registry) => registry, + Err(errors) => { + let details = errors + .iter() + .map(|error| error.to_string()) + .collect::>() + .join("\n "); + warn!( + graph_id = %graph.graph_id, + errors = %details, + "graph quarantined because stored queries failed to parse" + ); + skipped_graphs.push(format!( + "{}: stored queries failed to parse: {details}", + graph.graph_id + )); + continue; + } + }; + let embedding = match graph + .embedding + .as_ref() + .map(|profile| { + profile.resolve().map_err(|err| { + eyre!("embedding provider for graph '{}': {err}", graph.graph_id) + }) + }) + .transpose() + { + Ok(embedding) => embedding, + Err(err) => { + warn!( + graph_id = %graph.graph_id, + error = %err, + "graph quarantined because embedding provider configuration failed" + ); + skipped_graphs.push(format!("{}: {err}", graph.graph_id)); + continue; + } + }; graphs.push(GraphStartupConfig { graph_id: graph.graph_id.clone(), uri: graph.root.to_string_lossy().to_string(), policy: graph_policies.get(&graph.graph_id).cloned(), - embedding: graph - .embedding - .as_ref() - .map(|profile| { - profile.resolve().map_err(|err| { - eyre!("embedding provider for graph '{}': {err}", graph.graph_id) - }) - }) - .transpose()?, + embedding, queries: registry, }); } + if graphs.is_empty() { + let skipped = skipped_graphs.join(", "); + bail!( + "the cluster at '{}' has no healthy graphs to serve{}", + cluster_dir.display(), + if skipped.is_empty() { + String::new() + } else { + format!(" (quarantined: {skipped})") + } + ); + } + if require_all_graphs && !skipped_graphs.is_empty() { + bail!( + "strict cluster boot requires every graph to build startup settings (quarantined: {})", + skipped_graphs.join(", ") + ); + } - let env_unauth = std::env::var("OMNIGRAPH_UNAUTHENTICATED") - .ok() - .map(|v| { - let trimmed = v.trim(); - !trimmed.is_empty() && trimmed != "0" && !trimmed.eq_ignore_ascii_case("false") - }) - .unwrap_or(false); + let env_unauth = env_flag("OMNIGRAPH_UNAUTHENTICATED"); Ok(ServerConfig { mode: ServerConfigMode::Multi { @@ -127,6 +198,7 @@ pub(crate) async fn load_cluster_settings( }, bind: cli_bind.unwrap_or_else(|| "127.0.0.1:8080".to_string()), allow_unauthenticated: cli_allow_unauthenticated || env_unauth, + require_all_graphs, }) } @@ -138,6 +210,7 @@ pub async fn load_server_settings( cli_cluster: Option<&PathBuf>, cli_bind: Option, cli_allow_unauthenticated: bool, + cli_require_all_graphs: bool, ) -> Result { let Some(cluster_dir) = cli_cluster else { bail!( @@ -147,7 +220,23 @@ pub async fn load_server_settings( was removed in RFC-011." ); }; - load_cluster_settings(cluster_dir, cli_bind, cli_allow_unauthenticated).await + load_cluster_settings( + cluster_dir, + cli_bind, + cli_allow_unauthenticated, + cli_require_all_graphs, + ) + .await +} + +fn env_flag(name: &str) -> bool { + std::env::var(name) + .ok() + .map(|v| { + let trimmed = v.trim(); + !trimmed.is_empty() && trimmed != "0" && !trimmed.eq_ignore_ascii_case("false") + }) + .unwrap_or(false) } /// MR-723 server runtime state, classified from the three-state matrix @@ -240,7 +329,9 @@ pub(crate) fn read_bearer_tokens_file(path: &str) -> Result) -> Result> { +pub(crate) fn validate_bearer_tokens( + entries: Vec<(String, String)>, +) -> Result> { let mut seen_actors = HashSet::new(); let mut seen_tokens = HashSet::new(); let mut normalized = Vec::with_capacity(entries.len()); @@ -301,11 +392,18 @@ mod tests { /// as 404 without also masking a 401/500. Pins each outcome. #[test] fn authorize_splits_decision_from_operational_error() { - use super::{Authz, PolicyAction, PolicyCompiler, PolicyConfig, PolicyRequest, ResolvedActor, authorize}; + use super::{ + Authz, PolicyAction, PolicyCompiler, PolicyConfig, PolicyRequest, ResolvedActor, + authorize, + }; use std::sync::Arc; fn req(action: PolicyAction) -> PolicyRequest { - PolicyRequest { action, branch: None, target_branch: None } + PolicyRequest { + action, + branch: None, + target_branch: None, + } } let actor = ResolvedActor::cluster_static(Arc::from("act-alice")); @@ -345,7 +443,11 @@ mod tests { authorize( Some(&actor), Some(&engine), - PolicyRequest { action: PolicyAction::Read, branch: Some("main".to_string()), target_branch: None }, + PolicyRequest { + action: PolicyAction::Read, + branch: Some("main".to_string()), + target_branch: None + }, ) .unwrap(), Authz::Allowed @@ -354,11 +456,17 @@ mod tests { match authorize( Some(&actor), Some(&engine), - PolicyRequest { action: PolicyAction::Change, branch: Some("main".to_string()), target_branch: None }, + PolicyRequest { + action: PolicyAction::Change, + branch: Some("main".to_string()), + target_branch: None, + }, ) .unwrap() { - Authz::Denied(message) => assert!(!message.is_empty(), "a deny carries its decision message"), + Authz::Denied(message) => { + assert!(!message.is_empty(), "a deny carries its decision message") + } Authz::Allowed => panic!("change must be denied: only read is allowed"), } // Policy installed but no actor → operational failure (`Err`), NOT a @@ -397,8 +505,7 @@ mod tests { }; // Empty registry → nothing attached, no error. - let empty = - super::validate_and_attach(QueryRegistry::default(), &catalog, "g").unwrap(); + let empty = super::validate_and_attach(QueryRegistry::default(), &catalog, "g").unwrap(); assert!(empty.is_none()); // A query that type-checks → attached. @@ -407,7 +514,11 @@ mod tests { "query find_user() { match { $u: User } return { $u.name } }", )]) .unwrap(); - assert!(super::validate_and_attach(ok, &catalog, "g").unwrap().is_some()); + assert!( + super::validate_and_attach(ok, &catalog, "g") + .unwrap() + .is_some() + ); // A query referencing a type the schema lacks → boot refusal that // names both the graph label and the offending query. @@ -420,7 +531,10 @@ mod tests { let msg = err.to_string(); assert!(msg.contains("graph-x"), "labels the graph: {msg}"); assert!(msg.contains("ghost"), "names the query: {msg}"); - assert!(msg.contains("schema check"), "mentions the schema check: {msg}"); + assert!( + msg.contains("schema check"), + "mentions the schema check: {msg}" + ); } #[test] @@ -451,7 +565,7 @@ mod tests { async fn server_settings_require_cluster_boot_source() { // RFC-011 cluster-only: with no --cluster the server refuses to // start and names the cluster-required remedy. - let error = super::load_server_settings(None, None, false) + let error = super::load_server_settings(None, None, false, false) .await .unwrap_err(); assert!( @@ -534,6 +648,7 @@ mod tests { }, bind: "127.0.0.1:0".to_string(), allow_unauthenticated: false, + require_all_graphs: false, }; let result = serve(config).await; let err = result @@ -586,6 +701,7 @@ mod tests { }, bind: "127.0.0.1:0".to_string(), allow_unauthenticated: false, + require_all_graphs: false, }; let result = serve(config).await; let err = diff --git a/crates/omnigraph-server/tests/multi_graph.rs b/crates/omnigraph-server/tests/multi_graph.rs index 617cc66..5679aef 100644 --- a/crates/omnigraph-server/tests/multi_graph.rs +++ b/crates/omnigraph-server/tests/multi_graph.rs @@ -13,7 +13,6 @@ use serde_json::Value; use serial_test::serial; use tower::ServiceExt; - mod support; use support::*; @@ -414,7 +413,7 @@ async fn cluster_boot_serves_applied_state() { assert!(server_policy.is_none()); let state = - omnigraph_server::open_multi_graph_state(graphs, Vec::new(), None, config_path) + omnigraph_server::open_multi_graph_state(graphs, Vec::new(), None, config_path, false) .await .unwrap(); let app = build_app(state); @@ -424,7 +423,10 @@ async fn cluster_boot_serves_applied_state() { // GET /graphs refuses even in cluster mode. let (status, body) = json_response( &app, - Request::builder().uri("/graphs").body(Body::empty()).unwrap(), + Request::builder() + .uri("/graphs") + .body(Body::empty()) + .unwrap(), ) .await; assert_eq!(status, StatusCode::FORBIDDEN, "{body}"); @@ -460,6 +462,115 @@ async fn cluster_boot_serves_applied_state() { assert_eq!(status, StatusCode::OK, "{body}"); } +#[tokio::test] +async fn cluster_boot_quarantines_graph_open_failures() { + let temp = tempfile::tempdir().unwrap(); + let schema = "\nnode Person {\n name: String @key\n}\n"; + let good_uri = temp.path().join("good.omni"); + Omnigraph::init(good_uri.to_string_lossy().as_ref(), schema) + .await + .unwrap(); + let bad_uri = temp.path().join("missing.omni"); + let server_policy = omnigraph_server::PolicySource::Inline( + r#" +version: 1 +kind: server +groups: + admins: [act-admin] +rules: + - id: admins-list-graphs + allow: + actors: { group: admins } + actions: [graph_list] +"# + .to_string(), + ); + let graphs = vec![ + omnigraph_server::GraphStartupConfig { + graph_id: "broken".to_string(), + uri: bad_uri.to_string_lossy().to_string(), + policy: None, + embedding: None, + queries: stored_query_registry(&[]), + }, + omnigraph_server::GraphStartupConfig { + graph_id: "good".to_string(), + uri: good_uri.to_string_lossy().to_string(), + policy: None, + embedding: None, + queries: stored_query_registry(&[]), + }, + ]; + let strict_err = match omnigraph_server::open_multi_graph_state( + graphs.clone(), + vec![("act-admin".to_string(), "admin-token".to_string())], + Some(&server_policy), + temp.path().join("cluster.yaml"), + true, + ) + .await + { + Ok(_) => panic!("strict startup should reject a failed graph open"), + Err(err) => err, + }; + assert!( + strict_err + .to_string() + .contains("strict multi-graph startup requires every graph to open"), + "{strict_err}" + ); + let state = omnigraph_server::open_multi_graph_state( + graphs, + vec![("act-admin".to_string(), "admin-token".to_string())], + Some(&server_policy), + temp.path().join("cluster.yaml"), + false, + ) + .await + .unwrap(); + let mut ready: Vec<_> = state + .routing() + .registry + .list() + .iter() + .map(|handle| handle.key.graph_id.as_str().to_string()) + .collect(); + ready.sort(); + assert_eq!(ready, vec!["good"]); + let app = build_app(state); + + let (status, body) = json_response( + &app, + Request::builder() + .uri("/graphs") + .header("authorization", "Bearer admin-token") + .body(Body::empty()) + .unwrap(), + ) + .await; + assert_eq!(status, StatusCode::OK, "{body}"); + assert_eq!( + body["graphs"] + .as_array() + .unwrap() + .iter() + .map(|graph| graph["graph_id"].as_str().unwrap()) + .collect::>(), + vec!["good"] + ); + + let (status, body) = json_response( + &app, + Request::builder() + .uri("/graphs/broken/queries") + .header("authorization", "Bearer admin-token") + .body(Body::empty()) + .unwrap(), + ) + .await; + assert_eq!(status, StatusCode::NOT_FOUND, "{body}"); +} + #[tokio::test(flavor = "multi_thread")] #[serial] async fn cluster_boot_injects_embedding_provider_config() { @@ -555,6 +666,7 @@ graphs: Vec::new(), server_policy.as_ref(), config_path, + false, ) .await .unwrap(); @@ -665,7 +777,10 @@ async fn cluster_boot_wires_policy_bindings_into_cedar_slots() { .unwrap(); fs::write( temp.path().join("cluster.policy.yaml"), - permit_all_policy_yaml(&["default"]).replace("protected_branches: [main]\n", "protected_branches: [main]\nkind: server\n"), + permit_all_policy_yaml(&["default"]).replace( + "protected_branches: [main]\n", + "protected_branches: [main]\nkind: server\n", + ), ) .unwrap(); fs::write( @@ -719,7 +834,7 @@ graphs: async fn cluster_boot_refusals() { // RFC-011 cluster-only: with no --cluster, boot refuses with the // cluster-required remedy. - let err = omnigraph_server::load_server_settings(None, None, true) + let err = omnigraph_server::load_server_settings(None, None, true, false) .await .unwrap_err(); assert!(err.to_string().contains("boots from a cluster"), "{err}"); @@ -729,7 +844,12 @@ async fn cluster_boot_refusals() { // Tampered catalog blob refuses boot with the remedy. let blob_dir = dir.join("__cluster/resources/query/knowledge/find_person"); - let blob = fs::read_dir(&blob_dir).unwrap().next().unwrap().unwrap().path(); + let blob = fs::read_dir(&blob_dir) + .unwrap() + .next() + .unwrap() + .unwrap() + .path(); fs::write(&blob, "tampered").unwrap(); let err = cluster_settings(&dir).await.unwrap_err(); assert!( diff --git a/crates/omnigraph-server/tests/s3.rs b/crates/omnigraph-server/tests/s3.rs index 99bf98d..793d79d 100644 --- a/crates/omnigraph-server/tests/s3.rs +++ b/crates/omnigraph-server/tests/s3.rs @@ -11,7 +11,6 @@ use omnigraph_server::api::ReadRequest; use omnigraph_server::{AppState, build_app}; use serde_json::json; - mod support; use support::*; @@ -137,6 +136,7 @@ async fn server_boots_cluster_from_bare_storage_uri_and_serves_query() { Some(&std::path::PathBuf::from(&root)), None, true, + false, ) .await .unwrap(); @@ -153,6 +153,7 @@ async fn server_boots_cluster_from_bare_storage_uri_and_serves_query() { Vec::new(), server_policy.as_ref(), config_path, + false, ) .await .unwrap(); @@ -170,7 +171,9 @@ async fn server_boots_cluster_from_bare_storage_uri_and_serves_query() { .await .unwrap(); assert_eq!(response.status(), StatusCode::OK); - let bytes = axum::body::to_bytes(response.into_body(), usize::MAX).await.unwrap(); + let bytes = axum::body::to_bytes(response.into_body(), usize::MAX) + .await + .unwrap(); let value: serde_json::Value = serde_json::from_slice(&bytes).unwrap(); assert_eq!(value["rows"][0]["p.name"], "Ada", "{value}"); } diff --git a/crates/omnigraph-server/tests/support/mod.rs b/crates/omnigraph-server/tests/support/mod.rs index 157c58e..694db46 100644 --- a/crates/omnigraph-server/tests/support/mod.rs +++ b/crates/omnigraph-server/tests/support/mod.rs @@ -15,15 +15,12 @@ use omnigraph::db::{Omnigraph, ReadTarget}; use omnigraph::error::OmniError; use omnigraph::loader::{LoadMode, load_jsonl}; use omnigraph_policy::{PolicyChecker, PolicyEngine}; -use omnigraph_server::api::{ - BranchCreateRequest, BranchMergeRequest, ChangeRequest, ReadRequest, -}; +use omnigraph_server::api::{BranchCreateRequest, BranchMergeRequest, ChangeRequest, ReadRequest}; use omnigraph_server::queries::{QueryRegistry, RegistrySpec}; use omnigraph_server::{AppState, build_app}; use serde_json::{Value, json}; use tower::ServiceExt; - pub const MUTATION_QUERIES: &str = r#" query insert_person($name: String, $age: I32) { insert Person { name: $name, age: $age } @@ -1198,6 +1195,8 @@ graphs: temp } -pub async fn cluster_settings(dir: &Path) -> color_eyre::eyre::Result { - omnigraph_server::load_server_settings(Some(&dir.to_path_buf()), None, true).await +pub async fn cluster_settings( + dir: &Path, +) -> color_eyre::eyre::Result { + omnigraph_server::load_server_settings(Some(&dir.to_path_buf()), None, true, false).await } diff --git a/docker/entrypoint.sh b/docker/entrypoint.sh index 98587aa..79d7de7 100644 --- a/docker/entrypoint.sh +++ b/docker/entrypoint.sh @@ -17,7 +17,12 @@ if [ -n "${OMNIGRAPH_CLUSTER:-}" ]; then echo "OMNIGRAPH_CLUSTER is an exclusive boot source; unset OMNIGRAPH_TARGET_URI/OMNIGRAPH_CONFIG/OMNIGRAPH_TARGET" >&2 exit 64 fi - exec "$SERVER_BIN" --cluster "${OMNIGRAPH_CLUSTER}" --bind "${bind}" + set -- --cluster "${OMNIGRAPH_CLUSTER}" --bind "${bind}" + case "${OMNIGRAPH_REQUIRE_ALL_GRAPHS:-}" in + ""|0|false|FALSE) ;; + *) set -- "$@" --require-all-graphs ;; + esac + exec "$SERVER_BIN" "$@" fi # URI comes from the env var (the positional arg wins over any config @@ -46,6 +51,8 @@ omnigraph-server container startup requires one of: Optional: - OMNIGRAPH_BIND (default: 0.0.0.0:8080) + - OMNIGRAPH_REQUIRE_ALL_GRAPHS (cluster mode: fail startup unless every + applied graph is healthy) - OMNIGRAPH_TARGET (used with OMNIGRAPH_CONFIG) - OMNIGRAPH_CONFIG (may also accompany OMNIGRAPH_TARGET_URI to add a policy file; the URI still comes from OMNIGRAPH_TARGET_URI) diff --git a/docs/dev/rfc-005-server-cluster-boot.md b/docs/dev/rfc-005-server-cluster-boot.md index 85df875..6c57bba 100644 --- a/docs/dev/rfc-005-server-cluster-boot.md +++ b/docs/dev/rfc-005-server-cluster-boot.md @@ -1,7 +1,7 @@ # RFC: Server Boots from Cluster State — Phase 5 of the Cluster Control Plane **Status:** Landed (5A policy bindings #175; 5B/5C the `--cluster` boot mode — one PR) -**Implementation deviations:** (1) cluster mode reuses `ServerConfigMode::Multi` (a new settings *source*, not a new enum variant; `config_path` carries the cluster dir). (2) Stored queries load via `QueryRegistry::from_specs` from verified blob *content*, not blob paths. (3) More than one policy bundle binding a single scope is a boot error (the serving pipeline holds one bundle per graph + one server-level; stacking is a later slice). (4) `GET /graphs` keeps its closed-by-default contract — without a cluster-bound bundle there is no server-level Cedar engine, so enumeration refuses. +**Implementation deviations:** (1) cluster mode reuses `ServerConfigMode::Multi` (a new settings *source*, not a new enum variant; `config_path` carries the cluster dir). (2) Stored queries load via `QueryRegistry::from_specs` from verified blob *content*, not blob paths. (3) More than one policy bundle binding a single scope is a boot error (the serving pipeline holds one bundle per graph + one server-level; stacking is a later slice). (4) `GET /graphs` keeps its closed-by-default contract — without a cluster-bound bundle there is no server-level Cedar engine, so enumeration refuses. (5) Graph-attributed startup failures quarantine that graph by default; operators can restore all-or-nothing boot with `--require-all-graphs` / `OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`. **Date:** 2026-06-10 **Builds on:** Phase 4 complete ([rfc-004-cluster-graph-schema-apply.md](rfc-004-cluster-graph-schema-apply.md), Landed): `cluster apply` converges graphs, schemas, stored queries, and policies into the cluster catalog. Normative context: [cluster-config-specs.md](cluster-config-specs.md) (the migration model's "window 2"), [cluster-axioms.md](cluster-axioms.md) (axiom 15), [cluster-config-implementation-spec.md](cluster-config-implementation-spec.md) (Phase 5 rollout, Compatibility Stance #7–#9, exit criterion 7). **Target release:** unversioned (phased — see Sequencing). @@ -46,8 +46,8 @@ Mode inference gains rule 0: `--cluster ` → **Cluster mode**, which is al `load_server_settings` grows a cluster branch that reads, in order: -1. `__cluster/state.json` — **missing state is a boot error** ("run `cluster import` + `cluster apply` first"). Pending recovery sidecars under `__cluster/recoveries/` are also a boot error (`cluster_recovery_pending`): a server must not start serving a ledger that a sweep is about to rewrite. -2. **Graph set** = state's `graph.` resources (tombstoned graphs are absent by construction). Each graph's URI is the derived root `/graphs/.omni`. A recorded graph whose root does not open is a boot error — same fail-fast posture as today's bad URI. +1. `__cluster/state.json` — **missing state is a boot error** ("run `cluster import` + `cluster apply` first"). Invalid or unattributable recovery sidecars under `__cluster/recoveries/` are also a boot error: a server must not start if it cannot prove the blast radius. Valid graph-attributed sidecars quarantine that graph by default and are logged as `cluster_recovery_pending`; `--require-all-graphs` promotes them back to a boot error. +2. **Graph set** = state's `graph.` resources (tombstoned graphs are absent by construction). Each graph's URI is the derived root `/graphs/.omni`. A recorded graph whose root does not open quarantines that graph by default; `--require-all-graphs` restores the original fail-fast posture. 3. **Stored queries** = state's `query..` entries, content loaded from the catalog blob at the recorded digest. Blob-missing or digest-mismatched is a boot error (the catalog verification semantics from Stage 3B, applied at boot). Queries type-check at engine open exactly as today (`validate_and_attach` — unchanged). 4. **Policies** = state's `policy.` entries, content from catalog blobs, bindings from the applied metadata of D3: bundles bound to `cluster` load as the server-level Cedar engine (`PolicyEngine::load_server`); bundles bound to graphs load per-graph (`PolicyEngine::load_graph`) and install via `with_policy` — the existing two-gate structure, unchanged. 5. `cluster.yaml` is parsed **only** to validate that the directory is a cluster root (and for nothing else — explicitly not for resource content; a divergence between desired config and applied state is *served as applied*, visible via `cluster plan`). @@ -76,16 +76,19 @@ State's `StateResource` records only a digest. To make the ledger serving-suffic ### D4. Readiness and failure posture -Boot is fail-fast, matching the server's existing stance (bad policy YAML refuses boot): +Cluster-global failures are fail-fast, matching the server's existing stance (bad policy YAML refuses boot). Graph-local failures quarantine the affected graph by default so a single bad graph cannot crash-loop an otherwise healthy cluster. Operators who prefer the original all-or-nothing contract pass `--require-all-graphs` or set `OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`, which promotes every graph-local quarantine/open/settings failure to a boot error. | Condition | Behavior | |---|---| | `state.json` missing / unparseable / unsupported version | boot error | -| pending recovery sidecars | boot error (run any state-mutating cluster command to sweep) | -| recorded graph root missing or unopenable | boot error | +| invalid/unreadable/unattributable recovery sidecars | boot error (run any state-mutating cluster command to sweep or inspect) | +| valid graph-attributed recovery sidecars | quarantine that graph; strict mode boot error | +| recorded graph root missing or unopenable | quarantine that graph; strict mode boot error | | query/policy blob missing or digest-mismatched | boot error (run `cluster refresh` + `apply` to self-heal, then restart) | | policy entry without `applies_to` metadata | boot error ("re-run cluster apply", D3) | -| stored query fails type-check against the live schema | boot error (existing `validate_and_attach` behavior) | +| stored query fails parse/type-check against the live schema | quarantine that graph; strict mode boot error | +| embedding provider configuration for one graph cannot resolve | quarantine that graph; strict mode boot error | +| every applied graph is quarantined or fails startup | boot error (`cluster_no_healthy_graphs`) | | state lock held | **not** an error — boot takes no lock; it reads a point-in-time snapshot of an immutable-once-written state file (the CAS discipline means a concurrent apply produces a *new* file atomically; the server reads whichever was current at open) | ### D5. The `mcp.expose` bridge in cluster mode @@ -109,7 +112,7 @@ Rollback is the same switch in reverse — nothing in cluster mode mutates `omni - *Axiom 5*: the server serves deployed reality (applied digests), never desired intent; D3 keeps the ledger the single serving source. - *Axiom 12*: boot reads without the lock but relies on the atomic-replace write discipline; it never writes state. - *Axiom 14 / Stance #9*: the expose-all bridge is named, scoped to cluster mode, and carries its Phase 6 sunset. -- *Loud failures (deny-list)*: every degraded condition is a typed boot error with a remedy; no partial serving, no silent fallback to the yaml. +- *Loud failures (deny-list)*: every degraded condition is either a typed cluster-global boot error with a remedy or an explicit graph quarantine logged at startup; no silent fallback to the yaml. `--require-all-graphs` is the opt-in all-or-nothing mode for operators who treat any degraded graph as fatal. - *Respect the boundaries*: `omnigraph-cluster` stays free of HTTP; the server reads the catalog through a small read-only loader (either a `pub` read surface on `omnigraph-cluster` or a thin module in the server consuming the documented file formats — implementation picks the one that keeps `omnigraph-cluster` dependency-light; the state/blob formats are already a documented contract). ## Sequencing @@ -117,7 +120,7 @@ Rollback is the same switch in reverse — nothing in cluster mode mutates `omni | Slice | Scope | Gate | |---|---|---| | **5A: serving metadata in state** | `applies_to` recorded on policy resources at apply + sweep roll-forward; additive state schema; `status`/plan surfacing | In-crate tests: metadata written/rolled-forward; old state parses; re-apply backfills | -| **5B: `--cluster` boot mode** | Flag + mode inference rule 0; catalog loader (state → `GraphStartupConfig`s + registries + policy engines); readiness table; OpenAPI regen if surface shifts | Server tests: boot from a converged fixture dir, serve `/graphs/{id}/query` + stored queries + Cedar gates; every D4 row refuses boot; e2e: `cluster apply` then serve — "applied means serving" | +| **5B: `--cluster` boot mode** | Flag + mode inference rule 0; catalog loader (state → `GraphStartupConfig`s + registries + policy engines); readiness table; OpenAPI regen if surface shifts | Server tests: boot from a converged fixture dir, serve `/graphs/{id}/query` + stored queries + Cedar gates; D4 cluster-global rows refuse boot; graph-local rows quarantine by default and refuse under `--require-all-graphs`; e2e: `cluster apply` then serve — "applied means serving" | | **5C: docs + caveat retirement** | `cluster-config.md` mode-switch section; `server.md`/`deployment.md`; retire the "not serving" caveats for cluster-mode deployments; migration guide (D6) | `check-agents-md.sh`; doc accuracy review | ## Exit-criteria coverage diff --git a/docs/releases/v0.7.0.md b/docs/releases/v0.7.0.md index b4ad903..24cefdf 100644 --- a/docs/releases/v0.7.0.md +++ b/docs/releases/v0.7.0.md @@ -36,6 +36,12 @@ get faster and self-healing, and text embedding becomes provider-independent. single-graph flat-route mode, positional-`` boot, and `omnigraph.yaml` `graphs:`-map boot are gone — add or remove graphs with `cluster apply` and restart. +- **Resilient cluster boot with strict opt-out.** Graph-attributed startup + failures now quarantine that graph and let healthy graphs serve; `/graphs` + lists only ready graphs, and quarantined graph routes return 404. Cluster- + global failures still refuse boot, and `--require-all-graphs` (or + `OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`) restores fail-fast all-or-nothing startup + for operators who prefer any degraded graph to abort the process. - **One storage substrate + recovery liveness.** The cluster storage backend and the engine both go through one `StorageAdapter` (versioned read, conditional replace/CAS, prefix delete), exercised by a storage fault-injection matrix. diff --git a/docs/user/clusters/config.md b/docs/user/clusters/config.md index cd4d772..04811ec 100644 --- a/docs/user/clusters/config.md +++ b/docs/user/clusters/config.md @@ -231,9 +231,11 @@ Policy entries additionally record their applied `applies_to` bindings as normalized typed refs — the state ledger is serving-sufficient for the future server-boot stage. A change to `applies_to` alone (the policy file digest unchanged) appears in the plan as an Update marked `binding_change` -(human output: `[bindings]`), applies like any catalog change, and counts -toward convergence; ledgers written before this field existed are backfilled -by the next apply. +(human output: `[bindings]`), and as `metadata_change: policy_bindings` in +structured output. Embedding provider entries similarly carry their resolved +profile in the ledger; pre-profile ledgers are backfilled by an Update with +`metadata_change: embedding_profile`. These metadata-only updates apply like +catalog changes and count toward convergence. Each plan change carries a `disposition` field — an honest preview of what `cluster apply` will do with it in this stage: `applied` (executes), `derived` @@ -322,7 +324,9 @@ cluster apply until the approval-artifact stage. Unsupported migrations (e.g. changing a property's type), engine lock contention, or graphs with user branches fail loudly as `schema_apply_failed` with the engine's message; dependent changes are demoted to `blocked` and graph-moving work stops for -the run. +the run. These pre-movement failures are checked before the cluster schema +recovery sidecar is created, so they do not leave stale recovery files behind +or brick later server boot. `cluster plan` previews schema updates with the engine's real migration plan: each schema change carries a `migration` field (`supported` + typed steps), @@ -402,20 +406,29 @@ drift is visible. Routing is always multi-graph (`/graphs/{id}/...`). Bearer tokens and the bind address stay process-level (flags/env) — they are per-replica facts, not cluster facts. -Boot is fail-fast: missing or unreadable state, pending recovery sidecars, -missing/tampered catalog blobs, policy entries without binding metadata -(pre-binding ledgers — re-run `cluster apply`), an empty graph set, more than -one policy bundle binding a single scope (split or merge bundles; stacked -scopes are a later stage), unopenable graph roots, and stored queries that no -longer type-check all refuse startup with a remedy. A held state lock is -*not* an error — boot reads the atomically-replaced state file without +Boot is fail-fast for cluster-global readiness failures: missing or +unreadable state, invalid/unattributable recovery sidecars, +missing/tampered shared catalog blobs, policy entries without binding +metadata (pre-binding ledgers — re-run `cluster apply`), an empty graph set, +more than one policy bundle binding a single scope (split or merge bundles; +stacked scopes are a later stage), cluster policy problems, or zero healthy +graphs. Valid graph-attributed recovery sidecars, unopenable graph roots, and +stored queries that no longer type-check quarantine that graph instead; the +server logs startup diagnostics, skips the graph's queries and graph-only +policy bindings, and serves any remaining healthy graphs. A held state lock +is *not* an error — boot reads the atomically-replaced state file without locking. +Use `omnigraph-server --require-all-graphs` (or +`OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`) when degraded serving is not acceptable; it +promotes every graph-local quarantine or startup failure back to a boot error. + Serving is static per process: the server reads the applied revision once at -startup, so picking up newly applied state means restarting it. Stored -queries are all listed in `GET /queries` in cluster mode (the cluster -registry has no expose flag; exposure becomes a policy decision in a later -phase). +startup, so picking up newly applied state means restarting it. `GET /graphs` +lists only ready/served graphs; quarantined graphs are omitted and their +routes return 404. Stored queries are all listed in `GET /queries` in cluster +mode (the cluster registry has no expose flag; exposure becomes a policy +decision in a later phase). ## Status diff --git a/docs/user/clusters/index.md b/docs/user/clusters/index.md index 0c2e7d7..089fd4b 100644 --- a/docs/user/clusters/index.md +++ b/docs/user/clusters/index.md @@ -221,7 +221,8 @@ applied revision is not safely servable. Each refusal names its remedy: | Boot error | Meaning | Remedy | |---|---|---| | `cluster_state_missing` | no ledger | `cluster import`, then `apply` | -| `cluster_recovery_pending` | interrupted operation awaiting sweep | run `cluster apply` (or any state-mutating command), restart | +| `cluster_recovery_pending` | graph was quarantined because an interrupted operation awaits sweep | run `cluster apply` (or any state-mutating command), restart | +| `cluster_no_healthy_graphs` | every applied graph is quarantined or failed startup | sweep/fix the graph-specific failures, then restart | | `catalog_payload_missing` / `…_digest_mismatch` | catalog blob lost or tampered | `cluster refresh`, then `apply`, restart | | `policy_bindings_missing` | ledger predates binding metadata | re-run `cluster apply` (backfills), restart | | `cluster_empty` | applied revision has no graphs | apply a cluster with ≥1 graph | @@ -231,6 +232,13 @@ A held *state lock* is deliberately **not** a boot error — the server reads the atomically-replaced ledger without locking, so serving never contends with an in-flight apply. +When at least one graph is healthy, graph-attributed recovery sidecars and +graph-local startup failures do not block the whole server. The affected +graph is skipped, its graph-only policy bindings and queries are omitted, +and `/graphs` lists only the ready graphs. Pass +`omnigraph-server --require-all-graphs` or set +`OMNIGRAPH_REQUIRE_ALL_GRAPHS=1` to make any such quarantine fail startup. + ## 6. Deployment patterns - **Replicas**: any number of `--cluster` servers can serve the same config diff --git a/docs/user/deployment.md b/docs/user/deployment.md index a0d8e9f..1772b9a 100644 --- a/docs/user/deployment.md +++ b/docs/user/deployment.md @@ -208,6 +208,7 @@ When no positional args are given, the image entrypoint |---|---| | `OMNIGRAPH_CLUSTER` | Cluster boot source — a config directory or a storage-root URI, forwarded as `--cluster`. The only boot source. | | `OMNIGRAPH_BIND` | Listen address (default `0.0.0.0:8080`). | +| `OMNIGRAPH_REQUIRE_ALL_GRAPHS` | When truthy, forwarded as `--require-all-graphs`: any graph-local quarantine or startup failure aborts cluster boot instead of serving the healthy subset. | Per-graph and server-level Cedar policy come from the cluster's applied revision (authored in `cluster.yaml` and published with `cluster apply`), diff --git a/docs/user/operations/server.md b/docs/user/operations/server.md index ced9d0d..18032e9 100644 --- a/docs/user/operations/server.md +++ b/docs/user/operations/server.md @@ -15,11 +15,24 @@ omnigraph-server --cluster --bind 0.0.0.0:8080 startup configs (id, URI, optional per-graph policy, stored-query registry) plus an optional server-level policy, then opens every configured graph in parallel at startup (bounded concurrency = 4, -fail-fast on the first open error). Routing is always multi-graph — +quarantining graph-specific open failures). Routing is always multi-graph — requests to bare flat protected paths (`/read`, `/snapshot`, …) return 404; the served surface is `/graphs/{graph_id}/...`. See [cluster-config.md](../clusters/config.md#serving-from-the-cluster-the-mode-switch) -for what is read and the fail-fast readiness rules. +for what is read and the readiness rules. + +Readiness is fail-fast for cluster-global problems: missing or unreadable +state, invalid/unattributable recovery sidecars, unreadable shared catalog +payloads, cluster policy errors, or zero healthy graphs. Graph-attributed +pending recovery sidecars and graph-specific startup failures quarantine +that graph instead; the server logs startup diagnostics and serves the +remaining healthy graphs. `GET /graphs` enumerates ready/served graphs only, +so quarantined graphs are absent and their routes return 404. + +Operators who want the original all-or-nothing boot contract can pass +`--require-all-graphs` or set `OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`. In that mode, +any graph quarantine, graph-open failure, stored-query startup failure, or +embedding-provider resolution failure aborts startup. A scheme-qualified argument (`s3://…`) reads the ledger straight from the storage root, with no local config directory. `--bind`, @@ -27,7 +40,7 @@ storage root, with no local config directory. `--bind`, ### Stored-query validation at startup -If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup** and **refuses to boot** if any query references a type or property the schema lacks — the same fail-loud posture as a malformed policy file, so schema drift surfaces at the deploy boundary rather than at invocation. Two MCP-exposed queries claiming the same tool name is likewise a boot error. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the exposed queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below). +If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup**. Query parse/type failures quarantine that graph; if no graph remains healthy, startup refuses. Two MCP-exposed queries claiming the same tool name are likewise graph-local startup failures. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the exposed queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below). ## Endpoint inventory @@ -61,7 +74,7 @@ Server-level management endpoints: | Method | Path | Auth | Action | |---|---|---|---| -| GET | `/graphs` | bearer + `graph_list` on `Server::"root"` | list registered graphs | +| GET | `/graphs` | bearer + `graph_list` on `Server::"root"` | list ready/served graphs | ### Stored-query catalog (`GET /queries`) From 3feb23af054c72d66f0392805ca61cda54eee155 Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Fri, 19 Jun 2026 14:26:50 +0300 Subject: [PATCH 12/13] feat(cli): surface stored-query @description/@instruction in `queries list` (#280) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * test: e2e coverage for @description/@instruction surfaces Add end-to-end tests pinning the two annotation surfaces as they exist today, at their real boundaries: - engine (lifecycle.rs): schema-level @description (node/edge/property) and @instruction (node/edge) persist verbatim into the on-disk _schema.ir.json through Omnigraph::init; property-level @instruction aborts init and writes no schema IR. - server (stored_queries.rs): query-level @description/@instruction on a stored query surface as typed QueryCatalogEntry fields over GET /queries, and a query declaring neither omits both fields. No behavior change — these document the current contract. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(cli): surface stored-query @description/@instruction in `queries list` A stored query's @description/@instruction are its catalog metadata — what it does and how to invoke it. The HTTP GET /queries catalog already carries them, but `omnigraph queries list` dropped both fields in human and --json output even though they were available on the registry entry. Carry description/instruction on QueriesListItem (Option, skipped when None) and copy them from the query decl. Human output prints an indented `description:` / `instruction:` line per query when present; --json includes the fields when present and omits them otherwise — matching the HTTP catalog shape documented in docs/user/operations/server.md. Tests (cli_queries.rs): a query with both annotations surfaces them in human + --json; a query with neither prints no annotation lines and omits both JSON fields. Co-Authored-By: Claude Opus 4.8 (1M context) * docs(cli): document `queries list` output incl. description/instruction Per AGENTS.md maintenance Rule 1, document the user-visible `queries list` output alongside the field addition. The `queries` command family had no row in the CLI reference top-level table; add one covering `list` (human + --json shapes, with description/instruction shown only when declared, matching the HTTP GET /queries catalog) and `validate`. Addresses the Greptile P2 review finding on PR #280. Co-Authored-By: Claude Opus 4.8 (1M context) * fix(cli): indent multiline stored-query annotations in `queries list` A `@description`/`@instruction` value can be multiline (GQ string literals admit newlines), which made the human `queries list` output break back to the left margin on continuation lines. Indent continuation lines to align under the first via a `print_query_annotation` helper. Addresses review feedback from @martin-g on PR #280. Co-Authored-By: Claude Opus 4.8 (1M context) --------- Co-authored-by: Claude Opus 4.8 (1M context) Co-authored-by: Ragnor Comerford --- crates/omnigraph-cli/src/helpers.rs | 27 ++++ crates/omnigraph-cli/src/output.rs | 7 ++ crates/omnigraph-cli/tests/cli_queries.rs | 119 ++++++++++++++++++ .../omnigraph-server/tests/stored_queries.rs | 41 ++++++ crates/omnigraph/tests/lifecycle.rs | 105 ++++++++++++++++ docs/user/cli/reference.md | 1 + 6 files changed, 300 insertions(+) diff --git a/crates/omnigraph-cli/src/helpers.rs b/crates/omnigraph-cli/src/helpers.rs index 971ca30..be00808 100644 --- a/crates/omnigraph-cli/src/helpers.rs +++ b/crates/omnigraph-cli/src/helpers.rs @@ -875,6 +875,25 @@ pub(crate) async fn execute_queries_validate( Ok(()) } +/// Print a stored-query annotation under its `queries list` entry. A +/// `@description`/`@instruction` value may be multiline (GQ string literals +/// admit newlines); continuation lines are indented to align under the first +/// so the catalog stays readable instead of breaking the left margin. +fn print_query_annotation(label: &str, value: &str) { + let prefix = format!(" {label}: "); + let continuation = " ".repeat(prefix.len()); + let mut lines = value.split('\n'); + match lines.next() { + Some(first) => { + println!("{prefix}{first}"); + for line in lines { + println!("{continuation}{line}"); + } + } + None => println!("{prefix}"), + } +} + /// `queries list --cluster ` (RFC-011): list the catalog's stored queries. /// With `--graph`, scope to one graph. pub(crate) async fn execute_queries_list( @@ -893,6 +912,8 @@ pub(crate) async fn execute_queries_list( mcp_expose: q.expose, tool_name: q.tool_name.clone(), mutation: q.is_mutation(), + description: q.decl.description.clone(), + instruction: q.decl.instruction.clone(), params: q .decl .params @@ -933,6 +954,12 @@ pub(crate) async fn execute_queries_list( String::new() }; println!("{kind} {}({params}){mcp}", q.name); + if let Some(description) = &q.description { + print_query_annotation("description", description); + } + if let Some(instruction) = &q.instruction { + print_query_annotation("instruction", instruction); + } } } Ok(()) diff --git a/crates/omnigraph-cli/src/output.rs b/crates/omnigraph-cli/src/output.rs index d6903f4..80de625 100644 --- a/crates/omnigraph-cli/src/output.rs +++ b/crates/omnigraph-cli/src/output.rs @@ -849,6 +849,13 @@ pub(crate) struct QueriesListItem { pub(crate) mcp_expose: bool, pub(crate) tool_name: Option, pub(crate) mutation: bool, + /// `@description` from the query declaration — what the query is for. + /// Carried so the CLI catalog matches the HTTP `GET /queries` surface. + #[serde(skip_serializing_if = "Option::is_none")] + pub(crate) description: Option, + /// `@instruction` from the query declaration — how/when to invoke it. + #[serde(skip_serializing_if = "Option::is_none")] + pub(crate) instruction: Option, pub(crate) params: Vec, } diff --git a/crates/omnigraph-cli/tests/cli_queries.rs b/crates/omnigraph-cli/tests/cli_queries.rs index 92f7879..b51018e 100644 --- a/crates/omnigraph-cli/tests/cli_queries.rs +++ b/crates/omnigraph-cli/tests/cli_queries.rs @@ -231,6 +231,125 @@ fn queries_list_prints_registered_query() { ); } +#[test] +fn queries_list_surfaces_description_and_instruction() { + // `@description`/`@instruction` are the whole point of a stored query in a + // catalog — they tell an agent/operator what it does and how to invoke it. + // The CLI catalog must surface them in both human and --json output, to + // match the HTTP `GET /queries` surface. + let cluster = converged_cluster_with_query( + "described.gq", + "query described($name: String) \ + @description(\"Find a person by exact name.\") \ + @instruction(\"Use for exact lookups; prefer search for fuzzy matches.\") \ + { match { $p: Person { name: $name } } return { $p.age } }", + " described:\n file: ./described.gq\n", + ); + + // Human output. + let output = output_success( + cli().arg("queries").arg("list").arg("--cluster").arg(cluster.path()), + ); + let stdout = stdout_string(&output); + assert!( + stdout.contains("description: Find a person by exact name."), + "human list must show @description; stdout:\n{stdout}" + ); + assert!( + stdout.contains("instruction: Use for exact lookups; prefer search for fuzzy matches."), + "human list must show @instruction; stdout:\n{stdout}" + ); + + // --json output. + let output = output_success( + cli() + .arg("queries") + .arg("list") + .arg("--cluster") + .arg(cluster.path()) + .arg("--json"), + ); + let body: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + let entry = body["queries"] + .as_array() + .unwrap() + .iter() + .find(|q| q["name"] == "described") + .unwrap(); + assert_eq!(entry["description"], "Find a person by exact name."); + assert_eq!( + entry["instruction"], + "Use for exact lookups; prefer search for fuzzy matches." + ); +} + +#[test] +fn queries_list_indents_multiline_annotation_continuation() { + // GQ string literals admit newlines, so a `@description`/`@instruction` + // can be multiline. Human output must indent continuation lines to align + // under the first rather than breaking back to the left margin. + let cluster = converged_cluster_with_query( + "multi.gq", + "query multi($name: String) \ + @description(\"line one\\nline two\") \ + { match { $p: Person { name: $name } } return { $p.age } }", + " multi:\n file: ./multi.gq\n", + ); + let output = output_success( + cli().arg("queries").arg("list").arg("--cluster").arg(cluster.path()), + ); + let stdout = stdout_string(&output); + // " description: " is 17 chars wide; the continuation aligns under it. + assert!( + stdout.contains(" description: line one\n line two"), + "multiline annotation must indent the continuation; stdout:\n{stdout}" + ); +} + +#[test] +fn queries_list_omits_annotations_when_absent() { + // The other half of the contract: a query that declares neither annotation + // prints no extra lines and omits both JSON fields entirely. This keeps the + // catalog clean rather than echoing empty `description:`/`instruction:`. + let cluster = converged_cluster_with_query( + "bare.gq", + "query bare() { match { $p: Person } return { $p.name } }", + " bare:\n file: ./bare.gq\n", + ); + + // Human output: the query is listed, but no annotation lines. + let output = output_success( + cli().arg("queries").arg("list").arg("--cluster").arg(cluster.path()), + ); + let stdout = stdout_string(&output); + assert!(stdout.contains("bare()"), "stdout:\n{stdout}"); + assert!( + !stdout.contains("description:") && !stdout.contains("instruction:"), + "a query without annotations prints no annotation lines; stdout:\n{stdout}" + ); + + // --json output: both fields omitted (not present as null). + let output = output_success( + cli() + .arg("queries") + .arg("list") + .arg("--cluster") + .arg(cluster.path()) + .arg("--json"), + ); + let body: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + let entry = body["queries"] + .as_array() + .unwrap() + .iter() + .find(|q| q["name"] == "bare") + .unwrap(); + assert!( + entry.get("description").is_none() && entry.get("instruction").is_none(), + "a query without annotations omits both JSON fields: {entry}" + ); +} + #[test] fn queries_validate_requires_a_cluster() { // RFC-011: with no --cluster (and no cluster profile), the command errors diff --git a/crates/omnigraph-server/tests/stored_queries.rs b/crates/omnigraph-server/tests/stored_queries.rs index 02553a7..00b0229 100644 --- a/crates/omnigraph-server/tests/stored_queries.rs +++ b/crates/omnigraph-server/tests/stored_queries.rs @@ -369,6 +369,47 @@ async fn list_queries_is_read_gated_so_a_non_invoker_can_list() { ); } +#[tokio::test(flavor = "multi_thread")] +async fn list_queries_surfaces_query_description_and_instruction() { + // E2e for the query-level `.gq` surface: `@description`/`@instruction` on + // a stored query declaration are carried through to clients via the typed + // `QueryCatalogEntry` fields over `GET /queries`. A query without them + // omits both fields (serde `skip_serializing_if = "Option::is_none"`). + let described = "query described($name: String) \ + @description(\"Find a person by exact name.\") \ + @instruction(\"Use for exact lookups; prefer search for fuzzy matches.\") \ + { match { $p: Person { name: $name } } return { $p.age } }"; + let (_temp, app) = app_with_stored_queries( + &[ + ("described", described, true), + ("bare", "query bare() { match { $p: Person } return { $p.name } }", true), + ], + &[("act-invoke", "t-invoke")], + INVOKE_POLICY_YAML, + ) + .await; + let (status, body) = json_response(&app, get_request(&g("/queries"), "t-invoke")).await; + assert_eq!(status, StatusCode::OK, "body: {body}"); + let entries = body["queries"].as_array().unwrap(); + + let described = entries.iter().find(|q| q["name"] == "described").unwrap(); + assert_eq!( + described["description"], "Find a person by exact name.", + "query @description surfaces over GET /queries: {described}" + ); + assert_eq!( + described["instruction"], + "Use for exact lookups; prefer search for fuzzy matches.", + "query @instruction surfaces over GET /queries: {described}" + ); + + let bare = entries.iter().find(|q| q["name"] == "bare").unwrap(); + assert!( + bare.get("description").is_none() && bare.get("instruction").is_none(), + "a query without the annotations omits both fields: {bare}" + ); +} + #[tokio::test(flavor = "multi_thread")] async fn list_queries_is_empty_when_no_registry() { let (_temp, app) = app_for_loaded_graph_with_auth("demo-token").await; diff --git a/crates/omnigraph/tests/lifecycle.rs b/crates/omnigraph/tests/lifecycle.rs index a56a80c..9488e12 100644 --- a/crates/omnigraph/tests/lifecycle.rs +++ b/crates/omnigraph/tests/lifecycle.rs @@ -304,3 +304,108 @@ async fn init_with_force_recovers_from_orphan_schema_files() { "force-recovered graph must have full schema state written" ); } + +/// E2e for the schema-level `.pg` surface: `@description` (node / edge / +/// property) and `@instruction` (node / edge only) parse, validate, and +/// persist verbatim into the on-disk `_schema.ir.json` through `Omnigraph::init` +/// — the contract that surfaces them in catalog metadata for tooling. +#[tokio::test] +async fn schema_annotations_persist_into_ir_json_on_init() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + + let schema = r#" +node Task @description("Tracked work item") @instruction("Prefer querying by slug") { + slug: String @key @description("Stable external identifier") +} + +edge DependsOn: Task -> Task @description("Hard dependency") @instruction("Use only for blockers") +"#; + + Omnigraph::init(uri, schema).await.unwrap(); + + let ir_json = fs::read_to_string(dir.path().join("_schema.ir.json")).unwrap(); + let ir: serde_json::Value = serde_json::from_str(&ir_json).unwrap(); + + // Helper: collect the {name -> value} map of annotations that carry a + // string value. Value-less annotations (e.g. `@key`, which also desugars + // to a constraint) are skipped — they aren't what this test asserts. + let anns = |v: &serde_json::Value| -> std::collections::BTreeMap { + v["annotations"] + .as_array() + .unwrap() + .iter() + .filter_map(|a| { + Some(( + a["name"].as_str()?.to_string(), + a["value"].as_str()?.to_string(), + )) + }) + .collect() + }; + + let node = ir["nodes"] + .as_array() + .unwrap() + .iter() + .find(|n| n["name"] == "Task") + .unwrap(); + let node_anns = anns(node); + assert_eq!(node_anns.get("description").map(String::as_str), Some("Tracked work item")); + assert_eq!( + node_anns.get("instruction").map(String::as_str), + Some("Prefer querying by slug"), + "node @instruction persists into _schema.ir.json" + ); + + let prop = node["properties"] + .as_array() + .unwrap() + .iter() + .find(|p| p["name"] == "slug") + .unwrap(); + assert_eq!( + anns(prop).get("description").map(String::as_str), + Some("Stable external identifier"), + "property @description persists into _schema.ir.json" + ); + + let edge = ir["edges"] + .as_array() + .unwrap() + .iter() + .find(|e| e["name"] == "DependsOn") + .unwrap(); + let edge_anns = anns(edge); + assert_eq!(edge_anns.get("description").map(String::as_str), Some("Hard dependency")); + assert_eq!(edge_anns.get("instruction").map(String::as_str), Some("Use only for blockers")); +} + +/// `@instruction` is rejected on a property at compile time, so init aborts +/// before any graph state is written (mirrors the parser-level rejection from +/// the full engine boundary). +#[tokio::test] +async fn init_rejects_instruction_on_property() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + + let schema = r#" +node Task { + slug: String @key @instruction("bad") +} +"#; + + // `Omnigraph` is not `Debug`, so match rather than `unwrap_err`. + let err = match Omnigraph::init(uri, schema).await { + Ok(_) => panic!("property-level @instruction must abort init"), + Err(err) => err, + }; + assert!( + err.to_string().contains("@instruction is only supported on node and edge types"), + "property-level @instruction must abort init: {err}" + ); + assert!( + !dir.path().join("_schema.ir.json").exists(), + "rejected init must not persist a schema IR" + ); +} diff --git a/docs/user/cli/reference.md b/docs/user/cli/reference.md index 9d83ead..1709226 100644 --- a/docs/user/cli/reference.md +++ b/docs/user/cli/reference.md @@ -26,6 +26,7 @@ Top-level command families and subcommands. Graph-targeting commands accept a po | `cleanup --keep N --older-than 7d --confirm` | destructive version GC (`--confirm` to execute; also needs `--yes` against a non-local `s3://` target — see *Write diagnostics & destructive confirmation*) | | `embed` | offline JSONL embedding pipeline | | `policy validate \| test \| explain` | Cedar tooling against a cluster's applied policies (`--cluster `; `--graph ` picks a graph's bundle when several apply). `test` takes `--tests `; `explain` takes `--actor`/`--action`/`--branch`/`--target-branch` | +| `queries list \| validate` | inspect a cluster's applied stored-query registry (`--cluster `; `--graph ` to scope one graph). `list` prints each query's kind (read/mutation), name, typed params, and `[mcp: …]` exposure; a query's `@description`/`@instruction` are shown as indented `description:` / `instruction:` lines when declared (omitted otherwise). `--json` emits `{name, mcp_expose, tool_name, mutation, params}` plus `description`/`instruction` **only when present** — matching the HTTP `GET /queries` catalog ([server.md](../operations/server.md)). `validate` type-checks the registry and exits non-zero on a broken query | | `profile list \| show []` | read-only inspection of `~/.omnigraph/config.yaml` profiles. `list` shows each profile's binding (server/cluster/store) + default graph and marks the `$OMNIGRAPH_PROFILE`-active one; JSON keeps `binding` and adds `scope_kind`, `target`, `valid`, and `error`; `show` resolves one profile's scope (endpoint + default graph), defaulting to the active profile, else the flat operator defaults | | `version` / `-v` | print `omnigraph 0.3.x` | From 57348cf7fa16fe30e9ebeb3b0d76f65ce69a5bfe Mon Sep 17 00:00:00 2001 From: Andrew Altshuler Date: Fri, 19 Jun 2026 18:42:56 +0300 Subject: [PATCH 13/13] fix(engine): preserve identifier case in filter pushdown (#283) (#285) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * test(engine): regression tests for #283 camelCase property filters Red against current code. A query (or chained mutation) that filters on a camelCase schema field lints and plans cleanly but fails at run time with "No field named reponame" because the identifier's case is destroyed at the engine->Lance boundary. Coverage added: - query.rs unit: ir_filter_to_expr on a camelCase property must emit an Expr::Column named `repoName`, not `reponame` (red); plus a green coercion guard that a camelCase int column still gets a coerced literal. - mutation.rs unit: predicate_to_sql must emit the column UNQUOTED and case-preserved (green guard documenting the committed-scan contract). - literal_filters.rs e2e: a camelCase @index field with an inline-binding pushdown filter returns the seeded row (red — read pushdown). - writes.rs e2e: an update+delete on a camelCase predicate, and a chained update that re-reads the pending side of scan_with_pending by the same camelCase predicate (red — pending MemTable scan). Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7 * fix(engine): preserve identifier case in filter pushdown (#283) Two engine->Lance boundaries lowercased camelCase column identifiers, breaking any filter on a camelCase schema field even though the IR, compiler, projection, and in-memory filtering all preserve case. Read pushdown (exec/query.rs, ir_expr_to_expr): build the column reference with datafusion::prelude::ident() instead of col(). col() routes through SQL identifier normalization and lowercases an unquoted identifier (`repoName` -> `reponame`); ident() builds an unqualified, case-preserved Column. Property refs here are always bare column names, so there is no qualified-name handling to lose. No-op for the lowercase columns that work today. Pending mutation scan (table_store.rs, scan_pending_batches): the committed-scan consumer (Lance Scanner::filter(&str)) preserves an unquoted identifier's case but treats a double-quoted "col" as a string literal, so predicate_to_sql must keep the column unquoted. The pending side splices that same unquoted predicate into a DataFusion `SELECT ... WHERE`, which would lowercase it. Make that path case-preserving by disabling sql_parser.enable_ident_normalization on its SessionContext rather than quoting (quoting would match zero committed rows). predicate_to_sql gains only a clarifying comment; its emitted string is unchanged. Full engine suite green (579 tests). Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7 * docs(dev): case study for #283 camelCase filter bug Record the root cause, the two-boundary fix (read pushdown col→ident; pending mutation scan ident-normalization off), and why the obvious symmetric "quote the column" fix is wrong (Lance reads a double-quoted column as a string literal and silently matches zero committed rows). Linked from a new "Case Studies" section in the dev index so the link check passes. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7 --------- Co-authored-by: Claude Opus 4.8 (1M context) --- crates/omnigraph/src/exec/mutation.rs | 32 ++++ crates/omnigraph/src/exec/query.rs | 65 ++++++- crates/omnigraph/src/table_store.rs | 10 +- crates/omnigraph/tests/literal_filters.rs | 26 +++ crates/omnigraph/tests/writes.rs | 67 +++++++ docs/dev/bug-case-fix.md | 217 ++++++++++++++++++++++ docs/dev/index.md | 10 + 7 files changed, 424 insertions(+), 3 deletions(-) create mode 100644 docs/dev/bug-case-fix.md diff --git a/crates/omnigraph/src/exec/mutation.rs b/crates/omnigraph/src/exec/mutation.rs index 9fcff45..fbd0751 100644 --- a/crates/omnigraph/src/exec/mutation.rs +++ b/crates/omnigraph/src/exec/mutation.rs @@ -477,6 +477,12 @@ fn predicate_to_sql( } }; + // #283: emit the column UNQUOTED. Lance's `Scanner::filter(&str)` (the + // committed-scan consumer) preserves an unquoted identifier's case but + // treats a double-quoted `"col"` as a string literal, so quoting here + // would silently match zero committed rows. The pending-batch MemTable + // query is instead made case-preserving by disabling DataFusion identifier + // normalization on its `SessionContext` (see `scan_pending_batches`). Ok(format!("{} {} {}", column, op, value_sql)) } @@ -1477,3 +1483,29 @@ fn enrich_mutation_params(params: &ParamMap) -> Result { } Ok(resolved) } + +#[cfg(test)] +mod predicate_sql_tests { + use super::*; + + // #283: a camelCase column in a mutation predicate must be emitted + // UNQUOTED and case-preserved. The committed-scan consumer, Lance's + // `Scanner::filter(&str)`, preserves an unquoted identifier's case but + // treats a double-quoted `"col"` as a string literal (which silently + // matches zero rows), so the predicate string must not quote the column. + // The pending MemTable path stays case-preserving by disabling DataFusion + // identifier normalization on its context, not by quoting here. + #[test] + fn predicate_to_sql_preserves_camelcase_column_unquoted() { + let predicate = IRMutationPredicate { + property: "repoName".to_string(), + op: CompOp::Eq, + value: IRExpr::Literal(Literal::String("acme".into())), + }; + let sql = predicate_to_sql(&predicate, &ParamMap::new(), false).unwrap(); + assert_eq!( + sql, "repoName = 'acme'", + "column must be unquoted and case-preserved, got {sql}" + ); + } +} diff --git a/crates/omnigraph/src/exec/query.rs b/crates/omnigraph/src/exec/query.rs index e922075..23e1434 100644 --- a/crates/omnigraph/src/exec/query.rs +++ b/crates/omnigraph/src/exec/query.rs @@ -2149,9 +2149,13 @@ pub(super) fn ir_expr_to_expr( params: &ParamMap, target: Option<&arrow_schema::DataType>, ) -> Option { - use datafusion::prelude::col; + use datafusion::prelude::ident; match expr { - IRExpr::PropAccess { property, .. } => Some(col(property)), + // #283: `ident()` preserves the identifier's case. `col()` would route + // through SQL identifier normalization and lowercase an unquoted + // camelCase column (`repoName` → `reponame`), which then fails to + // resolve against the case-sensitive Lance/Arrow schema. + IRExpr::PropAccess { property, .. } => Some(ident(property)), IRExpr::Literal(l) => literal_to_expr_coerced(l, target), IRExpr::Param(name) => params .get(name) @@ -2656,4 +2660,61 @@ mod literal_lowering_tests { "reversed-operand literal must coerce to the Int32 column type, got {expr:?}" ); } + + // Name of the left operand's column in a binary comparison `col OP lit`. + fn binary_left_column_name(e: &Expr) -> Option { + match e { + Expr::BinaryExpr(b) => match b.left.as_ref() { + Expr::Column(c) => Some(c.name.clone()), + _ => None, + }, + _ => None, + } + } + + // #283: a camelCase property must reach the scan as its exact column name, + // not a SQL-normalized (lowercased) one. `col()` lowercases unquoted + // identifiers; the pushed-down column ref must stay `repoName`. + #[test] + fn ir_filter_preserves_camelcase_column_name() { + use arrow_schema::{DataType, Field}; + let schema = arrow_schema::Schema::new(vec![Field::new("repoName", DataType::Utf8, true)]); + let filter = IRFilter { + left: IRExpr::PropAccess { + variable: "d".into(), + property: "repoName".into(), + }, + op: CompOp::Eq, + right: IRExpr::Literal(Literal::String("acme".into())), + }; + let expr = ir_filter_to_expr(&filter, &ParamMap::new(), Some(&schema)).unwrap(); + assert_eq!( + binary_left_column_name(&expr).as_deref(), + Some("repoName"), + "camelCase column must be preserved (not lowercased to `reponame`), got {expr:?}" + ); + } + + // Index preservation: a camelCase numeric column still coerces its literal + // (so the scalar BTREE stays eligible) — the col→ident fix must not disturb + // the coercion path (which resolves the column type via field_with_name). + #[test] + fn ir_filter_coerces_literal_for_camelcase_int_column() { + use arrow_schema::{DataType, Field}; + let schema = + arrow_schema::Schema::new(vec![Field::new("itemCount", DataType::Int32, true)]); + let filter = IRFilter { + left: IRExpr::PropAccess { + variable: "m".into(), + property: "itemCount".into(), + }, + op: CompOp::Eq, + right: IRExpr::Literal(Literal::Integer(2)), + }; + let expr = ir_filter_to_expr(&filter, &ParamMap::new(), Some(&schema)).unwrap(); + assert!( + binary_has_int32_literal(&expr), + "camelCase int column must keep its coerced Int32 literal (BTREE-eligible), got {expr:?}" + ); + } } diff --git a/crates/omnigraph/src/table_store.rs b/crates/omnigraph/src/table_store.rs index 0325e1e..511508f 100644 --- a/crates/omnigraph/src/table_store.rs +++ b/crates/omnigraph/src/table_store.rs @@ -1883,7 +1883,15 @@ async fn scan_pending_batches( filter: Option<&str>, ) -> Result> { let schema = pending_schema.unwrap_or_else(|| pending_batches[0].schema()); - let ctx = datafusion::execution::context::SessionContext::new(); + // #283: disable SQL identifier normalization so an unquoted camelCase + // column in `filter` (e.g. `repoName = 'acme'`, emitted unquoted by + // `predicate_to_sql` because the committed Lance scan needs it unquoted) + // is matched case-preserving against the case-sensitive MemTable schema. + // Without this, DataFusion lowercases `repoName` → `reponame` and fails to + // resolve. Quoted identifiers (the projection list below) are unaffected. + let mut config = datafusion::execution::context::SessionConfig::new(); + config.options_mut().sql_parser.enable_ident_normalization = false; + let ctx = datafusion::execution::context::SessionContext::new_with_config(config); let mem = datafusion::datasource::MemTable::try_new(schema, vec![pending_batches.to_vec()]) .map_err(|e| OmniError::Lance(e.to_string()))?; ctx.register_table("pending", Arc::new(mem)) diff --git a/crates/omnigraph/tests/literal_filters.rs b/crates/omnigraph/tests/literal_filters.rs index d486f28..9fb480a 100644 --- a/crates/omnigraph/tests/literal_filters.rs +++ b/crates/omnigraph/tests/literal_filters.rs @@ -145,3 +145,29 @@ query seen_eq() { match { $m: Metric { seen: datetime("2024-06-01T12:00:00Z") } assert_eq!(sorted_metric_names(&mut db, q, "born_eq").await, vec!["m1"]); assert_eq!(sorted_metric_names(&mut db, q, "seen_eq").await, vec!["m1"]); } + +// #283: a property-match on a camelCase `@index` field must execute, not fail +// with "No field named reponame" at the Lance scan. Exercises the pushdown arm +// (inline binding `Doc { repoName: $r }`) end-to-end. +const CC_SCHEMA: &str = r#" +node Doc { + slug: String @key + repoName: String @index +} +"#; +const CC_DATA: &str = r#"{"type":"Doc","data":{"slug":"d1","repoName":"acme"}} +{"type":"Doc","data":{"slug":"d2","repoName":"globex"}}"#; + +#[tokio::test] +async fn camelcase_property_filter_executes() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let mut db = Omnigraph::init(uri, CC_SCHEMA).await.unwrap(); + load_jsonl(&mut db, CC_DATA, LoadMode::Overwrite).await.unwrap(); + + let q = r#"query by_repo($r: String) { match { $d: Doc { repoName: $r } } return { $d.slug } }"#; + let r = query_main(&mut db, q, "by_repo", ¶ms(&[("$r", "acme")])) + .await + .expect("camelCase property filter must execute, not fail at the Lance scan"); + assert_eq!(r.num_rows(), 1, "expected exactly the d1 row for repoName=acme"); +} diff --git a/crates/omnigraph/tests/writes.rs b/crates/omnigraph/tests/writes.rs index 8120940..9cb8689 100644 --- a/crates/omnigraph/tests/writes.rs +++ b/crates/omnigraph/tests/writes.rs @@ -1646,3 +1646,70 @@ async fn branch_cascade_delete_forks_node_and_edges_under_held_queues() { "main must be untouched by the branch delete" ); } + +// #283: a mutation predicate (`where camelField = ...`) on a camelCase column +// must execute, not fail at the Lance scan with "No field named ...". Covers +// both `update` (committed scan via scan_with_pending) and `delete` +// (delete_where), which share the same emitted SQL filter string. +const CC_SCHEMA: &str = r#" +node Doc { + slug: String @key + repoName: String @index + status: String? +} +"#; +const CC_DATA: &str = r#"{"type":"Doc","data":{"slug":"d1","repoName":"acme","status":"open"}} +{"type":"Doc","data":{"slug":"d2","repoName":"globex","status":"open"}}"#; + +#[tokio::test] +async fn camelcase_mutation_predicate_updates_and_deletes() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let mut db = Omnigraph::init(uri, CC_SCHEMA).await.unwrap(); + load_jsonl(&mut db, CC_DATA, LoadMode::Overwrite).await.unwrap(); + + let m = r#" +query set_status($repo: String, $st: String) { update Doc set { status: $st } where repoName = $repo } +query del($repo: String) { delete Doc where repoName = $repo } +"#; + + let upd = db + .mutate("main", m, "set_status", ¶ms(&[("$repo", "acme"), ("$st", "closed")])) + .await + .expect("update with a camelCase predicate must execute"); + assert_eq!(upd.affected_nodes, 1, "exactly the acme Doc should update"); + + let del = db + .mutate("main", m, "del", ¶ms(&[("$repo", "globex")])) + .await + .expect("delete with a camelCase predicate must execute"); + assert_eq!(del.affected_nodes, 1, "exactly the globex Doc should delete"); + + assert_eq!(count_rows(&db, "node:Doc").await, 1, "one Doc (acme) should remain"); +} + +// #283 (pending side): a chained mutation whose 2nd op filters a camelCase +// column must read op-1's staged rows through the pending DataFusion `MemTable` +// (`SELECT … WHERE {filter}` via ctx.sql), which lowercases unquoted idents. +// This is the path the single update/delete above does NOT exercise. +#[tokio::test] +async fn camelcase_chained_mutation_reads_pending_by_camelcase() { + let dir = tempfile::tempdir().unwrap(); + let uri = dir.path().to_str().unwrap(); + let mut db = Omnigraph::init(uri, CC_SCHEMA).await.unwrap(); + load_jsonl(&mut db, CC_DATA, LoadMode::Overwrite).await.unwrap(); + + // op-1 stages a status change to the acme Doc; op-2 re-filters the same + // camelCase column, so it must match op-1's pending row. + let m = r#" +query chain($repo: String) { + update Doc set { status: "stage1" } where repoName = $repo + update Doc set { status: "stage2" } where repoName = $repo +} +"#; + let r = db + .mutate("main", m, "chain", ¶ms(&[("$repo", "acme")])) + .await + .expect("chained camelCase mutation must read the pending row, not fail at the MemTable SELECT"); + assert_eq!(r.affected_nodes, 2, "both ops should touch the acme Doc (read-your-writes)"); +} diff --git a/docs/dev/bug-case-fix.md b/docs/dev/bug-case-fix.md new file mode 100644 index 0000000..d5d596e --- /dev/null +++ b/docs/dev/bug-case-fix.md @@ -0,0 +1,217 @@ +# Bug case study: camelCase property filters lowercased at runtime + +**Issue:** [#283](https://github.com/ModernRelay/omnigraph/issues/283) (mirrored +in the dev-graph as `iss-990`) +**Reported on:** 0.7.0 (release binary) +**Status of code:** present on `v0.7.0`; fixed on branch `fix/iss-283-camelcase-filter` (read pushdown + pending mutation scan) +**Severity:** correctness — a valid, lint-clean query fails at run time. + +## Symptom + +A read query that filters on a **camelCase** schema field lints and plans +cleanly but fails when it executes: + +```text +No field named reponame. Column names are case sensitive. +``` + +Minimal repro: + +```pg +node SourceDocument { + repoName: String @index +} +``` + +```gq +query find($repoName: String) { + match { $d: SourceDocument { repoName: $repoName } } + return { $d.repoName } +} +``` + +`omnigraph lint` passes; running the query errors. The operator workaround is to +rename the field to all-lowercase (`repo`), which is why this looked like a +schema-design quirk rather than an engine bug. + +## Root cause + +The filter-pushdown path builds the Lance scan predicate's column reference with +`datafusion::prelude::col(property)`: + +- **Site:** `crates/omnigraph/src/exec/query.rs` — `ir_expr_to_expr`: + ```rust + IRExpr::PropAccess { property, .. } => Some(col(property)), + ``` +- `col(&str)` runs DataFusion's SQL **identifier normalization** + (`Column::from_qualified_name` → `parse_identifiers_normalized(.., false)`), + which **lowercases unquoted identifiers**. So `col("repoName")` resolves to a + column named `reponame`. +- Lance stores columns **case-preserved** (`repoName`) and resolves them + case-sensitively, so the scan can't find `reponame` and errors. + +The IR is not at fault: the parser and lowering preserve the original case +(`property: pm.prop_name.clone()`), which is exactly why the compiler resolves +`repoName` and **lint passes**. The case is destroyed only at the +engine → Lance boundary. + +There is a **second** boundary with the same root cause but a *different* +parser: the pending-batch scan in `table_store.rs::scan_pending_batches` splices +the mutation predicate string into a DataFusion `SELECT … WHERE {filter}` over a +`MemTable`, and DataFusion's SQL parser lowercases the unquoted column the same +way (`repoName` → `reponame`). See **Part 2** of the fix — it surfaces only on a +*chained* mutation that re-reads the pending side, which is why a single +update/delete on a camelCase predicate looked fine. + +### Why the rest of the engine is unaffected + +The two pushdown sites above were the offenders; the remaining paths already +treat column names case-sensitively and handle camelCase correctly: + +- **Projection / return** uses the real Arrow field name (`f.name()`). +- **In-memory filtering** (the fallback for non-pushable predicates) looks the + column up by the preserved property name against the batch schema. +- **The committed Lance mutation scan** (`Scanner::filter(&str)`) preserves an + unquoted identifier's case, so committed-row matching on a camelCase predicate + already worked. + +So the read bug surfaces for predicates that *are* pushed down (e.g. an equality +on a scalar camelCase column), and the mutation bug only for the pending-side +re-scan of a chained mutation. + +### Why it slipped through + +The `ir_filter_to_expr` unit tests only use the all-lowercase field `count`, so +no test exercised a camelCase property. Nothing in CI compared the emitted +column name against the schema's casing. + +## Fix + +There are **two** engine→Lance boundaries that lose case, and they need +**different** fixes because the two consumers disagree on quoting semantics. + +### Part 1 — read pushdown (`exec/query.rs`, `ir_expr_to_expr`) + +Use DataFusion's case-preserving column constructor, `ident()`, instead of +`col()`: + +```rust +IRExpr::PropAccess { property, .. } => Some(datafusion::prelude::ident(property)), +``` + +`ident()` builds `Expr::Column(Column::new_unqualified(property))` with no SQL +parse and no normalization, so the case is preserved. Property references here +are always bare column names (the variable is dropped via `..`), so there is no +qualified-name (`a.b`) handling to lose. + +This is the right layer and the right shape: + +- It is a **no-op for the lowercase columns that work today** (`slug`, `id`, + `status`, …) — lowercasing those was already a no-op — so there is no + regression risk for the common case. +- It makes pushdown **consistent** with projection and in-memory filtering, + which already use case-preserved names. +- It also restores **index use** for camelCase columns: today such a filter + errors before the BTREE is even considered. + +### Part 2 — pending mutation scan (`table_store.rs`, `scan_pending_batches`) + +`update`/`delete` predicates lower through `predicate_to_sql(..)` into a single +**SQL string** (`format!("{} {} {}", column, op, value_sql)`). That one string +is consumed by **two** different parsers, and *they disagree on what quoting +means*: + +- The **committed** side passes the string to Lance's `Scanner::filter(&str)`. + Lance **preserves an unquoted identifier's case** (so unquoted camelCase + *already works* on the committed scan) but treats a double-quoted `"col"` as a + **string literal** — `"repoName" = 'acme'` parses as `'repoName' = 'acme'`, + a constant-false predicate that silently matches **zero** committed rows. +- The **pending** side splices the same string into a DataFusion + `SELECT … FROM pending WHERE {filter}` over a `MemTable`. DataFusion's SQL + parser **lowercases** an unquoted identifier (`repoName` → `reponame`) and + fails to resolve against the case-sensitive `MemTable` schema. + +So no single quoting choice for the column satisfies both: quoting fixes the +pending side but breaks the committed side, and vice versa. The fix keeps the +predicate **unquoted** (what the committed Lance scan needs) and makes the +*pending* context case-preserving instead, by disabling SQL identifier +normalization on its `SessionContext`: + +```rust +let mut config = SessionConfig::new(); +config.options_mut().sql_parser.enable_ident_normalization = false; +let ctx = SessionContext::new_with_config(config); +``` + +`predicate_to_sql` itself never lowercased anything (it copies the preserved +property name), so its emitted string is unchanged — it gains only a comment +recording the unquoted contract. The projection list in the same function is +already double-quoted and is unaffected (quoted identifiers are case-preserved +under either normalization setting). + +Rejected alternatives: banning/normalizing camelCase at the compiler (a real +usability regression — camelCase fields are legitimate), lowercasing column +names in storage (a breaking on-disk change), merely making lint *warn* (a +band-aid that leaves the runtime broken), or **quoting the column in +`predicate_to_sql`** (empirically breaks 7 existing lowercase-column mutation +tests because Lance reads `"col"` as a string literal — see Part 2). + +## Scope and caveats + +- **Not Windows-specific.** The original report's environment was Windows, but + the cause is platform-independent. +- **The mutation path was only *partially* broken, and not where first + assumed.** The committed side of `scan_with_pending(..)` (Lance + `Scanner::filter(&str)`) and `delete`'s `delete_where(..)` / `Dataset::delete` + preserve an unquoted identifier's case, so a *single* `update`/`delete` on a + camelCase predicate already worked. Only the **pending** side — the in-memory + `MemTable` re-scan that a *chained* mutation hits — lowercased the column. + This was confirmed empirically: a single update+delete on `repoName` passes + unfixed; a chained update that re-reads the pending side fails with + `No field named reponame`. The fix is Part 2 above (disable identifier + normalization on the pending `SessionContext`), **not** quoting the column. + The eventual MR-A migration (`delete_where` → Lance 7 + `DeleteBuilder::execute_uncommitted`, structured `Expr`) is the longer-term + shape but is out of scope here. +- **Check the coercion lookup.** Adjacent to the fix, the literal-coercion step + (`prop_data_type(.., schema)`, which keeps the BTREE usable) also resolves the + column by name. Confirm it uses the preserved name; if it mishandles case a + camelCase filter would resolve but lose its index — a silent perf regression, + not a crash. +- **Do not use `col(r#""repoName""#)` as the general read-path fix.** Quoting + would preserve this one name, but it routes through SQL identifier parsing and + changes qualified-name semantics. The IR property here is already a bare + column name, so `ident(property)` / `Column::new_unqualified(property)` is the + precise structured expression. +- **Do not "fix" the mutation string by quoting the column.** It is tempting to + reuse a `quote_ident` helper symmetric with `literal_to_sql`'s value escaping, + but the column quote-rules differ between the two consumers of the predicate + string: Lance's `Scanner::filter(&str)` reads `"col"` as a *string literal* + (silently matching nothing), while DataFusion's `ctx.sql` reads it as a + case-preserved identifier. Because the committed Lance scan already preserves + the *unquoted* identifier's case, the column must stay unquoted and the + pending DataFusion context must be told not to normalize — not the reverse. + +## Validation (test-first) + +1. **Red:** add an `ir_filter_to_expr` test asserting the emitted + `Expr::Column` name for a camelCase property is `repoName`, not `reponame`. + Fails on current code. +2. **Green:** apply the `col` → `ident` change (Part 1) and the pending-context + `enable_ident_normalization = false` change (Part 2). +3. **End-to-end:** a camelCase `@index` field with + `match { T { camelField: $x } }` returns the row (the unit test alone can't + catch an engine↔Lance boundary regression). +4. **Mutation parity:** with the same camelCase field, cover: + - `update T where camelField == $x set otherField = ...` updates the intended + row. + - `delete T where camelField == $x` deletes the intended row and cascades as + expected. + - A chained update that hits the pending side of `scan_with_pending` still + works, so both the committed Lance scan and pending DataFusion `MemTable` + predicate paths are case-preserving. +5. **Index preservation:** keep or add a plan/trace assertion that the + camelCase `@index` equality predicate still reaches the scalar-index path. + A result-only test can pass while silently falling back to a full scan. +6. Run the full engine suite (`cargo test -p omnigraph-engine`) — in particular + the existing BTREE index-eligibility tests, which `ident()` must not disturb. diff --git a/docs/dev/index.md b/docs/dev/index.md index 1fc0b77..91f108b 100644 --- a/docs/dev/index.md +++ b/docs/dev/index.md @@ -62,6 +62,16 @@ The `docs/rfcs/` track is the **public, externally-authorable** RFC process. The maintainer/internal RFCs below (`rfc-00N-*.md`) are a separate, team-owned track; don't conflate the two. +## Case Studies + +Worked write-ups of specific bugs — root cause, fix, and the reasoning that +ruled out the tempting-but-wrong alternatives. Read these for the debugging +pattern, not just the outcome. + +| Area | Read | +|---|---| +| camelCase property filters lowercased at runtime (#283) — two engine→Lance boundaries, two different fixes | [bug-case-fix.md](bug-case-fix.md) | + ## Active Implementation Plans Working documents for in-flight feature work. Removed when the work lands.