diff --git a/docs/releases/v0.7.0.md b/docs/releases/v0.7.0.md index 4048041..b4ad903 100644 --- a/docs/releases/v0.7.0.md +++ b/docs/releases/v0.7.0.md @@ -1,90 +1,294 @@ # Omnigraph v0.7.0 -v0.7.0 takes the cluster control plane to object storage and overhauls the -configuration architecture around two single-owner surfaces. A cluster — -state ledger, content-addressed catalog, and graph data — can now live -entirely on an S3-compatible bucket, and a server can boot from that bucket -with no local files at all. Operator identity, credentials, and personal -aliases move to a home-level config; the legacy combined `omnigraph.yaml` -enters a guided, staged deprecation. +v0.7.0 is three large arcs in one release. **Operations:** the cluster control +plane moves to object storage and the configuration architecture collapses to two +single-owner surfaces — a cluster can live entirely on an S3-compatible bucket, a +server boots from it with no local files, and the legacy combined `omnigraph.yaml` +is **removed**. **CLI:** the command-line surface is unified and made honest — +embedded and remote runs are one execution path, `load` becomes the single +bulk-write command, every command declares the **capability** it needs (and +rejects flags that don't apply), and the server boots only from a cluster. +**Engine & substrate:** Lance moves to 7.x, traversal/index/recovery internals +get faster and self-healing, and text embedding becomes provider-independent. ## Highlights -- **Clusters on object storage (`storage:`).** `cluster.yaml` gains an - optional `storage: s3://bucket/prefix` root. Every stored byte — the - state ledger, lock, recovery sidecars, approval artifacts, catalog blobs, - and the derived graph roots (`/graphs/.omni`) — flows - through one storage layer, so `file://` (the default, byte-compatible - with existing clusters) and `s3://` are a single code path. The ledger's - compare-and-swap uses S3 conditional writes (`If-Match` / - `If-None-Match`), verified against AWS semantics, RustFS, and - Tigris-backed stores; the state lock is genuinely cross-machine on - object storage. -- **Config-free serving: `--cluster s3://bucket/prefix`.** The server - accepts a bare storage-root URI and boots from the applied revision on - the bucket — the ledger and catalog are the whole deployment artifact. - Policy bundles serve as digest-verified *content* from the catalog - (never re-read from disk), closing the last gap for fully remote - clusters. The preferred container shape becomes **bucket, no volume** - (see `docs/user/deployment.md`). -- **Per-operator configuration (`~/.omnigraph/`).** A home-level config - carries operator identity (`operator.actor`, the new last hop of the - `--as` chain), output defaults, named servers, and personal aliases. - `$OMNIGRAPH_HOME` relocates it; `$OMNIGRAPH_CONFIG` now stands in for - `--config` in both binaries. -- **Credentials keyed by server name.** `omnigraph login ` stores - a bearer token in `~/.omnigraph/credentials` (created `0600`; over- - permissive files are refused). Token resolution for a request whose URL - matches an operator-defined server: `OMNIGRAPH_TOKEN_` env → the - credentials file → the legacy `bearer_token_env` chain unchanged. A - token is only ever sent to the server it is keyed to. -- **Operator targeting and aliases.** `--server ` (with `--graph - ` for multi-graph servers) addresses operator-defined endpoints on - every remote-capable command. Operator aliases are pure *bindings* — - personal name → (server, graph, stored-query name, default params) — - invoking catalog-owned stored queries; they carry no query content. -- **`omnigraph.yaml` deprecation begins.** Loading the legacy file prints - a per-key notice naming each present key's new home - (`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION=1` to silence in CI). - `omnigraph config migrate` proposes — and with `--write`, applies — the - split: team half to a ready-to-review `cluster.yaml`, personal half - merged into the operator config (existing entries always win). - `omnigraph init` no longer scaffolds the file. Migrated teams can set - `OMNIGRAPH_NO_LEGACY_CONFIG=1` to turn any legacy-file load into a hard - error. The file itself keeps working until its removal at the next - major version. +### Clusters & storage on object storage -## Breaking / behavior changes +- **Clusters on object storage (`storage:`).** `cluster.yaml` gains an optional + `storage: s3://bucket/prefix` root. Every stored byte — state ledger, lock, + recovery sidecars, approval artifacts, catalog blobs, and the derived graph + roots (`/graphs/.omni`) — flows through one storage layer, so + `file://` (the default, byte-compatible with existing clusters) and `s3://` + are a single code path. The ledger's compare-and-swap uses S3 conditional + writes (`If-Match`/`If-None-Match`), verified against AWS, RustFS, and other + S3-compatible stores; the state lock is genuinely cross-machine on object + storage. +- **Config-free serving: `--cluster s3://bucket/prefix`.** The server accepts a + bare storage-root URI and boots from the applied revision on the bucket — the + ledger and catalog are the whole deployment artifact. Policy bundles serve as + digest-verified *content* from the catalog (never re-read from disk). The + preferred container shape becomes **bucket, no volume** (see + `docs/user/deployment.md`). +- **Cluster-only server.** `omnigraph-server` boots **only** from `--cluster + ` and serves N graphs (N ≥ 1) under cluster routes + (`/graphs/{id}/…`, plus a read-only `GET /graphs` enumeration). The old + single-graph flat-route mode, positional-`` boot, and `omnigraph.yaml` + `graphs:`-map boot are gone — add or remove graphs with `cluster apply` and + restart. +- **One storage substrate + recovery liveness.** The cluster storage backend and + the engine both go through one `StorageAdapter` (versioned read, conditional + replace/CAS, prefix delete), exercised by a storage fault-injection matrix. + A long-lived server now heals a recoverable write on its *next write* rather + than only at restart. -- `omnigraph init` no longer writes an `omnigraph.yaml` into the working - directory. Start cluster configs from the documentation templates, or - run `omnigraph config migrate` against an existing legacy file. -- Loading a legacy `omnigraph.yaml` now emits a deprecation block on - stderr (suppressible; see above). Output on stdout is unchanged. -- `ServingPolicy` (cluster crate API) carries verified policy *content* - instead of a blob path; `read_serving_snapshot` and several cluster - command entry points are now async. +### Configuration: two single-owner surfaces + +The legacy combined `omnigraph.yaml` is **removed**. Configuration now lives in +two surfaces with single owners, plus a zero-config tier: + +- **Cluster config (`cluster.yaml` + checkout, team-owned)** declares what the + system *is*: graphs, schemas, stored queries, policies, storage. A server boots + from it via `--cluster`. +- **Per-operator config (`~/.omnigraph/config.yaml`, person-owned)** declares who + *you* are: `operator.actor` (the last hop of the `--as` chain), output + defaults, named servers + clusters, profiles, aliases, and a default scope. + `$OMNIGRAPH_HOME` relocates it. +- **Credentials keyed by server name.** `omnigraph login ` stores a + bearer token in `~/.omnigraph/credentials` (created `0600`; over-permissive + files refused). Resolution for a request whose URL matches an operator-defined + server: `OMNIGRAPH_TOKEN_` env → the credentials file → the default + `OMNIGRAPH_BEARER_TOKEN`. A token is only ever sent to the server it is keyed to. +- **Operator targeting and aliases.** `--server ` (with `--graph ` for + multi-graph servers) addresses operator-defined endpoints. Operator aliases are + pure, **read-only** *bindings* — personal name → (server, graph, stored-query + name, default params) — invoking catalog-owned stored queries; they carry no + query content and a binding to a stored mutation is rejected. +- **Default scopes.** `defaults.server` (served) or `defaults.store` (a zero-flag + *local* default — mutually exclusive with `server`) supply the no-flag scope, + with an optional `default_graph`. `--profile ` / `$OMNIGRAPH_PROFILE` + selects a named scope bundle wholesale; `omnigraph profile list` / + `profile show []` inspect what's defined (read-only). + +### Unified, capability-aware CLI + +- **One bulk-write command: `omnigraph load`.** `load` is now the single data-write + command and works against remote graphs (over HTTP with the same bearer/actor + resolution as every other remote command) — previously the only data command + forced to open storage directly. `--mode overwrite|append|merge` is **required** + (overwrite is destructive, so there is no default); `--from ` opts into + fork-if-missing for `--branch`. `omnigraph ingest` becomes a **deprecated + alias** (`--from main --mode merge` defaults; one-line stderr warning). +- **No implicit branch forks.** Loading into a branch that does not exist is an + **error** unless `--from ` is given — a typo'd branch name no longer + silently forks `main` and lands your data there. Same rule on the server. +- **One execution path, embedded ≡ remote.** Every CLI verb runs through one + `GraphClient` with two implementations (embedded engine, HTTP) sharing a single + wire-DTO crate (`omnigraph-api-types`). An executable parity matrix runs every + verb against both and asserts identical results, so local and remote no longer + drift. +- **Declared capabilities + honest addressing.** Every command declares the + **capability** it needs — `any` (run against a graph, served or embedded), + `served` (needs a server), `direct` (direct storage access), `control` + (manage/inspect a cluster), or `local` (no graph) — and the CLI enforces it. + Wrong-capability addressing now fails loudly with a declared message (e.g. + `--server` on `optimize`) instead of being silently ignored, and a maintenance + verb pointed at a remote target is rejected. `omnigraph --help` groups commands + by capability with a legend. +- **Address cluster graphs for maintenance.** `optimize` / `repair` / `cleanup` + accept `--cluster --graph ` (`--cluster` is a cluster directory, + storage-root URI, or a `clusters:` name from `~/.omnigraph/config.yaml`), + resolving the graph's storage URI from the served cluster state (no need to + hand-type `/graphs/.omni`). `--graph` is the single graph selector + across server and cluster scopes. Conversely, `omnigraph init` **refuses** a + cluster-managed path and points at `cluster apply` — graphs in a cluster are + created with ledger/recovery/approvals, not by hand. `schema apply` refuses a + cluster-managed graph for the same reason (and the server rejects a cluster- + backed schema apply with `409`, pointing at `cluster apply`). +- **Write diagnostics + destructive-write safety (RFC-011 Decision 9).** Every + write (`load`, `mutate`, `branch create|delete|merge`, `schema apply`, + `optimize`, `repair`, `cleanup`) echoes its resolved target + access path to + stderr — e.g. `omnigraph load → s3://…/knowledge.omni (direct, remote)` — + suppressible with the global `--quiet`. Destructive writes against a + **non-local** scope (`cleanup`, overwrite `load`, `branch delete` against an + `http(s)://` server or `s3://` store/cluster) require explicit consent: the + global `--yes`, an interactive TTY prompt, or — for a non-interactive / + `--json` run — a hard refusal instead of silently proceeding. Local (`file://`) + writes are unaffected. +- **Route alignment: canonical `POST /load`.** The server gains a canonical + `POST /load`; `POST /ingest` is now a deprecated alias that emits RFC 9745 + `Deprecation: true` + RFC 8288 `Link: ; rel="successor-version"` + headers (a sibling-relative reference that resolves under `/graphs/{id}/…`). + The CLI's `load` targets `/load`. +- **Operator aliases get their own namespace (`omnigraph alias `).** A + personal binding to a stored query on a named server is invoked as + `omnigraph alias [args]` (RFC-011 Decision 4), so an alias can never + shadow — or be shadowed by — a built-in verb. `alias` rejects global scope + flags (`--server`/`--graph`/`--store`/`--cluster`/`--profile`/`--as`) its + binding already owns. +- **No-graph addressing lists candidates (RFC-011 Decision 7).** When a scope + has no `--graph` and no `default_graph`, the CLI never silently picks. A + **cluster** scope with exactly one applied graph uses it automatically and + otherwise **lists the candidates** (from the served catalog). A multi-graph + **server** lists the candidates (from `GET /graphs`) and requires `--graph `. +- **Invoke stored queries by name (RFC-011 Decision 3).** `omnigraph query + ` / `mutate ` invoke a stored query **by name** from the served + catalog — `omnigraph query find_people` instead of `--query find.gq --name + find_people`. The verb asserts the query's kind (an `expect_mutation` flag on + `POST /queries/{name}`: `query ` is rejected with `'' is a + mutation — use omnigraph mutate `, and vice-versa). `.gq` files become + the explicit ad-hoc lane (`-e` / `--query`), with the positional selecting + which query in the source. + +### Engine & substrate + +- **Lance 6.0.1 → 7.0.0.** The columnar substrate is bumped to Lance 7.x with + correct-by-design alignment: the unenforced primary key is immutable once set, + `WriteParams::auto_cleanup` is disabled so version GC stays operator-owned, and + the native namespace/`object_store` 0.13 surface is pinned by surface-guard + tests. No on-disk format change for existing graphs. +- **Indexed graph traversal.** `Expand` can run over a BTREE-indexed path, + asserted semantically equal to the CSR traversal it accelerates. +- **Scalar index coverage + filter literal coercion.** Closes index-coverage gaps + and coerces filter literals correctly, cutting query latency on indexed scans. +- **Index materialization is derived state.** `schema apply` records + `@index`/`@key` *intent* and builds nothing (index-only changes touch no table + data); `load`/`mutate` build inline through one chokepoint but **defer** an + untrainable Vector column as *pending* instead of aborting; `optimize` is the + reconciler that materializes declared-but-missing indexes and folds appended + fragments back into existing ones. +- **Recovery liveness + one storage substrate.** Writers heal a recoverable + write on the *next write* (not only at the next read-write open); a storage + fault-injection matrix exercises the sidecar lifecycle; the cluster and engine + share one `StorageAdapter` over `object_store`. +- **Branch-fork self-heal.** Manifest-unreferenced branch forks are reclaimed + (eager best-effort + a `cleanup` reconciler backstop), so a failed branch-delete + reclaim no longer wedges a reused branch name. +- **Composite `@unique(a, b)`.** Enforced as a true composite key, with one shared + keying function for intake and branch-merge that fails loudly on an un-keyable + column type rather than silently exempting it. + +### Embeddings: provider-independent (RFC-012) + +- **One client, any provider.** Text embedding moves to a single + provider-independent `EmbeddingConfig` behind a sealed `Provider` enum: + **OpenAI-compatible** (the **OpenRouter** default gateway — one key for many + models — plus OpenAI-direct and self-hosted endpoints), native **Gemini**, and + a deterministic **Mock**. One client serves both the query path and the offline + `omnigraph embed` CLI, with a per-query deadline and `tracing` observability. + The dead, uncallable compiler-crate OpenAI client (and its `reqwest`/`tokio` + deps) was removed. +- **Same-space guarantee.** `@embed("source", model="…")` records the embedding + identity (model) in the schema IR so it travels with the data; a string + `nearest()` whose resolved embedder model differs from the recorded one is + **rejected with a typed error** instead of silently ranking across vector + spaces. (`@embed` still does no ingest-time embedding — deferred to a later + phase.) + +## Breaking & behavior changes + +- **`omnigraph.yaml` is removed.** The CLI and server no longer read it at all; + the `OmnigraphConfig` type, `omnigraph config migrate`, and the deprecation + env vars (`OMNIGRAPH_NO_LEGACY_CONFIG`, `OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, + `OMNIGRAPH_CONFIG`) are gone. Configure via a team `cluster.yaml` and a + per-operator `~/.omnigraph/config.yaml` (see Upgrade notes). +- **`omnigraph-server` boots only from `--cluster`.** The positional-`` + single-graph boot and the `omnigraph.yaml` `graphs:`-map boot are removed; all + HTTP is under `/graphs/{id}/…` (with flat `/healthz` and the `/graphs` + enumeration). Upgrade deployments to `omnigraph-server --cluster `. +- **Default embedding provider flips to OpenRouter.** Embedding is no longer + hardwired to Gemini: the default provider is **OpenAI-compatible via + OpenRouter**, `OMNIGRAPH_GEMINI_BASE_URL` is dropped, and Gemini-direct users + must set `OMNIGRAPH_EMBED_PROVIDER=gemini`. A `nearest("string")` query whose + resolved model differs from a property's recorded `@embed(model=…)` is now a + typed error rather than silent cross-space ranking. +- **`query --alias ` is removed.** Invoke operator aliases via + `omnigraph alias [args]`. +- **`query`/`mutate` no longer take a positional graph URI, `--uri`, or + `--name`** (RFC-011 D3). The positional is now the query name; address the + graph with `--store` (local) / `--server` / `--profile`, and select a query + within an ad-hoc `--query`/`-e` source with the positional (replacing + `--name`). By-name catalog invocation is **served-only** (a bare `--store` has + no catalog — use `-e`/`--query` there). Scripts using + `query --query f.gq --name q` become + `query --store --query f.gq q`. +- **Legacy data-plane addressing removed** (#238): `--target`, the positional + `http(s)://`→remote dispatch, and `--as` on a served write (the actor is + resolved server-side from the bearer token) no longer exist. +- **`omnigraph load` replaces direct-storage-only loading; `--mode` is required.** + Scripts calling `load` without `--mode` must add one (`overwrite|append|merge`). +- **`omnigraph ingest` is deprecated** (still works; one-line stderr warning). + Use `load --from --mode `. +- **Loading into a missing branch is now an error without `--from`** (CLI and + `POST /load`/`POST /ingest`): a missing branch returns 404 / fails, never an + implicit fork. Pass `--from ` (CLI) or the request `from` field (HTTP) to + fork-if-missing. This affects any workflow that relied on auto-forking. +- **Scope flags that can't apply now error instead of being silently ignored.** + `--server` on any direct/control/session command, `--cluster` outside the + cluster-scoped verbs, and `--graph` where no multi-graph scope applies all fail + with a declared message. `--graph` is the single graph selector and is + **accepted** on `optimize` / `repair` / `cleanup` when paired with `--cluster` + (replacing the removed `--cluster-graph`). +- **`schema apply` is refused against a cluster-managed graph.** The CLI signposts + `omnigraph cluster apply`; a cluster-backed server returns `409 Conflict` + (after the Cedar gate, so an unauthorized actor still gets `403`). Cluster + graphs evolve through `cluster apply`, never a direct apply. +- **Storage-plane error text changed.** A maintenance verb pointed at a remote + target now fails with a declared direct-capability message (replacing the older + "only supported against local graph URIs" wording). Error strings are observable + contract (Hyrum); pin against the new text. +- **Non-local destructive writes now require `--yes` in automation.** A + `cleanup` / overwrite-`load` / `branch delete` against an `http(s)://` or + `s3://` target with `--json` (or any non-TTY context) previously executed; + it now **refuses** unless `--yes` is passed. CI scripts that destroy remote + data must add `--yes`. Local (`file://`) writes are unchanged. +- **`omnigraph init` no longer scaffolds a config file,** and **refuses a + cluster-managed storage path** (`/graphs/.omni` under a cluster) — + create those graphs with `cluster apply`. +- **`POST /ingest` is deprecated** (kept indefinitely as a shim) and returns + `Deprecation`/`Link` headers. **A v0.7 CLI talks to `POST /load`,** which a + pre-0.7 server does not expose — upgrade the server and CLI together, or keep + using `ingest`. +- **`ServingPolicy` (cluster crate API) carries verified policy content instead + of a blob path; `read_serving_snapshot` and several cluster command entry points + are now `async`.** +- **`omnigraph --help` reorders commands** (grouped by capability) and **hides + the deprecated `ingest`** from the listing — `ingest` still runs. Help text is + observable; this is a deliberate output change. ## Upgrade notes - Existing clusters need no migration: an absent `storage:` key keeps the config-directory layout byte-for-byte. -- Existing `omnigraph.yaml` setups keep working through the deprecation - window; `omnigraph config migrate` produces the recommended split. -- Operator setup is three lines: - `mkdir -p ~/.omnigraph`, write `operator.actor` (and `servers:`) into - `~/.omnigraph/config.yaml`, then `echo $TOKEN | omnigraph login `. +- **`omnigraph.yaml` is no longer read.** There is no automated migrate command + in 0.7.0; recreate configuration as a team `cluster.yaml` (graphs, schemas, + stored queries, policies — see `docs/user/clusters/`) plus a per-operator + `~/.omnigraph/config.yaml` (identity, servers, credentials, defaults — see + `docs/user/cli/reference.md`). +- **`omnigraph-server` now requires `--cluster `** — there is no + positional-URI boot. Run `cluster apply` first, then serve the applied revision. +- **Gemini-direct embedding users** set `OMNIGRAPH_EMBED_PROVIDER=gemini` (the + default is now OpenRouter); `OMNIGRAPH_GEMINI_BASE_URL` is removed. +- Audit scripts for two CLI changes: add `--mode` to every `load`, and add + `--from ` anywhere you relied on a missing branch being auto-created. +- Upgrade server and CLI together for the `/load` route (or keep `ingest`). +- Operator setup is three lines: `mkdir -p ~/.omnigraph`, write `operator.actor` + (and `servers:`) into `~/.omnigraph/config.yaml`, then + `echo $TOKEN | omnigraph login `. ## Internals -- The cluster, server, and CLI crates were modularized (the 7.9k-line - cluster `lib.rs` is now eight focused modules; the server and CLI test - monoliths split into per-area suites) — pure code movement, no behavior - change. -- New gated end-to-end suites run the full cluster lifecycle against a - real S3-compatible store in CI, including a lock-release regression and - a config-free server boot from a bare bucket URI. -- The deployment guide gains the bucket-no-volume container recipe for - AWS and Railway, validated against a live Railway deployment - (Railway buckets are S3-compatible and pass the conditional-write - contract test). +- The cluster, server, and CLI crates were modularized (the ~7.9k-line cluster + `lib.rs` into focused modules; the server and CLI test monoliths into per-area + suites) — pure code movement. +- The parity matrix (embedded vs remote) is the new referee for CLI behavior; the + OpenAPI drift test guards `openapi.json`; Lance-surface guard tests pin the + upstream APIs the engine depends on (the first smoke check on a Lance bump). +- Gated end-to-end suites run the full cluster lifecycle against a real + S3-compatible store in CI (lock-release regression, config-free boot from a + bare bucket URI). +- The deployment guide gains the bucket-no-volume container recipe for AWS / + S3-compatible object storage. +- `clap` updated to 4.6.1. CI runs the full workspace suite on `main` post-merge + rather than on every PR (faster PR turnaround; the local + `cargo test --workspace --locked` is the pre-merge gate).