omnigraph/docs/dev/rfc-011-cli-refactoring.md

755 lines
43 KiB
Markdown
Raw Normal View History

docs(rfc): RFC-011 — CLI refactoring (one addressing & config model) (#228) * docs(rfc): RFC-011 — CLI refactoring (one addressing & config model) A maintainer-internal RFC (Status: Proposed) for the post-omnigraph.yaml CLI: one ontology (store/server/cluster; cluster vs operator config; catalog; context; capability); addressing = scope + --graph with the access path *derived*; served is the default front door and direct storage is privileged (admin/break-glass); stateless per command; definitions named, payloads passed. Includes the full end-state command taxonomy (by capability), a current-state appendix, migration, invariants check, and the resolved Decisions (with two deferred). Completes the config/CLI lineage RFC-007 → RFC-008 → RFC-009 → RFC-010. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(rfc): RFC-011 — address Greptile review (4 doc fixes) - P1: end-state taxonomy `schema apply` annotation said "Open Q10" — now points at the resolved Decision 10 (cluster graphs via cluster apply). - P1: add the `alias <name>` verb (Decision 4) to the end-state taxonomy's local section — it was claimed "full command set" but omitted. - P2: Decision 11's bulk-data-plane reference now carries the "PR #219, not yet merged" caveat (matches the Relationship section). - P2: footnote now states the `check`→`lint` argv-shim is removed (its end-state disposition was unspecified). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 20:21:23 +03:00
# RFC-011: CLI refactoring — one addressing & config model
**Status:** Proposed
**Date:** 2026-06-14
**Audience:** CLI/server maintainers
**Builds on:** [rfc-007-operator-config.md](rfc-007-operator-config.md)
(per-operator config, keyed credentials, named servers),
[rfc-008-deprecate-omnigraph-yaml.md](rfc-008-deprecate-omnigraph-yaml.md)
(the legacy file this RFC finishes removing),
[rfc-009-unify-access-paths.md](rfc-009-unify-access-paths.md)
(`GraphClient` — embedded ≡ remote at the execution layer),
[rfc-010-cli-planes-restructure.md](rfc-010-cli-planes-restructure.md)
(declared planes + the wrong-plane guard this RFC subsumes).
**Sequencing:** lands as / after RFC-008 stage 5 (the `omnigraph.yaml` removal).
## Summary
Refactor the CLI around one coherent model once `omnigraph.yaml` is gone. The
shape:
- **One ontology** (store, server, cluster; cluster config vs operator config;
catalog; context; capability) where each term names exactly one concept.
- **Addressing = scope + `--graph`, with the access path *derived*.** A command
resolves a *scope* (operator defaults, an optional named *context*, or one
explicit primitive address — `--store` / `--server` / `--cluster`), selects a
graph inside it with `--graph`, and the **served-vs-direct access path falls out
of the scope's bindings × the verb's capability** — it is never a per-command
toggle and never inferred from a URI scheme.
- **Served is the front door; direct storage is privileged.** The everyday scope
is a *server* (a bearer token, no bucket credentials). Reading or writing a
remote store/cluster directly is an explicit, credentialed, admin/break-glass
act — never the default, never baked into everyday operator config.
- **The CLI is stateless per command.** No `current_context` pointer, no
`USE`-style mode; every command is fully determined by its flags + static
config. You *select* a graph, you do not *switch into* one.
- **Definitions are named; payloads are passed.** Queries (`.gq`) and schema
(`.pg`) live in the catalog and are invoked by name; params and bulk data are
the only per-call inputs.
This removes `--target`, `--cluster-graph`, `--uri` scheme-dispatch, and the
plane guard's "a `--target` that resolves to a remote URL" special case — and it
collapses the four-plane vocabulary, for users, into a single capability rule.
## Motivation: the legacy file pollutes the taxonomy
Today the CLI exposes four overlapping addressing forms but the system has only
three real entities; the mismatch is the whole problem, and `omnigraph.yaml` is
the carrier:
1. **`--target` straddles kinds.** It resolves through the legacy
`omnigraph.yaml` `graphs:` map (`config.rs::resolve_target_uri`), and that
`.uri` can be a **storage location** (`file`/`s3`) *or* a **remote server**
(`http`). One flag, two access paths with different capability and trust
models. The wrong-plane guard's storage-plane remote rejection
(`helpers.rs:467`) exists *only* to compensate for this overload.
2. **Scheme-inferred transport.** `<URI>`/`--uri` has the same disease a level
down: `is_remote_uri` (`helpers.rs:15`) silently picks embedded vs remote from
the scheme. Transport is guessed from a string, not declared.
3. **No single environment concept.** Defaults are smeared across the deprecated
`omnigraph.yaml` (`cli.graph`, `server.graph`) with no clean way to name or
switch environments.
Removing `omnigraph.yaml` is the moment to fix all three at once.
## Ontology
Every term is one concept. The rest of this RFC uses them precisely.
### Entities — the things that exist
- **Graph** — a typed property graph (node/edge types over Lance); the thing you
query and mutate. *Example: the `knowledge` graph.*
- **Store** — the storage location of a **single** graph: its Lance datasets at a
`file://`/`s3://` URI. Addressed directly with `--store`. *Example:
`s3://acme/clusters/brain/graphs/knowledge.omni`.*
- **Cluster** — a storage root holding **many** graphs plus the catalog and
control-plane state (state ledger, approvals, recovery). Managed as-code by the
team. *Example: the `brain` cluster at `s3://acme/clusters/brain`.*
- **Server** — an `omnigraph-server` process serving graphs over HTTP with bearer
auth and Cedar policy; boots from a bare graph or a cluster. *Example: `prod` at
`https://graph.example.com`, serving the `brain` cluster.*
### Config & catalog — the descriptions
- **Cluster config** — `cluster.yaml` in the cluster root, declaring the **desired
state** (graphs, schemas, stored queries, policies, storage), applied with
`cluster apply`. Team-owned; the source of truth for *what the system is*.
- **Catalog** — the **applied** registry the cluster owns in storage: the graphs,
stored queries, and policies `cluster apply` materialized. What a server serves
and what `query <name>` resolves against. *(Cluster config is the spec; the
catalog is the applied result.)*
- **Operator config** — `~/.omnigraph/config.yaml`, your **personal** file:
identity (actor), default graph, named servers/clusters, output prefs, optional
contexts. Declares *who I am*, never what the system is.
- **Context** — an optional named bundle of **defaults inside the operator
config** (one of {cluster, server, store} + a default graph). Config data,
**not state**: selecting one fills in omitted flags for a command; it does not
put you "in" a mode. Chosen per command (`--context <name>`) or per shell
(`OMNIGRAPH_CONTEXT`).
- **Credential** — a bearer token keyed to a **server name**, resolved via
`OMNIGRAPH_TOKEN_<NAME>` or `~/.omnigraph/credentials` (`0600`); sent only to
the server it is keyed to. (Per RFC-007 — the operator config holds endpoints,
never tokens.)
### What you run — definitions vs payloads
- **Schema** — the `.pg` type definitions for a graph; authored as a file, applied
via `schema apply` (or `cluster apply`).
- **Stored query** — a named query in the catalog, the team's reusable contract;
invoked by name. *Example: `find_people`.*
- **Query file (`.gq`)** — an authoring artifact holding `query <name>`
declarations; becomes a stored query when `cluster apply` adopts it. For
authoring/ad-hoc, not everyday invocation.
- **Payload** — the per-call inputs that vary each run: params (`--params`,
positional args) and bulk data (`--data`). Never part of config.
### How a command resolves
- **Scope** — the resolved environment a command addresses: operator defaults, a
named context, or one explicit primitive address.
- **Access path** — **served** (through a server) or **direct** (open storage
in-process). Derived from scope × capability; see "Access path" below.
- **Capability** — what a verb requires: `any`, `served`, `direct`, `control`,
or `local`.
- **Target shape** — whether the verb is **graph-scoped** (selects one graph
inside the scope), **scope-scoped** (operates on the whole server/cluster
scope), or **local** (does not resolve scope or graph).
- **Actor** — the identity a write is attributed to: server-resolved from the
bearer token (served), or `--as` ?? `operator.actor` (direct).
### The relationships that prevent confusion
- **Exactly two config surfaces:** **cluster config** (team) and **operator
config** (personal). Nothing else is "a config."
- A **context is not a third config** — it lives *inside* the operator config, and
it is **defaults, not state**.
- A **catalog is not config** — it is the *applied state* the cluster owns.
- A **store is one graph; a cluster is many graphs** + catalog + control state.
- A **graph is the logical thing**; store/server/cluster are ways to reach it.
- "State" elsewhere is not the context: *graph state* is committed data in Lance;
*cluster state* is the applied control-plane ledger. Neither is operator config.
## Design
### First principles
> Addressing should be 1:1 with the system's real entities; the access path
> (served vs direct) should be **derived**, never inferred from a string or
> toggled per command; the CLI should be **terse by config and stateless per
> command**; and **definitions are named while payloads are passed**.
Every command answers four orthogonal questions — kept orthogonal here:
| Axis | Question | Today | Target |
|---|---|---|---|
| Scope | which environment? | `omnigraph.yaml` defaults / `--target` | operator defaults · `--context` · one primitive |
| Target shape | whole scope or one graph? | implicit in command family | declared per verb |
| Graph | which graph in it? | tangled into the address | `--graph` only for graph-scoped server/cluster verbs |
| Access path | served or direct? | inferred from scheme / target | **derived** from scope × capability |
| Actor | who am I? | `--as` > `cli.actor` (yaml) > `operator.actor` | `--as`/`operator.actor` (direct) · token (served) |
### A scope binds one entity — and served is the default
A scope (a context, the flat defaults, or one primitive flag) binds **exactly one
of** {server, cluster, store}. Server and cluster scopes may contain many graphs
and can carry a `default_graph`; a store scope is already one graph and does not
accept `--graph`. They differ by privilege, and **the everyday default is a
server**:
- **server** → served (the everyday scope). A bearer token, **no storage
credentials**. Data verbs run through it, policy-enforced; maintenance verbs are
unavailable from this scope — there is no server route for them, so you must
name storage explicitly. This is what a normal operator's config binds.
- **cluster** → direct storage to a managed cluster, for **control,
maintenance, and graph-backed validation only** (`cluster *`,
`optimize`/`repair`/`cleanup`/`schema plan`, graph-backed `lint`, and
`queries validate`). Data verbs are **not** run directly against a cluster —
they go served, or `--store` for ad-hoc. **Privileged:** requires bucket
credentials, so it appears only in a maintainer's config or as an explicit
`--cluster` flag — never in an everyday operator's defaults.
- **store** → one graph's storage, direct. A **local file** store is ordinary
local dev; a **remote `s3://`** store is break-glass. No catalog (named queries
do not resolve — the ad-hoc lane).
A scope names **one** thing, so there is no independent `server`+`cluster` pair
that could disagree (the audit's coherence hazard is gone by construction — the
default is just a server). And the storage root lives only where it must:
### Direct storage access is privileged (the storage-root rule)
> The storage root (`s3://…`) is **server-and-admin knowledge, never
> everyday-operator knowledge.** Everyday operator config binds a server (a bearer
> token, no bucket credentials). Direct remote access — opening a cluster root or
> an `s3://` store — is always **explicit and privileged**: you name
> `--cluster`/`--store`, and only someone with bucket credentials can. The CLI
> never opens a remote store from a default scope.
This is the least-privilege posture — revoke a bearer token, don't rotate bucket
keys; only the **server process** and an occasional **maintenance admin** ever
hold storage credentials. It makes "use the server, not raw storage"
**structural**, not advisory: direct access requires credentials a normal operator
does not have *and* a flag they must type. The only storage root in an everyday
setup is the one the **server** boots from; operators never see it. (Local *file*
stores for dev are unaffected — a local file is not the production bucket.)
### Access path is derived, not chosen
The two access paths are genuinely different — not two transports for one thing:
- **Served** (through a server): the server resolves your actor from a token and
enforces Cedar policy at the HTTP boundary. In cluster mode the **catalog and
config** (graph set, stored queries, policy bundles) are pinned to the applied
serving revision and move only on restart; **graph data** is read through the
server's engine handle against the requested branch/snapshot (it is not frozen
at boot, though a long-running server will not observe *out-of-band direct
writes* to storage until its handle refreshes). No storage credentials needed.
- **Direct** (open the Lance storage in-process): a **privileged** path — it needs
your own storage credentials, so only an admin/maintainer (or a local-dev file
store) takes it. Actor self-declared (`--as` ?? `operator.actor`), reads **live
storage HEAD**. There is **no server-side identity/auth gate** — but engine-level
Cedar policy *is* still enforced when the graph selection provides a policy
(enforcement is engine-wide; embedded `_as` writers call the same `enforce`).
"Direct" means "no HTTP boundary," not "unpoliced."
Because they differ in authority, freshness, and availability, a graph reached via
a server and that graph's raw storage are **different things you name
differently** — not one identity you flip. Making the access path a per-command
toggle (`--via`) is the `--target` mistake in new clothes; it is rejected.
> **The access path follows from the scope and the verb.** A **server** scope →
> served (data/catalog). A **cluster** scope → direct control, maintenance, and
> validation. A **store** scope → direct ad-hoc data (no catalog). The verb's
> capability picks which applies and rejects the mismatches.
State the bound plainly: the everyday data path
(`query`/`mutate`/`load`/`branch`/`export`/`commit`) against a served graph
**never needs direct storage access**, and direct access is legitimate only in
bounded places: **bootstrap** (`init`), **storage-native maintenance**
(`optimize`/`repair`/`cleanup`/`schema plan`), **graph-backed validation**
(`lint`), **catalog validation** (`queries validate`), the **control plane**
(`cluster *`), **local dev** with no server, and **break-glass** (recovery, or
checking whether a long-running server's handle lags live HEAD). Everything else
is served. This is what makes "discourage direct storage" enforceable rather
than aspirational.
This list is expected to **shrink**: Decision 11 moves
`optimize`/`cleanup` (and healthy-path `repair`) to server-managed jobs, which
would leave direct access to just standalone/local dev, the control plane, and
break-glass — and remove the last routine reason an admin needs bucket
credentials.
### Capability semantics
The CLI validates through verb capability, not plane jargon:
| Capability | Meaning | Examples |
|---|---|---|
| `any` | graph-scoped data; served via a server scope; direct only against a **store** scope (local dev / break-glass); **errors on a cluster scope** | `query`, `mutate`, `load`, `export`, branch reads, `schema show/apply` |
| `served` | requires an HTTP server; may be graph-scoped or scope-scoped | `graphs list`, `queries list` |
| `direct` | graph-scoped storage-native or graph-backed validation; no server form exists | `init`, `optimize`, `repair`, `cleanup`, `schema plan`, graph-backed `lint` |
| `control` | cluster-scoped catalog/control-plane work; addresses the cluster, not a single raw store | `cluster *`, `queries validate` |
| `local` | does not address a graph or scope | `config`, `context`, `lint --query ... --schema ...` |
`any` does **not** mean "the user picks": the resolver picks from the scope.
Internally the exhaustive `command_plane` match (`planes.rs`) stays as the drift
guard; user-facing errors speak in terms of what the command needs.
### Definitions vs payloads
Queries and schema are **definitions** — contracts that live in the catalog and
are invoked **by name**; params and data are **payloads** passed per call. So the
everyday form is `omnigraph query <name> [params]`, not
`omnigraph query --file find.gq`. A `.gq` path on a routine query is a smell: the
query is not in the catalog yet. Lifecycle: **author a `.gq``cluster apply`
adopts it → invoke by name thereafter.**
Named queries resolve through a **server** (which serves the cluster's catalog).
`queries list` is therefore a served catalog read. `queries validate` is a
control/catalog check against the cluster-owned query definitions. A bare
`--store` has **no catalog**, so it is the ad-hoc lane (`-e` / `--file`), and
`--cluster` does not invoke stored queries. So named-query invocation is a
**served** convenience; direct access (`--store`) is always ad-hoc.
| Kind | Examples | How it enters a command |
|---|---|---|
| Definition | stored query, schema | named in the catalog; authored as a file, adopted by `cluster apply` |
| Payload | params, bulk data | passed per call (`--params`, positional args, `--data`) |
| Authoring / ad-hoc | a `.gq` you're writing | `-e '…'`, `--file new.gq`, `lint --query new.gq --schema schema.pg`, `schema apply --schema` |
### Resolution rule
1. If the verb is `local`, reject graph/scope flags and run without resolving a
scope.
2. If a primitive address is supplied (`--store`/`--server`/`--cluster`), use it
and ignore operator-config scope defaults. *(A **named** primitive — `--server
prod`, `--cluster brain` — still resolves through the operator-config registry;
a **literal**`--server https://…`, `--store s3://…` — bypasses it. Per
Decision 2: a value containing `://` is a literal, otherwise a config-name
lookup.)*
3. Else if `--context <name>` (or `OMNIGRAPH_CONTEXT`) selects a context, use it.
4. Else use the operator config's flat defaults. Error only if neither resolves.
*(No sticky "current" pointer — each command resolves scope fresh.)*
5. Resolve the graph only for **graph-scoped** verbs. Server/cluster scopes:
exactly one graph in scope → use it; else `default_graph`; else require
`--graph <id>`. Store scopes are already one graph, so `--graph` is rejected.
**Scope-scoped** verbs (`graphs list`, `queries list`, `queries validate`,
and `cluster *`) do not select a graph unless their own resource argument says
otherwise.
6. Derive the access path from capability × scope:
- `direct` verb → the scope's cluster/store; if the scope is a server, error
(name storage explicitly — it is privileged).
- `served` verb → the scope's server; if the scope is a cluster/store, error.
- `control` verb → the scope's cluster; if the scope is a server/store, error
(name a cluster explicitly — it is privileged).
- `any` verb → **served** if the scope is a server; **direct** against a
**store** scope (ad-hoc); on a **cluster** scope, error — cluster is
maintenance-only, so use a server for data or `--store` for ad-hoc.
7. Reject mismatches with an error naming the missing axis.
Good errors:
```text
scope "prod" has 4 graphs; pass --graph <id> or set default_graph
optimize needs direct storage access; scope "prod" is a server — name storage with --cluster s3://… or --store (requires storage credentials)
graphs list enumerates a server scope; do not pass --graph
--store opens raw storage directly, bypassing any server (no HTTP auth gate, live HEAD); for recovery/inspection
```
### Config shape (operator config)
`~/.omnigraph/config.yaml` — your personal file; the cluster config
(`cluster.yaml` + catalog) is the separate, team-owned surface. The default-graph
key is `default_graph` everywhere (the per-command flag is `--graph`).
**Everyday operator — binds a server, holds no storage root:**
```yaml
defaults:
server: prod
default_graph: knowledge
output: table
servers:
prod: { url: https://graph.example.com } # token keyed by name (RFC-007); no creds here
staging: { url: https://staging.example.com }
contexts: # optional, only for multiple environments
staging: { server: staging, default_graph: knowledge }
```
A normal operator never has a storage root or bucket credentials. Their default
scope is served; `optimize`/`repair`/`cleanup` error with a pointer to name
storage explicitly.
**Maintainer — opts into a cluster root (and has bucket credentials):**
```yaml
contexts:
brain-admin: { cluster: brain, default_graph: knowledge } # direct; admin/control/maintenance
clusters:
brain: { root: s3://acme/clusters/brain } # the s3:// root lives ONLY here
```
The `clusters:` block — the only place a storage root appears in operator config —
is **admin-only and opt-in**, absent from a normal operator's file. Equivalently,
skip config and name it per command:
`omnigraph optimize --cluster s3://acme/clusters/brain --graph knowledge`. The
cluster stays the source of truth for the managed catalog; tokens live in the
keyed credential store, never in this file.
### Command shape
Assume the everyday flat defaults: server `prod`, default graph `knowledge`.
| Intent | Command | Path |
|---|---|---|
| Run a catalog query | `omnigraph query find_people` | served |
| …with params | `omnigraph query find_people --params '{"title":"Eng"}'` | served |
| Another graph in scope | `omnigraph query find_people --graph archive` | served |
| Write | `omnigraph load --data batch.jsonl --mode append` | served |
| A different environment | `omnigraph --context staging query find_people` | served |
| One-off server, no config | `omnigraph query find_people --server https://graph.example.com --graph knowledge` | served |
| Maintain (admin, explicit storage) | `omnigraph optimize --cluster s3://acme/clusters/brain --graph knowledge` | direct (privileged) |
| Maintain (admin, via admin context) | `omnigraph --context brain-admin optimize --graph knowledge` | direct (privileged) |
| List catalog queries | `omnigraph queries list` | served |
| Validate cluster query catalog | `omnigraph queries validate --cluster s3://acme/clusters/brain` | control (privileged) |
| Offline query lint | `omnigraph lint --query new.gq --schema schema.pg` | local |
| Graph-backed query lint | `omnigraph lint --query new.gq --cluster s3://acme/clusters/brain --graph knowledge` | direct (privileged) |
| Local dev, no server | `omnigraph query -e 'match { … } return { … }' --store graph.omni` | direct (local file) |
| Break-glass: raw storage of a served graph | `omnigraph query --file find.gq --store s3://acme/clusters/brain/graphs/knowledge.omni` | direct (privileged, rare) |
Note what the everyday rows are: **all served.** `optimize` does *not* appear in
the default-scope rows — from a server scope it errors and points you to name
storage (see the resolution rule), so maintenance is always a deliberate,
credentialed act. There is no "force served/direct" row — you never toggle the
path on a configured graph; the only way to reach raw storage is to *name it*
(`--cluster`/`--store`), which makes the privileged bypass unmistakable. Everyday
rows invoke a query **by name**; a `.gq` file appears only where there is no
catalog (bare store, break-glass) via `-e`/`--file`.
## Before / after
**Before** = best available today (legacy `omnigraph.yaml` `--target`, `.gq`
files, `--cluster-graph`, scheme inference). **After** = this model.
| Intent | Before | After |
|---|---|---|
| Run a query | `omnigraph query --target knowledge --query find.gq --name find_people` | `omnigraph query find_people` |
| Another graph | `omnigraph query --target archive --query find.gq --name find_people` | `omnigraph query find_people --graph archive` |
| Load | `omnigraph load --data b.jsonl --mode append --target knowledge` | `omnigraph load --data b.jsonl --mode append` |
| Maintain (admin) | `omnigraph optimize --cluster brain --cluster-graph knowledge` | `omnigraph optimize --cluster s3://acme/clusters/brain --graph knowledge` |
| Another environment | edit `omnigraph.yaml`, or re-address with full URIs | `--context staging …` or `OMNIGRAPH_CONTEXT=staging` |
| One-off remote | `omnigraph query --uri https://… --query find.gq` *(scheme→remote)* | `omnigraph query find_people --server https://… --graph knowledge` |
| Raw storage of a served graph | `omnigraph query s3://…/knowledge.omni --query find.gq` *(looks like a normal query)* | `omnigraph query --file find.gq --store s3://…/knowledge.omni` *(explicit bypass)* |
**Removed:** `--target`; `--cluster-graph` (`--graph` is the graph selector only
for graph-scoped server/cluster verbs); `--uri` http-scheme dispatch; `--via`
(never ships); everyday `--query <file>` (definitions are named);
`omnigraph.yaml` and its `cli.graph`/`server.graph` defaults.
## Server-side corollary
The same ontology applies to `omnigraph-server` boot: with `omnigraph.yaml` gone,
a server boots from a single bare graph URI **or** a cluster (`--cluster <dir|s3>`,
RFC-005), never a `graphs:` map. The store/server/cluster ontology is then
consistent across CLI and server.
## Migration & compatibility
Addressing flags and config keys are observable contract (Hyrum); every removal is
staged and release-noted.
- **`config migrate`** (shipped) maps each legacy `graphs:` entry **by what it
actually is**: `http(s)` URIs → a `server:` (the recommended everyday shape);
`file` URIs → a local `store:`; an `s3://` **graph** URI → an **admin** `store:`
(it is a single graph, not a cluster); an `s3://` **cluster root** (one that
carries cluster state) → an **admin** `cluster:`. Everyday `s3://` graph usage
migrates with a **warning** — prefer serving it via a server rather than
re-establishing direct remote access. It reports dropped keys.
- **Operators move to a server-default scope.** Where a legacy setup pointed
`cli.graph` at an `s3://` graph for everyday use, migration flags it: the
recommended shape is a `server:` scope (bearer token, no bucket creds), with the
`s3://` root kept only in a maintainer's config — not every operator's.
- **`--target`** warns for one release, then errors; **`OMNIGRAPH_NO_LEGACY_CONFIG=1`**
(already the strict switch) becomes the default — loading `omnigraph.yaml` is a
hard error.
- **`--cluster-graph``--graph`**: `--cluster-graph` is accepted with a warning
for one release, then removed.
- **`--graph` meaning change**: today `--graph` is "graph id on a multi-graph
server" (paired with `--server`); it generalizes to "select the graph for
graph-scoped verbs in server/cluster scopes." Existing `--server --graph`
usage keeps working (it is a strict superset); release-note the broadened
meaning and the fact that store/scope-scoped verbs reject it.
- **`--uri http://…`** warns, then errors with a pointer to `--server`.
- **`--as` on served paths**: today global `--as` is accepted (a no-op on remote
writes — the server resolves the actor from the token); rejecting it on the
served path is staged — warn for one release, then error.
- **`--alias`** → the `alias` namespace (`omnigraph alias <name>`, Decision 4);
the old `--alias` flag warns for one release, then is removed.
## Non-goals
- **No change to the direct/served capability split.** Maintenance stays
storage-direct by design (no server routes for `optimize`/`repair`/`cleanup`);
this RFC only makes the split explicit.
- **No new transport.** Addressing surface, not protocol.
- **No positional sigil grammar** (`@server/graph`, `%cluster/graph`). Considered
and rejected: explicit flags are more discoverable; contexts already give
brevity. Revisit only on demonstrated expert-terseness demand.
## Decisions
The questions this RFC opened are resolved as follows. Two are explicitly
deferred (see below); they do not block the model.
1. **Local-dev path → embedded `--store` scope.** Local dev runs the engine
in-process against a `--store <file>` (or a store-scoped context); `omnigraph
serve` stays available but is not required. Consistent with embedded ≡ remote
(RFC-009).
2. **Primitives are one flag, typed by content.** `--server` and `--cluster`
accept either a config name or a literal URI: a value containing `://` is a
literal (bypasses the registry); otherwise it is a config-name lookup (error if
unknown). `--store` is always a URI. (Replaces the earlier "literal-vs-named"
question — no `--server-url`/`--cluster-root` split.)
3. **Stored invocation: `query <name>` (read) / `mutate <name>` (write), one
catalog namespace.** A name maps to one definition; the verb asserts its kind
and the CLI errors on mismatch (`'apply_labels' is a mutation — use
omnigraph mutate apply_labels`). No `invoke` verb.
4. **Aliases live under an `alias` namespace**`omnigraph alias <name> [args]`,
never bare top-level. An alias can therefore neither shadow nor be shadowed by a
built-in (current or future) verb.
6. **Context merge: scope wholesale, prefs layered.** The entity binding +
`default_graph` come *wholesale* from the active scope (a context, or flat
defaults if none) — never per-key merged across the entity dimension (that would
yield "server *and* cluster"). Only non-scope preferences (`output`, table
layout) take flat defaults as a base. Precedence: explicit flag > context > flat
defaults.
7. **No default graph → error + list candidates.** A graph-scoped verb with no
`--graph`, no `default_graph`, and >1 graph in scope errors and lists candidates
(served: `GET /graphs`; cluster-direct: catalog enumeration). If enumeration is
policy-gated/unavailable, it says so and asks for `--graph`. Never auto-pick.
9. **Diagnostics & safety.** Writes echo the resolved scope + access path to stderr
(suppress with `--quiet`). Destructive verbs (`cleanup`, overwrite `load`,
`branch delete`) require confirmation when the scope is not local; `--yes` skips
it; **no TTY without `--yes` errors** (never silently proceed). `--json`/CI never
prompt — destructive without `--yes` errors.
10. **Cluster graphs evolve only via `cluster apply`.** `schema apply` (an `any`
verb) targets standalone graphs; against a cluster-managed graph it errors and
points at `cluster apply` (which records ledger/recovery/approvals — RFC-004).
Mirrors `init`'s refusal of a cluster-managed path.
11. **Maintenance moves server-side (committed direction).** `optimize`/`cleanup`
(and healthy-path `repair`) become server/cluster-managed async jobs —
policy-gated, audited, single-coordinator — with `direct` retained only as
break-glass (`repair` when the server is down). Runs out-of-band (a worker +
async job routes, the `POST …` / `GET …/{id}` shape of the bulk-data-plane RFC
(`docs/rfcs/0001-bulk-data-plane.md`, PR #219, not yet merged)), never inline in
serving; `schema plan` is
excluded (≈ `cluster plan` in cluster mode). The **mechanism** (job routes,
worker, scheduling) is a follow-up RFC; until it lands the capability table above
stands, and maintenance is `direct`. When it lands, the maintenance verbs'
capability becomes "served-job + direct break-glass."
## Deferred
Non-blocking; settle when convenient.
- **D5 — combined admin scope.** A scope binds one entity; admins read via a
server scope and maintain via `--cluster`. A `deployments: { … }` object
(server + cluster validated coherent, referenced by a context) is revisited only
if admin ergonomics demand it — and Decision 11 largely removes the need.
- **D8 — the `context` command surface.** `context list` / `context show`
(read-only inspection) are additive diagnostics, shippable anytime; they don't
touch the grammar or resolution. The *no sticky `context use`* constraint holds
regardless — it is a design principle, not a command.
## Safety
Dropping the sticky `current_context` pointer removes the main footgun — a
destructive command silently inheriting a "current" environment from an earlier
session. Because each command resolves scope fresh, what is on the command line is
what runs. Two guards remain (a flat default or `OMNIGRAPH_CONTEXT` can still point
at prod): echo the resolved scope + access path on writes, and require
confirmation (or `--yes`) for destructive verbs when the resolved scope is not
local (Decision 9). The most dangerous direct writes (`cleanup`, overwrite
`load`) are *structurally* rare now — unavailable from the everyday server scope,
and gated behind bucket credentials plus an explicit `--cluster`/`--store` — so a
normal operator's setup mostly cannot issue them by accident at all.
## Invariants & deny-list check
- **§10 query semantics first-class / §11 transport at the boundary:** preserved —
addressing resolves CLI-side to a `GraphClient`; no transport concepts leak into
engine crates.
- **§12 no client-set actor:** strengthened — the served path's actor stays
token-resolved and `--as` is rejected there; direct self-declares.
- **Least privilege (security posture):** everyday operators hold a revocable
bearer token, not bucket credentials; only the server process and maintenance
admins hold storage creds. Direct remote access is structural opt-in, not a
default — narrowing the blast radius of a leaked operator config.
- **§6 strong consistency:** both paths are snapshot-isolated per query; this RFC
changes addressing, not isolation.
- **Deny-list (no state that drifts):** contexts and aliases are static config
sugar that resolve to canonical scopes; they declare nothing the cluster or
server doesn't already own. No sticky session state is introduced.
- No Hard Invariant is weakened; the change is CLI surface + config removal.
## Relationship to prior work
The completion of the config/CLI lineage: RFC-007 added the operator config and
keyed credentials; RFC-008 demoted `omnigraph.yaml`; RFC-009 unified execution
behind `GraphClient`; RFC-010 declared the planes. This RFC removes the last
legacy addressing surface so the plane model becomes a clean function of the three
real entities, and folds the planes into a single capability rule. It is adjacent
to the public-track bulk-data-plane RFC (`docs/rfcs/0001-bulk-data-plane.md`,
PR #219, not yet merged), which canonicalizes `load`/`export` verbs; this RFC
canonicalizes how every verb *addresses* a graph.
## Appendix: target CLI taxonomy (end state)
The full command set under this model, organized by **capability** (the new
classifying axis) instead of plane — the end-state counterpart to the
current-taxonomy appendix below. Every command, with its end-state addressing.
```
omnigraph
├─ any — data verbs · served by default (server scope, or --server <url|name>);
│ --graph selects the graph in scope; --store forces ad-hoc direct (no catalog)
│ ├─ query (alias: read*) invoke a stored query by NAME; -e/--file for ad-hoc
│ ├─ mutate (alias: change*) invoke a stored mutation by name; -e/--file for ad-hoc
│ ├─ load bulk write — --data, --mode required; --from forks a missing branch
│ ├─ export dump graph data (NDJSON / Arrow)
│ ├─ snapshot current per-table versions
│ ├─ branch { create | list | delete | merge } merge takes --into <target>
│ ├─ commit { list | show } inspect the commit graph
│ └─ schema { show (alias: get) | apply } cluster graphs evolve via cluster apply (Decision 10)
├─ served — needs a server (errors on a store/cluster scope)
│ ├─ graphs list enumerate the graphs a server serves
│ └─ queries list list stored queries in the served catalog
├─ direct — storage-native, PRIVILEGED · --cluster <root> | --store <uri> + bucket creds; never a server
│ ├─ init bootstrap a graph (--store <uri>); refuses a cluster-managed path
│ ├─ optimize compaction; --graph selects
│ ├─ repair publish uncovered drift; --confirm / --force
│ ├─ cleanup version GC; --keep / --older-than / --confirm
│ ├─ schema plan migration preview (reads storage directly)
│ └─ lint --query <path> graph-backed query lint (with --graph on cluster scope)
├─ control — cluster/catalog control, PRIVILEGED · --cluster <dir|s3>
│ ├─ cluster { validate | plan | apply | approve | status | refresh | import | force-unlock }
│ apply/approve take --as <actor>; force-unlock takes <LOCK_ID>
│ └─ queries validate validate cluster-owned stored queries against graph schemas
└─ local — no graph
├─ policy { validate | test | explain } offline Cedar tooling
├─ context { list | show } read-only; NO mutating `use` (no sticky state)
├─ alias <name> [args] personal shortcut; expands to its bound stored-query call (D4)
├─ config { migrate } finish the omnigraph.yaml split (RFC-008)
├─ login / logout per-server bearer credentials
├─ embed offline embedding pipeline
├─ lint --query <path> --schema <path> file-only query lint
└─ version (-v)
```
`*` `read`/`change` remain as deprecated aliases (warn on use); `ingest` and the
`check``lint` argv-shim are **removed**. `get` aliases `schema show`.
### Addressing forms (end state)
Three scope forms — one per real entity — plus the graph selector. No `--target`,
no `--cluster-graph`, no `--uri` scheme-dispatch, no `--via`.
| Form | Resolves to | Access | Privilege |
|---|---|---|---|
| **server scope** — operator default, a `--context`, or `--server <url\|name>` | a served endpoint + keyed token | served | everyday (bearer token) |
| **cluster scope** — an admin context, or `--cluster <root>` | a managed cluster's storage + catalog | direct | privileged (bucket creds) |
| **store scope**`--store <uri>` | one graph's storage (no catalog) | direct | local-dev (file) / break-glass (s3) |
| **`--graph <id>`** | selects the graph for graph-scoped verbs in server/cluster scopes; invalid for store scopes and scope-scoped verbs | — | — |
Resolution: explicit primitive (`--server`/`--cluster`/`--store`) → `--context` /
`OMNIGRAPH_CONTEXT` → operator flat defaults. Access path is then derived from the
scope kind × the verb's capability (see the Resolution rule); it is never inferred
from a URI scheme and never toggled.
### What moved vs today
| Command(s) | Today (plane) | End state (capability) |
|---|---|---|
| `query`/`mutate`/`load`/`export`/`snapshot`/`branch`/`commit`/`schema show`/`schema apply` | Data | **`any`** (served-default; `--store` ad-hoc) |
| `graphs list` | Data (remote-only) | **`served`** |
| `queries list` | Session | **`served`** (catalog read) |
| `init`/`optimize`/`repair`/`cleanup`/`schema plan`/graph-backed `lint` | Storage | **`direct`** (privileged) |
| `queries validate` | Storage | **`control`** (catalog validation) |
| `cluster *` | Control | **control** (unchanged) |
| `policy *`/`embed`/`login`/`logout`/`config`/`version`/offline `lint --query --schema` | Session | **`local`** |
| `ingest`; `--target`; `--cluster-graph`; `--uri http` dispatch | present | **removed** |
| — | — | **added:** `context { list | show }` (read-only) |
Cross-capability families: `schema` (`plan` is `direct`, `show`/`apply` are
`any`), `queries` (`list` is `served`, `validate` is `control`), and `lint`
(offline with `--schema` is `local`, graph-backed is `direct`) split per
subcommand/mode, exactly where their authority and data dependencies differ.
## Appendix: current CLI taxonomy (today)
The **as-is** command surface this RFC transforms, kept so the RFC is
self-contained. The source of truth is the exhaustive `command_plane` match in
`crates/omnigraph-cli/src/planes.rs`.
Where it disagrees with the design above (four planes, `--target`,
`--cluster-graph`, scheme-inferred transport), the design is the *target* and this
is *today*.
### The four planes (today)
| Plane | What it touches | Addressing accepted |
|---|---|---|
| **Data** | a graph — embedded **or** via a server | `<URI>` · `--target` · `--server` (+`--graph`) |
| **Storage** | direct storage, no server | `<URI>` · `--target` (local/S3 only) · some also `--cluster`+`--cluster-graph` |
| **Control** | a cluster *directory* | `--config <dir>` |
| **Session** | no graph | — |
`--server`/`--graph` are gated strictly to the data plane; `guard_addressing`
(`planes.rs:128`) rejects them elsewhere (RFC-010 Slice 1).
### Command tree by plane (today)
```
omnigraph
├─ DATA ────────── run against a graph; embedded or --server
│ ├─ query (alias: read) · mutate (alias: change) · load · ingest (hidden, deprecated)
│ ├─ branch { create | list | delete | merge } · snapshot · export · commit { list | show }
│ ├─ graphs { list } (remote-only)
│ └─ schema { show (alias: get) | apply } ← show/apply are DATA
├─ STORAGE ─────── direct file://|s3:// access; --server rejected
│ ├─ init · optimize · repair · cleanup (optimize/repair/cleanup also: --cluster --cluster-graph)
│ ├─ lint (check shim) · schema plan ← plan is STORAGE
│ └─ queries validate
├─ CONTROL ─────── cluster directory via --config <dir>
│ └─ cluster { validate | plan | apply | approve | status | refresh | import | force-unlock }
└─ SESSION ─────── no graph
├─ policy { validate | test | explain } · embed · login / logout
├─ config { migrate } · queries list ← list is SESSION
└─ version (-v)
```
`read`/`change` are visible clap aliases (deprecated names, warn); `check` is an
argv-shim → `lint`; `get` aliases `schema show`; `ingest` is hidden but runs.
### Cross-plane families (today)
- **`schema`**: `schema plan` is Storage; `schema show`/`apply` are Data.
- **`queries`**: `queries validate` is Storage; `queries list` is Session.
### Addressing forms (today)
| Form | Looks up in | Resolves to | Source |
|---|---|---|---|
| `<URI>` / `--uri` | nothing (explicit) | the literal URI | — |
| `--target <name>` | `omnigraph.yaml` `graphs:` | that graph's `uri` (local / S3 / **http**) | `config.rs::resolve_target_uri` |
| `--server <name>` (+`--graph`) | `~/.omnigraph/config.yaml` `servers:` | a remote server URL | `helpers.rs::resolve_server_flag` |
| `--cluster <dir\|s3> --cluster-graph <id>` | served cluster state | the graph's storage URI | `helpers.rs` (RFC-010 Slice 3) |
Precedence (`resolve_target_uri`): explicit `<URI>`/`--uri``--target`
`cli.graph` default → error. `is_remote_uri` (`helpers.rs:15`) then selects
`GraphClient::Remote` vs `Embedded` (`client.rs:86`).
### Enforcement points (today)
- **`guard_addressing`** (`planes.rs:128`): `--server`/`--graph` on a non-data verb
fails with a declared message.
- **Storage-plane remote rejection** (`helpers.rs:467`): a storage verb whose
`--target` resolves to `http(s)://` is rejected.
- **`init` into a cluster layout** is refused (use `cluster apply`).
## Audit comments
Reviewed against the current CLI taxonomy, `planes.rs`, `cli.rs`, `helpers.rs`,
`client.rs`, RFC-007/RFC-010, and the user-facing CLI/server docs.
### Validated
- The target taxonomy now has a stable classifier: `any`, `served`, `direct`,
`control`, and `local` are all declared capabilities.
- Cluster scope is coherent: it is privileged direct storage for control,
maintenance, and validation, not a direct data path. `any` data verbs served by
default and reject cluster scope.
- Graph selection is no longer universal. Graph-scoped verbs select a graph;
scope-scoped verbs such as `graphs list`, `queries list`, `queries validate`,
and `cluster *` address the whole server/cluster scope.
- The current-state appendix still matches the implemented CLI: four planes,
`--target`, `--cluster-graph`, scheme-inferred transport, `schema plan` as
Storage, and `schema show/apply` as Data.
Decisions and deferrals are tracked in [Decisions](#decisions) above — not
duplicated here.