Folds in the Codex verification review (kept verbatim with per-point Resolution notes): - `graphs list` is marked remote-only today in the current-state table (the embedded arm bails; it rides GraphClient only to share the resolver). - `init` is noted as positional-URI-only today (no `--target`); adding `--target` to init is part of the proposal, entangled with the init→cluster apply signpost, not current state. - Validated-fact #1 now describes the post-collapse reality (`GraphClient::resolve*`; only the two factories call `apply_server_flag`), dropping the stale "16 call sites" count. - The Authority rule carries a flag-shape caveat: `--graph` is already a global flag requiring `--server`, so the cluster-managed resolver and its flag shape are deferred to a later slice; the illustrative `--cluster <dir> --graph <id>` spelling is marked not-final. Docs-only; no code change. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
22 KiB
RFC: Restructure the CLI Around Explicit Planes
Status: Proposed
Date: 2026-06-13
Audience: CLI/server/cluster maintainers
Builds on: rfc-009-unify-access-paths.md
(Phases 3a–3c landed — the embedded/remote data-plane fork is now one
GraphClient enum; this RFC expands RFC-009 Phase 4 from a narrow
embedded-vs-remote capability table into the full plane model, and leaves
Phase 5 route alignment where it is),
rfc-007-operator-config.md (operator
--server/--graph/--target addressing — the surfaces this RFC makes
uniform across planes),
rfc-008-deprecate-omnigraph-yaml.md.
Sequencing: post-v0.7.0, after RFC-009 Phase 3c (done).
Summary
The CLI silently spans three planes — data, storage/maintenance, and
control — and forces the operator to know which plane each verb lives on and
address a graph differently per plane. The same graph you query as
--server prod --graph knowledge you must maintain as
s3://bucket/knowledge.omni. Plane restrictions (graphs list is server-only,
optimize is storage-only) are accidental — discovered by hitting a cryptic
error, not declared.
This RFC makes the plane model explicit and coherent with three moves:
- One graph-addressing model across every verb (
--target/--graph/ positional URI/--server), resolving to a storage URI for maintenance and a remote client for data — instead of two different ways to name one graph. - A declared, per-subcommand capability surface (RFC-009 Phase 4): each
verb declares its plane(s); wrong-plane invocations get an honest "this is
storage-plane,
--serverdoesn't apply" error from one table, not scatteredbail!s. - Plane-grouped
--helpso the model is legible at a glance.
No new server feature. Storage maintenance stays off the wire — deliberately.
Current state of affairs
The CLI has 23 top-level commands. They divide into three planes, addressed three different ways:
| Plane | Verbs | Reaches the graph by | Addressing surface |
|---|---|---|---|
| Data | query, mutate, load, ingest, branch *, snapshot, export, commit *, schema show/apply (and graphs list, remote-only today — see note) |
embedded engine or HTTP server (one GraphClient) |
positional URI or --target / --graph / --server (config aliases) |
| Storage / maintenance | init, optimize, repair, cleanup, schema plan, queries validate |
embedded engine only, directly on storage (file:// or s3://) |
positional URI or --target — no --server / --graph (except init, which today takes only a required positional URI — no --target) |
| Control | cluster validate/plan/apply/approve/status/refresh/import/force-unlock |
a cluster directory (file:// or s3://), not a graph URI |
--config <dir> |
What's confusing (validated facts)
-
Two names for one graph. Data verbs resolve
--server prod --graph knowledgethroughGraphClient::resolve*(the embedded/remote fork collapsed in RFC-009 Phases 3a–3c; only the twoGraphClientfactories callapply_server_flag). Maintenance verbs instead useresolve_uri/resolve_local_uriand accept only a positional URI or--target— so to compact the graph you query as--server prod --graph knowledgeyou must types3://bucket/knowledge.omni. One graph, two addressing vocabularies.Note (
graphs list). It is routed throughGraphClientonly to share the addressing/token resolver; its embedded arm fails loudly, so it is remote-only today (the later capability table and Relationship to RFC-009 record it as remote-now / embedded-cluster-later). -
Plane restrictions are accidental, not declared.
graphs listis server-only andoptimize/repair/cleanup/initare storage-only purely by code shape. Pointoptimizeat anhttps://URL and you get whateverOmnigraph::opensays about an https URI — accidental error text that, per Hyrum's Law, is already someone's dependency. The capability is real but unstated. -
The split is per-subcommand, and the family names hide it.
schema planis storage-only (resolve_local_uri) whileschema show/schema applyare data-plane (the graph client).queries validateopens the graph to typecheck whilequeries listonly reads the registry config. The plane is a property of the subcommand, not the family. -
Maintenance has no server/cluster counterpart at all. There is no HTTP route and no
clustersubcommand foroptimize/cleanup/repair(verified: nothing in the server route table, nothing inomnigraph-cluster/src). For a server-backed deployment you run the same CLI against the storage URI, out-of-band from the serving process. This is correct (maintenance is heavyweight, destructive, single-operator — it should not be a multi-tenant HTTP surface), but it is undocumented in the CLI's own shape, so it reads as an omission rather than a decision. -
inithas a hidden control-plane twin. Bareinitcreates a single graph from storage; in cluster mode the equivalent iscluster apply(graph-creation stage, with ledger/recovery/approval semantics). Same intent, two entry points, no signpost between them. -
Flat
--help. All 23 commands list as one undifferentiated block, so the plane a verb belongs to is tribal knowledge.
The net effect: a new operator must already know OmniGraph's plane architecture to predict which flags work on which verb and how to name a graph. The CLI does not teach its own model.
Target CLI ergonomics
The throughline: you name a graph one way, and the CLI tells you what works where. Simple examples of the end state:
One name for a graph, everywhere
A config target knowledge works on every verb that touches that graph:
omnigraph query --target knowledge --query q.gq # data (embedded or remote, auto)
omnigraph load --target knowledge --data rows.jsonl # data
omnigraph optimize --target knowledge # maintenance (resolves to its storage URI)
omnigraph cleanup --target knowledge --keep 10 --confirm
omnigraph repair --target knowledge --confirm
The positional URI form still works everywhere, unchanged:
omnigraph optimize s3://bucket/knowledge.omni
Data plane: same command, embedded or remote
You don't pick "local vs server" syntax — resolution decides:
omnigraph query ./local.omni --query q.gq # opens engine directly
omnigraph query --server prod --graph knowledge --query q.gq # over HTTP
omnigraph query --target knowledge --query q.gq # whichever the config says
Maintenance: --target must resolve to direct storage (loud if not)
$ omnigraph optimize --target prod
error: `--target prod` resolves to a remote server (https://prod…).
`optimize` is a storage-plane command and needs direct storage access.
Pass the graph's s3://… URI, or use --cluster <dir> --graph <id>.
Cluster-managed graphs get an explicit, intentional path (no implicit
cluster.yaml peeking):
omnigraph optimize --cluster ./cluster --graph knowledge
Wrong-plane = one honest, stable error
$ omnigraph optimize --server prod
error: `optimize` is a storage-plane command; `--server` addresses the data
plane and does not apply here. Use --target <name> or a storage URI.
$ omnigraph graphs list ./local.omni
error: `graphs list` needs a remote multi-graph server (http/https) today.
(Embedded cluster-catalog enumeration is planned — RFC-009.)
--help teaches the model
DATA PLANE run against a graph (embedded or --server)
query mutate load branch snapshot export commit schema show schema apply
STORAGE / MAINTENANCE direct storage access; no server
init optimize repair cleanup schema plan queries validate
CONTROL PLANE manage a cluster directory
cluster
INSPECT / SESSION
graphs list queries list lint policy embed login logout config
Exceptions, signposted (not silent)
omnigraph init --schema s.pg ./new.omni # plain path: fine
$ omnigraph init --target knowledge --schema s.pg # cluster-managed target: redirected
error: `knowledge` is a cluster-managed graph. Create it via `cluster apply`
(which records ledger + recovery + approvals), not `init`.
In one line: one way to name a graph, the right flags accepted per verb, and a CLI that tells you its planes instead of making you memorize them.
Proposed shape (mechanism)
One addressing model for every graph-addressing verb
Route all graph-addressing verbs — data and maintenance — through one
resolver that turns (positional URI | --target | --graph | --server) into
either a storage URI (file:///s3://) → embedded execution, or a remote
GraphClient → HTTP execution, per the verb's declared plane.
Authority rule (the precedence must not be silent). --target is an
operator/legacy target lookup; cluster.yaml is a different authority surface
(read only by cluster commands and --cluster boot). A maintenance verb must
not quietly consult both and invent a precedence. The rule:
-
A maintenance verb's
--targetresolves through the operator/legacy config and its URI must already be direct storage; a target that resolves to a remote (http(s)://) URL fails loudly (see the example above). -
Cluster-managed graphs are addressed explicitly via a cluster-root + graph-id pair (spelled
--cluster <dir> --graph <id>for illustration), so reading cluster state is an intentional mode — never an implicit fallback between operator config andcluster.yaml.Flag-shape caveat (deferred).
--graphis already a global flag thatrequires = "server"and appends/graphs/<id>to a remote URL — a different meaning, and clap won't permit--graphwithout--server. So the cluster-maintenance addressing needs either a distinct flag (e.g.--cluster-graph <id>) or an explicit global-flag migration. This is why the cluster-managed resolver is deferred to a later slice (it also rides the applied-state-vs-declared-config open question below); the operator/legacy--targetpath lands first.
A declared, per-subcommand capability surface (RFC-009 Phase 4, expanded)
One table, per subcommand (family-level rows hide exactly the cases the table exists to make non-accidental):
| Command | Data (embedded) | Data (remote) | Storage (direct) | Config / session | Notes |
|---|---|---|---|---|---|
query, mutate, load, ingest |
✅ | ✅ | — | — | ingest is the deprecated alias of load |
branch create/list/delete/merge |
✅ | ✅ | — | — | |
snapshot, export, commit list/show |
✅ | ✅ | — | — | |
schema show |
✅ | ✅ | — | — | |
schema apply |
✅ | ✅ | — | — | declarative alternative: cluster apply |
schema plan |
— | — | ✅ | — | local resolver today |
queries validate |
— | — | ✅ | — | opens the graph to typecheck |
init |
— | — | ✅ | — | cluster-managed graphs → cluster apply |
optimize, repair, cleanup |
— | — | ✅ | — | |
graphs list |
(later) | ✅ | — | — | remote today; embedded-cluster later (RFC-009) |
queries list |
— | — | — | ✅ | reads the registry config; no graph |
lint |
— | — | ✅ | ✅ | --schema file, or opens a local graph |
policy validate/test/explain |
— | — | — | ✅ | reads policy files + config |
embed |
— | — | — | ✅ | local tooling (files + embedding API) |
login, logout, config, version |
— | — | — | ✅ | session / config; no graph |
The resolver consults this table. A wrong-plane invocation produces one honest,
stable message instead of N ad-hoc bail!s and accidental open errors.
Plane-grouped --help
Group the command list by plane (the --help block shown under Target CLI
ergonomics). Cosmetic, zero behavior change, highest legibility-per-line.
Maintenance stays off the wire (decision, not omission)
This RFC does not add server routes for optimize/cleanup/repair:
- Serving = the server. Multi-tenant, safe-for-many-callers data plane.
- Storage maintenance = the CLI against storage, addressed uniformly, run by an operator or a scheduled job with storage access.
Adding maintenance-over-HTTP would re-introduce a heavyweight, destructive multi-tenant surface and add a plane rather than clarify the three we have. A future cluster-driven maintenance reconciler (scheduled compaction/GC as a control-plane policy) is explicitly out of scope — net-new design (who runs it, with what resource bounds), not a CLI restructure.
init is an explicit exception (decision)
Direct-storage init against a plain URI/target stays. But if a target resolves
to a cluster-managed graph root, init refuses and signposts cluster apply (which records ledger, recovery, and approval artifacts) rather than
initializing that root out of band. This closes the "hidden twin" of the current
state.
Compatibility
Additive and low-risk:
--target/--graphon maintenance verbs is new capability; the positional URI form keeps working unchanged.- Grouped
--helpis cosmetic. - Capability-surface error text changes the message you get on a wrong-plane
or misaddressed invocation. Per Hyrum's Law that text is observable; the change
is deliberate, release-noted, and replaces an accidental
Omnigraph::openstring with a stable, declared one — a net improvement, but flagged.
No engine, server, or wire-protocol change. The work is CLI-internal: the shared resolver, the capability table, and help grouping.
Test plan
Extend the existing CLI suites rather than adding a duplicate harness:
parity_matrix.rs— capability exclusions (the per-subcommand plane table becomes the source of truth for which verbs are remote-only / storage-only).cli_data.rs— maintenance wrong-plane errors (optimize --server,optimize --target <remote>), and--targetresolving to direct storage.cli_schema_config.rs—graphs listplane behavior,schema planvsschema show/applyplane split, and plane-grouped--helpoutput.system_local.rs—--server/ operator-targeting edge cases end-to-end.
Pin the new wrong-plane error strings deliberately: this RFC is intentionally
replacing accidental Omnigraph::open strings with stable capability errors, and
those strings become observable behavior (Hyrum).
Relationship to RFC-009
RFC-009 Phase 4 was scoped as "declared plane capabilities" for the
embedded-vs-remote axis only. This RFC subsumes and broadens that phase into
the full three-plane, per-subcommand model (adds uniform maintenance addressing,
the authority rule, and help grouping). RFC-009 Phase 5 (remote load →
/load route alignment) is unaffected and remains in RFC-009.
graphs list reconciliation: RFC-009's answered open question (pinned in
parity_matrix.rs's exclusions comment) targets graphs list becoming
Both-capability once the embedded arm enumerates the cluster catalog. This RFC
aligns with that rather than superseding it: the capability table shows
graphs list as remote today, embedded-cluster later.
Open questions
- Capability-table location — a CLI-internal const, or surfaced (e.g. in
--helpand a machine-readableomnigraph capabilitiesfor tooling)? --cluster <dir> --graph <id>for maintenance — does the maintenance command resolve the storage URI from the applied cluster state, or from the declaredcluster.yaml? (Applied state is the truth the server serves; declared config may be ahead of it.)
Review comments (Codex, 2026-06-13)
Overall take: the direction is right. The planes already exist; making them
declared in code, help text, and error messages should reduce operator surprise.
Keeping storage maintenance off HTTP is also the right boundary: optimize,
repair, and cleanup are direct-storage operator actions, not a multi-tenant
serving surface.
Before implementation, tighten these points:
-
Resolver authority needs a sharper rule. The proposal says maintenance resolves storage URIs "from
cluster.yaml/ operator config", but those are different authority surfaces. Today--targetis an operator/legacy graph-target lookup; cluster config is read byclustercommands and by--clusterserver boot. Do not make a maintenance command silently consult both and pick a precedence. Either:--targeton maintenance means an operator/legacy target whose URI is already direct storage, with remote targets failing loudly; or- add an explicit cluster-root/config resolver for this case, so reading cluster state is an intentional mode.
Resolution (accepted): both —
--targetresolves through operator/legacy config and must be direct storage (remote → loud fail); cluster-managed graphs use the explicit--cluster <dir> --graph <id>resolver. See Authority rule under Proposed shape. -
graphs listconflicts with RFC-009's target shape. This RFC classifiesgraphs listas remote-only, while RFC-009's answered open question says it becomes Both-capability once the embedded arm enumerates the cluster catalog. Pick one direction here: either this RFC explicitly supersedes that target, or the capability table should showgraphs listas remote today and embedded-cluster later.Resolution (accepted): align, don't supersede. The table shows
graphs listremote-today / embedded-cluster-later. See Relationship to RFC-009. -
The capability table should be per subcommand, not per family. The family-level rows hide the exact cases the table is supposed to make non-accidental. At minimum, call out:
schema planas local/storage-backed today, whileschema showandschema applyroute through the graph client;queries validateversusqueries list, which do not have the same plane shape;lint,policy,embed,login,logout,config, andversion, so enumeration/session/tooling commands are intentionally classified instead of falling outside the model.
Resolution (accepted): the capability table is now per-subcommand and classifies every command, including the session/tooling group.
-
initshould be an explicit exception. Direct-storageinitis fine. A cluster-managed graph should be created bycluster apply, with ledger, recovery, and approval semantics. If a named target resolves to a cluster-managed graph root,initshould signpostcluster applyrather than quietly initializing that root out of band.Resolution (accepted): promoted from open question to a decision. See
initis an explicit exception.
Testing notes for the implementation slice:
-
Extend the existing CLI suites rather than adding a new duplicate harness:
parity_matrix.rsfor capability exclusions,cli_data.rsfor maintenance wrong-plane errors,cli_schema_config.rsforgraphs list/ help behavior, andsystem_local.rsfor--server/ operator-targeting edge cases. -
Pin the new wrong-plane error strings deliberately. This RFC is intentionally replacing accidental
Omnigraph::openstrings with stable capability errors, and those strings become observable behavior.Resolution (accepted): captured as the Test plan section.
Verification comments (Codex, 2026-06-13)
Follow-up verification against the current CLI/server code found a few remaining current-state nits. These are doc-shape issues, not objections to the proposal:
-
Current-state table overstates
graphs list. The table under Current state of affairs still listsgraphs listwith data verbs that reach the graph by embedded engine or HTTP. Current code routes it throughGraphClientonly to share the resolver, but the embedded arm fails loudly; the later RFC text correctly says remote today / embedded-cluster later. Make the current-state row match that.Resolution (accepted): the Data row now marks
graphs listremote-only today, with a note that it ridesGraphClientonly to share the resolver. -
Current-state table overstates
initaddressing.initis grouped with maintenance verbs whose addressing surface is positional URI or--target. Currentinitonly accepts a required positional URI and has no--targetor config path. The proposal can add that capability, but the current-state table should not describe it as already present.Resolution (accepted): the Storage row now calls out that
inittakes only a required positional URI today (no--target); adding--targettoinitis part of the proposal, entangled with theinit→cluster applysignpost, not current state. -
apply_server_flagcall-site count is stale. The text says data verbs resolve--server prod --graph knowledgethroughapply_server_flagat 16 call sites. Current code has the fork collapsed: data verbs callGraphClient::resolve*, and only the twoGraphClientfactories callapply_server_flag. Rephrase the verified fact aroundGraphClient, not the old pre-collapse call-site count.Resolution (accepted): validated-fact #1 now describes the post-collapse reality (
GraphClient::resolve*; the two factories callapply_server_flag), dropping the stale count. -
--cluster <dir> --graph <id>collides with today's global--graphsemantics. The target ergonomics section proposes that flag shape for maintenance, but current--graphis a global flag that requires--serverand appends/graphs/<id>to a remote server URL. Either choose a separate cluster-maintenance graph flag shape, or call out the clap/global flag migration explicitly as part of the implementation.Resolution (accepted): the Authority rule now carries a flag-shape caveat — the cluster-managed resolver (and its flag shape, e.g.
--cluster-graphvs a--graphmigration) is deferred to a later slice; the operator/legacy--targetpath lands first. The illustrative--cluster <dir> --graph <id>spelling is marked as not-final.