omnigraph/docs/dev/rfc-002-config-cli-architecture.md
Ragnor Comerford 3c2b1b8051
Stored-query registry foundation + config/CLI RFC-002 (#128)
* MR-969: add stored-query registry config surface

Introduce the `queries:` block in omnigraph.yaml — an inline
`name -> entry` map of stored queries, per-graph
(`graphs.<id>.queries`) and top-level for single-graph mode, mirroring
how `policy` is wired in both modes. Each entry points at a `.gq` file
and carries optional MCP exposure settings (`expose`, `tool_name`),
defaulting to not-exposed.

Additive: absent `queries:` leaves current behavior unchanged.

- QueryEntry { file, mcp: McpSettings { expose, tool_name } }
- `queries` field on TargetConfig + OmnigraphConfig (serde default)
- query_entries() / target_query_entries() accessors
- resolve_query_file() — base_dir-relative `.gq` path resolution
- round-trip + absent-block tests

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add stored-query registry loader and GraphHandle wiring

Add a `queries` module: QueryRegistry loads each declared `.gq` entry,
parses it, and selects the query whose symbol matches the manifest key,
asserting the two agree (key == `query <name>` symbol). Identity is the
query name; a key/symbol mismatch is a load-time error. Errors are
collected, not fail-fast, so a bad registry surfaces every broken entry
at once. Schema type-checking is deliberately left to a separate pass so
the loader stays callable without an open engine.

Thread an `Option<Arc<QueryRegistry>>` through GraphHandle alongside the
per-graph policy; the URI-canonicalizing clone propagates it. Production
openers default to None for now — the boot path loads and attaches the
registry in a later change.

- QueryRegistry::{from_specs, load, lookup, iter}; StoredQuery::is_mutation
- GraphHandle.queries field, propagated on canonical clone
- registry unit tests: identity match/mismatch, multi-query selection,
  per-entry parse errors, error collection, mutation classification

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: add RFC-002 config & CLI architecture

Layered config (user-global ~/.config/omnigraph/ + per-project), a
unifying `target` abstraction resolving to (locus, graph, sub-state,
credential) with embedded-URI XOR remote-server loci, multi-server ×
multi-graph client targeting, credentials by-reference, and the
file-naming decision: project and server config are one artifact
(`omnigraph.yaml`); the only differently-named file is the user-global
`config.yaml`, split by scope not role. Includes the 12-factor bind
portability rule (prefer --bind/OMNIGRAPH_BIND over a committed
server.bind) and the defined-locally / invoked-remotely model for
stored queries. Derived from first principles working backwards from
what the engine enables; validated against kube/Helix/git/compose.

Linked from docs/dev/index.md. Proposed; phased rollout for the
MR-973/974/981 family.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add check() to validate stored queries against the live schema

A pure check(registry, catalog) that type-checks every stored query via
the same typecheck_query_decl the engine runs for inline queries — no
parallel implementation. Failures are collected, not fail-fast, so an
operator sees every broken query (e.g. a type/property a migration
renamed or removed) in one pass. Breakages are fatal (the boot path will
refuse to start); warnings are advisory.

Pure over (registry, catalog) so it is callable both at boot (engine
catalog) and offline from the CLI without an open engine.

Advisory lint: an mcp.expose:true query that declares a Vector(N)
parameter warns — an LLM cannot supply a raw embedding vector; such a
query should take a String parameter and embed server-side. Warns
rather than rejects, since service-to-service callers may pass vectors.

- CheckReport { breakages, warnings }; has_breakages / is_clean
- tests: valid query, unknown type, unknown property, collect-not-fail-fast,
  vector-param-exposed warns, unexposed silent

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Drop internal plan-label refs from stored-query config comments

Doc comments referenced sequencing labels ("C2") that mean nothing to a
reader; reword to describe the behavior directly. Comment-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: reconcile aliases with the role model in RFC-002

Place the existing client-only `aliases:` block in the client/server
role split: aliases are client-role (CLI, embedded, ungated) and may
live in both user-global and project config; `queries:` is server-role
(deployment manifest only). They overlap as "name -> .gq"; `queries:` is
the superset, and the end-state subsumes aliases (definition -> queries,
target/branch/format -> client invocation context, positional args ->
CLI sugar). v1 keeps aliases unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: make RFC-002 config global-first, project-optional

The global user config is the primary, self-sufficient default; the
CLI works from any directory with no project file (the kubectl/aws/gh
posture), a deliberate flip from today's project-anchored behavior.
The project omnigraph.yaml becomes an optional repo-scoped override and
the deployment manifest. Uniform schema, both layers optional; global
can hold any section including a personal server's graphs/queries.
Additive: project still overrides global; the flip adds a fallback
layer below the project file rather than removing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: justify XDG ~/.config/omnigraph over legacy ~/.omnigraph in RFC-002

Make the rationale explicit: XDG-first because OmniGraph is a client
that will cache remote catalogs and keep session state alongside
secrets, and XDG separates config / cache / state into distinct dirs
(clear cache without touching creds; backups skip cache) whereas a
single ~/.omnigraph/ mixes them. Honor ~/.omnigraph/ as a fallback for
the peer-group (aws/kube/docker/helix) expectation. Add XDG_CACHE_HOME
/ XDG_STATE_HOME to the override precedence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: build RFC-002 credentials on the existing env-file mechanism

OmniGraph already has credentials-by-reference: bearer_token_env names
the env var, and auth.env_file is a git-ignored dotenv the CLI
auto-loads (real env vars win), resolved via resolve_remote_bearer_token.
The RFC's proposed credentials.yaml + token_env were redundant parallel
inventions. Reconcile: reuse bearer_token_env (extend to
servers.<name>) and auth.env_file (add a global ~/.config/omnigraph/.env
layered under the project .env.omni); OS keychain is an additive future
resolver. No new credentials.yaml. Updated summary, non-goals,
background, file-naming, credentials, example, login, migration, rollout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: use single ~/.omnigraph dir (Helix-style), not XDG, in RFC-002

Reverse the earlier XDG-first call. The prior argument rested on a false
dichotomy (single-dir => mixed config/cache/state); in fact the peer
tools (aws, kube, helix) achieve separation via SUBDIRECTORIES inside
one ~/.tool/ dir (~/.aws/sso/cache/, ~/.kube/cache/), getting cache
hygiene AND one discoverable place. So everything goes under
~/.omnigraph/: config.yaml, credentials (dotenv, 0600), cache/, state/.
Lower cognitive load, matches what DB/cloud-CLI users expect, matches
Helix. OMNIGRAPH_HOME overrides; $XDG_CONFIG_HOME optionally honored but
~/.omnigraph/ is canonical. Updated all paths, the rationale paragraph,
the file-naming table (added a cache/state row), and env precedence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: reconcile RFC-002 with shipped/planned CLI tickets

Align with reality found in existing tickets:
- Noun is graph/graphs, not target/targets (MR-603 done renamed the
  config key targets->graphs, flag --graph). Use graphs:/--graph; an
  entry is embedded (uri) XOR remote (server + remote graph name).
- ~/.omnigraph/ confirmed by MR-581 (og template pull, done) which
  already quick-starts templates there.
- Templates already exist (MR-581/MR-531) — not invented here.
- The init family is already specced (init, quickstart MR-973, serve
  MR-970, prune MR-972, mcp install MR-974, agent-mode MR-981); this
  RFC only adds the user route (~/.omnigraph/config.yaml + login).
- aliases: -> operations: planned (MR-839).
- bearer_token_env gap tracked in MR-971.
- query lint/check already exist (MR-639) — registry validator must not
  collide with the singular `query check`.
Add a Reconciliation section; fix the canonical example to graphs:/--graph.
Also: merge semantics refined (deep-merge settings, replace named
entries, replace lists, config view --resolved --show-origin).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: correct stale-ticket claims and fold init/bootstrap design into RFC-002

Verify against code, not ticket statuses (MR-581 is marked done but is
stale/unbuilt): no ~/.omnigraph usage, no template/serve/quickstart/
prune/login commands exist; config still uses aliases: (no operations:).
So ~/.omnigraph/ stands on peer-convention merits alone, and templates
are a design question, not a foothold. Add §7.5: the three-tier init
model (user route = login + ~/.omnigraph/config.yaml; thin project init;
fat quickstart + templates) with first-principles positions (split
init/login, in-place refuse-if-exists, interactive vs --auto/agent-mode,
--template flag, secrets-on-scaffold gitignore rule). This RFC owns only
the user route; the rest are sibling tickets (MR-973/970/972/974/981).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: breadboard + slice Shape A in RFC-002

Add the implementation breadboard (places P1-P5, affordances N1-N14 with
NEW markers, mermaid) and five vertical slices for the selected config/
CLI/init shape: V1 global layer + merge engine + config view; V2 remote
graphs + HTTP-client path + credential resolution; V3 omnigraph login;
V4 init-hardening + quickstart + templates (rides MR-970); V5 agent-mode
(MR-981). Rollout reordered to the slice sequence; spikes X1-X4 gate
their owning slice. V1-V2 close the substantive client->server gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add InvokeQuery Cedar action (coarse, graph-scoped)

A per-graph, branch-scoped action that gates invoking a server-side
stored query by name. Coarse for now: an `invoke_query` allow rule
permits any stored query on the graph; a future, additive refinement
adds an optional per-query-name scope without changing rules written
against the coarse action. Enforcement is at the HTTP boundary; the
engine `_as` writers still enforce read/change per the query body, so a
stored mutation is double-gated (invoke_query to reach the tool, change
for the write). No call site yet — the invocation handler wires it in a
later change (same pattern as Admin/GraphList added ahead of consumers).

- variant + as_str/resource_kind(Graph)/FromStr/uses_branch_scope
- Cedar schema: invoke_query appliesTo Graph
- tests: per-graph allow/deny, branch-scope accepted

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Load and type-check stored queries at server boot, refusing breakage

At startup the server now loads each graph's stored-query registry,
type-checks every query against that graph's live schema, and refuses to
boot if any query references a type/property the schema doesn't have
(same posture as bad policy YAML) — so schema drift surfaces at the
deploy boundary, not silently at invocation. Non-blocking warnings are
logged. The validated registry is attached to the GraphHandle (the two
production sites previously held `queries: None`).

Loading (parse + key==symbol identity) happens at settings-build time
where the config is in scope; the schema type-check happens after each
engine opens (single mode in `open_single_with_queries`, multi mode in
`open_single_graph`). `open_with_bearer_tokens_and_policy` delegates
with an empty registry so its 18 test callers are unchanged; the public
`new_*` constructors are unchanged (only the private build path threads
the registry).

- ServerConfigMode::Single / GraphStartupConfig carry the loaded registry
- boot tests: valid registry boots; type-broken query refuses boot + names it

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add `omnigraph queries validate` and `queries list` CLI

`queries validate` type-checks the stored-query registry against the
live schema offline — it opens the selected graph, runs the same
check() the server runs at boot, prints breakages/warnings (human or
--json), and exits non-zero on any breakage — so an operator can catch
a query broken by a schema change without restarting the server.
`queries list` prints each registered query's name, MCP exposure, and
typed params.

Named `validate` (not `check`) to avoid overlap with the existing
`omnigraph lint` — `query check`/`query lint` are already deprecated
argv-shims to `lint`. Registry entries resolve like the server: a named
graph uses its per-graph `queries:`; otherwise the top-level one.

- Queries subcommand group; reuses QueryRegistry::load + check from
  omnigraph-server; local-only (needs the schema), mirrors lint
- tests: clean registry exits 0, broken query exits non-zero + names it,
  list shows the query and its typed params

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Route registry selection through one shared query_entries_for

The "which queries: block applies for graph X" rule existed twice — the
server boot path and the CLI's registry_entries — and had already drifted:
the CLI carried an unreachable unwrap_or_else fallback the server lacked.

Add OmnigraphConfig::query_entries_for(graph: Option<&str>) as the single
definition (named graph -> its per-graph block; otherwise top-level) and
route all three sites through it: server single mode, server multi-graph
loop, and the CLI. The CLI's dead fallback arm is deleted; CLI and server
now resolve identically by construction.

No behavior change. Extends the config round-trip test to pin the selector,
including the unknown-name -> top-level fallback the deleted CLI arm covered.

* Funnel registry validation through one validate_and_attach gate

The check -> refuse-on-breakage -> log-warnings -> empty->None block was
copy-pasted across both open paths (single mode and the multi-graph
per-graph open), differing only by the graph label. A third opener could
attach a registry that was never schema-checked.

Extract validate_and_attach(queries, catalog, label) -> Option<Arc<..>> as
the single gate both paths call, so attaching an unchecked registry is no
longer expressible. The catalog handle is an owned Arc, so calling it
before the multi-mode policy match (which rebinds db) is borrow-clean.

No behavior change. Adds a direct unit test of the helper (empty / clean /
breakage incl. the graph label in the message) — covering the multi-graph
path's logic, which previously had no boot-refusal coverage.

* Resolve param types structurally in the MCP vector lint

The exposed-query advisory detected vector params with
type_name.starts_with("Vector(") — a second copy of the compiler's own
ScalarType::from_str_name vector parsing that could drift from it.

Key the lint off PropType::from_param_type_name + ScalarType::Vector(_)
instead, the one canonical resolver the type system already uses. Any
future param-suppliability lint now reads the structured type rather than
scanning the surface string.

Behavior-preserving: the grammar forbids list-of-vector params
(list_type = "[" base_type "]", and base_type excludes Vector), so the only
input where the structured and string checks could differ is unparseable.
Adds a guard test that an exposed String param does not false-trigger the
warning.

* Refuse duplicate MCP tool names across exposed stored queries

The effective MCP tool name (explicit tool_name, else the query name) is a
second identity namespace beside the registry key, but nothing enforced it
unique — two exposed queries could claim one catalog key, and each consumer
re-derived the name ad hoc.

Add StoredQuery::effective_tool_name() as the one definition, and a
load-time uniqueness pass in from_specs over exposed queries: a collision is
a collected LoadError naming the loser and the winner. Scoped to exposed
queries (unexposed have no MCP tool); deterministic over the BTreeMap so the
first-declared wins and the error order is stable.

New (rare) refusal: a config with colliding exposed tool names now fails
`omnigraph queries validate` offline and refuses server boot, the same
posture as a malformed registry. Release-note-worthy.

Test-first: duplicate_exposed_tool_name_is_a_load_error (red before the
pass, green after) + a CLI offline test; the unexposed sibling pins the
exposed-only scope; effective_tool_name asserts folded into the load test.

* docs: document the queries registry, CLI, and invoke_query action

The stored-query surface shipped without user docs. Add it, per the same-PR
maintenance contract:

- policy.md: invoke_query as per-graph action #10 (branch-scoped), with the
  double-gating note; renumber graph_list; add it to the branch_scope list.
- cli-reference.md: the `queries validate | list` command, and the
  `queries:` config block (per-graph + top-level) with mcp.expose/tool_name
  and the tool-name uniqueness rule.
- server.md: boot-time stored-query type-check (refuse on breakage), noting
  invocation over HTTP/MCP is not yet exposed.

* Add POST /queries/{name} stored-query invocation handler

Invoke a curated server-side stored query by name: source + name come from
the per-graph queries: registry, the client sends only runtime inputs
(params, branch, snapshot). Gated by the invoke_query Cedar action at the
boundary; the handler delegates to the existing run_query/run_mutate, whose
inner Read/Change enforce still runs — so a stored mutation is double-gated
(invoke_query to reach the tool, change for the write).

- InvokeStoredQueryRequest + an untagged InvokeStoredQueryResponse
  { Read(ReadOutput), Change(ChangeOutput) } → one Json<_> return type and a
  oneOf 200 schema (a correct contract, not a wrong-but-simple one).
- Route lives in per_graph_protected → single-mode /queries/{name} and
  multi-mode /graphs/{id}/queries/{name} for free.
- Deny == unknown: an invoke_query denial and a missing query both return the
  same 404, so the catalog can't be probed by an unauthorized caller.
- OpenAPI regenerated; tests cover read, mutation double-gate (403 vs 200),
  bad-param 400, and the identical-404 deny path.

Completes the MR-969 V1 invocation slice (registry + /queries/{name} + invoke_query).

* docs: stored-query invocation endpoint; flip the not-yet-exposed caveat

Now that POST /queries/{name} ships (C7), document it: add the endpoint to
server.md's inventory + an invocation section (body, untagged read/mutate
envelope, invoke_query gate, double-gated mutations, deny == 404), and flip
the startup note that said invocation was not yet exposed. In policy.md,
replace "no invocation call site yet" on the invoke_query action with a
pointer to the endpoint.

* Scope the stored-query 404-hiding claim to non-invoke_query callers

Review found the deny==404 catalog-hiding was overstated as a contract: it
holds only at the outer invoke_query gate. A caller that HOLDS invoke_query
but lacks read/change gets the inner gate's 403 for an existing query vs 404
for an unknown one — so existence is visible to grant-holders by design (the
intended double-gate). The handler docstring, OpenAPI 404 description, and
server.md all claimed the 404 was airtight against any denied actor.

Correct the wording in all three (no behavior change) and add the missing
symmetric test (invoke_query but no read -> 403 for an existing query, 404
for unknown) so the actual contract is pinned. Also document that in
default-deny mode (tokens, no policy) every invocation 404s until an
invoke_query rule is configured.

Nits: the from_specs collision comment said "first declared wins" but it is
lexicographically-first by name (BTreeMap); the effective_tool_name docstring
overclaimed the CLI display routes through it (it resolves the rule on its
own output DTO).

* Default mcp.expose to true (the manifest entry is the opt-in)

expose controls MCP-catalog membership only — it is not an authorization
gate (invocation is gated by invoke_query regardless). So requiring a
per-query mcp.expose: true was friction with no safety benefit: a
non-exposed query is still HTTP-invocable by name. Flip the default so
declaring a query in the manifest exposes it to the agent tool catalog by
default; expose: false is the escape hatch for service-only queries.

Both the absent-mcp path (Default impl) and the present-but-no-expose path
(serde default fn) now yield true. Doc comments + cli-reference updated; the
config round-trip test asserts the new default.

* Add GET /queries stored-query catalog endpoint

List a graph's mcp.expose stored queries as a typed tool catalog so a client
(the MCP server) can register them as tools without fetching .gq source.
Each entry carries name, MCP tool_name, description/instruction, a
read/mutate flag, and decomposed typed params (kind enum: string|bool|int|
bigint|float|date|datetime|blob|vector|list, plus item_kind for lists and
vector_dim) — so the consumer builds an input schema with a closed match and
never re-parses omnigraph type spelling. I64/U64 are bigint (string on the
wire): a JSON number loses precision past 2^53 and the engine already accepts
decimal strings.

Read-gated (works in default-deny; the catalog is graph-wide, authorized
against main). NOT Cedar-filtered per query yet — a reader can list a query
whose invoke_query they lack (documented gap until per-query authz lands);
invocation stays invoke_query-gated + deny==404.

- api: QueriesCatalogOutput / QueryCatalogEntry / ParamDescriptor / ParamKind
  + query_catalog_entry (reuses PropType::from_param_type_name; scalar_kind is
  exhaustive, so a new ScalarType is a compile error here until catalogued).
- GET /queries route in per_graph_protected (→ /graphs/{id}/queries in multi
  mode); OpenAPI regenerated; path allowlists updated.
- Tests: projection unit (every kind, list, vector, nullable, mutation,
  empty) + handler (exposed-only filter, read-gate probe-oracle, empty
  registry).

* docs: GET /queries stored-query catalog endpoint

Document the catalog: the endpoint table row (GET /queries, read-gated), a
catalog section (typed-param kind enum, bigint/date/datetime/blob-as-string,
graph-wide/branch-independent, mcp.expose default true, the read-gated
probe-oracle gap), and flip the startup note now that the catalog ships.

* Collect file-I/O and parse errors in QueryRegistry::load in one pass

load() early-returned on any unreadable .gq file, masking parse / identity /
tool-name-collision errors in the OTHER (readable) files — so an operator
fixed the missing file, restarted, and only then saw the next broken query.
Now it collects I/O errors but still runs from_specs on the readable specs
and returns the union, so every broken entry surfaces at once (matching the
collected-errors contract the rest of the registry already follows).

Safe: from_specs' tool-name collision check runs over loaded queries only, so
dropping an I/O-failed entry can only under-report a collision, never invent
one. I/O errors are ordered first (BTreeMap key order), then spec errors.

Adds a load-level test (tempdir: a valid, a missing, and a parse-broken .gq)
asserting all three surface in one Err — confirmed red before the fix.

* Make invoke_query graph-scoped (one branch authority)

invoke_query gates reaching the curated stored-query surface — a graph-level
capability. Per-branch/snapshot access is already enforced by the inner
read/change gate in run_query/run_mutate (authorized against the resolved
branch), so branch-scoping the outer gate was redundant AND wrong for snapshot
reads (it defaulted to main). Drop the branch dimension: remove InvokeQuery
from uses_branch_scope (it joins admin as graph-scoped) and authorize the
boundary gate with branch: None.

Lossless: an actor confined to branch X by their read/change rules can still
only invoke a stored query that touches X. A rule that sets branch_scope on
invoke_query is now rejected by validate() — write invoke_query in its own
rule.

Ripple (atomic): restructure the server invoke fixture so invoke_query sits in
its own branch_scope-free rule; invert invoke_query_is_branch_scoped ->
invoke_query_rejects_branch_scope; the per-graph authorize test uses
branch: None; docs (policy.md, server.md, the InvokeQuery doc). No wire/OpenAPI
change.

* Resolve graph config by identity, not server mode

Which policy/queries block applies for a graph was decided three different,
mode-dependent ways: single-mode boot used top-level even for a named graph;
multi-mode used per-graph (and silently ignored a top-level queries block); the
CLI used per-graph for a named target. So `queries validate --target prod`
could check a different registry than the single-mode server loaded, and a
named graph's per-graph policy/queries were silently shadowed.

Make config a function of graph IDENTITY: a graph served by NAME
(--target/server.graph, a graphs: entry) uses its own graphs.<name>.{policy,
queries}; a bare URI is anonymous and uses top-level. One rule, applied by
single-mode boot, multi-mode boot, and the CLI — so they can't diverge and the
CLI predicts the server exactly.

No silent ignore: serving a named graph while a top-level policy/queries block
is populated now refuses boot, naming the block (the multi-mode top-level-policy
bail, extended to queries and to single-mode-named). The CLI's `queries
validate` derives the schema URI and the registry from ONE selection, and a
positional URI forces anonymous (ignoring cli.graph) so the two can't come from
different graphs.

BREAKING (released behavior): single mode by name (--target/server.graph) with
top-level policy/queries previously used top-level; it now uses the per-graph
block and refuses boot if top-level is also populated. Bare-URI single mode is
unchanged. Loud, with migration text pointing at graphs.<name>.

- config: resolve_policy_file_for (policy sibling of query_entries_for, no
  top-level fallback) + populated_top_level_blocks for the coherence check.
- characterization tests (single-mode named -> per-graph; named + top-level ->
  bail; multi-mode top-level queries -> bail; CLI positional-URI -> top-level).
- docs: policy.md, server.md, cli-reference.md.

* docs: RFC-002 credentials keyed by server name (keychain/profile/env)

Reworks the RFC's credentials model: secrets are keyed by server name — OS
keychain `omnigraph:<server>` (preferred) -> a `[<server>]` profile in
`~/.omnigraph/credentials` -> `OMNIGRAPH_TOKEN[_<SERVER>]` env (CI), the
AWS/gh/kube model. `servers.<name>` is endpoint-only by default but may carry
an explicit, secret-free `auth: { token: { env|file|command|keychain } }`
source. The shipped `bearer_token_env` + `.env.omni` dotenv remain a legacy
compat path; no `credentials.yaml`.

* docs: RFC-002 — typed graph locator (storage/server/graph_id), not a uri string

Add §1.1: the resolved graph address is a typed GraphLocator
(Embedded{storage} | Remote{server, graph_id}), not a flat uri: String.
Diagnoses the string model's cost in the code today (~16 is_remote_uri forks,
TargetConfig can't express multi-server x multi-graph, the CLI bails on remote,
the ts SDK models baseUrl+graphId separately) and settles the YAML naming so
the key names the locus:

- storage: (embedded) — shipped uri: is a deprecated alias
- server: + graph_id: (remote) — graph_id defaults to the entry key
- storage xor server, reject both/neither (no silent ambiguity)

Kills the graphs:/graph: collision and the uri:-might-be-a-server ambiguity.
Updates the §1/§8 examples and the entry-shape notes to the new naming.

* Test: queries list must reject an unknown --target

queries list opens no graph URI, so unknown-graph validation does not ride
along on resolve_target_uri the way it does for every other command. The new
test reproduces the gap: with an unknown --target the command currently exits 0
and prints the (empty) top-level registry instead of erroring like the
URI-resolving commands do. Fails against current code; the fix follows.

* Validate the graph selection in queries list

Graph-existence validation was a side effect of URI resolution: every
URI-resolving command rejects an unknown --target via resolve_target_uri, but
queries list opens no URI, so query_entries_for(Some(unknown)) silently fell
back to the top-level registry and showed the wrong (or empty) catalog.

Make membership a property of the selection: add the fallible
resolve_graph_selection alongside the infallible query_entries_for (a known
name passes through, an unknown name errors with the same message as
resolve_target_uri, None stays anonymous), and validate the selection in
execute_queries_list. query_entries_for is unchanged — server boot's bare-URI
path still needs its None -> top-level arm.

* Surface policy-engine errors from stored-query invoke

The invoke handler mapped every authorize_request failure to 404 ('stored
query not found'), which collapsed the authorization decision (deny -> 403)
together with operational failures (no actor -> 401, Cedar evaluation error ->
500). A real policy-engine 500 was hidden as a missing query.

Separate the two concerns instead of sniffing the masked status. Extract
authorize() returning an Authz { Allowed, Denied(msg) } decision and reserve
Err for operational failures only; authorize_request becomes a thin wrapper
that maps Denied -> 403, so the 16 deny-as-403 callers are unchanged. The
invoke handler now matches the decision directly: a denial stays 404 (deny ==
missing, so the catalog can't be probed without the grant), while a 401/500
propagates with its true status.

500 is now a reachable outcome on POST /queries/{name}; document it in the
endpoint responses and regenerate openapi.json.

* Extract the named-graph/top-level coherence rule into one helper

The rule 'a named graph uses its own graphs.<name> block, so a populated
top-level block is a config error' lived inline in single-mode server boot.
Extract it to OmnigraphConfig::ensure_top_level_blocks_honored so the same
definition can be shared by the CLI selection gate (next commit) and the two
can't drift. Boot calls the helper; the message is reworded context-neutral
(drops 'serving') so it reads correctly from both boot and the CLI.

Behavior-preserving: multi-graph mode keeps its own unconditional check, and
single_mode_named_graph_rejects_top_level_blocks still passes.

* Test: queries validate/list must reject a named graph with a top-level block

Server boot refuses a config where a graph is selected by name yet a top-level
queries:/policy.file block is populated (the block would be silently ignored).
The CLI's queries validate/list resolve the same named selection but skip that
coherence check, so they give a false green / list the per-graph block. The new
test reproduces it: validate prints OK and list succeeds where boot would
refuse. Fails against current code; the fix follows.

* Enforce top-level coherence in the single CLI selection gate

queries validate validated graph membership only as a side effect of URI
resolution and queries list only via resolve_graph_selection's membership
check; neither applied the named-graph/top-level coherence rule server boot
enforces, so both gave a false green on a config boot refuses.

Fold ensure_top_level_blocks_honored into resolve_graph_selection so it is the
single gate that returns only valid + server-coherent selections, and route
resolve_selected_graph (queries validate) through it; queries list already
calls the gate. A named graph with a populated top-level block now errors in
both commands, matching boot. A positional URI stays anonymous (top-level
honored), so queries_validate_positional_uri_ignores_default_graph is
unaffected.

* docs: RFC-003 — MCP server surface for omnigraph-server

Detailed MCP-transport design for the stored-query/MCP work, building on the
shipped #128 registry. Corrects the draft against the branch head: the coarse
invoke_query gate + 404 denial-masking are already wired (server_invoke_query),
so per-query invoke_query scope (PolicyRequest has no query-name dimension yet)
is the real prerequisite; positions the doc as superseding rfc-001's MCP
transport (/mcp/tools+/mcp/invoke) and reconciles the shipped mcp.expose YAML
form and the schema-introspection non-goal; grounds the parity surface in the
actual omnigraph-ts package (13 tools with read/change ids, 2 resources).

* docs(config): clarify graph config boundaries

* fix(config): enforce graph-scoped policies and query validation

* fix(cli): require graph selection for scoped query registries

* fix(server): preserve named graph id in single mode policy

* fix(cli): share graph identity for policy resolution

* test(cli): cover policy tooling server graph selection

* fix(cli): honor server graph for policy tooling

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 22:50:31 +02:00

56 KiB
Raw Blame History

RFC: Config & CLI Architecture — Layered Config, Client Targeting, File Naming

Status: Proposed Date: 2026-05-30 Tickets: MR-668 (multi-graph server, shipped — the dependency this builds on), MR-969 (stored queries + MCP — supplies the in-repo agent tool surface), MR-973 (quickstart / onboarding), MR-974 (agent setup surface), MR-981 (agent-friendly CLI hardening) Target release: v0.8.x (tentative; phased — see Rollout)

Summary

OmniGraph today has a single config file, omnigraph.yaml, read both by the CLI (operating the embedded engine) and by omnigraph-server (hosting graphs). There is no client-side configuration that targets a running server — to talk to a deployed omnigraph-server you drop to curl or the omnigraph-ts client. This is the one real gap in an otherwise coherent design (storage-URI addressing, multi-graph routing, per-graph policy).

This RFC defines the config and CLI architecture that closes that gap, derived from first principles — working backwards from what OmniGraph uniquely enables rather than copying kubeconfig / helix.toml. The result:

  1. A global-first layered config — user-global (~/.omnigraph/) is the primary, self-sufficient default; per-project (./omnigraph.yaml) is an optional override + deployment manifest. One uniform schema, both layers optional; the CLI works from any directory with no project file (the kubectl/aws/gh posture), unlike today's project-anchored behavior.
  2. A single unifying noun — the target — that resolves a name to a concrete (locus, graph, sub-state, credential) tuple, where the locus is embedded (storage URI) XOR remote (server endpoint).
  3. A multi-server × multi-graph client model (OmniGraph hosts N graphs per server and there are M servers — unlike Helix's one-cluster-one-graph).
  4. Credentials by reference, keyed by server name (the AWS/gh/kube model) — OS keychain omnigraph:<server> (preferred) → a [<server>] profile in ~/.omnigraph/credentialsOMNIGRAPH_TOKEN[_<SERVER>] env (CI). servers.<name> is endpoint-only by default but may carry an explicit, secret-free auth: { token: { env|file|command|keychain } } source; no credentials.yaml; the shipped bearer_token_env + dotenv stay as a legacy compat path. Every committed/GitOps'd surface stays secret-free.
  5. A file-naming decision: project and server config are the same artifact, same name (omnigraph.yaml); the only differently-named file is the user-global config.yaml, justified by scope, not role.

The design optimizes jointly for DX (one command surface across embedded and remote; clone-and-go) and AX (agent experience: one flat resolved context, secrets structurally unreachable, branch-pinned reproducible reads, and a GitOps'd capability surface).

Reconciliation with shipped / planned CLI work

Verified against the code, not ticket statuses (which are unreliable — e.g. MR-581 is marked done but is stale and unbuilt). Findings and the corrections they force:

  • Noun is graph/graphs, NOT target/targets. The config key is graphs: in config.rs and the flag is --graph. This RFC uses graphs:/--graph throughout; the unifying noun is a graphs: entry that is embedded (storage:, formerly uri:) XOR remote (server: + graph_id: defaulting to the entry key) — a typed locator (§1.1). Read any lingering targets:/--target below as graphs:/--graph.
  • ~/.omnigraph/ stands on its own merits (Helix/aws/kube peer convention), not on precedent — there is no ~/.omnigraph/ usage in the code today. (MR-581 / MR-531 templates-into-~/.omnigraph/ are stale tickets, unbuilt.)
  • Templates do not exist in the code (no template command). The template mechanism is a design question for this RFC / the init family, not an existing foothold.
  • What actually exists in the CLI (verified): init, query(read), mutate(change), load, ingest, branch, schema, lint, snapshot, export, commit, policy, optimize, cleanup, graphs. Not built: serve, quickstart, template, prune, login. omnigraph init exists (with scaffold_config_if_missing, main.rs:1415); the rest of the "init family" (quickstart MR-973, serve MR-970, prune/init --force MR-972/975, mcp install/skills MR-974, agent-mode MR-981) are unbuilt tickets, some stale.
  • Config still uses aliases: (no operations: in code; MR-839 unbuilt). §6's reconciliation talks about aliases: as-is, noting operations: is a proposed rename.
  • bearer_token_env exists (per-graph, config.rs); MR-971 flags a CLI-parity / server-side gap. The per-servers.<name> extension lands on top of that.
  • A top-level omnigraph lint command exists (verified). A stored-query registry validator must pick a verb that doesn't read as a competing lint/check.

Motivation

Three problems, in priority order:

  • No client→server targeting config. The moment an operator stands up omnigraph-server — for bearer auth + Cedar at a network boundary + admission control + multi-graph routing — the CLI can't address it. curl is the fallback. There is no named, switchable, credential-carrying way to say "run this against prod on the team server."
  • Multi-server × multi-graph has no first-class expression. OmniGraph genuinely runs N graphs per server across M servers. The same graph is multi-homeds3://b/prod may be prod on server A, production on server B, and opened directly by the CLI. Today's flat graphs: map (name→storage-URI) can't express "graph production on server prod-eu."
  • Solo-first and embedded-first are unserved by the remote story. A solo developer with no projects should define everything in ~. A developer iterating locally (embedded, no server) and then pointing at staging (remote) should change one word, not learn a second command surface.

MR-668 shipped the server side (multiple graphs per server). MR-969 ships the in-repo agent tool surface (stored queries / MCP). This RFC supplies the client and config layer that lets humans and agents target that surface coherently — the foundation under MR-973 / MR-974 / MR-981.

Non-Goals

  • A control plane / dashboard for config. Operators edit files and (for servers) restart. No runtime config-mutation API. Matches the MR-668 / MR-969 operational model.
  • Hot reload. Restart-only for server-side config, matching MR-668 and MR-969.
  • Embedding secrets in any config file. Credentials are by-reference; the git-ignored auth.env_file dotenv (or, later, the OS keychain) holds tokens. Never a committable *.yaml.
  • Renaming the project manifest by role. No omnigraph.server.yaml / omnigraph.client.yaml. Role lives in sections, not filenames (see Design §3).
  • Dropping embedded mode. Embedded-first is load-bearing for the file-naming decision; this RFC assumes it stays.
  • Cross-graph / cross-server tool listing in MCP. Clients loop over per-graph catalogs (a MR-969 non-goal, restated).

Background

OmniGraph runs on Lance 6.x: typed nodes/edges in per-type Lance datasets, atomic multi-table commits via a __manifest table, branchable and time-travelable. The CLI (omnigraph) operates the embedded engine directly against a storage URI — no HTTP client in its runtime dependencies. omnigraph-server (Axum) is a separate HTTP front-end over the same engine, with bearer auth + per-graph Cedar (MR-668). The two read the same omnigraph.yaml but never connect to each other.

OmniGraph already has a credentials-by-reference mechanism, which this RFC builds on rather than replacing: TargetConfig.bearer_token_env names the env var holding a graph's bearer token, and auth.env_file points at a git-ignored dotenv (.env.omni) that the CLI auto-loads into the process (load_env_file_into_process) with real-env-vars-win precedence; resolve_remote_bearer_token resolves a token via env var then dotenv named lookup. .env.omni is already in .gitignore.

The six irreducible enablers that drive the design (referenced as E1E6 below):

# Enabler Consequence
E1 A graph is a self-contained storage URI; the substrate (object store + manifest CAS) is the source of truth — no server required to read/write. A graph is addressable directly (embedded), not only via a server.
E2 A server hosts many graphs; many servers exist. The remote address space is {server} × {graph_id}.
E3 The same graph is multi-homed under different per-locus names. Name ≠ identity. Resolution is mandatory.
E4 Branch / commit / snapshot are first-class addressable sub-state. An address is graph @ branch/snapshot, not just graph.
E5 Enforcement is two-layered: engine-layer Cedar (_as writers, works embedded) + HTTP-boundary bearer+Cedar (server only). How you reach a graph determines which enforcement applies.
E6 Stored queries / MCP tools are a per-graph registry defined in the project config (MR-969). The agent tool surface is version-controlled in the repo.

Competitors collapse dimensions OmniGraph keeps live: Helix fuses E2+E3 (one cluster = one graph); namidb fuses E1+E3 into the URI (s3://b?ns=prod) and serves one namespace per process. OmniGraph has all of E1E6 at once, so its config resolves a richer space — but the richness is earned by capability.

Design

1. The address space and the target abstraction

Every OmniGraph address is a tuple:

(locus, graph, sub-state, credential)
  locus      = embedded(URI)  XOR  remote(server-endpoint)        # E1, E2
  graph      = a URI (embedded)  |  a graph_id on a server (remote) # E3
  sub-state  = branch | snapshot                                   # E4
  credential = cloud-storage creds (embedded) | bearer token (remote) # E5

The config's only job is name → this tuple. Define one noun — a target — that resolves to either shape:

targets:
  dev:                       # embedded — substrate-direct (E1)
    storage: s3://team-bucket/dev.omni
    branch: main             # sub-state (E4)
  staging:                   # remote — resolves a server by reference (E2/E3)
    server: staging          # → looked up in `servers`
    graph_id: prod           # the graph's id on that server (defaults to the entry key)
    branch: review

--target staging resolves: project targets.staging{server: staging, graph_id: prod, branch: review}servers.staging{endpoint, token-by-ref} → final (remote(https://…), prod, review, $TOKEN). Embedded targets skip the server hop and use cloud-storage credentials.

Two concepts, not kubeconfig's three. kube splits cluster / user / context; that 3-way split is its most-cursed UX. A target bundles server+graph+branch+defaults under one name; the only thing split out is servers, because endpoints+credentials are shared across many targets and are secret-bearing (different ownership and rate-of-change; see §2). Result: 2 nouns — servers and targets. Embedded targets (storage:) subsume today's graphs: entries.

1.1 The resolved address is a typed locator, not a uri string

The shipped config models a graph as a single uri: String, and code branches on is_remote_uri(uri). That conflates two structurally different addresses: an embedded graph is a complete, self-contained address — one storage URI = one graph, opened directly via the embedded engine; a remote graph is a server endpoint + a graph_id — one server hosts N graphs. A bare server URL is not a graph; it lacks the graph_id. The cost of the string model, in the code today:

  • the CLI re-decides "server or file?" via is_remote_uri at ~16 call sites;
  • TargetConfig (one uri field) cannot express multi-server × multi-graph or a multi-homed graph (E2/E3) — "graph production on server prod-eu" has no representation;
  • the CLI bails on remote URIs for most operations, precisely because the string can't carry the graph_id;
  • the omnigraph-ts SDK had to model baseUrl + graphId separately (rewriting /graphs/{graphId}/…) — it invented the structure the string lacks.

So the resolved address is a typed locator, not a string:

enum GraphLocator {
    Embedded { storage: StorageUri },                  // file:// , s3:// — a complete graph
    Remote   { server: ServerId, graph_id: GraphId },  // which server + which graph (+ bearer creds)
}

A graphs: entry resolves into this once; downstream code dispatches on the variant (the breadboard's GraphConn = Embedded(engine) | Remote(http)) instead of re-sniffing a scheme at each call site. The uri string becomes an input format for the embedded variant, never the address itself.

YAML naming follows the locator — the key names the locus, so neither the value's scheme nor a comment is load-bearing:

Locus Key Value
Embedded storage: (shipped uri: is a deprecated alias) a storage URI (s3://…, file://…)
Remote server: a name in servers: (its endpoint + creds resolve by name, §5)
Remote graph id graph_id: the id on that server — defaults to the entry key; set only when the local alias differs

An entry has storage: xor server: — the deserializer rejects both and neither (no silent ambiguity). This removes two prior confusions: graphs: (the map) vs graph: (the remote id), and uri:-might-be-a-server.

servers:
  prod-eu: { endpoint: https://og-eu.internal:8080 }
graphs:
  dev:        { storage: s3://team-bucket/dev.omni }   # embedded
  production: { server: prod-eu }                       # remote — graph_id = "production" (the key)
  staging:    { server: prod-eu, graph_id: prod }       # remote — alias ≠ server's id

1.2 Invalid configs are rejected by design

The DX rule is: a config field is either honored or rejected, never silently ignored. The loader therefore has two phases:

  1. Parse YAML into a loose/raw shape that preserves origin (base_dir, layer, line/path when available).
  2. Convert once into a typed, role-aware resolved config. Every command receives the resolved form, not the raw YAML structs.

The typed graph shape is:

enum GraphEntry {
    Embedded(EmbeddedGraphEntry),
    Remote(RemoteGraphEntry),
}

struct EmbeddedGraphEntry {
    storage: StorageUri,
    branch: Option<BranchName>,
    policy: Option<PolicyFile>,
    queries: QueryRegistrySpec,
}

struct RemoteGraphEntry {
    server: ServerId,
    graph_id: GraphId,
    branch: Option<BranchName>,
}

That makes these rules structural rather than advisory:

  • A graph entry must specify exactly one locator: storage:/legacy uri: xor server:.
  • policy: and queries: are valid only on Embedded graph entries, because they define the capability surface of a graph this process opens directly. A Remote graph entry points at a server; that server owns policy and stored-query definitions.
  • omnigraph-server may serve only Embedded graph entries. A server manifest entry with server: is rejected: a server should not "host" a graph by proxying another server.
  • A named graph uses its own graph entry. Top-level policy: / queries: are a legacy anonymous-bare-URI compatibility path only; if a named graph is selected while top-level blocks would be ignored, config validation errors with a migration hint.
  • A client-defined remote graph discovers stored queries from the server (GET /queries) and invokes them (POST /queries/{name}); it does not define queries: locally for that remote graph.

Examples that must fail fast:

graphs:
  prod:
    storage: s3://team-bucket/prod.omni
    server: prod-us                         # invalid: storage xor server
graphs:
  prod:
    server: prod-us
    graph_id: production
    policy: { file: ./policies/prod.yaml }  # invalid: remote graph policy lives on the server
    queries:
      find_user: { file: ./queries/find_user.gq } # invalid: remote graph queries are discovered

omnigraph config view --resolved --show-origin is the user-facing debugger for this boundary: it shows the final Embedded or Remote graph and where every honored field came from. Fields that cannot be honored never make it into the resolved view; they fail validation first.

2. Layered config — global-first, uniform schema, project-optional

Posture: global-first, project-optional. OmniGraph's CLI is primarily a client (it operates against graphs and servers, embedded or remote), so it sits on the global-first side of the CLI-config axis — like kubectl / aws / gh / docker, and unlike project-first tools (git / cargo / terraform) whose primary config is per-repo. The global user config is the primary, self-sufficient default; the project file is an optional repo-scoped override (and, when present, the deployment manifest). omnigraph query --target prod must work from any directory with no project file, exactly as kubectl get pods --context prod works from anywhere. (This is a deliberate flip from today, where the CLI reads ./omnigraph.yaml and does not even walk parent dirs — i.e. today it is project-anchored.)

Rule: the two layers share ONE raw schema, and each is fully self-sufficient (the git-layering mechanism — same schema at both levels; you never need a repo to have a working config). Do not specialize the file format by layer. Instead, run the same role-aware validation everywhere (§1.2): the global and project layers may both define graph locators, defaults, servers, and aliases, but fields that are meaningless for a resolved graph variant are rejected rather than ignored. For example, queries: is valid for an embedded graph this config opens directly; it is invalid on a remote graph entry because remote stored queries are server-owned and discovered.

This makes the zero-project case the default, not an edge case: a solo user (or an agent) defines everything needed for client work in ~/.omnigraph/config.yaml — servers, embedded + remote graph locators, defaults, aliases, and optionally personal embedded-graph query registries — and never creates a project file. A team adds ./omnigraph.yaml only when it wants repo-scoped overrides or a committed, GitOps'd deployment manifest. Global-first does not forbid project files; it stops requiring them (the kubectl model: ~/.kube/config is sufficient and default; per-project kubeconfigs are opt-in via KUBECONFIG).

Layer Required? Typical use Path
Global no the default — solo/agent's entire config; shared servers+creds for teams; even a personal server's graphs/queries ~/.omnigraph/config.yaml
Project no opt-in — repo-scoped overrides + the committed deployment manifest (graphs, queries, policy) ./omnigraph.yaml

Precedence (low → high): built-in defaults < global < project < env vars < CLI flags. With no project file it collapses to built-in < global < env < flags — the common global-only path.

Merge semantics — "closest layer wins, at the smallest meaningful unit" (the field consensus: git / kubeconfig / cargo / Helm / VS Code):

  • Settings objects (defaults, auth, server) → deep-merge per field: a project sets defaults.graph and inherits the global defaults.output_format. (VS Code / cargo behavior.)
  • Named-resource maps (servers, graphs / compat targets, queries, aliases) → union by key; on a collision the higher layer's entry REPLACES the lower wholesaleno field-level deep-merge within an entry. (kubeconfig: union contexts by name.) The footgun this avoids: global servers.prod = {endpoint, policy}, project servers.prod = {endpoint: other} — deep-merge would silently retain the old fields; replace makes the project's prod self-contained and predictable.
  • Lists/arraysreplace, never append (Helm convention; appending is order-sensitive and surprising).
  • Scalars → higher layer wins.
  • Relative paths carry their origin's base_dir. A queries: entry's .gq path, or a policy.file, resolves against the directory of the layer it was defined in — global entries under ~/.omnigraph/, project entries under the project dir.
  • Inspectable (non-negotiable): omnigraph config view --resolved --show-origin prints each final value and which layer set it (the git config --show-origin / kubectl config view rule). A layered config without origin-tracing is a debugging trap.

3. Roles, and the file-naming decision (same name for project = server)

omnigraph.yaml carries two roles that diverge in prod and collapse on a laptop:

  • Server role (read by omnigraph-server): graphs: entries that are embedded storage locators, per-graph policy.file, queries: — the stored-query/MCP registry lives here, plus serving knobs. Remote graph locators are rejected in this role.
  • Client role (read by the CLI/agent): servers:, embedded or remote graphs: locators, defaults:, aliases:. A remote graph locator points at server-owned capabilities; it cannot define local policy: or queries:.

Project config and server config are the same artifact, hence the same name. The server serves the project: the file that says "these graphs exist, with these stored queries and this policy" is simultaneously the project manifest and the server's deploy config. Role is distinguished by which sections are populated, never by filename. Readers ignore sections that are not theirs (today's file already does this with cli: vs server:).

Why not kube's role-split. Two coherent models exist: (A) one project file with role-sections (Helix helix.toml holds both [local.dev] and [enterprise.production]; compose; Cargo), and (B) deployment-manifest strictly separate from client config (kubectl — you never put a context in deployment.yaml). kube is the sharpest topological analog (multi-server × multi-graph, one client targeting many), so B has a real claim. The tiebreaker is E1: OmniGraph is embedded-first. In embedded mode the manifest's graphs: is the local target list — manifest and local-client-view are the same object, so splitting them (B) fights the grain and forces two files for local work. kube splits because it has no embedded mode (client always remote+global). So: take the half kube is right about — remote client targeting (servers:, endpoints, creds) is a separate concern in a separate user-global file (config.yaml, like ~/.kube/config); reject the half it is wrong about for us — do not split the project layer by role. The second name (config.yaml) is justified by scope (user-global), not role. (If OmniGraph ever dropped embedded mode and went pure-remote, model B's strict split would become cleanest.)

4. File naming

Principles from the field: one global dir ~/.omnigraph/ (like ~/.aws/~/.kube/~/.helix), with config/cache/state as subdirectories (separation without XDG's three-root scatter); secrets keyed by server name in the OS keychain or a separate git-ignored profile file (AWS/gh model, not a new credentials.yaml); project-root manifest keeps the app-named file (Cargo.toml, package.json); .yaml, not .yml; keep OmniGraph's established names. The genuinely new decisions are the global dir's existence and keyed-by-name resolution with an explicit auth.token override (MR-971); the shipped bearer_token_env + auth.env_file mechanism remains as legacy compat.

Artifact Path / name Why
Project = server config (one artifact) ./omnigraph.yaml Keep. Root manifest like Cargo.toml / compose.yaml / helix.toml. Same name for both roles because it is one file. In prod the server's deploy repo and an app repo each have their own omnigraph.yaml — same name, different repos.
Global user config ~/.omnigraph/config.yaml One dir (~/.omnigraph/, like ~/.aws/~/.kube/~/.helix). Named config.yaml not omnigraph.yaml — the name signals scope (and ~/.aws/config, ~/.kube/config, ~/.helix/config all do this). Holds the full schema so a solo user needs nothing else.
Credentials OS keychain (omnigraph:<server>, preferred) → ~/.omnigraph/credentials profile file ([<server>], 0600, git-ignored). Keyed by server name, inside the one dir. Key by name, AWS/gh model~/.aws/credentials [profile], ~/.kube/config users:, ~/.helix/credentials. Not a credentials.yaml, and not a per-server hand-named env var; the secret lives under the server name (no indirection). Legacy bearer_token_env + .env.omni dotenv remain as a compat path. See §5.
Cache / state ~/.omnigraph/cache/, ~/.omnigraph/state/ Subdirs of the one dir (like ~/.aws/sso/cache/, ~/.kube/cache/) — cache is rm -rf-safe and backup-excludable without scattering across XDG roots.
Cedar policy ./policies/<env>.yaml + <env>.tests.yaml Keep. Referenced by policy.file.
Schema ./*.pg (e.g. schema.pg) Keep.
Stored queries ./queries/*.gq Keep. .gq sources referenced by the queries: registry.

Global dir: ~/.omnigraph/ — one place, with subdirectories. Everything OmniGraph keeps for a user lives under a single ~/.omnigraph/ directory, matching the peer group (~/.aws, ~/.kube, ~/.docker) and the direct competitor (~/.helix). This is what DB/cloud-CLI users expect and the lowest-cognitive-load shape.

Separation and "one place" are not in conflict — the decisive realization. The peer tools get config/cache/state separation via subdirectories inside the one dir, not via XDG's three scattered roots: ~/.aws/sso/cache/, ~/.kube/cache/. So OmniGraph keeps ~/.omnigraph/config.yaml, ~/.omnigraph/credentials, ~/.omnigraph/cache/ (catalogs — rm -rf-safe, backup-excludable), ~/.omnigraph/state/ (session, logs) — getting cache hygiene and a single discoverable location, without the XDG scatter. An earlier draft argued XDG on a false dichotomy (it assumed single-dir ⇒ mixed); subdirs dissolve it. ~/.omnigraph/ is canonical and documented; $XDG_CONFIG_HOME may optionally be honored if a user has set it, but XDG is not part of the mental model.

Env / override precedence (the KUBECONFIG analog):

  • OMNIGRAPH_CONFIG=/path — explicit config file, highest precedence.
  • OMNIGRAPH_HOME=/path → the global dir (default ~/.omnigraph/); $XDG_CONFIG_HOME optionally honored if a user has set it, but ~/.omnigraph/ is canonical.
  • Cache and state are subdirs of the one dir: ~/.omnigraph/cache/ (cached remote catalogs), ~/.omnigraph/state/ (session, logs).
  • Per-server token resolution: an explicit auth: { token: {...} } source (env/file/command/keychain) wins if set; otherwise keyed by the server nameOMNIGRAPH_TOKEN_<NAME> (or OMNIGRAPH_TOKEN for the active server) → OS keychain omnigraph:<name> → the [<name>] profile in ~/.omnigraph/credentials; legacy bearer_token_env still honored. See §5.

5. Credentials, connection tiers, and bind portability (12-factor)

Credentials are by-reference everywhere, never inlined — and keyed by the server name, not by a hand-invented env-var name. This is the one place the design departs from simply reusing the shipped bearer_token_env mechanism, because that mechanism is sub-optimal for a multi-server client: it forces the operator to invent and coordinate an env-var name per server (three steps to add a server: pick a var, name it in config, set it in the store). The peer group (AWS profiles, gh hosts, kubeconfig users, docker auths) instead keys the secret by the server's name — no indirection. OmniGraph should match that.

Resolution for server <name> (no config field required):

  1. OMNIGRAPH_TOKEN_<NAME> env var (name-derived, upper-snake), else OMNIGRAPH_TOKEN for the active server — the CI/headless override (12-factor).
  2. OS keychain entry omnigraph:<name> — the preferred interactive store (no plaintext on disk); written by omnigraph login <name>.
  3. ~/.omnigraph/credentials — an AWS-style profile file keyed by server name (mode 0600, git-ignored), the fallback when no keychain:
    [prod-us]
    token = 
    [prod-eu]
    token = 
    

So a servers.<name> with no token field resolves by name — adding a server is one step (omnigraph login <name>), and "multiple servers, multiple tokens" falls out for free.

But implicit must not be the only path — explicit sourcing is a first-class option (the DX/AX lesson). Pure-convention is invisible (you must know OMNIGRAPH_TOKEN_<NAME>), can't integrate with a secrets-manager's fixed var name, and can't do dynamic/short-lived tokens. So a server may declare an explicit auth: block — a method-agnostic wrapper (today only token: for bearer; mtls:/oidc: are the future siblings, so the credential model never has to be re-keyed) holding a tagged token source. Secrets are still never inlined (every source is a reference):

servers:
  prod-us:
    endpoint: https://og-us…
    auth: { token: { env: OG_PROD_US_TOKEN } }     # explicit env var — self-documenting (= legacy bearer_token_env)
  prod-eu:
    endpoint: https://og-eu…
    auth: { token: { command: [vault, read, -field=token, secret/og] } }   # dynamic / short-lived
  edge:
    endpoint: https://og-edge…
    auth: { token: { file: /run/secrets/og-token } }   # k8s/docker mounted secret
  staging:
    endpoint: https://og-staging…          # no auth: → implicit chain (below)
auth.token: source when DX/AX value
(auth omitted) the common case zero-config; omnigraph login populates keychain omnigraph:<name>
{ env: VAR } secrets-manager / CI injects a fixed var self-documenting — config states the source; = the legacy bearer_token_env
{ file: PATH } k8s/docker secret mounted as a file no env plumbing
{ command: [...] } Vault, cloud IAM, gh auth token dynamic tokens — first-class exec, the capability pure-env/keychain can't give (kube exec / AWS credential_process)
{ keychain: ENTRY } pin a non-default keychain entry explicit override of the name-derived default

Resolution per server: if auth.token: is set, use that source (no fallthrough). Else the implicit chain: OMNIGRAPH_TOKEN_<NAME> (or OMNIGRAPH_TOKEN for the active server) → keychain omnigraph:<name>[<name>] in ~/.omnigraph/credentials (0600, git-ignored). omnigraph login <server> writes/rotates only that server's secret; per-server precedence is independent; sharing is opt-in (same env var or source). The command source runs locally with the operator's own privileges and is defined only in operator-owned config (never server-supplied), so it adds no remote-execution surface. The auth: wrapper is method-agnostic so adding mTLS/OIDC later is a new sibling key, not a breaking re-key (Hyrum's Law: the field name is a contract once shipped). There is no credentials.yaml and no inlined secret. Convention for the floor, explicit for control — and explicit is legible to agents and never inlines a secret.

Back-compat. The shipped per-graph bearer_token_env + auth.env_file dotenv (resolve_remote_bearer_token, real-env-wins) keeps working unchanged for existing single-server setups; bearer_token_env is just the legacy flat alias for auth: { token: { env } }. Resolution tries an explicit auth.token: (or legacy bearer_token_env) first, then the keyed-by-name chain — so nothing breaks, but the zero-config default is the no-boilerplate keyed-by-name path. (MR-971 — the bearer_token_env parity gap — is where this resolver work lands.)

Three connection tiers (Supabase/Prisma teach the zero-config floor):

  1. Env varsOMNIGRAPH_SERVER=https://… + OMNIGRAPH_TOKEN=…: zero-config remote, no file (the DATABASE_URL floor).
  2. Global config.yaml — named servers: + graphs: for multi-server setups (the AWS-profiles convenience).
  3. Project omnigraph.yaml — project-pinned targets/graphs, committed.

Keep omnigraph.yaml a portable manifest (12-factor). Deploy-specific runtime that varies per environment — the bind host/port, worker counts — should be supplied by --bind / OMNIGRAPH_BIND (flags/env), not a committed server.bind: baked into the manifest. A manifest that hardcodes 0.0.0.0:8080 is not portable across deploys and leaks an environment detail into a version-controlled file. The same-named omnigraph.yaml stays portable across deploys precisely because the volatile, per-environment knobs live in env/flags (12-factor config), while the stable, portable definition (graphs, queries, policy) lives in the file. This is the one concrete lesson taken from kube's model-B without adopting its file split: portability via env/flags, not via a second file.

6. Where stored queries live: defined locally, invoked remotely

A stored query splits across two axes; do not conflate them:

  • Definition (.gq source + queries: entry) lives next to the embedded graph entry that owns it. For a hosted remote graph, that is the deployment manifest read by omnigraph-server; for a personal embedded graph, it may be the user's own config. It never lives on a client-side Remote graph entry.
  • Discovery ("what tools exist for me?") is fetched from the server (Cedar-filtered GET /queries / MCP catalog) at connect time.
  • Invocation is remote (client → server, HTTP/MCP) — or embedded (the CLI opens the graph directly and reads the same manifest).

For remote use, the client carries pointers to servers, not query definitions; it discovers and invokes, never defines. This is the capability-as-code guarantee for agents: an agent can only invoke tools the server's committed, reviewed config exposes — it cannot define a new tool at runtime. Definition is structurally outside the agent's reach.

queries: (graph-capability registry, Cedar-gated when served remotely, MCP-visible when exposed) and aliases: (client CLI shortcut) overlap — both can name .gq-backed operations. This RFC keeps them siblings (the MR-969 decision); the clean long-term is one registry, two invocation surfaces (embedded + remote), with aliases: subsumed. Out of scope here.

Reconciling aliases: with the role model

aliases: is the pre-MR-969, client-role, embedded-only, ungated ancestor of queries:. An alias bundles command (read/change), query (.gq path), name (symbol), args (positional param names), and graph/branch/format defaults; the CLI runs it embedded. The server never reads it. So:

  • Role: aliases: is client-role (CLI behavior) → it may live in both the user-global config.yaml and the project manifest, layered. queries: is graph-capability role → it lives only on an Embedded graph entry, and for remote server graphs that means the server deployment manifest. Who opens the graph determines where query definitions can live.
  • Difference: aliases: = embedded invocation, no gating, explicit command, bundles client defaults + positional args. queries: = remote (+future embedded), Cedar + mcp.expose, infers read/mutate, bundles only MCP settings.
  • Convergence: decompose an alias — definition (name→.gq+symbol) → queries: (the superset: typed, validated, gated, multi-surface, no redundant command); target/branch/format → client invocation context (--target/--branch/--format or defaults:), not baked per-query; positional args → thin CLI sugar or dropped (agents/services use named JSON params). End-state: one queries: registry + the client config model subsumes aliases:.
  • Validation: a file-backed alias (query: ./foo.gq) may target only an embedded graph. A remote graph shortcut must be explicit that it invokes a server-owned stored query, e.g. invoke: find_user, so the client cannot smuggle a new .gq definition into a remote capability surface.
  • v1: keep aliases: unchanged. Footgun worth a load-time warn: an alias and a query with the same name in one manifest are different namespaces invoked differently (--alias X vs POST /queries/X).
aliases:
  local_owner:
    command: query
    query: ./queries/owner.gq
    name: owner
    graph: dev          # valid only if `dev` resolves Embedded

  remote_owner:
    invoke: find_user
    graph: prod         # valid only if `prod` resolves Remote; source lives on the server
    args: [name]

7. CLI surface

  • omnigraph login <server> — interactive auth; stores the token keyed by server name in the OS keychain (omnigraph:<server>) or the [<server>] profile of ~/.omnigraph/credentials (0600). The gh auth login analog.
  • omnigraph use <graph> — set the active graph (writes the appropriate layer). The kubectl config use-context analog.
  • omnigraph config view [--resolved] [--show-origin] [<graph>] — print the merged config and, with --resolved, the final tuple plus the origin layer of every field (the git config --show-origin / kubectl config view analog). Resolution is never a mystery.
  • All existing verbs (query, mutate, load, schema, branch, …) gain --graph <name>; resolution decides embedded vs remote transparently.

7.5 Init, login, and bootstrap — three tiers (folds in the Q2 design)

Scaffolding splits into three tiers by scope and fatness, mirroring the field (supabase init vs login; HelixDB thin init vs fat chef). Most of this lives in sibling tickets; this RFC owns only the user route.

Tier Command Scope What it does Model Status
User route omnigraph login [<server>] user (~/.omnigraph/) auth + write ~/.omnigraph/config.yaml / credentials; first-run global setup gh / supabase login this RFC (unbuilt)
Thin project init omnigraph init project, in-place create graph + scaffold_config_if_missing (omnigraph.yaml + minimal .pg/.gq); refuse-if-exists or --force cargo init, prisma init exists; --force purge = MR-975
Fat bootstrap omnigraph quickstart [--template <t>] [--auto] project, possibly new-dir scaffold + seed data + serve start + agent prompt file HelixDB chef, create-next-app MR-973 (unbuilt)

Design positions (first-principles, since none of the fat tier is built):

  • Split init (project) from login (user) — never one command writing to both $HOME and the project (the supabase line, not the dbt line). init=project scaffold; login=user credential + global config.
  • init is in-place + refuse-if-exists (cargo/prisma/terraform default): don't clobber; adopt existing files; require --force to overwrite (and --force purges Lance state per MR-975).
  • Interactive for humans, --auto/agent-mode for automation (npm -y, create-* --CI, MR-981 --machine). In OMNIGRAPH_AGENT_MODE any prompt → fail with a repair hint.
  • Templates are a --template <name> flag on the fat tier (create-vite model), with the content (schema + queries + seed) coming from a template source. Mechanism is a design question (bundled-in vs og template pull from a repo vs npm create-*-style delegation) — not an existing foothold (MR-581 stale). Lean: a small set of bundled templates first (generic Person→Knows, plus promote omnigraph-intel-bootstrap), --template <github> later.
  • init/quickstart can scaffold the graphs: map with one or more entries; "init with specific graphs" = the scaffolded graphs: block (embedded storage: locally; the agent/operator adds remote server: entries via login + editing).
  • Secrets-on-scaffold rule (prisma/dbt/supabase all do this): anything that writes a token also keeps it out of VCS. login prefers the OS keychain (no file); the ~/.omnigraph/credentials profile fallback is 0600 and git-ignored, and any project-local .env-shaped file gets a .gitignore entry.

8. Concrete shape

Global ~/.omnigraph/config.yaml (per-user, secret-free):

servers:                               # endpoint only — token is keyed by the server name
  prod-us:  { endpoint: https://og-us.internal:8080 }
  prod-eu:  { endpoint: https://og-eu.internal:8080 }
  staging:  { endpoint: https://og-staging.internal:8080 }
graphs:
  personal: { storage: ~/graphs/personal.omni }
defaults:
  graph: personal
aliases:
  my_people:
    command: query
    query: ~/queries/people.gq
    name: list_people
    graph: personal

Project client ./omnigraph.yaml (committed, secret-free, portable — no server.bind). Note the shipped noun is graphs: (MR-603); an entry is embedded (storage:) XOR remote (server: + graph_id:, §1.1):

graphs:
  dev:      { storage: s3://team-bucket/dev.omni, branch: main }  # embedded
  staging:  { server: staging, graph_id: prod, branch: review }   # remote → graph `prod` on server `staging`
  prod-us:  { server: prod-us, graph_id: production }
  prod-eu:  { server: prod-eu, graph_id: production }             # multi-homed: same graph, another server
defaults: { graph: dev, output_format: table }
aliases:
  owner:
    command: query
    query: ./queries/owner.gq
    name: owner
    args: [name]
    graph: dev

Select with --graph <name> (shipped flag, MR-603).

Server deployment ./omnigraph.yaml (committed in the deploy repo, read by omnigraph-server). Every served graph is an embedded storage locator; server-owned policy and stored-query definitions live here:

graphs:
  production:
    storage: s3://team-bucket/prod.omni
    policy:
      file: ./policies/prod.yaml
    queries:
      find_user:
        file: ./queries/find_user.gq
        mcp: { expose: true, tool_name: lookup_user }

server:
  policy:
    file: ./policies/server.yaml

Credentials are keyed by server name — omnigraph login prod-us writes the OS keychain entry omnigraph:prod-us (or a [prod-us] profile in ~/.omnigraph/credentials, 0600, git-ignored); OMNIGRAPH_TOKEN_PROD_US overrides for CI. No token fields in any config file; no committable secrets.

DX

  1. One command surface, two loci. query --graph dev (embedded) and --graph staging (remote) are the same command; only resolution differs. Change one word, not a mental model.
  2. Clone-and-go. Project config names servers+graphs; teammate runs omnigraph login staging once and every target resolves. The git + gh auth login model.
  3. Multi-server × multi-graph is the default. Remote graph entries reference server by name; servers is a global named map; graphs are per-server. prod-us and prod-eu both serving production is two graph entries — Helix cannot express this.
  4. Solo-first. Everything in ~, no project required.
  5. Laptop-to-fleet on one schema. Local = one omnigraph.yaml (both roles); prod = role-split across repos. No second format to learn.

AX (agent experience)

  1. One flat resolved context, never a config to navigate. target→server→endpoint→token resolves before the agent sees anything. The agent reasons about tools, not topology (the LLM-safe-surface principle extended to config).
  2. Secrets are structurally outside the agent's reach. The repo it operates in has no tokens; they are in the global layer / keychain, outside its view. An agent cannot exfiltrate a prod token from project config because it is not there.
  3. Branch/snapshot-pinned contexts (E4) — hand an agent a branch: review / --snapshot v42 target and its reads are reproducible and cannot see uncommitted main-line state. No kubeconfig analog.
  4. The agent's capabilities are a GitOps'd artifact (E6) — which graphs exist, which stored-query tools it may call, and which Cedar rules gate them are all in the version-controlled server config. Powers change only via a reviewed PR, deployed by restart. Infrastructure-as-code for what the AI can do.
  5. Config + policy compose. Config = "where am I pointed + which token"; Cedar = "what may I do there." Orthogonal; no enforcement logic leaks into config.

GitOps — three surfaces, secrets in none

Surface Repo Contents Deploy Secrets
Server deployment config infra/deploy repo graphs:, policy, queries: + .gq files commit → CI → server restart (no hot reload) none — by-reference
Project client config app repo graphs: → embedded storage or remote server+graph committed, read by CLI/agent none
Global user config not GitOps'd — machine-local ~ servers: + creds-by-ref omnigraph login writes it refs only (like ~/.kube/config)

Comparison

Property kubeconfig Helix git compose OmniGraph (this RFC)
Named remote endpoints + creds-by-ref partial partial (global servers)
Global + project layering, uniform schema
Embedded OR remote under one name n/a (E1)
Multi-server × multi-graph n/a n/a (E2)
Branch/snapshot in the address partial (E4)
Agent tool surface in the repo ✗ (separate bundle) n/a n/a (E6)
Project manifest renamed by role no no no
Concept count 3 1 2 1 2 (servers/targets)

Migration / backwards compatibility

  • Additive. Today's omnigraph.yaml (graphs:, cli:, server:, aliases:, policy:) keeps working unchanged. graphs: entries are equivalent to embedded targets: with a storage: (shipped uri: is a deprecated alias); both resolve.
  • targets: is new and optional. servers: is new and optional. Absent → today's behavior.
  • Global ~/.omnigraph/config.yaml is new. Absent → only project + env + flags, exactly as now. Its addition is the global-first posture flip: today the CLI is project-anchored (reads ./omnigraph.yaml, no parent walk); the global config becomes the new primary discovery path so the CLI works with no project file. Existing project-only workflows are unchanged (project still overrides global); the flip is additive — it adds a fallback layer below the project file, it does not remove the project file.
  • graphs:targets: is an evolution, not a break. Both can coexist; targets: is the superset (adds remote + branch pinning). A future cleanup may alias graphs: to embedded targets:.
  • server.bind stays supported but documentation steers operators to --bind / OMNIGRAPH_BIND for portability; no removal.
  • Credentials: keyed-by-name is new; bearer_token_env is the compat path. The primary design (keychain / [<server>] profile / OMNIGRAPH_TOKEN_<SERVER>) is new resolver work (lands on MR-971). The shipped bearer_token_env + auth.env_file dotenv (resolve_remote_bearer_token) is unchanged and still honored — existing single-server dotenv setups keep working, and the resolver honors an explicit auth: { token: {...} } source (env/file/command/keychain) with bearer_token_env as its flat legacy alias. No credentials.yaml.
  • Validation tightens invalid mixes, not valid legacy use. Top-level policy: / queries: remain only for anonymous bare-URI compatibility. Named graphs use per-entry fields. Remote graph entries with local policy: / queries: and server manifests with server: graph locators are rejected because there is no correct way to honor those fields.

Open questions

  • graphs: vs targets: naming churn. Do we rename graphs:targets: (with a deprecation alias) or keep graphs: for embedded and add targets: for remote? Leaning: keep both, document targets: as the superset.
  • Keychain integration scope. Keychain is now the primary credential store (§5), so this is on the critical path, not optional: macOS Keychain first (matches operator practice) with the 0600 [<server>] profile file as fallback; Linux Secret Service / pass later. Open: which keyring crate, and the exact OMNIGRAPH_TOKEN_<SERVER> name-derivation (upper-snake, non-alnum → _).
  • Project-local servers:. Allowed (e.g. a localhost dev server), merged with global. Confirm creds stay by-reference even for project-local servers (yes).
  • aliases:queries: convergence. Out of scope here; tracked separately. One registry with embedded + remote invocation surfaces is the target end state.
  • Single-file KUBECONFIG-style list. Do we support OMNIGRAPH_CONFIG pointing at multiple files (colon-joined), or a single file only? Start single; revisit if demand appears.

Implementation — breadboard + slices (Shape A)

Shaped via requirements + a fit check (Shape A — global-first layered config + unified graphs: entry + three-tier init — selected over a project-first minimal option and a Helix-clone). This section breadboards A and slices it. Bold = NEW.

Places

# Place What
P1 Disk ~/.omnigraph/{config.yaml, credentials, cache/, state/} + project omnigraph.yaml + .env.omni
P2 Config resolution runs on every command: load layers → merge → resolve --graph
P3 Command execution embedded engine OR remote HTTP client
P4 Remote omnigraph-server existing HTTP surface (/query, /mutate, /queries/{name})
P5 Scaffold login / init / quickstart

Affordances

# Place Affordance NEW? Wires
U1 P1 ~/.omnigraph/config.yaml (operator edits) N → N1
U2 P1 project ./omnigraph.yaml → N1
U3 P1 ~/.omnigraph/credentials / .env.omni dotenv (secrets, git-ignored) → N4
U4 P3 omnigraph <verb> --graph <name> (any command) → N14
U5 P5 omnigraph login [<server>] N → N11
U6 P5 omnigraph init / quickstart [--template] partly → N12 / N13
U7 P2 omnigraph config view --resolved --show-origin N → N10
N1 P2 load_layered_config() — global (N3) + project (cwd), serde each N → N2
N2 P2 merge engine — deep-merge settings; replace named-resource entries; replace lists; retain provenance and raw field origins N⚠️ → N5, → S_merged
N3 P2 global-dir resolver — OMNIGRAPH_HOME else ~/.omnigraph/ N → N1
N4 P2 load_env_file_into_process — dotenv, real-env-wins (existing) → N9
N5 P2 resolve_graph(name, merged) → typed Embedded/Remote locator; rejects invalid role/field combinations before execution N⚠️ → N6
N6 P3 GraphConnEmbedded(engine) | Remote(http) dispatch N⚠️ → N7, → N8
N7 P3 embedded path — Omnigraph::open(uri) (existing) → engine
N8 P3 HTTP-client path — POST /query//mutate//queries/{name} N⚠️ → P4, → N9
N9 P2 resolve_bearer_token(server) — explicit auth.token source if set, else keyed by name: OMNIGRAPH_TOKEN_<NAME>/OMNIGRAPH_TOKEN → keychain omnigraph:<name>[<name>] profile; legacy bearer_token_env/dotenv (MR-971) N⚠️ → N8
N10 P2 config view handler — merged + per-field origin (needs N2 provenance) N → U7
N11 P5 login handler — interactive auth → write config.yaml + credentials (0600) + .gitignore N⚠️ → S_global
N12 P5 init handler — scaffold_config_if_missing + create graph; refuse-if-exists/--force purge (MR-975) partly → S_project
N13 P5 quickstart handler — scaffold + --template + seed + serve start + agent prompt (MR-973; needs serve MR-970) N⚠️ → S_project
N14 P3 agent-mode wrapper — --machine/OMNIGRAPH_AGENT_MODE: JSON, structured errors, never-prompt, typed exit codes (MR-981) N⚠️ → N1
S_global P1 ~/.omnigraph/config.yaml + credentials N read by N1/N9
S_project P1 ./omnigraph.yaml + .env.omni read by N1/N4
S_merged P2 in-memory resolved config (per command, with provenance) N read by N5/N10
S_cache P1 ~/.omnigraph/cache/ (remote catalogs) N read by N8
flowchart TB
  subgraph P1["P1: Disk"]
    U1["U1: ~/.omnigraph/config.yaml"]
    U2["U2: ./omnigraph.yaml"]
    U3["U3: credentials dotenv"]
  end
  subgraph P2["P2: Config resolution"]
    N3["N3: global-dir (OMNIGRAPH_HOME)"]
    N1["N1: load_layered_config"]
    N2["N2: merge engine (+provenance)"]
    N4["N4: dotenv loader"]
    N5["N5: resolve_graph(--graph)"]
    N9["N9: resolve_bearer_token"]
    N10["N10: config view"]
  end
  subgraph P3["P3: Command execution"]
    U4["U4: omnigraph <verb> --graph"]
    N14["N14: agent-mode wrapper"]
    N6["N6: GraphConn embedded|remote"]
    N7["N7: embedded Omnigraph::open"]
    N8["N8: HTTP-client POST"]
  end
  subgraph P5["P5: Scaffold"]
    U5["U5: login"]; U6["U6: init/quickstart"]
    N11["N11: login handler"]; N12["N12: init"]; N13["N13: quickstart"]
  end
  P4["P4: remote omnigraph-server"]
  U1-->N1; U2-->N1; N3-->N1; N1-->N2-->N5-->N6
  U3-->N4-->N9-->N8
  U4-->N14-->N1
  N6-->N7; N6-->N8-->P4
  N2-->N10-->U7["U7: config view --resolved"]
  U5-->N11; U6-->N12; U6-->N13
  classDef ui fill:#ffb6c1,stroke:#d87093,color:#000
  classDef n fill:#d3d3d3,stroke:#808080,color:#000
  class U1,U2,U3,U4,U5,U6,U7 ui
  class N1,N2,N3,N4,N5,N6,N7,N8,N9,N10,N11,N12,N13,N14 n

Slices (vertical, each demo-able)

# Slice Parts/affordances Demo
V1 Global layer + merge + config view A1A4 · N1,N2,N3,N10 · U1,U7,S_global,S_merged Put config in ~/.omnigraph/, run omnigraph config view --resolved --show-origin from any dir → merged result with per-field origin; existing embedded commands work global-first with no project file
V2 Remote graphs + HTTP client + creds A5A7 · N5,N6,N8,N9 · S_cache Define a server: graph entry; omnigraph query --graph prod hits the remote server (curl-free); embedded --graph dev still local
V3 omnigraph login A8 · N11,U5 omnigraph login prod writes ~/.omnigraph/credentials (0600) + .gitignore; V2 remote query now works with no manual env
V4 Thin-init hardening + quickstart + templates A9 · N12,N13,U6 (needs serve MR-970) omnigraph quickstart --template person-knows scaffolds + seeds + serves; init --force purges (MR-975)
V5 Agent-mode A10 · N14,U4 (MR-981) OMNIGRAPH_AGENT_MODE=1 omnigraph query … → JSON + structured errors + typed exit codes; never-prompt

V1 is the foundation (global-first + merge + view). V2 closes the substantive client→server gap. V3 is credential ergonomics. V4/V5 ride sibling tickets (MR-970/973/981). MR-969 (stored queries) ships independently and is reached by N8's /queries/{name} once V2 lands.

Rollout

The slices above are the rollout order: V1 (global layer + merge) → V2 (remote graphs + HTTP client) → V3 (login) → V4 (quickstart/templates, on MR-970) → V5 (agent-mode, MR-981). V1V2 close the substantive gap (global-first config + curl-free server access); V3V5 are ergonomics that ride sibling tickets. Evaluate after V2 against early-adopter and agent-onboarding (MR-973 / MR-974) signal. The spikes (X1 HTTP-client, X2 merge engine, X3 resolver+provenance, X4 login) resolve before their owning slice.

Prior art

  • kubeconfig (clusters / users / contexts; KUBECONFIG; kubectl config view)
  • Helix CLI v2 (helix.toml local+enterprise instance blocks; ~/.helix/config; ~/.helix/credentials)
  • AWS CLI (~/.aws/config + ~/.aws/credentials split; named profiles; credential_process)
  • git (~/.gitconfig + .git/config; --show-origin)
  • Cargo (Cargo.toml manifest + ~/.cargo/config.toml)
  • Supabase / Prisma (one project manifest; connection via DATABASE_URL env)
  • 12-factor app (config that varies by deploy lives in the environment)