The operator config gains servers: (name -> url; never a token). A remote command whose URL prefix-matches an operator server resolves its bearer token through the keyed chain first — OMNIGRAPH_TOKEN_<NAME> env, then the [<name>] section of ~/.omnigraph/credentials (created 0600 via temp+rename, #139 finding 7; group/world-readable files refused loudly) — falling through to the legacy chain unchanged. URL keying makes §D5 rule 3 structural: a token is only ever sent to the server it is keyed to. Longest-prefix matching with a path-boundary check (http://h:8080 never matches http://h:8080-evil). Inserting the keyed hop above the legacy chain is safe by construction — no existing setup can have servers: defined. omnigraph login <name> stores/rotates one section (token from --token or one stdin line — the pipe flow keeps secrets out of shell history); omnigraph logout removes it, idempotently; logging in before declaring the server warns instead of failing (the gh model). Coverage: URL-match/no-substring-trap, credentials round-trip preserving sibling sections, 0600 write + over-permissive refusal, env-name mapping; the legacy resolve test is now hermetic against a real ~/.omnigraph and asserts byte-identical legacy behavior with no servers defined; one spawned-binary e2e walks the whole lifecycle against an authed server: refusal -> wrong-token login (stdin) -> rotate (--token) -> authorized read -> env-beats-file -> non-matching-URL negative -> logout revokes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
15 KiB
RFC: Per-Operator Config — the Operator Slice of RFC-002
Status: Proposed Date: 2026-06-11 Builds on: rfc-002-config-cli-architecture.md (Proposed; implementation parked — PRs #139/#162 closed over review findings), rfc-005-server-cluster-boot.md (Landed), RFC-006 storage roots (#186/#190/#194, landed). The #139 review record is a normative input: every design rule in §D6 traces to a confirmed finding. Paired with: rfc-008-deprecate-omnigraph-yaml.md — together they define the two-surface architecture this RFC's operator half belongs to. Target release: unversioned (staged; see Sequencing).
Summary
Give OmniGraph the operator half of the two-surface config architecture
(RFC-008): cluster config (team-owned, in a repo — what the system is)
and operator config (person-owned, in $HOME — who I am). This is
Terraform's split: ~/.terraformrc for the operator, the checkout for the
declaration. OmniGraph today has neither half cleanly — omnigraph.yaml
mixes both concerns (RFC-008 retires it), and there is no home-level config
at all: identity and credentials get re-declared per working directory, in
files that sit next to repo-committed config.
This RFC introduces ~/.omnigraph/config.yaml (the operator surface)
and a keyed credentials chain, scoped deliberately small:
- Operator identity — a default actor for every
--ascascade. - Credentials by server name — no more inventing env-var names per server; secrets never inline, never in any repo-committed file.
- Named servers — operator-owned endpoint definitions; nothing a checkout supplies can redefine them.
It is explicitly a subset of RFC-002, sequenced to land. RFC-002 settled
the right long-term decisions (one ~/.omnigraph/ dir, credentials keyed by
server name, OMNIGRAPH_CONFIG/OMNIGRAPH_HOME env precedence) but its
implementation arrived as one 4,800-line PR mixing a crate extraction with
behavior changes, and died over ten confirmed findings. This RFC adopts
RFC-002's settled decisions verbatim where they apply, defers everything
else (GraphLocator, multi-homing, omnigraph use, the State layer), and
encodes the #139 findings as design rules so the same failures cannot recur.
Motivation
Three concrete pains, all hit in real operation this cycle:
- Identity repetition. The cluster actor cascade (#180) resolves
--asfrom the per-operatoromnigraph.yaml— which means every operator hand-maintains a copy in every working directory (the~/exp/intelsetup needed exactly this). A repo-committedomnigraph.yamlcannot carryas: act-andrewwithout claiming every contributor is Andrew. - Credential ergonomics.
bearer_token_envforces three coordinated steps per server (invent a var name, reference it in config, set it in the secret store). The peer group — AWS profiles,gh hosts, kubeconfig users — keys secrets by the server's name. - Cluster-era working shape. With clusters on object storage (RFC-006),
the project directory is a declaration checkout — operators run
cluster apply --config ./checkoutfrom anywhere. The things that are about the operator (who am I, which servers do I know, how do I like output formatted) have no home that travels with them.
Non-Goals
GraphLocator/ multi-homed graph resolution (RFC-002 §1) — the biggest and riskiest part of config-v2; untouched here.omnigraph use/ the State layer (~/.omnigraph/state/) — deferred with it (finding #2 showed its precedence interacts badly with scaffolds; that problem belongs to the slice that introduces it).- OS keychain integration — the credentials chain (§D4) leaves a slot for it; this RFC ships env + file sources only.
- Config-file walk-up. Terraform does not walk up from subdirectories
and neither do we —
--config(or running in the directory) stays the explicit, deterministic story for cluster checkouts. Rejected, not deferred: walk-up makes "which config am I using" a function of cwd depth, the class of surprise this RFC exists to remove. - Retiring
omnigraph.yaml— that is RFC-008's job, with its own staging. This RFC builds the destination; during RFC-008's deprecation window the legacy file keeps loading exactly as today. - Renaming or removing anything. No flag renames, no key renames, no schema-version bumps (findings #1, #3, #10).
Background (verified against main)
- Project-config lookup today (
crates/omnigraph-server/src/config.rs:529-553, shared by CLI and server):--config <path>, else./omnigraph.yamlin cwd, else built-in defaults. Relative paths inside the file resolve against the file's own directory (base_dir). No env var, no home file, no walk-up. - Side-effect on load (
crates/omnigraph-cli/src/helpers.rs:102-108):load_cli_configalso loadsauth.env_fileinto the process env — this is howOMNIGRAPH_BEARER_TOKENreaches remote commands today. - Actor resolution (
helpers.rs:170, #180):--asflag, else the project config's actor — currently the end of the chain. - Existing credential mechanism:
TargetConfig.bearer_token_envnames an env var;auth.env_filepoints at a git-ignored dotenv. Both keep working indefinitely (RFC-002 already committed to this; finding #3 showed what happens otherwise). OMNIGRAPH_CONFIGexists today only as the container entrypoint's translation to the server's--config. The CLI does not read it.
Design
D1. Files and discovery
~/.omnigraph/config.yaml # the operator surface (this RFC)
~/.omnigraph/credentials # keyed secrets, 0600, git-irrelevant (§D4)
./cluster.yaml + checkout # the team surface (unchanged; RFC-004..006)
./omnigraph.yaml # legacy, loads as today through RFC-008's window
Discovery order for the operator file: $OMNIGRAPH_HOME/config.yaml if
OMNIGRAPH_HOME is set, else ~/.omnigraph/config.yaml. Absent file =
empty layer, never an error. ~ is expanded wherever paths are read
(finding #9 — today a literal ./~/... directory gets created).
OMNIGRAPH_CONFIG=<path> becomes a first-class override for the --config
argument in the CLI (highest precedence below the flag itself), aligning the
CLI with the container contract that already uses this variable for the
server. One name, one meaning, both binaries — it points at whatever the
command's --config would (a cluster checkout for cluster commands; the
legacy file during RFC-008's window).
Per RFC-002 §4 (adopted verbatim): ~/.omnigraph/ is the one canonical
dir — cache/state subdirectories arrive with their own slices; XDG roots are
not part of the mental model ($XDG_CONFIG_HOME may be honored as a
fallback read location if set, but is never written to).
D2. The operator schema (v1 of this layer)
# ~/.omnigraph/config.yaml — about the OPERATOR, never about the system
operator:
actor: act-andrew # default for every --as cascade
servers: # operator-owned endpoint definitions
intel-dev:
url: http://127.0.0.1:8080
prod:
url: https://graph.modernrelay.ai
# No token here, ever. Resolution: §D4.
aliases: # personal shorthand over CLUSTER-owned queries
triage: # (the query is the shared contract; the alias,
server: intel-dev # its defaults, and its name are mine — RFC-008)
graph: spike
query: weekly_triage
defaults:
output: table # read --format default
Unknown keys are a warning, not an error in this layer (an operator file
written by a newer CLI must not brick an older one; contrast with
cluster.yaml, where unknown keys are deliberately fatal because they
change what a plan means).
D3. Precedence and the merge rule
The end-state cascade is short, because the team surface (cluster config) deliberately carries no operator-resolvable keys — no actor, no tokens, no output preferences. Identity can never come from a checkout:
flag > env > operator config > built-in
During RFC-008's deprecation window, a legacy omnigraph.yaml slots in
between env and operator config (its keys win over operator defaults,
preserving today's behavior for unmigrated setups) — with the §D5
credential inversion: credentials and endpoint definitions never come
from a legacy/checkout file when an operator-layer definition exists for
the same server name.
Merging is key-level: scalars override per key; maps (servers:,
aliases:) merge per entry, and entries merge per field (finding #13 —
merge_map replacing whole entries silently dropped sibling fields).
Concretely for the two flows this slice touches:
- Actor:
--as> legacycli.actor(window only, unchanged semantics)operator.actor> none (commands that need an actor keep failing loudly). - Output format:
--format> legacy default (window only) >defaults.output>table.
D4. Credentials: keyed by server name, by-reference always
Adopted from RFC-002 §5 unchanged, minus the keychain (a later source in
the same chain). For a server named <name>, the resolution chain is:
OMNIGRAPH_TOKEN_<NAME>(uppercased,-→_) — explicit env, wins.[<name>]section in~/.omnigraph/credentials(INI-style,0600; the loader refuses a group/world-readable file).- The legacy pair —
bearer_token_env+auth.env_file— exactly as today, for configs that already use it.
No inline secrets in any YAML file, anywhere (the existing invariant 12
posture extended to disk). A future omnigraph login <name>
writes/rotates one section of the credentials file via temp + rename
(finding #7: every operator-layer write is atomic), creating it 0600.
D5. The trust boundary (the security findings, made structural)
Findings #4, #5, #6 share one root cause: a file that arrives with a
repo checkout could redirect where requests go and what secrets they
carry. In the end state this is closed by construction — cluster config has
no server/credential keys at all, and the operator surface never comes from
a checkout. The rules below therefore govern the RFC-008 window (while
legacy omnigraph.yaml still loads) and stand as the permanent law for any
future checkout-supplied surface:
- A checkout-supplied file may reference a server by name; it may not
redefine an operator-defined server. If a legacy
./omnigraph.yamldeclaresservers.prod.urland~/.omnigraph/config.yamlalso definesprod, the operator definition wins and the CLI warns about the shadowed entry. A legacy-only server name keeps working (compat), but the keyed-credentials chain (§D4 steps 1–2) never resolves for it — only the legacy explicitbearer_token_envdoes. Net effect: a malicious checkout cannot pointprodat an attacker host and harvest the operator'sprodtoken. auth.env_filekeeps auto-loading (compat), but checkout-layer env-files cannot override variables already set in the process or by the operator layer — first-set-wins, operator-before-checkout (the existing real-env-wins rule, extended one layer down). Finding #5's injection becomes a no-op against any var the operator actually uses.- A token is sent only to the server it is keyed to. The legacy
single
OMNIGRAPH_BEARER_TOKENfallback keeps working for the single-server shape, but when a request resolves through a named server, only that name's chain applies (finding #6's broadcast).
D6. Compatibility rules (the #139 findings as law)
| Rule | Source finding |
|---|---|
| No flag or key is removed or renamed; new behavior is additive | #1, #3 |
| A config that loads today loads identically after this RFC; new validation applies only to new keys | #3, #8, #10 |
| Every operator-layer file write is temp + rename, never in-place | #7 |
~ expands wherever a path is read |
#9 |
| Map merges are per-entry, per-field — never wholesale replace | #13 |
| One resolution path per concern — the actor chain and the token chain each have exactly one implementation, called by CLI and server alike | #11, #12 |
| Each slice lands as its own PR with the workspace gate green; no slice mixes mechanical moves with behavior changes | #139's disposition |
Sequencing
Three PRs, each independently useful, each landable without the next:
- PR 1 — the operator file + identity (landed: #196). Loader for
~/.omnigraph/config.yaml(+OMNIGRAPH_HOME,~-expansion, warn-only unknown keys),operator.actorjoining the--ascascade,defaults.outputjoining the format cascade,OMNIGRAPH_CONFIGenv for the CLI's--config. Docs:cli-reference.mdgains the two-surface table. - PR 2 — keyed credentials (landed).
servers:in the operator layer, the §D4 chain (env + credentials file), the §D5 trust rules, andomnigraph login <name>(atomic write,0600). Legacy mechanisms untouched and tested-as-untouched. - PR 3 — operator targeting.
--server <name>on remote-capable commands andaliases:in the operator layer (server + graph + query + default params), resolving through operator-defined servers. This is the bridge toward RFC-002's locator — multi-server addressing in a safe, minimal form without theGraphLocatorrework — and the replacement RFC-008 needs before legacy aliases can migrate.
RFC-008's deprecation stages begin only after PRs 1–2 are on main: the
operator surface must exist before config migrate has somewhere to move
keys to.
Open questions
- Should
operator.actorapply to local (embedded-engine) writes too, or only where a server/cluster boundary exists? Leaning yes-everywhere: one identity chain (§D6 one-path rule), and local audit rows get better. - Does
defaults.outputbelong in slice 1, or is identity-only an even cleaner first PR? (Cost of including it is one cascade hop; value is immediate.) omnigraph config view --resolved(RFC-002 had it; #139 shipped a version) — slice 1 or slice 2? It materially helps debugging precedence, which argues early.
Relationship to RFC-002 and RFC-008
RFC-008 is the other half of this design: this RFC builds the operator
surface; RFC-008 retires the mixed-ownership file
(rfc-008-deprecate-omnigraph-yaml.md),
leaving exactly two config surfaces — cluster (team) and operator (person).
Every mention of omnigraph.yaml in this RFC describes the deprecation
window only. Sequencing couples them: RFC-007 PRs 1–2 land first, then
RFC-008's migration stages run against them.
RFC-002 remains the umbrella architecture. This RFC implements its §2
(layered config, global-first), §4 (file naming / one dir), and §5
(credentials) in their minimal load-bearing form, and explicitly defers §1
(GraphLocator/targets), §3 (roles), and the State layer. If/when the
locator work resumes, it builds on these layers rather than re-landing
them. RFC-002's header should gain a pointer here once this merges.