Terraform-style operator/project split: ~/.omnigraph/config.yaml for identity (operator.actor in the --as cascade), credentials keyed by server name (env -> 0600 credentials file; no inline secrets), and operator-owned named servers that project configs reference but cannot redefine. Explicitly a staged subset of RFC-002: adopts its settled decisions (one dir, keyed credentials, env precedence), defers GraphLocator/use/state-layer, and encodes the ten confirmed PR #139 findings as design rules (compat shims, key-level merges, atomic writes, the project-layer trust boundary). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
13 KiB
RFC: Per-Operator Config — the Operator Slice of RFC-002
Status: Proposed Date: 2026-06-11 Builds on: rfc-002-config-cli-architecture.md (Proposed; implementation parked — PRs #139/#162 closed over review findings), rfc-005-server-cluster-boot.md (Landed), RFC-006 storage roots (#186/#190/#194, landed). The #139 review record is a normative input: every design rule in §D6 traces to a confirmed finding. Target release: unversioned (staged; see Sequencing).
Summary
Give OmniGraph the operator half of the Terraform config split. Terraform
separates ~/.terraformrc (who I am, my credentials, my CLI behavior) from
the working directory's *.tf (what the project declares). OmniGraph today
has only the project half: ./omnigraph.yaml in the current working
directory (or --config <path>), and nothing else — no home-level config,
no walk-up, no env override for the CLI. Operator identity and credentials
must be re-declared in every directory an operator works from, and — worse —
they end up in files that live next to repo-committed project config.
This RFC introduces ~/.omnigraph/config.yaml (the operator layer) and
a keyed credentials chain, scoped deliberately small:
- Operator identity — a default actor for every
--ascascade. - Credentials by server name — no more inventing env-var names per server; secrets never inline, never in the project layer.
- Named servers — operator-owned endpoint definitions that project configs can reference but not redefine.
It is explicitly a subset of RFC-002, sequenced to land. RFC-002 settled
the right long-term decisions (one ~/.omnigraph/ dir, credentials keyed by
server name, OMNIGRAPH_CONFIG/OMNIGRAPH_HOME env precedence) but its
implementation arrived as one 4,800-line PR mixing a crate extraction with
behavior changes, and died over ten confirmed findings. This RFC adopts
RFC-002's settled decisions verbatim where they apply, defers everything
else (GraphLocator, multi-homing, omnigraph use, the State layer), and
encodes the #139 findings as design rules so the same failures cannot recur.
Motivation
Three concrete pains, all hit in real operation this cycle:
- Identity repetition. The cluster actor cascade (#180) resolves
--asfrom the per-operatoromnigraph.yaml— which means every operator hand-maintains a copy in every working directory (the~/exp/intelsetup needed exactly this). A repo-committedomnigraph.yamlcannot carryas: act-andrewwithout claiming every contributor is Andrew. - Credential ergonomics.
bearer_token_envforces three coordinated steps per server (invent a var name, reference it in config, set it in the secret store). The peer group — AWS profiles,gh hosts, kubeconfig users — keys secrets by the server's name. - Cluster-era working shape. With clusters on object storage (RFC-006),
the project directory is a declaration checkout — operators run
cluster apply --config ./checkoutfrom anywhere. The things that are about the operator (who am I, which servers do I know, how do I like output formatted) have no home that travels with them.
Non-Goals
GraphLocator/ multi-homed graph resolution (RFC-002 §1) — the biggest and riskiest part of config-v2; untouched here.omnigraph use/ the State layer (~/.omnigraph/state/) — deferred with it (finding #2 showed its precedence interacts badly with scaffolds; that problem belongs to the slice that introduces it).- OS keychain integration — the credentials chain (§D4) leaves a slot for it; this RFC ships env + file sources only.
- Project-file walk-up. Terraform does not walk up from subdirectories
and neither do we —
--config(or running in the directory) stays the explicit, deterministic story. Rejected, not deferred: walk-up makes "which config am I using" a function of cwd depth, the class of surprise this RFC exists to remove. - Renaming or removing anything. No flag renames, no key renames, no schema-version bumps (findings #1, #3, #10).
Background (verified against main)
- Project-config lookup today (
crates/omnigraph-server/src/config.rs:529-553, shared by CLI and server):--config <path>, else./omnigraph.yamlin cwd, else built-in defaults. Relative paths inside the file resolve against the file's own directory (base_dir). No env var, no home file, no walk-up. - Side-effect on load (
crates/omnigraph-cli/src/helpers.rs:102-108):load_cli_configalso loadsauth.env_fileinto the process env — this is howOMNIGRAPH_BEARER_TOKENreaches remote commands today. - Actor resolution (
helpers.rs:170, #180):--asflag, else the project config's actor — currently the end of the chain. - Existing credential mechanism:
TargetConfig.bearer_token_envnames an env var;auth.env_filepoints at a git-ignored dotenv. Both keep working indefinitely (RFC-002 already committed to this; finding #3 showed what happens otherwise). OMNIGRAPH_CONFIGexists today only as the container entrypoint's translation to the server's--config. The CLI does not read it.
Design
D1. Files and discovery
~/.omnigraph/config.yaml # the operator layer (this RFC)
~/.omnigraph/credentials # keyed secrets, 0600, git-irrelevant (§D4)
./omnigraph.yaml # the project layer (unchanged)
Discovery order for the operator file: $OMNIGRAPH_HOME/config.yaml if
OMNIGRAPH_HOME is set, else ~/.omnigraph/config.yaml. Absent file =
empty layer, never an error. ~ is expanded wherever paths are read
(finding #9 — today a literal ./~/... directory gets created).
OMNIGRAPH_CONFIG=<path> becomes a first-class override for the project
file in the CLI (highest precedence below the --config flag), aligning the
CLI with the container contract that already uses this variable for the
server. One name, one meaning, both binaries.
Per RFC-002 §4 (adopted verbatim): ~/.omnigraph/ is the one canonical
dir — cache/state subdirectories arrive with their own slices; XDG roots are
not part of the mental model ($XDG_CONFIG_HOME may be honored as a
fallback read location if set, but is never written to).
D2. The operator schema (v1 of this layer)
# ~/.omnigraph/config.yaml — about the OPERATOR, never about a project
operator:
actor: act-andrew # default for every --as cascade
servers: # operator-owned endpoint definitions
intel-dev:
url: http://127.0.0.1:8080
prod:
url: https://graph.modernrelay.ai
# No token here, ever. Resolution: §D4.
defaults:
output: table # read --format default
Unknown keys are a warning, not an error in this layer (an operator file
written by a newer CLI must not brick an older one; contrast with
cluster.yaml, where unknown keys are deliberately fatal because they
change what a plan means).
D3. Precedence and the merge rule
flag > env > project omnigraph.yaml > operator config > built-in
with exactly one principled inversion (§D5): credentials and endpoint definitions never come from the project layer when an operator-layer definition exists for the same server name.
Merging is key-level: scalars override per key; maps (servers:,
graphs:) merge per entry, and entries merge per field (finding #13 —
merge_map replacing whole entries silently dropped sibling fields). A
project file referencing server: prod composes with the operator's
servers.prod.url; it does not need to re-declare it and cannot
accidentally clobber half of it.
Concretely for the two flows this slice touches:
- Actor:
--as> projectas:/actor key (unchanged semantics) >operator.actor> none (commands that need an actor keep failing loudly). - Output format:
--format> project default >defaults.output>table.
D4. Credentials: keyed by server name, by-reference always
Adopted from RFC-002 §5 unchanged, minus the keychain (a later source in
the same chain). For a server named <name>, the resolution chain is:
OMNIGRAPH_TOKEN_<NAME>(uppercased,-→_) — explicit env, wins.[<name>]section in~/.omnigraph/credentials(INI-style,0600; the loader refuses a group/world-readable file).- The legacy pair —
bearer_token_env+auth.env_file— exactly as today, for configs that already use it.
No inline secrets in any YAML file, operator or project (the existing
invariant 12 posture extended to disk). A future omnigraph login <name>
writes/rotates one section of the credentials file via temp + rename
(finding #7: every operator-layer write is atomic), creating it 0600.
D5. The trust boundary (the security findings, made structural)
Findings #4, #5, #6 share one root cause: the project layer — a file that arrives with a repo checkout — could redirect where requests go and what secrets they carry. The rules:
- A project file may reference a server by name; it may not redefine
an operator-defined server. If
./omnigraph.yamldeclaresservers.prod.urland~/.omnigraph/config.yamlalso definesprod, the operator definition wins and the CLI warns about the shadowed project entry. A project-only server name keeps working (legacy compat), but the keyed-credentials chain (§D4 steps 1–2) never resolves for it — only the legacy explicitbearer_token_envdoes. Net effect: a malicious checkout cannot pointprodat an attacker host and harvest the operator'sprodtoken. auth.env_filekeeps auto-loading (compat), but project-layer env-files cannot override variables already set in the process or by the operator layer — first-set-wins, operator-before-project (the existing real-env-wins rule, extended one layer down). Finding #5's injection becomes a no-op against any var the operator actually uses.- A token is sent only to the server it is keyed to. The legacy
single
OMNIGRAPH_BEARER_TOKENfallback keeps working for the single-server shape, but when a request resolves through a named server, only that name's chain applies (finding #6's broadcast).
D6. Compatibility rules (the #139 findings as law)
| Rule | Source finding |
|---|---|
| No flag or key is removed or renamed; new behavior is additive | #1, #3 |
| A config that loads today loads identically after this RFC; new validation applies only to new keys | #3, #8, #10 |
| Every operator-layer file write is temp + rename, never in-place | #7 |
~ expands wherever a path is read |
#9 |
| Map merges are per-entry, per-field — never wholesale replace | #13 |
| One resolution path per concern — the actor chain and the token chain each have exactly one implementation, called by CLI and server alike | #11, #12 |
| Each slice lands as its own PR with the workspace gate green; no slice mixes mechanical moves with behavior changes | #139's disposition |
Sequencing
Three PRs, each independently useful, each landable without the next:
- PR 1 — the operator file + identity. Loader for
~/.omnigraph/config.yaml(+OMNIGRAPH_HOME,~-expansion, warn-only unknown keys),operator.actorjoining the--ascascade,defaults.outputjoining the format cascade,OMNIGRAPH_CONFIGenv for the CLI's project file. Docs:cli-reference.mdgains the layer table. - PR 2 — keyed credentials.
servers:in the operator layer, the §D4 chain (env + credentials file), the §D5 trust rules, andomnigraph login <name>(atomic write,0600). Legacy mechanisms untouched and tested-as-untouched. - PR 3 — project references.
server: <name>in project graph/target entries resolving through operator-defined servers, with the shadowing warning. This is the bridge toward RFC-002's locator — it gives multi-server addressing a safe, minimal form without theGraphLocatorrework.
Open questions
- Should
operator.actorapply to local (embedded-engine) writes too, or only where a server/cluster boundary exists? Leaning yes-everywhere: one identity chain (§D6 one-path rule), and local audit rows get better. - Does
defaults.outputbelong in slice 1, or is identity-only an even cleaner first PR? (Cost of including it is one cascade hop; value is immediate.) omnigraph config view --resolved(RFC-002 had it; #139 shipped a version) — slice 1 or slice 2? It materially helps debugging precedence, which argues early.
Relationship to RFC-002
RFC-002 remains the umbrella architecture. This RFC implements its §2
(layered config, global-first), §4 (file naming / one dir), and §5
(credentials) in their minimal load-bearing form, and explicitly defers §1
(GraphLocator/targets), §3 (roles), and the State layer. If/when the
locator work resumes, it builds on these layers rather than re-landing
them. RFC-002's header should gain a pointer here once this merges.