# RFC: Per-Operator Config — the Operator Slice of RFC-002 **Status:** Proposed **Date:** 2026-06-11 **Builds on:** [rfc-002-config-cli-architecture.md](rfc-002-config-cli-architecture.md) (Proposed; implementation parked — PRs #139/#162 closed over review findings), [rfc-005-server-cluster-boot.md](rfc-005-server-cluster-boot.md) (Landed), RFC-006 storage roots (#186/#190/#194, landed). The #139 review record is a normative input: every design rule in §D6 traces to a confirmed finding. **Paired with:** [rfc-008-deprecate-omnigraph-yaml.md](rfc-008-deprecate-omnigraph-yaml.md) — together they define the two-surface architecture this RFC's operator half belongs to. **Target release:** unversioned (staged; see Sequencing). ## Summary Give OmniGraph the operator half of the **two-surface config architecture** (RFC-008): **cluster config** (team-owned, in a repo — what the system *is*) and **operator config** (person-owned, in `$HOME` — who *I* am). This is Terraform's split: `~/.terraformrc` for the operator, the checkout for the declaration. OmniGraph today has neither half cleanly — `omnigraph.yaml` mixes both concerns (RFC-008 retires it), and there is no home-level config at all: identity and credentials get re-declared per working directory, in files that sit next to repo-committed config. This RFC introduces **`~/.omnigraph/config.yaml`** (the operator surface) and a **keyed credentials chain**, scoped deliberately small: 1. **Operator identity** — a default actor for every `--as` cascade. 2. **Credentials by server name** — no more inventing env-var names per server; secrets never inline, never in any repo-committed file. 3. **Named servers** — operator-owned endpoint definitions; nothing a checkout supplies can redefine them. It is explicitly a **subset of RFC-002**, sequenced to land. RFC-002 settled the right long-term decisions (one `~/.omnigraph/` dir, credentials keyed by server name, `OMNIGRAPH_CONFIG`/`OMNIGRAPH_HOME` env precedence) but its implementation arrived as one 4,800-line PR mixing a crate extraction with behavior changes, and died over ten confirmed findings. This RFC adopts RFC-002's settled decisions verbatim where they apply, defers everything else (`GraphLocator`, multi-homing, `omnigraph use`, the State layer), and encodes the #139 findings as design rules so the same failures cannot recur. ## Motivation Three concrete pains, all hit in real operation this cycle: - **Identity repetition.** The cluster actor cascade (#180) resolves `--as` from the per-operator `omnigraph.yaml` — which means every operator hand-maintains a copy in every working directory (the `~/exp/intel` setup needed exactly this). A repo-committed `omnigraph.yaml` cannot carry `as: act-andrew` without claiming every contributor is Andrew. - **Credential ergonomics.** `bearer_token_env` forces three coordinated steps per server (invent a var name, reference it in config, set it in the secret store). The peer group — AWS profiles, `gh hosts`, kubeconfig users — keys secrets by the server's *name*. - **Cluster-era working shape.** With clusters on object storage (RFC-006), the project directory is a *declaration checkout* — operators run `cluster apply --config ./checkout` from anywhere. The things that are about the *operator* (who am I, which servers do I know, how do I like output formatted) have no home that travels with them. ## Non-Goals - **`GraphLocator` / multi-homed graph resolution** (RFC-002 §1) — the biggest and riskiest part of config-v2; untouched here. - **`omnigraph use` / the State layer** (`~/.omnigraph/state/`) — deferred with it (finding #2 showed its precedence interacts badly with scaffolds; that problem belongs to the slice that introduces it). - **OS keychain integration** — the credentials *chain* (§D4) leaves a slot for it; this RFC ships env + file sources only. - **Config-file walk-up.** Terraform does not walk up from subdirectories and neither do we — `--config` (or running in the directory) stays the explicit, deterministic story for cluster checkouts. Rejected, not deferred: walk-up makes "which config am I using" a function of cwd depth, the class of surprise this RFC exists to remove. - **Retiring `omnigraph.yaml`** — that is RFC-008's job, with its own staging. This RFC builds the destination; during RFC-008's deprecation window the legacy file keeps loading exactly as today. - **Renaming or removing anything.** No flag renames, no key renames, no schema-version bumps (findings #1, #3, #10). ## Background (verified against main) - **Project-config lookup today** (`crates/omnigraph-server/src/config.rs:529-553`, shared by CLI and server): `--config `, else `./omnigraph.yaml` in cwd, else built-in defaults. Relative paths inside the file resolve against the file's own directory (`base_dir`). No env var, no home file, no walk-up. - **Side-effect on load** (`crates/omnigraph-cli/src/helpers.rs:102-108`): `load_cli_config` also loads `auth.env_file` into the process env — this is how `OMNIGRAPH_BEARER_TOKEN` reaches remote commands today. - **Actor resolution** (`helpers.rs:170`, #180): `--as` flag, else the project config's actor — currently the end of the chain. - **Existing credential mechanism**: `TargetConfig.bearer_token_env` names an env var; `auth.env_file` points at a git-ignored dotenv. Both keep working indefinitely (RFC-002 already committed to this; finding #3 showed what happens otherwise). - **`OMNIGRAPH_CONFIG`** exists today only as the *container entrypoint's* translation to the server's `--config`. The CLI does not read it. ## Design ### D1. Files and discovery ``` ~/.omnigraph/config.yaml # the operator surface (this RFC) ~/.omnigraph/credentials # keyed secrets, 0600, git-irrelevant (§D4) ./cluster.yaml + checkout # the team surface (unchanged; RFC-004..006) ./omnigraph.yaml # legacy, loads as today through RFC-008's window ``` Discovery order for the operator file: `$OMNIGRAPH_HOME/config.yaml` if `OMNIGRAPH_HOME` is set, else `~/.omnigraph/config.yaml`. Absent file = empty layer, never an error. `~` is expanded wherever paths are read (finding #9 — today a literal `./~/...` directory gets created). `OMNIGRAPH_CONFIG=` becomes a first-class override for the `--config` argument in the CLI (highest precedence below the flag itself), aligning the CLI with the container contract that already uses this variable for the server. One name, one meaning, both binaries — it points at whatever the command's `--config` would (a cluster checkout for cluster commands; the legacy file during RFC-008's window). Per RFC-002 §4 (adopted verbatim): `~/.omnigraph/` is the one canonical dir — cache/state subdirectories arrive with their own slices; XDG roots are not part of the mental model (`$XDG_CONFIG_HOME` may be honored as a fallback read location if set, but is never written to). ### D2. The operator schema (v1 of this layer) ```yaml # ~/.omnigraph/config.yaml — about the OPERATOR, never about the system operator: actor: act-andrew # default for every --as cascade servers: # operator-owned endpoint definitions intel-dev: url: http://127.0.0.1:8080 prod: url: https://graph.modernrelay.ai # No token here, ever. Resolution: §D4. aliases: # personal shorthand over CLUSTER-owned queries triage: # (the query is the shared contract; the alias, server: intel-dev # its defaults, and its name are mine — RFC-008) graph: spike query: weekly_triage defaults: output: table # read --format default ``` Unknown keys are a **warning, not an error** in this layer (an operator file written by a newer CLI must not brick an older one; contrast with `cluster.yaml`, where unknown keys are deliberately fatal because they change what a *plan* means). ### D3. Precedence and the merge rule The end-state cascade is short, because the team surface (cluster config) deliberately carries **no operator-resolvable keys** — no actor, no tokens, no output preferences. Identity can never come from a checkout: ``` flag > env > operator config > built-in ``` During RFC-008's deprecation window, a legacy `omnigraph.yaml` slots in between env and operator config (its keys win over operator defaults, preserving today's behavior for unmigrated setups) — with the §D5 credential inversion: **credentials and endpoint definitions never come from a legacy/checkout file when an operator-layer definition exists for the same server name.** Merging is **key-level**: scalars override per key; maps (`servers:`, `aliases:`) merge per *entry*, and entries merge per *field* (finding #13 — `merge_map` replacing whole entries silently dropped sibling fields). Concretely for the two flows this slice touches: - **Actor**: `--as` > legacy `cli.actor` (window only, unchanged semantics) > `operator.actor` > none (commands that need an actor keep failing loudly). - **Output format**: `--format` > legacy default (window only) > `defaults.output` > `table`. ### D4. Credentials: keyed by server name, by-reference always Adopted from RFC-002 §5 unchanged, minus the keychain (a later source in the same chain). For a server named ``, the resolution chain is: 1. `OMNIGRAPH_TOKEN_` (uppercased, `-`→`_`) — explicit env, wins. 2. `[]` section in `~/.omnigraph/credentials` (INI-style, `0600`; the loader refuses a group/world-readable file). 3. The legacy pair — `bearer_token_env` + `auth.env_file` — exactly as today, for configs that already use it. No inline secrets in any YAML file, anywhere (the existing invariant 12 posture extended to disk). A future `omnigraph login ` writes/rotates one section of the credentials file via temp + rename (finding #7: every operator-layer write is atomic), creating it `0600`. ### D5. The trust boundary (the security findings, made structural) Findings #4, #5, #6 share one root cause: a file that arrives with a *repo checkout* could redirect where requests go and what secrets they carry. In the end state this is closed by construction — cluster config has no server/credential keys at all, and the operator surface never comes from a checkout. The rules below therefore govern the **RFC-008 window** (while legacy `omnigraph.yaml` still loads) and stand as the permanent law for any future checkout-supplied surface: 1. **A checkout-supplied file may *reference* a server by name; it may not *redefine* an operator-defined server.** If a legacy `./omnigraph.yaml` declares `servers.prod.url` and `~/.omnigraph/config.yaml` also defines `prod`, the operator definition wins and the CLI warns about the shadowed entry. A legacy-only server name keeps working (compat), but the keyed-credentials chain (§D4 steps 1–2) never resolves for it — only the legacy explicit `bearer_token_env` does. Net effect: a malicious checkout cannot point `prod` at an attacker host and harvest the operator's `prod` token. 2. **`auth.env_file` keeps auto-loading (compat), but checkout-layer env-files cannot *override* variables already set in the process or by the operator layer** — first-set-wins, operator-before-checkout (the existing real-env-wins rule, extended one layer down). Finding #5's injection becomes a no-op against any var the operator actually uses. 3. **A token is sent only to the server it is keyed to.** The legacy single `OMNIGRAPH_BEARER_TOKEN` fallback keeps working for the single-server shape, but when a request resolves through a *named* server, only that name's chain applies (finding #6's broadcast). ### D6. Compatibility rules (the #139 findings as law) | Rule | Source finding | |---|---| | No flag or key is removed or renamed; new behavior is additive | #1, #3 | | A config that loads today loads identically after this RFC; new validation applies only to new keys | #3, #8, #10 | | Every operator-layer file write is temp + rename, never in-place | #7 | | `~` expands wherever a path is read | #9 | | Map merges are per-entry, per-field — never wholesale replace | #13 | | One resolution path per concern — the actor chain and the token chain each have exactly one implementation, called by CLI and server alike | #11, #12 | | Each slice lands as its own PR with the workspace gate green; no slice mixes mechanical moves with behavior changes | #139's disposition | ## Sequencing Three PRs, each independently useful, each landable without the next: 1. **PR 1 — the operator file + identity.** Loader for `~/.omnigraph/config.yaml` (+ `OMNIGRAPH_HOME`, `~`-expansion, warn-only unknown keys), `operator.actor` joining the `--as` cascade, `defaults.output` joining the format cascade, `OMNIGRAPH_CONFIG` env for the CLI's `--config`. Docs: `cli-reference.md` gains the two-surface table. 2. **PR 2 — keyed credentials.** `servers:` in the operator layer, the §D4 chain (env + credentials file), the §D5 trust rules, and `omnigraph login ` (atomic write, `0600`). Legacy mechanisms untouched and tested-as-untouched. 3. **PR 3 — operator targeting.** `--server ` on remote-capable commands and `aliases:` in the operator layer (server + graph + query + default params), resolving through operator-defined servers. This is the *bridge* toward RFC-002's locator — multi-server addressing in a safe, minimal form without the `GraphLocator` rework — and the replacement RFC-008 needs before legacy aliases can migrate. RFC-008's deprecation stages begin only after PRs 1–2 are on main: the operator surface must exist before `config migrate` has somewhere to move keys to. ## Open questions - Should `operator.actor` apply to *local* (embedded-engine) writes too, or only where a server/cluster boundary exists? Leaning yes-everywhere: one identity chain (§D6 one-path rule), and local audit rows get better. - Does `defaults.output` belong in slice 1, or is identity-only an even cleaner first PR? (Cost of including it is one cascade hop; value is immediate.) - `omnigraph config view --resolved` (RFC-002 had it; #139 shipped a version) — slice 1 or slice 2? It materially helps debugging precedence, which argues early. ## Relationship to RFC-002 and RFC-008 **RFC-008 is the other half of this design**: this RFC builds the operator surface; RFC-008 retires the mixed-ownership file ([rfc-008-deprecate-omnigraph-yaml.md](rfc-008-deprecate-omnigraph-yaml.md)), leaving exactly two config surfaces — cluster (team) and operator (person). Every mention of `omnigraph.yaml` in this RFC describes the deprecation window only. Sequencing couples them: RFC-007 PRs 1–2 land first, then RFC-008's migration stages run against them. RFC-002 remains the umbrella architecture. This RFC implements its §2 (layered config, global-first), §4 (file naming / one dir), and §5 (credentials) in their minimal load-bearing form, and explicitly defers §1 (`GraphLocator`/targets), §3 (roles), and the State layer. If/when the locator work resumes, it builds on these layers rather than re-landing them. RFC-002's header should gain a pointer here once this merges.