omnigraph/docs/dev/rfc-002-config-cli-architecture.md

# RFC: Config & CLI Architecture — Layered Config, Client Targeting, Typed Locators

**Status:** Proposed (revised 2026-06-02)
**Supersedes:** the original additive-only draft (2026-05-30). This revision **embraces breaking changes** to remove ambiguity and conflation rather than carrying every legacy shape forward. It is gated behind a config `version:` field and ships compat aliases for the highest-traffic legacy keys, but it does not pretend the end-state is purely additive. Incorporates an implementation-readiness review: endpoint-bound credentials, layer identity trust, route-unification specifics, restored `query.roots`, and right-sized auth scope.
**Target release:** v0.8.x (phased — see Rollout)

**Implementation status — V1a "locator core" landed.** Shipped: the typed
`GraphLocator` (`storage:` XOR `server:`+`graph_id:`), a `servers:` map, version-gated
config strictness (no `version:` = lenient + deprecation-warned; `version: 1` = unknown
keys rejected at any depth, via `serde_ignored`), `omnigraph-server` rejecting remote
graph entries (embedded-only), and the `--graph` flag. **Divergences from this proposal as
built:** `--target` was **removed outright (no deprecated alias)** — a clean rename, not the
alias proposed below; and `uri:` is honored-but-deprecation-warned (not auto-rewritten to
`storage.uri`). **Crate-location corrections** (the "Reconciliation"/"Implementation"
sections below predate V0): the config schema now lives in the extracted `omnigraph-config`
crate, and `QueryRegistry` was extracted to `omnigraph-queries` (not "kept in
`omnigraph-server`"). Deferred: layered global-first config + merge/provenance + `config
view`, the `cli:`→`defaults:` / `server:`→`serve:` renames (V1-remainder), route unification
+ remote client (V2), and the auth model (V3).

## Summary

OmniGraph today reads one config file, `omnigraph.yaml`, from both the CLI (operating the embedded engine) and `omnigraph-server` (hosting graphs). The CLI **can** already reach a *single-graph* server — point a graph entry's URI at the endpoint and set `bearer_token_env` — but it **cannot address a specific graph on a multi-graph server**, has no named-server credential model, and does not work without a project file in the current directory. Those are the real gaps.

This RFC defines the config and CLI architecture that closes them, derived from first principles — *working backward from what OmniGraph uniquely enables* rather than copying kubeconfig. The result:

1. A **typed locator** replacing the conflated `uri: String`. A graph entry is **embedded (`storage:`) XOR remote (`server:` + `graph_id:`)**; the *key* names the locus so neither a URI scheme nor a comment is load-bearing.
2. **Three-tier server addressing.** A `servers:` entry is self-sufficient — graph identity is server-owned, so you address a known `server/graph_id` directly with no per-graph entry (listing what exists is `graph_list`-gated, §9). Per-graph `graphs:` entries become *optional aliases* (for a short name, a branch pin, or multi-homing). Below that, env vars (`OMNIGRAPH_SERVER` + token) give a fileless floor.
3. **Global-first layered config.** The user-global `~/.omnigraph/config.yaml` is the primary, self-sufficient default; `./omnigraph.yaml` is an optional repo-scoped override + deployment manifest. One schema, both layers optional. The CLI works from any directory with no project file (the `kubectl`/`aws`/`gh` posture).
4. **A method-tagged auth model.** `auth:` is a tagged union over `bearer | oauth | mtls | none`; bearer/mtls reference a *secret source* (`env | file | command | keychain`). v1 ships `bearer`/`none`; `oauth`/`mtls` are reserved (the enum shape is fixed, so adding them is non-breaking — V6). Auth is **per-server**, not per-graph, and trusted-origin (§7): a lower-trust layer cannot supply credentials. Secrets are never inlined and never live in any `*.yaml` or in the project tree.
5. **A clean file layout split on the two real boundaries — secrecy and scope, never role.** Global `~/.omnigraph/config.yaml`; project `./omnigraph.yaml` (one artifact, both roles by section); credentials in the OS keychain → `~/.omnigraph/credentials` (INI, `0600`). No `credentials.yaml`.

The design optimizes jointly for **DX** (one command surface across embedded and remote; clone-and-go) and **AX** (agent experience: one flat resolved context, secrets out of the repo and endpoint-bound, branch-pinned reproducible reads, a GitOps'd capability surface).

## Reconciliation with the code

Verified **against the code**, not ticket status. Findings, with the corrections they force on the design:

- **Config lives in `crates/omnigraph-server/src/config.rs`**, and `omnigraph-cli` depends on the whole `omnigraph-server` crate to use it (`crates/omnigraph-cli/Cargo.toml:19`; the CLI imports `OmnigraphConfig`, `PolicyEngine`, `QueryRegistry`, `load_config` from `omnigraph_server`). The new layered-config stack should land in a **new shared `omnigraph-config` crate**, so the CLI stops pulling Axum/utoipa transitively just to parse YAML (see Implementation).
- **The config noun is `graphs:` (key) / `cli.graph` (default), but the shipped command-line flag is `--target`** (`main.rs:91,148,…`; field `target`, no `--graph` alias) — the code is itself split between "graph" config terminology and a "target" flag. This RFC unifies on **graph**: `--graph` becomes the canonical flag with `--target` kept as a deprecated alias (Migration).
- **`TargetConfig` models a graph as a single `uri: String`** with code branching on `is_remote_uri(uri)` (an `http(s)://` prefix check, `main.rs:686`). That string cannot express `{server, graph_id}`; today the only way to address a graph on a multi-graph server is to hand-write the prefix into the URI (`uri: https://host/graphs/prod`) and rely on the flat path append. §2 fixes this with the typed locator.
- **The CLI already speaks HTTP for many verbs** — `query`, `mutate`, `ingest`, `branch`, `commit`, `schema`, `snapshot`, `export`, `graphs` all have remote paths. But every URL is **flat** (`remote_url(&uri, "/branches")`, `…/commits`, `…/snapshot`, etc.) with **no `/graphs/{graph_id}/` prefix anywhere**, so the entire remote surface targets **single-graph-mode servers only** and 404s against a multi-graph server's nested routes. `query`/`mutate` additionally hit the **deprecated** `/read` (`main.rs:1991`) and `/change` (`main.rs:2068`), not the primary `/query`/`/mutate`. The HTTP client is therefore **extended**, not built from scratch.
- **Operations that bail on remote**: `load`, `lint`, `schema plan`, `optimize`, `cleanup` via `resolve_local_graph` → *"… is only supported against local graph URIs in this milestone"* (`main.rs:984`).
- **The CLI does not walk parent directories** — it reads `./omnigraph.yaml` in the cwd only (pinned by a `config.rs` test). Global-first is a deliberate posture flip.
- **What exists in the CLI** (verified): `init, query (read), mutate (change), load, ingest, branch, schema, lint, queries, snapshot, export, commit, policy, optimize, cleanup, graphs`. Note `queries` already shipped (the stored-query registry, PR #128). **Not built:** `login, use, config view, serve, quickstart`.
- **`scaffold_config_if_missing` exists** at `main.rs:1547` (invoked by `init`).
- **The default client bearer env is `OMNIGRAPH_BEARER_TOKEN`** (`main.rs:45`); the server uses `OMNIGRAPH_SERVER_BEARER_TOKEN[_JSON|_FILE|_AWS_SECRET]`. The implicit credential chain in §6 **reuses `OMNIGRAPH_BEARER_TOKEN`** rather than minting a new `OMNIGRAPH_TOKEN`.
- **The server already exposes the target surface**: `POST /query`, `POST /mutate`, `GET /queries`, `POST /queries/{name}`, `GET /graphs` (405 in single mode, list in multi), and the nested `/graphs/{graph_id}/…` cluster routes. `POST /graphs` and `DELETE /graphs/{id}` are intentionally **not** exposed. The one server-side change this RFC needs is **route unification** (§9).
- **`project.name` has no consumer** in the code; it is dropped. `server.graph` is purely the single-graph-mode selector (`lib.rs`); it is dropped in favor of structural mode (§9). `cli.actor` is the engine-layer policy actor default (`--as` > `cli.actor` > none, `main.rs:854`); it moves under `defaults:`.

## Motivation

Three problems, in priority order:

- **No multi-graph client targeting.** OmniGraph runs N graphs per server across M servers, but the CLI's remote path is flat-only and single-graph-only. There is no first-class way to say "graph `production` on server `prod-eu`," and the same graph is **multi-homed** — `s3://b/prod` may be `prod` on server A, `production` on server B, and opened directly by the CLI.
- **No global, no-project operation.** A solo developer or an agent should be able to define everything in `~` and run from any directory. Today the CLI is project-anchored.
- **Sub-optimal credentials for a multi-server world.** `bearer_token_env` is per-graph and forces the operator to invent and coordinate an env-var name per server. The peer group keys the secret **by the server's name** and supports interactive login, dynamic tokens, and OAuth. OmniGraph should match that.

## Non-Goals

- **A control plane / runtime config-mutation API.** Operators edit files and (for servers) restart.
- **Hot reload.** Restart-only for server-side config.
- **Embedding secrets in any config file.** Credentials are by-reference; secrets live in the OS keychain or a `0600` profile file, never a committable `*.yaml`, never in the project tree.
- **Renaming the project manifest by role.** Role lives in sections, not filenames (§5).
- **Dropping embedded mode.** Embedded-first is load-bearing for the file-layout decision.
- **Cross-graph / cross-server tool listing in MCP.** Clients loop over per-graph catalogs.
- **Managing cloud-storage credentials.** Embedded graphs authenticate to object storage via the standard cloud chain (`AWS_*`, instance roles); OmniGraph does not own those (§6).
- **Cloud-mode multi-tenancy.** A future multi-tenant Cloud tier (tenant resolved from the OAuth `org_id` claim, per-tenant Cedar bundles, dynamic graph lifecycle, `DELETE /graphs`) is out of scope and lands in the cloud RFC (MR-956 RFC 0003/0004). This RFC only **reserves the shapes** so that work is additive — `serve.auth.oauth` multi-issuer + `tenant_claim` (§6), `serve.policy` as a tagged source (§6), `(server, org)` credential keying (§7), and the `GraphKey { tenant_id, graph_id }` registry seam already shipped in MR-668 (§9). Tenant is **server-resolved from the token** (the MR-731 invariant, `identity.rs:180`) and never appears in the locator, URL path, or request body.

## Background

OmniGraph runs on Lance 6.x: typed nodes/edges in per-type Lance datasets, atomic multi-table commits via a `__manifest` table, branchable and time-travelable. The CLI operates the **embedded engine** directly against a storage URI. `omnigraph-server` (Axum) is a separate HTTP front-end over the same engine, with bearer auth + per-graph Cedar.

OmniGraph **already has a credentials-by-reference mechanism** this RFC builds on: `bearer_token_env` names the env var holding a graph's bearer token; `auth.env_file` points at a git-ignored dotenv that the CLI auto-loads (`load_env_file_into_process`, `main.rs:755`, real-env-wins); `resolve_remote_bearer_token` (`main.rs:870`) resolves a token via env then dotenv.

The six **irreducible enablers** that drive the design (E1–E6):

| # | Enabler | Consequence |
|---|---|---|
| E1 | A graph is a **self-contained storage URI**; the substrate is the source of truth — no server required to read/write. | A graph is addressable **directly (embedded)**, not only via a server. |
| E2 | A server hosts **many graphs**; **many servers** exist. | The remote address space is **`{server} × {graph_id}`**. |
| E3 | The same graph is **multi-homed** under different per-locus names; a server can **enumerate its own graphs** (`GET /graphs`, `graph_list`-gated). | **Name ≠ identity.** Addressing a graph by a *known* `server/graph_id` needs only read/invoke permission on that graph; *discovering* what exists is `graph_list`-gated. Clients need not pre-declare each graph. |
| E4 | **Branch / commit / snapshot** are first-class addressable sub-state. | An address is *graph @ branch/snapshot*, not just graph. |
| E5 | Enforcement is **two-layered**: engine-layer Cedar (`_as` writers, embedded) + HTTP-boundary bearer+Cedar (server only). | *How* you reach a graph determines *which* enforcement applies. |
| E6 | **Stored queries / MCP tools are a per-graph registry in the deployment config.** | The **agent tool surface is version-controlled in the repo.** |

There are also **two distinct credential domains**, conflated nowhere in this design:

- **Bearer / session credentials** (client → remote server). OmniGraph owns these: keychain / `credentials` / env / OAuth (§6).
- **Cloud-storage credentials** (embedded engine → object store). The ambient cloud chain owns these; OmniGraph only consumes them.

## Design

### 1. The address space and resolution

Every OmniGraph address is a tuple:

```
(locus, graph, sub-state, credential)
  locus      = embedded(storage URI)  XOR  remote(server endpoint)     # E1, E2
  graph      = a storage URI (embedded)  |  a graph_id on a server (remote)  # E3
  sub-state  = branch | snapshot                                       # E4
  credential = cloud-storage chain (embedded) | server auth (remote)   # E5
```

The config's job is **name → this tuple**. Two nouns express it:

- **`servers:`** — named remote endpoints (+ auth-by-reference). First-class addressable.
- **`graphs:`** — named graph locators (embedded or remote). For remote graphs these are *optional aliases*; a server alone is addressable without them.

**Resolution of `--graph X`** (the single rule, applied identically everywhere):

```
1. graphs.X exists?                  → that locator (Embedded or Remote)        # local alias wins
2. X is "srv/gid" and servers.srv?   → Remote { server: srv, graph_id: gid }    # qualified, no alias needed
3. defaults.server set?              → Remote { server: defaults.server, graph_id: X }
4. otherwise                         → error (unknown graph; no default server)
```

`/` is disallowed in a local alias name, so `srv/gid` is unambiguous (the `docker registry/image` pattern). Step 1 may resolve to either variant; steps 2–3 always resolve `Remote`. Snapshot/branch pins from the entry (or `defaults`) attach to the resolved locator and are overridable by `--branch` / `--snapshot`.

**With no `--graph`:** bare commands use `defaults.graph` (a graph alias). `defaults.server` is **not** a fallback graph — it only supplies the server for step 3 above when an explicit but otherwise-unknown id is passed. So `omnigraph query` → `defaults.graph`; `omnigraph query --graph production` (no alias `production`, no `/`) → `production` on `defaults.server`.

This yields **three addressing tiers**, all valid in either config layer:

| Tier | You write | You get | Ceremony |
|---|---|---|---|
| Env, no file | `OMNIGRAPH_SERVER=https://…` + token | reach any hosted graph by id | zero |
| `servers:` entry | a named endpoint (+ auth-by-ref) | reach **any** graph it hosts as `server/graph_id` | one entry per *server* |
| `graphs:` entry | a local alias → `{server, graph_id, branch, snapshot}` | short name, branch pin, multi-homing | one entry per *aliased graph* |

### 2. The typed locator (`storage:` vs `server:`)

The shipped model is one `uri: String` plus `is_remote_uri` sniffing at ~16 dispatch sites. That conflates two structurally different addresses: an **embedded** graph is a complete self-contained address (one storage URI = one graph), while a **remote** graph is a *server endpoint + a `graph_id`* (one server hosts N graphs). The *resolved* address is therefore a **typed locator**, not a string:

```rust
enum GraphLocator {
    Embedded { storage: Storage },                     // a complete graph on an object store
    Remote   { server: ServerId, graph_id: GraphId },  // which server + which graph (+ server auth)
}
```

A `graphs:` entry resolves into this **once**; downstream code dispatches on the variant instead of re-sniffing a scheme at each call site.

**The key names the locus** — so neither the value's scheme nor a comment is load-bearing:

| Locus | Key | Value |
|---|---|---|
| Embedded | **`storage:`** | a storage location (string or block, below) |
| Remote | **`server:`** | a name in `servers:` (its `endpoint` + auth resolve by name) |
| Remote graph id | **`graph_id:`** | the id on that server — **defaults to the entry key** |

An entry has `storage:` **xor** `server:`; the deserializer rejects both and neither.

**`storage:` is a string-or-block.** The bare scalar covers the common case; the block form gives per-graph object-store options a home (region/endpoint/profile) without a future breaking change, and keeps `uri:` as the precise word for "location" exactly where it is now unambiguous (`storage.uri` is always embedded):

```yaml
dev:  { storage: s3://team/dev.omni }            # scalar sugar ⇒ storage: { uri: s3://team/dev.omni }
prod:
  storage:
    uri: s3://team/prod.omni
    region: eu-west-1
    endpoint: https://minio.local                # S3-compatible override
    profile: team-deploy                          # named cloud profile (env-only — see note)
```

Shipped flat `uri:` becomes a deprecated alias mapped to `storage.uri` with a load-time warning.

**Validation (Lance 6.0.1):** `region`/`endpoint` are threadable per-graph today — Lance accepts per-dataset `storage_options` (`builder.rs:165-176,305`) and omnigraph currently hardcodes `storage_options: None` (`namespace.rs:228,376`); wiring them is omnigraph-internal, no Lance change. **`profile` is the exception** — `AWS_PROFILE` is env-only in both Lance and omnigraph's `AmazonS3Builder::from_env()` (`storage.rs:284`), so `storage.profile` is **scoped out of v1** unless omnigraph resolves the profile to concrete credentials itself. `region`/`endpoint` land in V2 (engine threading); `profile` stays a documented Open Question.

### 3. Invalid configs are rejected by design

The DX rule: **a config field is either honored or rejected, never silently ignored.** The loader has two phases:

1. Parse YAML into a raw, origin-preserving shape (`base_dir`, layer, path), with **`deny_unknown_fields`** so a typo errors instead of becoming a silent no-op.
2. Convert once into a typed, role-aware resolved config. Every command receives the resolved form.

```rust
struct Config {                  // identical schema at both layers; deny_unknown_fields
    version:  u32,               // schema version — forward-compat + clean deprecation gate
    servers:  Map<ServerId, Server>,
    graphs:   Map<GraphName, GraphEntry>,
    defaults: Defaults,
    serve:    Serve,             // host-role serving config (see §5/§9)
    aliases:  Map<AliasName, Alias>,
    query:    QueryRoots,        // client-role: search roots for ad-hoc `--query <path>` .gq files
}

enum GraphEntry {
    Embedded(EmbeddedGraph),     // storage: present
    Remote(RemoteGraph),         // server: present
}
struct EmbeddedGraph { storage: Storage, branch: Option<Branch>, snapshot: Option<Version>,
                       policy: Option<PolicyFile>, queries: Map<Name, QueryDef> }
struct RemoteGraph   { server: ServerId, graph_id: GraphId, branch: Option<Branch>, snapshot: Option<Version> }
```

This makes the rules structural rather than advisory:

- A graph entry must specify **exactly one** locator (`storage:` xor `server:`).
- `policy:` and `queries:` are valid **only** on `Embedded` entries — they define the capability surface of a graph this process opens directly. A `Remote` entry points at a server that owns its own policy and stored queries.
- `omnigraph-server` may serve only `Embedded` entries; a server manifest entry with `server:` is rejected (a server must not proxy another server).
- A `Remote` entry discovers stored queries from the server (`GET /queries`) and invokes them (`POST /queries/{name}`); it never defines `queries:` locally.

Examples that must fail fast:

```yaml
graphs:
  bad1: { storage: s3://b/prod.omni, server: prod-us }      # invalid: storage xor server
  bad2: { server: prod-us, graph_id: production,
          policy: { file: ./p.yaml } }                       # invalid: remote policy lives on the server
```

`omnigraph config view --resolved --show-origin` is the user-facing debugger: it prints the final `Embedded`/`Remote` locator and the origin layer of every honored field. Fields that cannot be honored fail validation first; they never appear in the resolved view.

### 4. Layered config — global-first, uniform schema, project-optional

**Posture: global-first, project-optional.** The CLI is primarily a *client*, so it sits on the global-first side of the axis — like `kubectl`/`aws`/`gh`/`docker`. The **global user config is the primary, self-sufficient default**; the project file is an optional repo-scoped override (and, when present, the deployment manifest). `omnigraph query --graph prod` must work from any directory with no project file.

**One raw schema, both layers, each self-sufficient.** Do not specialize the format by layer. Run the same role-aware validation everywhere (§3): a layer may define graphs, defaults, servers, and aliases, but fields meaningless for a resolved variant are rejected, not ignored.

| Layer | Required? | Typical use | Path |
|---|---|---|---|
| Global | no | **the default** — solo/agent's entire config; shared servers+creds for teams | `~/.omnigraph/config.yaml` |
| Project | no | **opt-in** — repo-scoped overrides + the committed deployment manifest | `./omnigraph.yaml` |

**Precedence (low → high):** built-in defaults < global < active-context state (§5) < project < env vars < CLI flags. With no project file it collapses to built-in < global < state < env < flags.

**Merge semantics — "closest layer wins, at the smallest meaningful unit":**
- **Settings objects** (`defaults`, `serve`) → deep-merge per field: a project sets `defaults.graph` and inherits the global `defaults.output_format`.
- **Named-resource maps** (`servers`, `graphs`, `aliases`) → union by key; on a collision the **higher-precedence** layer's entry **replaces** the lower wholesale (no field-level deep-merge within an entry — replace makes the entry self-contained and predictable). Per-graph `queries:` are not a top-level map; they merge as part of their owning `graphs` entry (replaced with it).
- **Server identity follows trust, not precedence (security).** Precedence and trust run *opposite* for the project layer: project is **higher-precedence** (it wins value merges, above) but **lower-trust** (a repo an agent can edit or a clone can ship). A `servers:` entry's `endpoint` and `auth` are its **identity**, and identity follows trust — a lower-trust layer may add *endpoint-only* servers and graph aliases, but may **not** (a) redefine the `endpoint` of a server a higher-trust layer defined, nor (b) carry a `servers.<name>.auth` block — *client* credential sourcing — at all (no `command`/`file`/`keychain`/`token` sourcing; `command` would be repo-authored RCE). Both are rejected. (`serve.auth`, the secret-free server-side *accept* config, is unaffected — it is exactly what a committed deployment manifest carries; §6.) Without this, a project file could repoint `servers.prod.endpoint` or inject `auth.command` and, since credentials key by name, harvest or execute against the user's `prod` identity. The credential trust model in §7 enforces the consuming side.
- **Lists** → replace, never append.
- **Scalars** → higher layer wins.
- **Relative paths carry their origin's `base_dir`** — a `queries:` `.gq` path or a `policy.file` resolves against the directory of the layer it was defined in.
- **Inspectable (non-negotiable):** `config view --resolved --show-origin` prints each final value and the layer that set it.

### 5. File layout, naming, and the secrets boundary

The layout splits on the **two boundaries that are actually irreducible — secrecy and scope — and never on role**:

| Axis | Real boundary? | Why |
|---|---|---|
| Secrecy (secret vs secret-free) | **yes, hard** | Security + AX: a secret-bearing file in the repo is exfiltratable by an agent and committable by a human. |
| Scope (user-global vs project-local) | **yes, hard** | Different lifecycle, owner, and VCS status. |
| Role (client vs server) | **no, soft** | On a laptop they collapse (E1); in prod they are different *repos* sharing a schema. Role is which sections are filled, not which file. |

```
~/.omnigraph/                       # global, user-scoped, machine-local, NEVER in VCS
├── config.yaml                     # servers + personal graphs + defaults + aliases   (SECRET-FREE)
├── credentials                     # INI, [server] → token, 0600, gitignored   (FALLBACK; keychain preferred)
├── cache/                          # remote catalogs (GET /graphs), OAuth token cache — rm -rf safe
└── state/                          # active-context (omnigraph use), session logs

<repo>/omnigraph.yaml               # project = deployment manifest, committed, portable   (SECRET-FREE)
<repo>/schema.pg, queries/*.gq, policies/*.yaml

# secrets at rest:  OS keychain  omnigraph:<server>   (preferred — no plaintext file)
# secrets in CI:    OMNIGRAPH_BEARER_TOKEN[_<SERVER>] env
```

**Naming decisions (best-practice + de-conflicted; breaking where it removes ambiguity):**

| Shipped | This RFC | Why |
|---|---|---|
| `server:` (self) vs `servers:` (remote) | **`serve:`** vs `servers:` | Two keys one letter apart with opposite meaning is the worst ambiguity in the current schema. `serve:` = "config when I serve"; `servers:` = "remotes I target." |
| `uri:` (graph-entry top level) | **`storage:`** (string-or-block; `uri:` nested) | `uri:` conflated embedded/remote (§2). |
| `cli:` block | folded into **`defaults:`** | "default graph/branch/format/actor" is one concept; no consumer-specific block. |
| top-level `policy:` / `queries:` | **removed** | per-graph only; deletes the dual-site reconciliation machinery. "Single-graph mode" = a one-entry `graphs:` map. |
| `bearer_token_env:` (per-graph) | **`servers.<>.auth.bearer.token.env`** | auth is per-server (§6); old field kept as a legacy alias. |
| `auth.env_file` (project dotenv) | **deprecated (warned)** | no secret-bearing file in the project tree. |
| `aliases.<>.query: <path>` + `command:` | **`aliases.<>.query: <name>`** (reference) | an alias references a *defined* query; read/mutate inferred (§8). |
| `project: { name }` | **removed** | no consumer. |
| *(none)* | **`version: 1`** + `deny_unknown_fields` | forward-compat; typos error rather than no-op. |
| `query.roots:` | **retained** | resolves ad-hoc `--query <relative>.gq`; orthogonal to the alias/registry model. |

Conventions kept: **snake_case** keys; **plural maps** keyed by name; **`~/.omnigraph/config.yaml`** global (named `config` — the universal convention) + **`./omnigraph.yaml`** project (app-named manifest). `OMNIGRAPH_HOME` overrides the global dir; `OMNIGRAPH_CONFIG` overrides the config file path; `$XDG_CONFIG_HOME` honored if set, but `~/.omnigraph/` is canonical.

**Active context is *state*, not declarative config.** `omnigraph use <graph>` writes `~/.omnigraph/state/active.yaml` (a thin `{server, graph}`), leaving the user-authored `config.yaml` pristine — avoiding kube's comment-stripping rewrite of `~/.kube/config`. It slots into precedence between global and project (§4).

**Four hard rules (promote to invariants):**
1. **No secret in any `*.yaml`, ever** — global or project. Secrets: keychain → `credentials` (INI, `0600`) → env.
2. **No secret-bearing file in the project tree.** (Kills project-local `.env.omni`; kept as a warned compat path, removed next major.)
3. **The project tree carries capability + targeting, never identity.** A project layer may *target* servers and define graphs, but it may not assert a server's identity — redefining a higher-layer server's `endpoint`/`auth` is rejected (§4), and credentials are endpoint-bound (§7). This is the AX guarantee that makes "hand an agent a repo" safe by construction.
4. **`config.yaml` ⊇ `omnigraph.yaml` schema; scope is the only difference.** Same parser, role-aware validation, `config view --resolved` is the disambiguator.

### 6. Auth — method × source are orthogonal

The shipped code knows only bearer-from-env. Two independent axes must be separated:

- **Method** = *what kind of credential/protocol*: `bearer`, `oauth`, `mtls`, `none`. Exactly one per server.
- **Source** = *where secret material is read from*: `env`, `file`, `command`, `keychain`. Reusable wherever a secret is needed.

OAuth is **not** "just another token source": it has an interactive flow, endpoints (issuer/client_id/scopes), and refresh semantics, and its tokens are minted by `omnigraph login` and cached in the keychain — never in config. So it is a *method* with its own fields.

```rust
// servers.<name>.auth — fully optional; absent ⇒ implicit bearer chain keyed by name
enum Auth {
    Bearer { token: SecretSource },
    None,                                   // explicitly unauthenticated (not accidental)
    // Reserved — shape-stable but not implemented in v1 (own milestone, see Rollout V6):
    OAuth  { issuer: Url, client_id: String, scopes: Vec<String>, audience: Option<String> },
    Mtls   { cert: SecretSource, key: SecretSource },
}
enum SecretSource {
    Env(String),           // env:      OMNIGRAPH_BEARER_TOKEN_PROD
    File(PathBuf),         // file:     /run/secrets/og-token
    Command(Vec<String>),  // command:  [vault, read, -field=token, secret/og]  (argv list, no shell)
    Keychain(String),      // keychain: omnigraph:prod
}
```

**Externally-tagged** (the key names the method/source), consistent with §2 — a field under `oauth:` cannot leak into `bearer:`.

| Method / source | Use case | YAML |
|---|---|---|
| *(omit `auth:`)* | the common case | implicit chain (below) |
| `bearer.token.env` | CI / secrets-manager fixed var | `auth: { bearer: { token: { env: OG_PROD_TOKEN } } }` |
| `bearer.token.file` | k8s/docker mounted secret | `auth: { bearer: { token: { file: /run/secrets/og } } }` |
| `bearer.token.command` | Vault / cloud IAM / `gh auth token` | `auth: { bearer: { token: { command: [vault, read, -field=token, secret/og] } } }` |
| `bearer.token.keychain` | pin a non-default keychain entry | `auth: { bearer: { token: { keychain: omnigraph:prod } } }` |
| `oauth` | SaaS / SSO — `omnigraph login` device flow | `auth: { oauth: { issuer: https://auth.og.cloud, client_id: og-cli, scopes: [graph.read, graph.write] } }` |
| `mtls` | client-cert networks | `auth: { mtls: { cert: { file: ./client.pem }, key: { file: /run/secrets/og-key.pem } } }` (key off the repo tree — hard rule 2) |
| `none` | open dev server | `auth: { none: {} }` |

**Scope (v1): only `bearer` and `none` are implemented.** `oauth` and `mtls` are **reserved** — the enum shape is fixed (so adding them later is not a breaking re-key, per Hyrum's Law), but a config selecting them errors with "auth method not yet supported." Client-side OAuth login (device flow, token cache, refresh) is a later milestone (Rollout V6); **server-side OIDC validation is owned by the Federated Auth workstream (MR-956 RFC 0001)** — `serve.auth.oauth` (below) is its YAML home and may land on its own timeline. mTLS is V6.

**Auth is per-server, not per-graph.** One credential authenticates you to a *server*; Cedar then authorizes per graph. The shipped per-graph `bearer_token_env` is the wrong grain for a multi-graph world (it repeats across every graph on a server); it survives as a legacy alias for `servers.<n>.auth.bearer.token.env`.

**The `command` source** runs locally with the operator's own privileges, so a `servers.<name>.auth` block — `command` especially — is **rejected from a lower-trust (project) layer** (§4): it is honored only from global/trusted config, never from a repo, so it adds no remote-execution surface. The `auth:` union is method-tagged so adding a method later is a new variant, not a re-key (Hyrum's Law: the field name is a contract once shipped).

**Server-side accept config is separate and secret-free** (it validates incoming credentials; it is not a credential) and lives under `serve:`:

```yaml
serve:
  auth:
    bearer: { enabled: true }                                  # tokens via OMNIGRAPH_SERVER_BEARER_TOKEN* env
    oauth:                                  # reserved shape; verifier owned by MR-956 RFC 0001
      issuers:                              # LIST from day one — scalar→list would be a breaking re-key
        - issuer:  https://auth.og.cloud
          audience: og-api
          tenant_claim: org_id              # → ResolvedActor.tenant_id (None in Cluster, Some in Cloud)
          # actor_claim / scope_claim / jwks_* field schema owned by MR-956 RFC 0001
  policy: { file: ./policies/server.yaml }                      # server-level Cedar (management endpoints)
  # bind/workers are 12-factor: --bind today (OMNIGRAPH_BIND is proposed, not yet implemented), never committed here
```

**Reserved for cloud (shape only; see Non-Goals).** Two forward-compat shapes ship in v0.8.x so the multi-tenant Cloud tier is additive, not a breaking re-key: (1) `serve.auth.oauth.issuers` is a **list** carrying `tenant_claim` (→ `ResolvedActor.tenant_id`, already present at `identity.rs:189`) — the verifier and full field schema (`jwks_*`, `clock_skew`, actor/scope claims) are **owned by MR-956 RFC 0001**, which this block is the YAML home for; this RFC reserves only the top-level shape and defers fields there, so there is **one** OIDC schema, not two. (2) `serve.policy` is a **tagged source keyed at the `policy` level** (like `storage:`/`auth:`) — `file` today, `directory`/`manifest` reserved for per-tenant Cedar bundles — so adding variants is additive, with **no `source:` wrapper** (which would be a needless re-key). Both stay parse-but-reject until implemented.

### 7. Credential resolution and connection tiers

**Implicit chain for server `<name>`** (when `auth:` is omitted), keyed by name, reusing the shipped env var:
1. **`OMNIGRAPH_BEARER_TOKEN_<NAME>`** (name-derived, upper-snake), else **`OMNIGRAPH_BEARER_TOKEN`** for the active server — the CI/headless override.
2. **OS keychain** `omnigraph:<name>` — the preferred interactive store; written by `omnigraph login <name>`.
3. **`~/.omnigraph/credentials`** — INI profile keyed by server name (`0600`, git-ignored):
   ```ini
   [prod-us]
   token = …
   [prod-eu]
   token = …
   ```

**Credential trust model (security).** Two rules close the credential-redirection path:
1. *Implicit/ambient credentials apply only to trusted-origin servers.* The implicit chain above (env-by-name, keychain-by-name, profile) is consulted **only when the server's identity — its `endpoint` — came from a trusted layer** (global config, or an explicit operator source). A server whose identity is introduced by a lower-trust (project) layer never auto-consumes an ambient credential: it is **unauthenticated (local-dev) by default**, and authenticated use requires either promoting it to a trusted layer (a global `servers.<name>`) or an operator-supplied credential at invocation — a `--token-from <env|file|command>` flag (operator-trust, not repo-supplied; a future addition, §10). This is what makes env-by-name safe: a raw `OMNIGRAPH_BEARER_TOKEN_<NAME>` carries no issued-for endpoint, so it is trustworthy only when the *name → endpoint* binding it rides on is itself trusted.
2. *login-written credentials additionally bind to their endpoint.* `omnigraph login <server>` records `(name, endpoint)`; at use, the keychain/profile token is released only if the resolved endpoint still matches, erroring otherwise (`server 'prod' resolved to <endpoint>, which does not match the endpoint this credential was issued for`). This catches a trusted server whose endpoint later changes.

Together with the §4 identity rule (a lower-trust layer can neither repoint a trusted server nor carry `servers.<name>.auth`), ambient credentials cannot be redirected to an attacker endpoint.

**Forward-compat (cloud, reserved; see Non-Goals).** Endpoint-binding keys a credential to `(name, endpoint)`, but a multi-org user on **one** cloud endpoint holds many tokens that all bind to that endpoint — so endpoint-binding alone cannot disambiguate them, and the credential identity unit becomes `(server, organization)`. Reserve `omnigraph:<server>[/<org>]` keychain keying and `[<server>/<org>]` profile sections now (additive). The org is server-resolved from the token (never a client-asserted field), so this is a storage-keying concern only.

If `auth:` is set, that source is used (no fallthrough). `omnigraph login <server>` writes/rotates only that server's secret (keychain preferred; OAuth, when implemented (V6), runs the device flow and caches tokens in the keychain → `~/.omnigraph/cache/oauth/`). There is **no `credentials.yaml`** and no inlined secret. *Convention for the floor, explicit for control.*

**Cloud-storage credentials** for embedded `storage:` graphs come from the ambient cloud chain (`AWS_*`, instance roles, `~/.aws/credentials`), optionally narrowed by `storage.profile`/`storage.region`/`storage.endpoint` (§2). OmniGraph never stores object-store secrets.

**Three connection tiers** (the zero-config floor):
1. **Env vars** — `OMNIGRAPH_SERVER=https://…` + token: fileless remote (the `DATABASE_URL` floor; `OMNIGRAPH_SERVER` is new).
2. **Global `config.yaml`** — named `servers:` (+ optional graph aliases) for multi-server setups.
3. **Project `omnigraph.yaml`** — project-pinned graphs/aliases, committed.

### 8. Stored queries (definitions) vs. aliases (invocations)

A stored query and a CLI alias are different concepts; do not collapse them, but do remove their overlap:

- **Definition** (`.gq` source + a `queries:` entry) lives next to the **embedded graph entry that owns it** — for a hosted graph, the deployment manifest read by `omnigraph-server`. It is the capability surface (Cedar-gated when served, MCP-visible when exposed). It never lives on a `Remote` entry.
- **Discovery** ("what can I call?") is fetched from the server (`GET /queries`, Cedar-filtered) at connect time.
- **Invocation** is remote (`POST /queries/{name}`) or embedded (open the graph, read the same manifest).
- **Alias** = a client-side *saved invocation* that **references** a defined query and binds invocation context — it never defines a `.gq`:

```yaml
graphs:
  prod:
    storage: s3://team/prod.omni
    queries:
      find_user: { file: ./queries/find_user.gq, mcp: { expose: true, tool_name: lookup_user } }

aliases:
  owner: { graph: prod, query: find_user, branch: review, format: table, args: [name] }
```

This is the **capability-as-code guarantee for agents**: an agent can only invoke tools the server's committed, reviewed config exposes; it cannot define a new tool at runtime. Making the alias a *reference* (not a second definition site with an inline `.gq` path and an explicit `command`) removes the "alias and query with the same name are different namespaces" footgun and the duplicate-definition drift, while keeping saved-invocation ergonomics. Read vs mutate is inferred from the referenced definition.

### 9. Server-mode disambiguation (the V2 prerequisite)

**What the server serves.** `serve.graphs: [<name>, …]` selects which embedded `graphs:` entries this process serves (default: **all** embedded entries). It subsumes the removed `server.graph` (a one-element list). Mode is derived from the served count: one ⇒ single, many ⇒ multi.

**Canonical wire id.** Every served graph has a canonical `graph_id` — its `serve.graphs` selection name, or `default` for a bare-URI server started with no config. The server **always mounts `/graphs/{graph_id}/…`**. The legacy flat routes (`/query`, `/branches`, …) remain **only when exactly one graph is served**, as a compat alias bound to that graph. `GET /graphs` returns the served set (one entry in single mode — today's single-mode 405 is removed) and stays `graph_list`-gated — so with default-deny on server-scoped actions, single-mode `GET /graphs` returns **403 unless a `serve.policy` authorizes `graph_list`** (405→403, not →200). **Open decision (validated):** the wire `graph_id` (`default` for a bare-URI server) and the Cedar *resource* id (today the normalized URI, `graph_resource_id_for_selection`) differ for anonymous graphs; either accept the split or align the anonymous Cedar id to `default` (a policy-identity break for existing single-graph deployments).

**Client.** The client config is **mode-agnostic**: a `Remote` locator always carries `graph_id`, and the client always builds `/graphs/{graph_id}/…`. It never needs to know a server's deploy mode.

This avoids shipping two URL shapes for the same operation depending on a config mode (a Hyrum's-Law liability) and lets the existing CLI remote paths be rewired once to the prefixed form (and migrated off the deprecated `/read`/`/change`). The fallback, if route unification is deferred, is a cached `GET /graphs` probe in `~/.omnigraph/cache/` (the catalog already returns each `graph_id`); it is strictly worse and not preferred. **V2 is gated on route unification.**

**Forward-compat (cloud, reserved; see Non-Goals).** The unified registry stays keyed by **`GraphKey { tenant_id: Option<TenantId>, graph_id }`** — already shipped in MR-668 (`identity.rs:116`, `tenant_id = None` in Cluster/embedded). Folding `Single`/`Multi` into one registry (V2) must **not** flatten it to `graph_id`-only: Cloud mode sets `tenant_id = Some(...)` from the token's `org_id`, two tenants may each own `production`, and `GET /graphs` becomes tenant-scoped (filtered to the resolved tenant; cross-tenant default-deny). Tenant is resolved from the token, never the path.

### 10. CLI surface

- `omnigraph login <server>` — interactive auth; stores the token in the keychain (`omnigraph:<server>`) or the `[<server>]` profile (`0600`); runs the OAuth device flow for `oauth` servers (V6). The `gh auth login` analog.
- `omnigraph use <graph>` — set the active context; writes `~/.omnigraph/state/active.yaml`. The `kubectl config use-context` analog.
- `omnigraph config view [--resolved] [--show-origin] [<graph>]` — print the merged config and, with `--resolved`, the final locator plus the origin layer of every field.
- `--token-from <env|file|command>` (future) — an operator-supplied one-shot credential, to authenticate against a server whose identity is *not* in a trusted layer (§7). Operator-trust, never repo-supplied.
- All existing verbs gain `--graph <name>` (the shipped flag is `--target`, kept as a deprecated alias); resolution (§1) decides embedded vs remote transparently.

### 11. Init, login, bootstrap — three tiers

| Tier | Command | Scope | What it does | Status |
|---|---|---|---|---|
| **User route** | `omnigraph login [<server>]` | user (`~/.omnigraph/`) | auth + write `config.yaml`/`credentials`; first-run global setup | this RFC (unbuilt) |
| **Thin project init** | `omnigraph init` | project, in-place | create graph + `scaffold_config_if_missing`; refuse-if-exists or `--force` | exists; `--force` purge unbuilt |
| **Fat bootstrap** | `omnigraph quickstart [--template <t>] [--auto]` | project | scaffold + seed + serve + agent prompt file | unbuilt (needs `serve`) |

Design positions: **split `init` (project) from `login` (user)** — never one command writing to both `$HOME` and the project; **`init` is in-place + refuse-if-exists** (cargo/prisma default); **interactive for humans, `--auto`/`OMNIGRAPH_AGENT_MODE` for automation** (any prompt → fail with a repair hint); **templates are a `--template` flag** on the fat tier; **secrets-on-scaffold rule** — anything that writes a token keeps it out of VCS (keychain preferred; `credentials` is `0600` and git-ignored).

## Concrete shape

**Global** `~/.omnigraph/config.yaml` (per-user, secret-free):
```yaml
version: 1
servers:
  prod:  { endpoint: https://og.internal:8080 }        # auth omitted ⇒ implicit chain keyed by name
  cloud:
    endpoint: https://api.og.cloud
    auth: { oauth: { issuer: https://auth.og.cloud, client_id: og-cli, scopes: [graph.read, graph.write] } }  # reserved/future (V6)
graphs:
  personal: { storage: ~/graphs/personal.omni, branch: main }
  review:   { server: cloud, graph_id: production, branch: review }   # optional pinned remote alias
defaults: { server: cloud, graph: personal, output_format: table, actor: ragnor }
aliases:
  people: { graph: personal, query: list_people }
```

**Project** `./omnigraph.yaml` (committed, secret-free, portable — read by CLI *and* server):
```yaml
version: 1
graphs:
  production:                                  # embedded ⇒ served; capability surface lives here
    storage: s3://team-bucket/prod.omni
    policy:  { file: ./policies/prod.yaml }
    queries:
      find_user: { file: ./queries/find_user.gq, mcp: { expose: true, tool_name: lookup_user } }
  staging:                                     # remote ⇒ a target; no policy/queries (server-owned)
    server: prod
    graph_id: prod
    branch: review
defaults: { graph: production, branch: main, output_format: table }
serve:
  graphs: [production]                          # which embedded graphs to serve (default: all)
  auth:   { bearer: { enabled: true } }         # bind via --bind (OMNIGRAPH_BIND proposed; see Rollout)
  policy: { file: ./policies/server.yaml }
```

**Credentials** `~/.omnigraph/credentials` (INI, `0600`, git-ignored — fallback when no keychain):
```ini
[prod]
token = …
```
`omnigraph login prod` writes the keychain entry `omnigraph:prod` (preferred) or this profile; `OMNIGRAPH_BEARER_TOKEN_PROD` overrides for CI. No token fields in any YAML; no committable secrets.

## DX

1. **One command surface, two loci.** `query --graph dev` (embedded) and `--graph staging` (remote) are the same command; only resolution differs.
2. **Point at a server, use it.** A `servers:` entry reaches every graph the server hosts as `server/graph_id` *if you know the id* — no per-graph declaration. (Listing what exists needs the `graph_list` permission, which the server may default-deny.) `omnigraph login <server>` once, then every target resolves.
3. **Multi-server × multi-graph is the default.** `prod-us` and `prod-eu` both serving `production` is two `servers:` entries (or two graph aliases) — Helix cannot express this.
4. **Solo-first.** Everything in `~`, no project required.
5. **Laptop-to-fleet on one schema.** Local = one `omnigraph.yaml` (both roles); prod = role-split across repos. No second format.

## AX (agent experience)

1. **One flat resolved context.** graph→server→endpoint→token resolves before the agent sees anything; `config view --resolved` flattens it. The agent reasons about tools, not topology.
2. **Secrets are outside the repo and trust-gated.** No secret-bearing file in the repo (hard rule 2); tokens live in the keychain / global layer / env, and ambient credentials apply only to trusted-origin servers (§7). A repo-confined agent cannot read a token, and cannot exfiltrate one by repointing or introducing a server — the §7 trust model and §4 identity rule withhold it. See the threat model below for the precise boundary.
3. **Branch/snapshot-pinned contexts** (E4) — hand an agent a `branch: review` / `--snapshot v42` graph and its reads are reproducible and cannot see uncommitted main-line state.
4. **Capabilities are a GitOps'd artifact** (E6) — which graphs exist, which stored-query tools it may call, and which Cedar rules gate them are all in version-controlled config. Powers change only via a reviewed PR + restart.
5. **Config + policy compose.** Config = "where am I pointed + which token"; Cedar = "what may I do there." Orthogonal.

**Threat model & secret boundary.** The agent/repo boundary is a trust boundary, held by three rules: (1) secrets live outside the repo — keychain or `~/.omnigraph/`, never project config or the tree (hard rule 2); (2) a lower-trust layer cannot redefine a server's identity (§4); (3) credentials bind to an endpoint, so a redirected server cannot harvest a token (§7). Caveat — "outside the agent's reach" means the **repo-confined** surface: a shell-capable agent with `$HOME` access can still read `~/.omnigraph/credentials`, so the OS keychain (no plaintext at rest) is the stronger posture and the default `login` target.

## GitOps — three surfaces, secrets in none

| Surface | Repo | Contents | Deploy | Secrets |
|---|---|---|---|---|
| Server deployment config | infra/deploy repo | `graphs:`, policy, `queries:` + `.gq` | commit → CI → **restart** | none — by-reference |
| Project client config | app repo | `graphs:` → embedded storage or remote server+graph | committed, read by CLI/agent | none |
| Global user config | machine-local `~` | `servers:` + creds-by-ref | `omnigraph login` writes it | refs only |

## Comparison

| Property | kubeconfig | Helix | git | compose | **OmniGraph (this RFC)** |
|---|---|---|---|---|---|
| Named remote endpoints + creds-by-ref | ✅ | ✅ | partial | partial | ✅ (global `servers`) |
| Global + project layering, uniform schema | ✗ | ✗ | ✅ | ✗ | ✅ |
| Embedded OR remote under one name | ✗ | ✗ | n/a | ✗ | ✅ (E1) |
| Server self-sufficient (no per-graph declare) | ✅ | ✗ | n/a | n/a | ✅ (E3) |
| Multi-server × multi-graph | ✅ | ✗ | n/a | n/a | ✅ (E2) |
| Branch/snapshot in the address | ✗ | ✗ | partial | ✗ | ✅ (E4) |
| Agent tool surface in the repo | ✗ | ✗ | n/a | n/a | ✅ (E6) |
| Pluggable auth methods (bearer/oauth/mtls) | ✅ (exec) | partial | ✗ | ✗ | ✅ |
| Concept count | 3 | 1 | 2 | 1 | **2 (servers/graphs)** |

## Divergence & single source of truth

The test (engineering integrated over time): does this design *prevent* divergence between the three surfaces — CLI, config, HTTP routes — by construction, or merely reduce today's instances?

**Structurally prevented:**
- **config ↔ CLI** — one noun (`graphs:`/`--graph`); a graph address resolves **once** into a typed `GraphLocator` (§2) that downstream dispatches on, instead of re-sniffing `is_remote_uri` at ~17 sites. A new command receives the resolved locator and cannot re-derive "server or file?" wrong. *Enforcement points:* a shared `GraphArgs` (one flag definition) and routing **every** command through the resolver — the current bare-`resolve_uri` re-sniff sites must be converted, not left.
- **config ↔ HTTP capability surface** — `policy:`/`queries:` live at exactly one site (the owning `Embedded` graph entry), read identically by the embedded CLI and the server; the dual top-level/per-graph reconciliation is deleted.

**Reduced, not prevented — the residual axis:**
- **CLI ↔ HTTP routes.** Route unification (§9) makes the path *shape* uniform, and *body* types are already shared (the CLI imports `api::*` DTOs, so a DTO change breaks CLI compilation — a compile-time guard). But **path strings stay hand-duplicated**: the server declares routes (`.route("/branches", …)`) and the CLI hand-writes the matching strings (`remote_url(&uri, "/branches")`), and the `omnigraph-ts` SDK is generated from a *vendored* `openapi.json` snapshot. So a new endpoint still forks three ways (server route + CLI client call + SDK re-vendor). Unification removes the *mode* divergence (flat vs nested) and the `/read`-vs-`/query` drift — not the structure that generates path divergence.

**The structural move that would close it (recorded, not in scope):** a shared route/operation table (path+method consts) consumed by both the server router and the CLI client, and/or generating the CLI's HTTP client from the same OpenAPI spec the SDK uses (the CLI is the only hand-maintained parallel client). Given ~17 slowly-growing endpoints and compile-shared bodies, this does not block the RFC — but **V2 is the cheap moment to add the shared path constants**, since it touches every path anyway.

**Net liability:** every duplicate-site count goes down (≈17 sniff sites → 1 locator; 2 route shapes → 1; dual policy/queries → 1; per-graph token → per-server; silent-ignore → honored-or-rejected). The added surface (merge+provenance engine, keychain, layered loader) is centralized — lower ongoing liability *provided* every command routes through the single resolver.

## Migration / breaking changes

Gated behind `version:`. `version: 1` is this schema; a missing `version:` is read as legacy (the shipped shape) with deprecation warnings.

**Compat aliases (legacy honored, warned):**
- `--target` flag → `--graph` (deprecated alias).
- `uri:` → `storage.uri`.
- `cli:` block fields → `defaults:`.
- `server:` (self) → `serve:`.
- `auth.env_file` dotenv → honored but warned (secrets-in-repo); removed next major.
- `bearer_token_env:` (legacy graph-local) → see "Renamed / migrated" below.

**Removed (hard errors under `version: 1`):**
- Top-level `policy:` / `queries:` — move to the owning `graphs.<name>` entry.
- `project.name` — no consumer.
- A `Remote` graph entry with local `policy:`/`queries:`; a `serve:` manifest with a `server:` graph locator; an alias with an inline `.gq` path.

**Renamed / migrated:**
- `server.graph` (single-graph selector) → **`serve.graphs: [<name>]`** (a one-element served set; §9). Not a removal — the "define many graphs, serve a subset" capability is preserved.
- **Legacy remote graph + credential mapping.** A legacy remote `{ uri, bearer_token_env }` has *no named server*, and its `uri` may already smuggle the multi-graph hack (`https://host/graphs/{gid}`). Under `version: 1` the migration **strips the trailing `/graphs/{gid}` suffix**: `https://host[/path]/graphs/{gid}` → `endpoint: https://host[/path]` (the full prefix, **including any reverse-proxy path**), `graph_id: gid`; a `uri` with no `/graphs/{gid}` suffix → `endpoint: <uri>`, `graph_id: <graph_name>`. It emits `servers.<name> = { endpoint, auth: { bearer: { token: { env: <VAR> } } } }` (treated as trusted on migrate) and rewrites the graph to `{ server: <name>, graph_id }`. Splitting the `/graphs/{gid}` suffix is required — otherwise V2's always-`/graphs/{id}/…` client would build `https://host/graphs/{gid}/graphs/<name>`. In legacy mode (no `version:`) the graph-local credential keeps working unchanged.

**Posture flips:**
- **Global-first.** The CLI gains a global discovery layer below the project file; existing project-only workflows are unchanged (project still overrides global).
- **Secrets out of the repo.** Project-local `.env.omni` is deprecated; bearer secrets live only in the keychain / `~/.omnigraph/credentials` / env.
- **Auth keyed by server name** (keychain / `[<server>]` profile / `OMNIGRAPH_BEARER_TOKEN_<SERVER>`), with explicit `auth:` sources for control. `OMNIGRAPH_BEARER_TOKEN` (the shipped name) is reused — **no new `OMNIGRAPH_TOKEN`**.

## Open questions

- **Keychain crate + name-derivation.** Keychain is the primary credential store, so it is on the critical path: macOS Keychain first, the `0600` profile file as fallback; Linux Secret Service / `pass` later. Open: which keyring crate, and the exact `OMNIGRAPH_BEARER_TOKEN_<SERVER>` derivation (upper-snake, non-alnum → `_`).
- **OAuth flow specifics (V6, not v1).** Device-authorization vs auth-code+PKCE as the default `login` flow; token-cache location and refresh-failure UX. The enum reserves the shape; implementation is deferred.
- **OIDC ownership / timeline (cloud).** `serve.auth.oauth`'s shape is reserved here; its verifier + field schema are MR-956 RFC 0001's. If Federated Auth lands before V6, server-side OIDC validation ships on its timeline, not this RFC's — the two must converge on one schema (the reserved `issuers:`-list + `tenant_claim`), never a second OIDC config.
- **`storage:` block scope.** How much object-store config to honor per graph (region/endpoint/profile) vs. delegating entirely to the ambient chain. Start minimal.
- **Single-file vs `KUBECONFIG`-style list.** `OMNIGRAPH_CONFIG` single path first; colon-joined list later if demand appears.
- **`config.yaml` vs `omnigraph.yaml` deep convergence.** Out of scope: one registry with embedded + remote invocation surfaces is the long-term end state for `queries:`/`aliases:`.

## Implementation — breadboard + slices

**Bold** = NEW. The new layered-config + resolver + auth code lands in a **new `omnigraph-config` crate** depended on by `omnigraph-cli` and `omnigraph-server`, so neither the CLI nor YAML parsing pulls in the HTTP server stack. **Caveat (validated):** config extraction alone does *not* shed the dependency — the CLI also imports ~20 `omnigraph_server::api::*` wire DTOs (`main.rs:20-27`). Fully realizing "CLI doesn't pull Axum" needs a companion **`omnigraph-api-types`** crate (the DTOs); otherwise the CLI keeps the server dep for DTOs. `QueryRegistry` stays in `omnigraph-server` (it is `omnigraph-compiler`-coupled, `queries.rs:18-22`) — only the serde types move; `PolicyEngine` is already standalone in `omnigraph-policy`.

### Places

| # | Place | What |
|---|---|---|
| P1 | Disk | `~/.omnigraph/{config.yaml, credentials, cache/, state/}` + project `omnigraph.yaml` |
| P2 | Config resolution | every command: load layers → merge → resolve `--graph` → resolve auth |
| P3 | Command execution | embedded engine OR remote HTTP client |
| P4 | Remote `omnigraph-server` | existing HTTP surface (+ route unification, §9) |
| P5 | Scaffold | `login` / `init` / `quickstart` |

### Affordances

| # | Place | Affordance | NEW? | Wires |
|---|---|---|---|---|
| U1 | P1 | `~/.omnigraph/config.yaml` (operator edits) | **N** | → N1 |
| U2 | P1 | project `./omnigraph.yaml` | — | → N1 |
| U4 | P3 | `omnigraph <verb> --graph <name>` (any command) | — | → N14 |
| U5 | P5 | `omnigraph login [<server>]` | **N** | → N11 |
| U6 | P5 | `omnigraph init` / `quickstart [--template]` | partly | → N12/N13 |
| U7 | P2 | `omnigraph use` / `config view --resolved --show-origin` | **N** | → N10 |
| N0 | P2 | **`omnigraph-config` crate** — shared schema, loader, resolver, auth | **N** | hosts N1–N9 |
| N1 | P2 | `load_layered_config()` — global (N3) + state (N3b) + project (cwd), `deny_unknown_fields` | **N** | → N2 |
| N2 | P2 | **merge engine** — deep-merge settings; replace named-resource entries/lists; retain per-field origin | **N⚠️** | → N5, N10 |
| N3 | P2 | global-dir resolver — `OMNIGRAPH_CONFIG` / `OMNIGRAPH_HOME` else `~/.omnigraph/` | **N** | → N1 |
| N3b | P2 | active-context state — `~/.omnigraph/state/active.yaml` | **N** | → N1 |
| N5 | P2 | `resolve_graph(name, merged)` — three-tier (§1) → typed `GraphLocator`; rejects invalid role/field combos | **N⚠️** | → N6 |
| N6 | P3 | `GraphConn` — `Embedded(engine)` \| `Remote(http)` dispatch | **N⚠️** | → N7, N8 |
| N7 | P3 | embedded path — `Omnigraph::open(storage)` (existing) | — | → engine |
| N8 | P3 | HTTP-client path — **rewire existing reqwest calls to `/graphs/{id}/…`; migrate off `/read`,`/change`** | **extend** | → P4, N9 |
| N9 | P2 | `resolve_auth(server)` — method×source (§6): explicit `auth:` else implicit chain keyed by name (reuses `OMNIGRAPH_BEARER_TOKEN`); **enforces the §7 credential trust model (trusted-origin + endpoint-binding) before releasing a token** | **N⚠️** | → N8 |
| N10 | P2 | `config view` handler — merged + per-field origin (needs N2) | **N** | → U7 |
| N11 | P5 | `login` handler — interactive auth (bearer; OAuth device flow in V6) → keychain / `credentials` (0600) + `.gitignore` | **N⚠️** | → S_global |
| N12 | P5 | `init` handler — `scaffold_config_if_missing`; refuse-if-exists / `--force` | partly | → S_project |
| N13 | P5 | `quickstart` handler — scaffold + `--template` + seed + serve + agent prompt | **N⚠️** | → S_project |
| N14 | P3 | agent-mode wrapper — `OMNIGRAPH_AGENT_MODE`: JSON, structured errors, never-prompt, typed exit codes | **N⚠️** | → N1 |
| N15 | P4 | **server route unification** — `serve.graphs` selects served set; canonical `graph_id` per graph; always mount `/graphs/{id}/…`; flat = compat alias only when one graph served; `GET /graphs` lists served set | **N⚠️** | → P4 |

### Slices (vertical, each demo-able)

| # | Slice | Demo |
|---|---|---|
| **V0** | **Foundations (no behavior change)** | extract `omnigraph-config` (+ `omnigraph-api-types`); add `version:` + `deny_unknown_fields`; build the layered-config fixture harness + keychain `SecretStore` seam; relocate the 11 `config.rs` tests. `cargo test --workspace` green, no functional change. |
| **V1** | **Global layer + merge + `config view`** | Config in `~/.omnigraph/`; `config view --resolved --show-origin` from any dir → merged result with per-field origin; embedded commands work global-first with no project file |
| **V2** | **Typed locator + route unification + remote client** | Define a `server:` graph (or `server/graph_id`); `query --graph prod` hits the server `curl`-free against `/graphs/{id}/…`; embedded `--graph dev` still local. *Gated on N15.* |
| **V3** | **Auth model + `login` + credential trust model** | `omnigraph login prod` (bearer) → keychain; per-server resolution with the §7 trust model (trusted-origin + endpoint-binding) + the §4 identity rule (the security model); V2 works with no manual env |
| **V4** | **Thin-init hardening + quickstart + templates** | `quickstart --template person-knows` scaffolds + seeds + serves; `init --force` purges |
| **V5** | **Agent-mode** | `OMNIGRAPH_AGENT_MODE=1 omnigraph query …` → JSON + structured errors + typed exit codes; never-prompt |
| **V6** | **OAuth / mTLS (reserved methods)** | implement the reserved `oauth` (device flow, token cache, refresh, OIDC server-side validation) and `mtls`; the enum shape ships in V3, so this is additive |

### Phase detail (sizing, gates, exit)

Sizes from the 2026-06-02 code audit (six parallel validators). **V0** is a prerequisite the original slices folded into "land first."

**V0 — Foundations** *(M–L; gates everything; no behavior change)*
- Extract `omnigraph-config` (schema + `load_config` + resolvers — clean, only std/serde/clap deps). Keep `QueryRegistry` in `omnigraph-server` (compiler-coupled); move only serde types; import `PolicyEngine` from `omnigraph-policy` directly. Decide/extract `omnigraph-api-types` (the `api::*` DTOs) to actually shed the CLI's server dep.
- `version:` + `deny_unknown_fields`, version-gated (no-version = legacy-lenient with compat aliases; `version: 1` = strict).
- Build the two missing test seams — a layered-config fixture harness (`TempHome` + `OMNIGRAPH_HOME`/XDG env isolation) and a keychain `SecretStore` trait + in-memory fake; relocate the 11 `config.rs` tests (`config.rs:567-948`). Record both in `testing.md`.
- Exit: `cargo test --workspace --locked` green; no functional change.

**V1 — Layered config + typed locator** *(L; the long pole)*
- N3 global-dir resolver; N1 layered load; **N2 merge engine + per-field provenance** (replaces the single `base_dir` — the hardest net-new piece; it gates both `config view` *and* the §7 trusted-origin rule); N3b active-context state + `omnigraph use`.
- Typed `GraphLocator` + `resolve_graph` (§1); rewrite the ~17 dispatch sites; delete `is_remote_uri` (`main.rs:686`).
- Schema reshape: `cli:`→`defaults:`, `server:`→`serve:`, `uri:`→`storage:` (string-or-block; region/endpoint, **profile scoped out**), remove top-level `policy:`/`queries:` (delete the coherence machinery `config.rs:356-421`), drop `project.name`. Fix `resolve_policy_tooling_graph_selection`.
- `--graph` canonical + `--target` alias (extract a shared `GraphArgs` first — the flag is duplicated 23×); `config view --resolved --show-origin`; migrate `scaffold_config_if_missing` (`main.rs:1547`) to `version: 1`.
- Exit: CLI works global-first with no project file; embedded behavior unchanged.

**V2 — Route unification + remote client** *(L; closes the substantive gap; gated on V0 server-side, V1 client-side)*
- Server: add `serve.graphs`; **unwind the `Single`/`Multi` bifurcation** (`GraphRouting`/`ServerConfigMode` + ~4 branch sites) into one registry; always `.nest("/graphs/{graph_id}",…)` (`lib.rs:1170-1175`); flat = compat alias when one graph served; `GET /graphs` served set (403-by-default without `serve.policy`); resolve the wire-vs-Cedar `graph_id` decision (§9).
- Client (N8): `remote_url` takes `graph_id` → `/graphs/{id}/…`; `/read`→`/query`, `/change`→`/mutate` (drop `legacy_change_request_body`); locator guards for `load`/`lint`/`schema plan`/`optimize`/`cleanup`.
- Engine: thread `storage.region`/`endpoint` → `Omnigraph::open` → `namespace.rs:228,376` + `S3StorageAdapter` (`storage.rs:284`).
- OpenAPI/SDK: regen `openapi.json` (`OMNIGRAPH_UPDATE_OPENAPI=1`), rewrite the exact allow-lists (`openapi.rs:162,1120`), re-vendor `omnigraph-ts` (its `transport.ts` is already prefixed — runtime aligns, op-id names churn).
- Tests: **make `system_remote.rs` hermetic** (it is entirely `#[ignore]`'d today — the central gap-closer has zero enforced coverage); route-mode matrix; legacy `/graphs/{gid}` URI-split migration.

**V3 — Credential trust model + login** *(L; the security phase; needs V1 provenance)*
- `servers:` + `Auth` union (`bearer`/`none` impl; `oauth`/`mtls` reserved-error) × `SecretSource`; `resolve_auth` keyed by server name (`rust-ini` reusable from the lock tree); **trusted-origin rule** (unblocked by V1 provenance) + **endpoint-binding**; reject project-layer `servers.auth`/`command`. `omnigraph login` (bearer → keychain via `keyring` 4.0.1, feature-gated, headless graceful-degrade — **check MSRV 1.88** against the toolchain); `serve.auth.bearer.enabled`; `OMNIGRAPH_SERVER` env floor.

**V4 — Init/quickstart** *(S–M)* — `quickstart --template`, `init --force`. **V5 — Agent-mode** *(S–M)* — `OMNIGRAPH_AGENT_MODE`. **V6 — OAuth/mTLS** *(L; deferred)* — client `oauth2`/`openidconnect` + device flow + token cache/refresh; server OIDC/JWKS via `jsonwebtoken` (already in the lock tree); `AuthSource::Oidc` is already reserved (`identity.rs:163`).

### Critical path & parallelization

```
V0 (crate + api-types + version gate + test seams)
        ├──────────────► V2-server (serve.graphs + route unwind)   ← needs only serve.graphs; develop alongside V1
        │                         │
V1 (N2 provenance + typed locator + schema reshape + config view + --graph)
        │                         │
        ├────► V2-client (remote rewire) ── gated on V2-server ────┘
        ├────► V3 (auth union + trusted-origin[needs N2] + login + keychain)
        └────► V4, V5 (ride V1)          V6 (rides V3; large, independent)
```

Long poles: **N2 merge+provenance**, the **typed-locator rewrite**, the **server Single/Multi unwind**. Startable early in parallel: V2-server (server-only), the `storage:` engine threading, and the mechanical `--graph` rename.

### Validation findings (2026-06-02 code audit)

Six parallel validators confirmed the RFC's code claims and surfaced these plan-shaping facts (folded into the phases above):
1. Config extraction alone does not shed Axum from the CLI — it also imports `api::*` DTOs → V0 adds `omnigraph-api-types`.
2. N2 merge+provenance gates both `config view` and the trusted-origin rule → it is the V1 linchpin; the auth trust model cannot precede config layering.
3. Route unification is not green-field — it unwinds the deliberate `Single`/`Multi` split and forces an `openapi.json` regen + `omnigraph-ts` re-vendor (SDK runtime already prefixed; op-ids churn).
4. `storage.profile` is env-only in Lance and omnigraph → scoped out of v1; `region`/`endpoint` are feasible now (Lance accepts per-dataset `storage_options`).
5. `system_remote.rs` is entirely `#[ignore]`'d → V2 must make it hermetic or rewrites land green-then-break.
6. Two test seams (layered-config fixtures, keychain) are missing and on the critical path → built in V0.

## Rollout

**V0 → V1 → V2 → V3 → V4 → V5 → V6.** V0–V1 are the foundation; V2 closes the substantive client→server gap (gated on server route unification, N15); **V3 lands the auth model and the credential-redirection security fix (a gate, not optional polish)**; V4–V5 are ergonomics; V6 implements the reserved auth methods. (`OMNIGRAPH_BIND` is a small additive server task — the binary honors `--bind`/`server.bind` only, `lib.rs:899` — not a prerequisite.) Evaluate after V2 against early-adopter and agent-onboarding signal.

## Prior art

- kubeconfig (clusters / users / contexts; `KUBECONFIG`; `kubectl config view`; `current-context`)
- Helix CLI v2 (`helix.toml` local+enterprise blocks; `~/.helix/config`; `~/.helix/credentials`)
- AWS CLI (`~/.aws/config` + `~/.aws/credentials` split; named profiles; `credential_process`)
- gh / kubelogin (OAuth device flow; keychain token storage)
- git (`~/.gitconfig` + `.git/config`; `--show-origin`)
- Cargo (`Cargo.toml` manifest + `~/.cargo/config.toml` + `~/.cargo/credentials.toml`)
- Supabase / Prisma (one project manifest; connection via `DATABASE_URL` env)
- 12-factor app (config that varies by deploy lives in the environment)