omnigraph/docs/dev/rfc-008-deprecate-omnigraph-yaml.md
aaltshuler 320311e759 docs(rfc): RFC-008 — deprecate omnigraph.yaml, one concern per config surface
The file is three unrelated concerns wearing one filename — server
deployment config, project/CLI conveniences, operator identity — and the
mixture is the root cause of a recurring problem class (per-operator
copies of project files, checkout-supplied credential redirection, init
scaffold pollution). End state: two single-owner surfaces — cluster
config (team, repo) and operator config (person, $HOME) — plus the
zero-config flags/env tier.

Complete key-by-key migration map over the verified OmnigraphConfig
surface; staged retirement per the repo's Hyrum rules (warn with per-key
guidance -> `config migrate` tool -> stop scaffolding -> opt-in strict ->
removal at the next major). RFC-007's project-layer framing is amended to
transitional accordingly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:33:19 +03:00

174 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# RFC: Deprecate `omnigraph.yaml` — One Concern per Config Surface
**Status:** Proposed
**Date:** 2026-06-11
**Builds on:** [rfc-007-operator-config.md](rfc-007-operator-config.md) (the
operator layer that absorbs the identity/credential keys),
[rfc-005-server-cluster-boot.md](rfc-005-server-cluster-boot.md) (Landed —
cluster-booted serving), RFC-006 storage roots (landed: #186/#190/#194).
**Supersedes in part:** RFC-007's "project layer" framing (§Relationship
below) and [rfc-002-config-cli-architecture.md](rfc-002-config-cli-architecture.md)'s
assumption that `omnigraph.yaml` remains the project manifest.
**Target release:** staged; final removal at the next major (see Sequencing).
## Summary
Retire `omnigraph.yaml`. It is three unrelated concerns wearing one
filename — server deployment config, project/CLI conveniences, and operator
identity — and the mixture is not a cosmetic wart but the root cause of a
recurring class of problems: operators keeping personal copies of "project"
files, repo checkouts able to carry credential-adjacent keys (the #139
security findings), `omnigraph init` scaffolding config into unrelated
directories, and every config discussion needing a paragraph to establish
which of the three files is meant.
The end state is **two config surfaces with single owners**:
| Surface | Owner | Declares |
|---|---|---|
| **Cluster config** (`cluster.yaml` + catalog) | the team, in a repo | what the system *is*: graphs, schemas, queries, policies, storage |
| **Operator config** (`~/.omnigraph/`) | one person, in `$HOME` | who *I* am: identity, credentials, known servers, ergonomics |
plus **flags/env** for the zero-config tier (one graph, one server, no
control plane) — which already works today with no file at all.
`omnigraph.yaml` has no role left once every key has a better home. This
RFC gives each key that home, and stages the retirement so that no working
setup breaks without a loud warning, a migration command, and a full
deprecation cycle first.
## Motivation
- **It breaks the ownership logic.** A config file must have one owner. A
file that carries `graphs:` (team-owned, reviewable) next to `cli.actor`
(one person's identity) and `auth.env_file` (credential loading) can be
neither safely committed nor sensibly personal. Every real deployment
this cycle tripped on it: per-operator copies in `~/exp/intel`,
graph-scoped alias URIs that only make sense per-person, the #139
findings where a checkout-supplied file could redirect tokens.
- **The cluster made it redundant.** Since RFC-005/006, a cluster
deployment serves from the applied catalog — `--cluster` mode does not
read `omnigraph.yaml` *at all*. Stored queries, policies, bindings, and
graph addressing all have authoritative homes. What remains in
`omnigraph.yaml` for cluster users is dead weight that can silently
disagree with what is actually serving.
- **Two declarative dialects is one too many.** `cluster.yaml` and
`omnigraph.yaml` both declare graphs/queries/policies with different
schemas, different validation strictness, and different lifecycle
guarantees. Maintaining, documenting, and testing both — and explaining
when each applies — is a permanent tax (the "programming integrated over
time" lens says: this forks on every config-surface change).
## Non-Goals
- **Breaking anyone now.** Every `omnigraph.yaml` that works today keeps
working through the entire deprecation window, with warnings.
- **Retiring the zero-config tier.** `omnigraph-server s3://bucket/g.omni
--bind …` plus env vars stays first-class forever — that tier needs *no*
file, which is the point.
- **Forcing the control plane on single-graph users.** The migration target
for a multi-graph yaml deployment is a *minimal* cluster (file-rooted,
no bucket required, `cluster.yaml` barely longer than the `graphs:` map
it replaces) — but a single graph never needs even that.
- **Touching `cluster.yaml`** — its schema and strictness are unchanged.
## Where every key goes (the complete migration map)
The full `OmnigraphConfig` surface (verified against
`crates/omnigraph-server/src/config.rs:182-207`):
| `omnigraph.yaml` key | Concern | New home |
|---|---|---|
| `graphs.<name>.uri` | what exists / where | `cluster.yaml` `graphs:` (storage-root-derived) — or a flag/env for the zero-config tier |
| `graphs.<name>.queries`, top-level `queries:` | what exists | cluster catalog (`.gq` discovery, RFC-004/#183) |
| `graphs.<name>.policy.file`, top-level `policy.file`, `server.policy.file` | what's enforced | `cluster.yaml` `policies:` + `applies_to` bindings |
| `server.bind` | deployment runtime | `--bind` / env (already authoritative; the key is a default) |
| `server.graph` | deployment runtime | `--target`-style flag / env in the zero-config tier; meaningless under cluster boot |
| `graphs.<name>.bearer_token_env`, `auth.env_file` | credentials | operator credentials chain (RFC-007 §D4) |
| `cli.actor` | identity | `operator.actor` (RFC-007 §D3) |
| `cli.output_format`, `cli.table_*` | personal ergonomics | `defaults:` in operator config (RFC-007 §D2) |
| `cli.graph`, `cli.branch` | personal targeting | operator config: named servers + a per-operator default target (RFC-007 PR 3) |
| `aliases.<name>` | personal ergonomics over shared queries | operator config `aliases:` — the *queries* they invoke are cluster-owned; the *shorthand* is personal |
| `query.roots` | discovery convenience | obsolete — cluster query discovery (#183) replaced it |
| `project.name` | label | dropped (the cluster's `metadata.name` is the deployment label) |
Two placements worth defending:
- **Aliases are operator config, not cluster config.** The stored query is
the shared contract (catalog-owned, digest-pinned); an alias is one
person's shorthand with their favorite default params and target. Putting
aliases in the cluster would force team review on personal ergonomics;
leaving them per-directory recreates today's problem. Per-operator,
keyed by server/graph name, is the AWS-profile shape.
- **Multi-graph serving without a control plane migrates to a minimal
cluster, not to a new file.** The honest cost: `cluster import` + `apply`
once, on a `file://` root next to the graphs. The honest benefit: one
declarative dialect, one validation path, one serving source — and the
upgrade path to buckets/approvals is a one-line `storage:` change instead
of a re-platform.
## Deprecation mechanics
Per Hyrum's Law (the repo's own deny-list: shipped observable behavior is
contract), retirement is staged, loud, and tooled:
1. **Warn.** Loading `omnigraph.yaml` emits a one-line deprecation notice
naming the replacement for each key actually present in the file (not a
generic banner — the migration map above, applied to *your* file).
Suppressible per-process (`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION=1`) for
CI logs during the window.
2. **Migrate.** `omnigraph config migrate` reads an existing
`omnigraph.yaml` and writes the split: the team half as a ready-to-review
`cluster.yaml` (+ moves query/policy files into the checkout layout),
the personal half merged into `~/.omnigraph/config.yaml` — printing a
diff-style summary and touching nothing without `--write`. The command
is the test of the migration map's completeness: any key it cannot
place is a bug in this RFC.
3. **Stop scaffolding.** `omnigraph init` stops generating
`omnigraph.yaml` (it currently scaffolds one into cwd — the source of
the test-pollution bug). `omnigraph cluster init` (new, small) scaffolds
a minimal `cluster.yaml` instead.
4. **Opt-in strict.** `OMNIGRAPH_NO_LEGACY_CONFIG=1` turns the warning into
an error — for teams that finished migrating and want regressions caught.
5. **Remove at the next major.** Loading the file becomes an error pointing
at `config migrate`. The `OmnigraphConfig` code path, the dual
query-registry loaders, and the yaml-mode server boot source are deleted
— the payoff that makes the whole exercise worth it.
Stages 13 can land in one release once RFC-007 PRs 12 exist (the operator
layer must exist before anything can migrate *to* it). Stage 4 the release
after. Stage 5 at the major, with the removal listed in release notes from
stage 1 onward.
## What this deletes, eventually
- The `OmnigraphConfig` struct and its 12-key surface, the
`load_config`/`load_cli_config` pair and its env-side-effect, the
scaffolder, and the legacy resolution paths (`resolve_cli_graph`'s dual
modes — finding #11's root cause).
- The yaml-mode multi-graph server boot (`ServerConfigMode::Multi` keeps
existing — cluster boot constructs it — but its `omnigraph.yaml` source
goes).
- An entire class of documentation ("which file does X go in?") and the
#139 security surface (a checkout cannot hijack what no longer loads).
## Relationship to RFC-007 and RFC-002
RFC-007 ships the operator layer this RFC migrates *to*; its "project
layer" language should be read as transitional — after this RFC, the
project layer **is** the cluster checkout, and RFC-007's PR 3 (project
`server:` references) applies to `cluster.yaml`-adjacent operator targeting
rather than to `omnigraph.yaml`. RFC-002's locator/state-layer work, if
resumed, targets the two-surface world directly. RFC-002's file-naming
decisions (`~/.omnigraph/` as the one dir) are unaffected.
## Open questions
- **Window length**: one minor release between warn (stage 1) and strict
(stage 4), or two? Cookbooks, skills, and the deployment docs all need
the same pass; the migration command makes a short window defensible.
- **`omnigraph login` vs `config migrate` ordering** — both write
`~/.omnigraph/`; whichever lands first establishes the file-locking and
atomic-write helpers the other reuses.
- **Does the MCP server config** (RFC-003) reference `omnigraph.yaml`
anywhere that needs the same treatment? To be audited in stage 1.