From 320311e75958569aba00b136647ab5f58cb7472a Mon Sep 17 00:00:00 2001 From: aaltshuler Date: Thu, 11 Jun 2026 19:33:19 +0300 Subject: [PATCH] =?UTF-8?q?docs(rfc):=20RFC-008=20=E2=80=94=20deprecate=20?= =?UTF-8?q?omnigraph.yaml,=20one=20concern=20per=20config=20surface?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The file is three unrelated concerns wearing one filename — server deployment config, project/CLI conveniences, operator identity — and the mixture is the root cause of a recurring problem class (per-operator copies of project files, checkout-supplied credential redirection, init scaffold pollution). End state: two single-owner surfaces — cluster config (team, repo) and operator config (person, $HOME) — plus the zero-config flags/env tier. Complete key-by-key migration map over the verified OmnigraphConfig surface; staged retirement per the repo's Hyrum rules (warn with per-key guidance -> `config migrate` tool -> stop scaffolding -> opt-in strict -> removal at the next major). RFC-007's project-layer framing is amended to transitional accordingly. Co-Authored-By: Claude Fable 5 --- docs/dev/index.md | 1 + docs/dev/rfc-007-operator-config.md | 12 +- docs/dev/rfc-008-deprecate-omnigraph-yaml.md | 174 +++++++++++++++++++ 3 files changed, 186 insertions(+), 1 deletion(-) create mode 100644 docs/dev/rfc-008-deprecate-omnigraph-yaml.md diff --git a/docs/dev/index.md b/docs/dev/index.md index 7e50777..b23326b 100644 --- a/docs/dev/index.md +++ b/docs/dev/index.md @@ -77,6 +77,7 @@ Working documents for in-flight feature work. Removed when the work lands. | Cluster graph & schema apply — Phase 4 sidecars, roll-forward recovery, approval artifacts | [rfc-004-cluster-graph-schema-apply.md](rfc-004-cluster-graph-schema-apply.md) | | Server boots from cluster state — Phase 5 mode switch, applied-revision serving | [rfc-005-server-cluster-boot.md](rfc-005-server-cluster-boot.md) | | Per-operator config — `~/.omnigraph/` identity, keyed credentials, named servers (the operator slice of RFC-002) | [rfc-007-operator-config.md](rfc-007-operator-config.md) | +| Deprecate `omnigraph.yaml` — one concern per config surface; key-by-key migration map and staged retirement | [rfc-008-deprecate-omnigraph-yaml.md](rfc-008-deprecate-omnigraph-yaml.md) | ## Boundary diff --git a/docs/dev/rfc-007-operator-config.md b/docs/dev/rfc-007-operator-config.md index 5c6b6b2..d2d9724 100644 --- a/docs/dev/rfc-007-operator-config.md +++ b/docs/dev/rfc-007-operator-config.md @@ -246,7 +246,17 @@ Three PRs, each independently useful, each landable without the next: version) — slice 1 or slice 2? It materially helps debugging precedence, which argues early. -## Relationship to RFC-002 +## Relationship to RFC-002 and RFC-008 + +**RFC-008 supersedes this RFC's "project layer" framing**: with +`omnigraph.yaml` deprecated +([rfc-008-deprecate-omnigraph-yaml.md](rfc-008-deprecate-omnigraph-yaml.md)), +the project layer *is* the cluster checkout. References to project +`omnigraph.yaml` in §D3/§D5 describe the transitional window only; the +trust-boundary rules apply unchanged to whatever the project layer is at a +given stage. Sequencing couples them: RFC-007 PRs 1–2 must land before +RFC-008's migration stages can begin (the operator layer is what keys +migrate *to*). RFC-002 remains the umbrella architecture. This RFC implements its §2 (layered config, global-first), §4 (file naming / one dir), and §5 diff --git a/docs/dev/rfc-008-deprecate-omnigraph-yaml.md b/docs/dev/rfc-008-deprecate-omnigraph-yaml.md new file mode 100644 index 0000000..49e2c4b --- /dev/null +++ b/docs/dev/rfc-008-deprecate-omnigraph-yaml.md @@ -0,0 +1,174 @@ +# RFC: Deprecate `omnigraph.yaml` — One Concern per Config Surface + +**Status:** Proposed +**Date:** 2026-06-11 +**Builds on:** [rfc-007-operator-config.md](rfc-007-operator-config.md) (the +operator layer that absorbs the identity/credential keys), +[rfc-005-server-cluster-boot.md](rfc-005-server-cluster-boot.md) (Landed — +cluster-booted serving), RFC-006 storage roots (landed: #186/#190/#194). +**Supersedes in part:** RFC-007's "project layer" framing (§Relationship +below) and [rfc-002-config-cli-architecture.md](rfc-002-config-cli-architecture.md)'s +assumption that `omnigraph.yaml` remains the project manifest. +**Target release:** staged; final removal at the next major (see Sequencing). + +## Summary + +Retire `omnigraph.yaml`. It is three unrelated concerns wearing one +filename — server deployment config, project/CLI conveniences, and operator +identity — and the mixture is not a cosmetic wart but the root cause of a +recurring class of problems: operators keeping personal copies of "project" +files, repo checkouts able to carry credential-adjacent keys (the #139 +security findings), `omnigraph init` scaffolding config into unrelated +directories, and every config discussion needing a paragraph to establish +which of the three files is meant. + +The end state is **two config surfaces with single owners**: + +| Surface | Owner | Declares | +|---|---|---| +| **Cluster config** (`cluster.yaml` + catalog) | the team, in a repo | what the system *is*: graphs, schemas, queries, policies, storage | +| **Operator config** (`~/.omnigraph/`) | one person, in `$HOME` | who *I* am: identity, credentials, known servers, ergonomics | + +plus **flags/env** for the zero-config tier (one graph, one server, no +control plane) — which already works today with no file at all. + +`omnigraph.yaml` has no role left once every key has a better home. This +RFC gives each key that home, and stages the retirement so that no working +setup breaks without a loud warning, a migration command, and a full +deprecation cycle first. + +## Motivation + +- **It breaks the ownership logic.** A config file must have one owner. A + file that carries `graphs:` (team-owned, reviewable) next to `cli.actor` + (one person's identity) and `auth.env_file` (credential loading) can be + neither safely committed nor sensibly personal. Every real deployment + this cycle tripped on it: per-operator copies in `~/exp/intel`, + graph-scoped alias URIs that only make sense per-person, the #139 + findings where a checkout-supplied file could redirect tokens. +- **The cluster made it redundant.** Since RFC-005/006, a cluster + deployment serves from the applied catalog — `--cluster` mode does not + read `omnigraph.yaml` *at all*. Stored queries, policies, bindings, and + graph addressing all have authoritative homes. What remains in + `omnigraph.yaml` for cluster users is dead weight that can silently + disagree with what is actually serving. +- **Two declarative dialects is one too many.** `cluster.yaml` and + `omnigraph.yaml` both declare graphs/queries/policies with different + schemas, different validation strictness, and different lifecycle + guarantees. Maintaining, documenting, and testing both — and explaining + when each applies — is a permanent tax (the "programming integrated over + time" lens says: this forks on every config-surface change). + +## Non-Goals + +- **Breaking anyone now.** Every `omnigraph.yaml` that works today keeps + working through the entire deprecation window, with warnings. +- **Retiring the zero-config tier.** `omnigraph-server s3://bucket/g.omni + --bind …` plus env vars stays first-class forever — that tier needs *no* + file, which is the point. +- **Forcing the control plane on single-graph users.** The migration target + for a multi-graph yaml deployment is a *minimal* cluster (file-rooted, + no bucket required, `cluster.yaml` barely longer than the `graphs:` map + it replaces) — but a single graph never needs even that. +- **Touching `cluster.yaml`** — its schema and strictness are unchanged. + +## Where every key goes (the complete migration map) + +The full `OmnigraphConfig` surface (verified against +`crates/omnigraph-server/src/config.rs:182-207`): + +| `omnigraph.yaml` key | Concern | New home | +|---|---|---| +| `graphs..uri` | what exists / where | `cluster.yaml` `graphs:` (storage-root-derived) — or a flag/env for the zero-config tier | +| `graphs..queries`, top-level `queries:` | what exists | cluster catalog (`.gq` discovery, RFC-004/#183) | +| `graphs..policy.file`, top-level `policy.file`, `server.policy.file` | what's enforced | `cluster.yaml` `policies:` + `applies_to` bindings | +| `server.bind` | deployment runtime | `--bind` / env (already authoritative; the key is a default) | +| `server.graph` | deployment runtime | `--target`-style flag / env in the zero-config tier; meaningless under cluster boot | +| `graphs..bearer_token_env`, `auth.env_file` | credentials | operator credentials chain (RFC-007 §D4) | +| `cli.actor` | identity | `operator.actor` (RFC-007 §D3) | +| `cli.output_format`, `cli.table_*` | personal ergonomics | `defaults:` in operator config (RFC-007 §D2) | +| `cli.graph`, `cli.branch` | personal targeting | operator config: named servers + a per-operator default target (RFC-007 PR 3) | +| `aliases.` | personal ergonomics over shared queries | operator config `aliases:` — the *queries* they invoke are cluster-owned; the *shorthand* is personal | +| `query.roots` | discovery convenience | obsolete — cluster query discovery (#183) replaced it | +| `project.name` | label | dropped (the cluster's `metadata.name` is the deployment label) | + +Two placements worth defending: + +- **Aliases are operator config, not cluster config.** The stored query is + the shared contract (catalog-owned, digest-pinned); an alias is one + person's shorthand with their favorite default params and target. Putting + aliases in the cluster would force team review on personal ergonomics; + leaving them per-directory recreates today's problem. Per-operator, + keyed by server/graph name, is the AWS-profile shape. +- **Multi-graph serving without a control plane migrates to a minimal + cluster, not to a new file.** The honest cost: `cluster import` + `apply` + once, on a `file://` root next to the graphs. The honest benefit: one + declarative dialect, one validation path, one serving source — and the + upgrade path to buckets/approvals is a one-line `storage:` change instead + of a re-platform. + +## Deprecation mechanics + +Per Hyrum's Law (the repo's own deny-list: shipped observable behavior is +contract), retirement is staged, loud, and tooled: + +1. **Warn.** Loading `omnigraph.yaml` emits a one-line deprecation notice + naming the replacement for each key actually present in the file (not a + generic banner — the migration map above, applied to *your* file). + Suppressible per-process (`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION=1`) for + CI logs during the window. +2. **Migrate.** `omnigraph config migrate` reads an existing + `omnigraph.yaml` and writes the split: the team half as a ready-to-review + `cluster.yaml` (+ moves query/policy files into the checkout layout), + the personal half merged into `~/.omnigraph/config.yaml` — printing a + diff-style summary and touching nothing without `--write`. The command + is the test of the migration map's completeness: any key it cannot + place is a bug in this RFC. +3. **Stop scaffolding.** `omnigraph init` stops generating + `omnigraph.yaml` (it currently scaffolds one into cwd — the source of + the test-pollution bug). `omnigraph cluster init` (new, small) scaffolds + a minimal `cluster.yaml` instead. +4. **Opt-in strict.** `OMNIGRAPH_NO_LEGACY_CONFIG=1` turns the warning into + an error — for teams that finished migrating and want regressions caught. +5. **Remove at the next major.** Loading the file becomes an error pointing + at `config migrate`. The `OmnigraphConfig` code path, the dual + query-registry loaders, and the yaml-mode server boot source are deleted + — the payoff that makes the whole exercise worth it. + +Stages 1–3 can land in one release once RFC-007 PRs 1–2 exist (the operator +layer must exist before anything can migrate *to* it). Stage 4 the release +after. Stage 5 at the major, with the removal listed in release notes from +stage 1 onward. + +## What this deletes, eventually + +- The `OmnigraphConfig` struct and its 12-key surface, the + `load_config`/`load_cli_config` pair and its env-side-effect, the + scaffolder, and the legacy resolution paths (`resolve_cli_graph`'s dual + modes — finding #11's root cause). +- The yaml-mode multi-graph server boot (`ServerConfigMode::Multi` keeps + existing — cluster boot constructs it — but its `omnigraph.yaml` source + goes). +- An entire class of documentation ("which file does X go in?") and the + #139 security surface (a checkout cannot hijack what no longer loads). + +## Relationship to RFC-007 and RFC-002 + +RFC-007 ships the operator layer this RFC migrates *to*; its "project +layer" language should be read as transitional — after this RFC, the +project layer **is** the cluster checkout, and RFC-007's PR 3 (project +`server:` references) applies to `cluster.yaml`-adjacent operator targeting +rather than to `omnigraph.yaml`. RFC-002's locator/state-layer work, if +resumed, targets the two-surface world directly. RFC-002's file-naming +decisions (`~/.omnigraph/` as the one dir) are unaffected. + +## Open questions + +- **Window length**: one minor release between warn (stage 1) and strict + (stage 4), or two? Cookbooks, skills, and the deployment docs all need + the same pass; the migration command makes a short window defensible. +- **`omnigraph login` vs `config migrate` ordering** — both write + `~/.omnigraph/`; whichever lands first establishes the file-locking and + atomic-write helpers the other reuses. +- **Does the MCP server config** (RFC-003) reference `omnigraph.yaml` + anywhere that needs the same treatment? To be audited in stage 1.