2026-06-08 20:07:39 +03:00
|
|
|
# Cluster Config
|
|
|
|
|
|
docs(cluster): operator how-to guide for deploying and managing clusters
New docs/user/cluster.md — the practical companion to cluster-config.md's
reference: zero-to-served walkthrough (validate/import/plan/apply, derived
roots, data loading, --cluster serving), the day-2 edit->plan->apply->restart
loop with a per-change-kind table, drift observation and convergence, the
approval gate for destructive changes, crash/lock/lost-ledger recovery, the
boot-refusal table with remedies, deployment patterns (replicas, backup
unit, CI gating), and the explicit not-yet list (hot reload, S3-hosted
cluster dirs, per-query exposure, pipelines). Linked from the user index,
the agent guide's topic map, and cross-linked from the reference.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:10:19 +03:00
|
|
|
> New to the cluster tooling? Start with the operator how-to guide,
|
docs(user): restructure user docs into topic sections (Phase 1) (#223)
Move the 23 flat docs/user/*.md files into topic subdirectories so the
user guide is organized by area (schema, queries, search, branching, cli,
operations, clusters, concepts, reference) instead of a flat list. This is
a pure structural move — whole files relocated, every cross-doc link
recomputed, no prose rewrites or content splits (those follow in Phase 2).
- 19 `git mv`s (install.md, deployment.md stay top-level); history preserved
(renames detected at 92–100% similarity).
- All intra-doc links, AGENTS.md's topic table (52 pointers), and the
docs/dev + docs/releases back-links recomputed via relpath from each
file's new location.
- docs/user/index.md rewritten as a sectioned nav hub.
- Fixed 5 doc-path references in Rust (comments + two user-facing server
settings error strings) to point at the new locations.
Verified: zero broken .md links across tracked docs; check-agents-md.sh
green (with the untracked scratch docs set aside); touched crates build.
Note: the public site (omnigraph-web) imports docs/ via a flat-only script;
its import-docs.mjs needs a subdir-aware update before the next re-sync.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 13:52:14 +03:00
|
|
|
> [cluster.md](index.md) — this document is the reference.
|
docs(cluster): operator how-to guide for deploying and managing clusters
New docs/user/cluster.md — the practical companion to cluster-config.md's
reference: zero-to-served walkthrough (validate/import/plan/apply, derived
roots, data loading, --cluster serving), the day-2 edit->plan->apply->restart
loop with a per-change-kind table, drift observation and convergence, the
approval gate for destructive changes, crash/lock/lost-ledger recovery, the
boot-refusal table with remedies, deployment patterns (replicas, backup
unit, CI gating), and the explicit not-yet list (hot reload, S3-hosted
cluster dirs, per-query exposure, pipelines). Linked from the user index,
the agent guide's topic map, and cross-linked from the reference.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:10:19 +03:00
|
|
|
|
2026-06-08 20:07:39 +03:00
|
|
|
Cluster config is the future control-plane configuration surface for a whole
|
|
|
|
|
OmniGraph deployment. In this stage, OmniGraph can validate a local
|
2026-06-08 23:18:44 +03:00
|
|
|
`cluster.yaml` folder, produce a deterministic read-only plan, inspect the
|
2026-06-09 23:36:33 +03:00
|
|
|
local JSON state ledger, explicitly refresh/import graph observations into
|
|
|
|
|
that ledger, manually remove a held local state lock by exact lock id, and
|
2026-06-10 05:00:42 +03:00
|
|
|
**apply the executable subset of the plan** — stored-query and policy-bundle
|
2026-06-10 13:14:20 +03:00
|
|
|
catalog writes, **graph creation** (a declared graph that does not exist yet
|
2026-06-10 14:35:50 +03:00
|
|
|
is initialized by apply at the derived root), **schema updates** (soft drops
|
|
|
|
|
only), and — behind an explicit, digest-bound **approval** — **graph
|
|
|
|
|
deletion**. It does not perform data-loss schema migrations, start servers,
|
2026-06-16 04:02:08 +03:00
|
|
|
or run data loads. A server can boot from the applied ledger with
|
|
|
|
|
`omnigraph-server --cluster <config-dir | storage-root>`.
|
2026-06-08 20:07:39 +03:00
|
|
|
|
|
|
|
|
## Commands
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-06-11 00:46:21 +03:00
|
|
|
omnigraph cluster validate --config company-brain
|
|
|
|
|
omnigraph cluster plan --config company-brain --json
|
|
|
|
|
omnigraph cluster apply --config company-brain --json
|
|
|
|
|
omnigraph cluster approve graph.<id> --config company-brain --as <actor>
|
|
|
|
|
omnigraph cluster status --config company-brain --json
|
|
|
|
|
omnigraph cluster refresh --config company-brain --json
|
|
|
|
|
omnigraph cluster import --config company-brain --json
|
|
|
|
|
omnigraph cluster force-unlock <LOCK_ID> --config company-brain --json
|
2026-06-08 20:07:39 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`--config` points at a directory, not a file. The directory must contain
|
|
|
|
|
`cluster.yaml`. When omitted, it defaults to the current directory.
|
|
|
|
|
|
feat!: delete the legacy OmnigraphConfig + config migrate; finish the omnigraph.yaml docs sweep (#252)
* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI
The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.
No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: remove the `config migrate` command and migrate.rs
`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.
- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(server)!: delete the legacy OmnigraphConfig type and load_config
With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).
- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
only caller was its own test) is removed — server boot and the CLI both build
registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
(`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).
No behavior change — every deleted item was unreachable after the excision.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs: purge the legacy omnigraph.yaml surface from the docs
Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.
- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
two-surface-config passage (drop `config migrate`, the deprecation env vars,
and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
file)" section and the `config migrate` row; re-home the `policy` row, the
bearer-token chain, the actor/format/param-precedence references, and the
`--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
cluster (`--cluster` + `cluster apply`); fix the policy examples to
`--cluster`; replace the `## Config` omnigraph.yaml example with the
operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
`policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
references.
RFCs, design specs, and prior release notes are left as historical records.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:31:29 +03:00
|
|
|
## Relationship to `~/.omnigraph/config.yaml`
|
docs(cluster): axiom 15 — single ownership, mode-switch migration, per-operator layer (#164)
Encode the omnigraph.yaml ↔ cluster.yaml coexistence rules that were implicit
across the specs:
- cluster-axioms.md: new axiom 15 — every fact has exactly one owner at a time;
coexistence is a mode switch, never a merge; omnigraph.yaml's job description
shrinks to the permanent per-operator layer. Added review-tension bullet.
- cluster-config-specs.md: "Migration model" subsection (three coexistence
windows: no-conflict, Phase-5 mode switch, bridges-with-sunsets) and a
"per-operator layer" completeness table (connection, credential reference,
active context, ergonomics, personal aliases) with its global-config-dir
destination per the RFC-002 direction.
- cluster-config-implementation-spec.md: Compatibility Stance #7–#9 (single
ownership, shrinking role, bridges carry sunsets); Phase 5 boot is an
exclusive XOR mode switch; fixed the duplicated recoveries/recovery dirs in
the Phase-1 storage layout.
- docs/user/cluster-config.md: "Relationship to omnigraph.yaml" section in
current-reality terms (cluster catalog is inspectable, not live).
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:44:51 +03:00
|
|
|
|
feat!: delete the legacy OmnigraphConfig + config migrate; finish the omnigraph.yaml docs sweep (#252)
* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI
The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.
No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: remove the `config migrate` command and migrate.rs
`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.
- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(server)!: delete the legacy OmnigraphConfig type and load_config
With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).
- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
only caller was its own test) is removed — server boot and the CLI both build
registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
(`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).
No behavior change — every deleted item was unreachable after the excision.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs: purge the legacy omnigraph.yaml surface from the docs
Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.
- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
two-surface-config passage (drop `config migrate`, the deprecation env vars,
and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
file)" section and the `config migrate` row; re-home the `policy` row, the
bearer-token chain, the actor/format/param-precedence references, and the
`--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
cluster (`--cluster` + `cluster apply`); fix the policy examples to
`--cluster`; replace the `## Config` omnigraph.yaml example with the
operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
`policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
references.
RFCs, design specs, and prior release notes are left as historical records.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:31:29 +03:00
|
|
|
`cluster.yaml` and the per-operator `~/.omnigraph/config.yaml` never describe
|
|
|
|
|
the same fact. The operator config is the permanent **per-operator** layer
|
|
|
|
|
(the operator's identity and credential references, named servers/clusters,
|
|
|
|
|
profiles, and CLI defaults); `cluster.yaml` is the shared desired state of a
|
2026-06-10 22:30:18 +03:00
|
|
|
whole deployment, read only by the `cluster` commands via `--config`.
|
|
|
|
|
|
|
|
|
|
The exact contract:
|
|
|
|
|
|
feat!: delete the legacy OmnigraphConfig + config migrate; finish the omnigraph.yaml docs sweep (#252)
* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI
The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.
No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: remove the `config migrate` command and migrate.rs
`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.
- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(server)!: delete the legacy OmnigraphConfig type and load_config
With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).
- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
only caller was its own test) is removed — server boot and the CLI both build
registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
(`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).
No behavior change — every deleted item was unreachable after the excision.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs: purge the legacy omnigraph.yaml surface from the docs
Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.
- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
two-surface-config passage (drop `config migrate`, the deprecation env vars,
and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
file)" section and the `config migrate` row; re-home the `policy` row, the
bearer-token chain, the actor/format/param-precedence references, and the
`--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
cluster (`--cluster` + `cluster apply`); fix the policy examples to
`--cluster`; replace the `## Config` omnigraph.yaml example with the
operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
`policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
references.
RFCs, design specs, and prior release notes are left as historical records.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:31:29 +03:00
|
|
|
- **Cluster commands read the operator config for exactly one thing**: the
|
|
|
|
|
`operator.actor` default used by `apply`/`approve` when `--as` is omitted —
|
|
|
|
|
operator identity is a per-operator fact. With `--as` present, the operator
|
|
|
|
|
config is not needed. Nothing else in it influences a cluster command.
|
|
|
|
|
- **No legacy `omnigraph.yaml`**: the CLI does not read `omnigraph.yaml` at
|
|
|
|
|
all, and a `--cluster` server reads only the cluster catalog — boot is
|
|
|
|
|
cluster-only.
|
|
|
|
|
- **The other direction is ergonomics, not coupling**: per-operator
|
feat(cli)!: remove legacy data-plane addressing (--target, positional http→remote, --as-on-served) (#238)
* feat(cli): --server accepts a literal URL (RFC-011 Decision 2)
`resolve_server_flag` now treats a `--server` value containing `://` as a literal
base URL (trailing slash trimmed; `--graph` appends `/graphs/<id>`), bypassing the
operator-config `servers:` registry; a bare name still resolves through the
registry. This is the replacement the upcoming `--uri http(s)://` deprecation
points at, and a small ergonomic win on its own (`--server https://host` with no
config entry). Token resolution for a literal-URL server falls to the legacy
OMNIGRAPH_BEARER_TOKEN chain, same as a positional URL today.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(cli): address the parity-matrix arms with global --store/--server flags
Prep for removing the positional-http→remote dispatch. The parity harness
addressed both arms with a positional graph right after the verb
(`omnigraph <verb> <addr> <args…>`), which only parses for top-level verbs —
for nested subcommands (`schema show`, `branch list`, …) the address landed in
the subcommand slot and BOTH arms failed identically, so the test passed
vacuously (matching exit codes, never comparing output).
Address both arms with the global flags instead — local `--store <graph>`
(embedded), remote `--server <url>` (served) — appended after the verb + args,
valid regardless of nesting. The previously-vacuous nested-verb parity checks
now actually compare embedded vs remote (and pass — parity holds), and the
remote arm no longer relies on the positional-URL dispatch that's about to be
removed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: --as on a served write is a hard error (was a silent no-op)
A served write resolves the actor server-side from the bearer token, so `--as`
could never set identity there — it was silently ignored. It now errors (in the
remote write factory, before any HTTP call), pointing the user at removing `--as`
or writing directly with `--store`. Reads don't carry `--as`, so this is
write-path only. BREAKING for any script that passed `--as` to a remote write
(it was a no-op, so behavior is unchanged except the now-explicit error).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: a positional/--uri http(s):// URL no longer dispatches to a server
Remote graphs must be addressed with `--server <url>` (or a named server / a
profile binding one). A positional or `--uri` `http(s)://` URL on a data verb now
errors instead of silently routing to the remote HTTP client — the scheme no
longer carries transport semantics. The discriminator is `via_server`: a remote
URL produced by a server scope is fine; a remote URL from a positional/`--uri`
source is rejected (`reject_positional_remote` in both GraphClient factories).
Storage verbs are unaffected — they already reject remote URIs through
`resolve_local_graph` with the existing "direct (storage-native)" error.
Migrated the gh-host keyed-credential system test to `--server <url>` (the literal
URL still prefix-matches the operator server for token resolution). BREAKING:
scripts addressing a server by a bare URL must switch to `--server <url>`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: remove the --target flag (use --store / --profile / --server)
Removes the legacy named-graph flag and threads its parameter out of the whole
resolver chain. `--target` resolved a graph name through `omnigraph.yaml`'s
`graphs:` map; its replacements (`--store <uri>`, `--profile <name>`,
`--server <name>`) all ship.
- Drops the 22 `target` clap fields + the `--cluster` exclusion that named it.
- Threads `target`/`cli_target` out of `resolve_uri`/`resolve_cli_graph`/
`resolve_local_graph`/`resolve_local_uri`/`resolve_storage_uri`/
`resolve_remote_bearer_token`/`apply_server_flag`/`execute_query_lint`/
`resolve_selected_graph`/`resolve_registry_selection_for_list`/
`execute_queries_{validate,list}`, the two `GraphClient` factories, and
`ScopeFlags`/`ResolvedScope`.
- Keeps the shared `OmnigraphConfig::resolve_target_uri` 3-arg (server boot uses
it); the CLI passes None for the explicit-target arm. The `cli.graph` default
(omnigraph.yaml bare-command fallback) is unchanged — its removal belongs to
the omnigraph.yaml excision.
- Operator/file aliases that bind a `graph` name still work: the name is now
resolved to a URI inline (a positional URI wins).
- Error messages and `--graph`/`--server`/`--store` help text no longer name
`--target`; the queries-list selection hint points at `cli.graph`.
BREAKING. Tests updated (named-target resolution rewritten onto `cli.graph`;
positional-URI tests unchanged). Full omnigraph-cli suite green (228).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(cli): drop --target and positional-http addressing; --as-on-served is an error
Update the user docs for the legacy data-plane addressing removals:
- the CLI `--target` flag is gone — address graphs with a positional URI,
`--store`, `--profile`, or `--server <name|url>`;
- a positional `http(s)://` URI no longer dispatches to a server (use `--server`);
- `--as` on a served write is now rejected (was a silent no-op).
Touches cli/reference.md (addressing intro, capability table, error examples,
scopes), cli/index.md (the remote-read example → --server), operations/maintenance
+ policy, and the cluster docs' data-plane load guidance. The server's own
`--target` boot flag is unchanged (server.md untouched). Also fixes a pre-existing
broken maintenance link in search/indexes.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(cli): --store is loudly exclusive with a positional URI / --server; test graphs→Served
Address two Greptile findings on the RFC-011 slices:
- Slice A (P1): `--store` combined with a positional URI silently dropped the URI
(`scope.rs` did `store.or(uri)`); `--store` + `--server` errored with a
misleading "positional URI" message. Now both combinations fail loudly with a
declared `--store is exclusive with a positional URI and --server` error.
- Slice B (P2): the `command_capability` unit test never exercised the one
Data→Served refinement (`graphs`); added the assertion so deleting that guard
can't pass silently.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 04:29:16 +03:00
|
|
|
data-plane commands address a cluster graph by its derived storage root
|
|
|
|
|
(`company-brain/graphs/knowledge.omni`) with `--store <uri>` — an ordinary
|
|
|
|
|
local path, no special handling.
|
docs(cluster): axiom 15 — single ownership, mode-switch migration, per-operator layer (#164)
Encode the omnigraph.yaml ↔ cluster.yaml coexistence rules that were implicit
across the specs:
- cluster-axioms.md: new axiom 15 — every fact has exactly one owner at a time;
coexistence is a mode switch, never a merge; omnigraph.yaml's job description
shrinks to the permanent per-operator layer. Added review-tension bullet.
- cluster-config-specs.md: "Migration model" subsection (three coexistence
windows: no-conflict, Phase-5 mode switch, bridges-with-sunsets) and a
"per-operator layer" completeness table (connection, credential reference,
active context, ergonomics, personal aliases) with its global-config-dir
destination per the RFC-002 direction.
- cluster-config-implementation-spec.md: Compatibility Stance #7–#9 (single
ownership, shrinking role, bridges carry sunsets); Phase 5 boot is an
exclusive XOR mode switch; fixed the duplicated recoveries/recovery dirs in
the Phase-1 storage layout.
- docs/user/cluster-config.md: "Relationship to omnigraph.yaml" section in
current-reality terms (cluster catalog is inspectable, not live).
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 00:44:51 +03:00
|
|
|
|
2026-06-08 20:07:39 +03:00
|
|
|
## Supported `cluster.yaml`
|
|
|
|
|
|
2026-06-16 04:02:08 +03:00
|
|
|
The current config surface accepts this resource subset:
|
2026-06-08 20:07:39 +03:00
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
version: 1
|
|
|
|
|
metadata:
|
|
|
|
|
name: company-brain
|
|
|
|
|
|
|
|
|
|
state:
|
|
|
|
|
backend: cluster
|
|
|
|
|
lock: true
|
|
|
|
|
|
2026-06-16 04:02:08 +03:00
|
|
|
providers:
|
|
|
|
|
embedding:
|
|
|
|
|
default:
|
|
|
|
|
kind: openai-compatible
|
|
|
|
|
base_url: https://openrouter.ai/api/v1
|
|
|
|
|
model: openai/text-embedding-3-large
|
|
|
|
|
api_key: ${OPENROUTER_API_KEY}
|
|
|
|
|
|
2026-06-08 20:07:39 +03:00
|
|
|
graphs:
|
|
|
|
|
knowledge:
|
2026-06-11 00:46:21 +03:00
|
|
|
schema: knowledge.pg
|
2026-06-16 04:02:08 +03:00
|
|
|
embedding_provider: default
|
2026-06-11 00:46:21 +03:00
|
|
|
queries: queries/ # discover every `query <name>` in queries/*.gq
|
2026-06-08 20:07:39 +03:00
|
|
|
|
|
|
|
|
policies:
|
|
|
|
|
base:
|
2026-06-11 00:46:21 +03:00
|
|
|
file: base.policy.yaml
|
2026-06-08 20:07:39 +03:00
|
|
|
applies_to: [knowledge]
|
|
|
|
|
```
|
|
|
|
|
|
2026-06-11 00:46:21 +03:00
|
|
|
`queries` is Terraform-shaped — the `.gq` files are the declaration. Three
|
|
|
|
|
forms:
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
queries: queries/ # directory: top-level *.gq, sorted; every declaration registers
|
|
|
|
|
queries: [people.gq, extra/a.gq] # explicit files; every declaration in each
|
|
|
|
|
queries: # fine-grained name -> file map
|
|
|
|
|
find_experts:
|
|
|
|
|
file: knowledge.gq
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Discovery is loud: an unreadable or unparseable `.gq`, or the same query name
|
|
|
|
|
declared in two files, fails validation (`query_parse_error`,
|
|
|
|
|
`duplicate_query_name`). Each discovered query is still an individually
|
|
|
|
|
addressed resource (`query.<graph>.<name>`) with its own plan/apply lifecycle;
|
|
|
|
|
the digest is the containing file's hash, so editing a multi-query file
|
|
|
|
|
updates all of its queries together. Paths are relative to the config
|
|
|
|
|
directory — the cluster is one explicit folder, so no `./` prefixes are
|
|
|
|
|
needed.
|
|
|
|
|
|
2026-06-16 04:02:08 +03:00
|
|
|
`providers.embedding.<name>` defines a query-time embedding provider profile
|
|
|
|
|
for cluster-served graphs. A graph opts in with `embedding_provider: <name>`;
|
|
|
|
|
bare names normalize to `provider.embedding.<name>`. Supported provider
|
|
|
|
|
`kind` values are `openai-compatible` (default/OpenRouter-compatible),
|
|
|
|
|
`openai` (OpenAI's own host), `gemini`, and `mock`. Real providers require
|
|
|
|
|
`api_key: ${ENV_VAR}`; inline secrets are rejected. The env var is resolved
|
|
|
|
|
only when a `--cluster` server boots, so `cluster validate`, `plan`, and
|
|
|
|
|
`apply` do not need deployment secrets. `mock` is deterministic and does not
|
|
|
|
|
require `api_key`. Vector dimensions stay schema-driven by the target
|
|
|
|
|
`Vector(N)` column, not the provider profile.
|
|
|
|
|
|
feat(cluster): the storage: root — state, catalog, and graph roots relocatable
cluster.yaml gains an optional storage: URI deciding where everything the
cluster STORES lives: the state ledger, lock, content-addressed catalog,
recovery sidecars, approval artifacts, and the derived graph roots
(<storage>/graphs/<id>.omni). Absent, it defaults to the config directory
itself — the original layout, byte-compatible, so pre-existing clusters and
the whole test suite are untouched. Declared configuration always stays in
the working tree (Terraform's config-local/state-remote split); credentials
are env-only, never in cluster.yaml.
Every command resolves its store from the declared root (a bad root is a
loud invalid_storage_root). Graph-root derivation, the delete executor
(prefix delete via the adapter), the sweep's existence probes, the catalog
payload write/verify/read paths, and the serving snapshot all flow through
ClusterStore — the last raw-fs holdouts for stored state are gone, and the
deny-list gains the rule that keeps it that way.
Tests: default-layout byte-compat, a file:// root relocating the entire
cluster (ledger+catalog+graphs under the new root, nothing under the config
dir, serving snapshot follows), invalid-root validation. 98 in-crate + 9
failpoints + full workspace gate green. The s3:// flavor lands with PR 3's
gated RustFS e2e.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:28:04 +03:00
|
|
|
`storage:` (optional) is the **storage root URI** for everything the cluster
|
|
|
|
|
stores — the state ledger, lock, content-addressed catalog, recovery
|
|
|
|
|
sidecars, approval artifacts, and the derived graph roots
|
|
|
|
|
(`<storage>/graphs/<id>.omni`). Absent, it defaults to the config directory
|
|
|
|
|
itself (the original layout, byte-compatible with pre-existing clusters).
|
|
|
|
|
`s3://bucket/prefix` puts the whole cluster on S3-compatible object storage:
|
|
|
|
|
the ledger CAS uses conditional writes (verified against AWS S3 semantics and
|
|
|
|
|
RustFS), the lock becomes genuinely cross-machine, and graph roots are
|
|
|
|
|
engine-native S3 URIs. Credentials are **never** in `cluster.yaml` — the
|
|
|
|
|
standard `AWS_*` environment contract applies, identical to graph storage.
|
|
|
|
|
Declared configuration (`cluster.yaml` and the schema/query/policy sources it
|
|
|
|
|
references) always stays in the working tree: config is versioned in git,
|
|
|
|
|
state lives in the store — the Terraform split.
|
|
|
|
|
|
2026-06-08 21:09:23 +03:00
|
|
|
`metadata.name` is a display label. `state.backend` may be omitted or set to
|
|
|
|
|
`cluster`; external state backends are reserved for a later stage. `state.lock`
|
2026-06-09 23:36:33 +03:00
|
|
|
defaults to `true`. When enabled, `cluster plan`, `cluster apply`,
|
|
|
|
|
`cluster refresh`, and `cluster import` briefly acquire
|
|
|
|
|
`<config-dir>/__cluster/lock.json`, then remove it before returning. `cluster status` never acquires the lock; it only reports
|
2026-06-09 02:12:00 +03:00
|
|
|
whether one is present. `cluster force-unlock` is the only lock-removal command;
|
|
|
|
|
it requires the exact lock id and should be run only after confirming no cluster
|
|
|
|
|
operation is active.
|
2026-06-08 20:07:39 +03:00
|
|
|
|
|
|
|
|
## Validation
|
|
|
|
|
|
|
|
|
|
`cluster validate` checks:
|
|
|
|
|
|
|
|
|
|
- `cluster.yaml` syntax and supported fields
|
|
|
|
|
- duplicate YAML keys
|
|
|
|
|
- schema, query, and policy file existence
|
|
|
|
|
- schema parsing and catalog construction
|
|
|
|
|
- stored-query parsing and query-name matching
|
|
|
|
|
- stored-query type-checking against the desired schema
|
|
|
|
|
- policy `applies_to` graph references
|
2026-06-16 04:02:08 +03:00
|
|
|
- embedding provider profiles and graph `embedding_provider` references
|
2026-06-08 20:07:39 +03:00
|
|
|
|
2026-06-16 04:02:08 +03:00
|
|
|
Fields reserved for later phases, such as `pipelines`, top-level
|
|
|
|
|
`embeddings`, `ui`, `aliases`, and `bindings`, fail with a typed diagnostic
|
|
|
|
|
instead of being silently ignored. Under `providers`, only `embedding` is
|
|
|
|
|
supported today; other provider namespaces fail as unsupported config.
|
2026-06-08 20:07:39 +03:00
|
|
|
|
|
|
|
|
## Planning
|
|
|
|
|
|
|
|
|
|
`cluster plan` first performs validation, then reads local JSON state from:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
<config-dir>/__cluster/state.json
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
If the file is missing, the state is treated as empty and every desired
|
|
|
|
|
resource is planned as a create. If present, the file must use this shape:
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"version": 1,
|
2026-06-08 21:09:23 +03:00
|
|
|
"state_revision": 0,
|
2026-06-08 20:07:39 +03:00
|
|
|
"applied_revision": {
|
|
|
|
|
"config_digest": "...",
|
|
|
|
|
"resources": {
|
|
|
|
|
"schema.knowledge": { "digest": "..." },
|
|
|
|
|
"query.knowledge.find_experts": { "digest": "..." },
|
2026-06-16 04:02:08 +03:00
|
|
|
"provider.embedding.default": {
|
|
|
|
|
"digest": "...",
|
|
|
|
|
"embedding_profile": {
|
|
|
|
|
"kind": "openai-compatible",
|
|
|
|
|
"base_url": "https://openrouter.ai/api/v1",
|
|
|
|
|
"model": "openai/text-embedding-3-large",
|
|
|
|
|
"api_key": "${OPENROUTER_API_KEY}"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"graph.knowledge": {
|
|
|
|
|
"digest": "...",
|
|
|
|
|
"embedding_provider": "provider.embedding.default"
|
|
|
|
|
},
|
2026-06-10 15:30:57 +03:00
|
|
|
"policy.base": {
|
|
|
|
|
"digest": "...",
|
|
|
|
|
"applies_to": ["cluster", "graph.knowledge"]
|
|
|
|
|
}
|
2026-06-08 20:07:39 +03:00
|
|
|
}
|
2026-06-08 21:09:23 +03:00
|
|
|
},
|
|
|
|
|
"resource_statuses": {
|
|
|
|
|
"graph.knowledge": {
|
|
|
|
|
"status": "applied",
|
|
|
|
|
"conditions": [],
|
|
|
|
|
"message": "optional status detail"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"approval_records": {},
|
|
|
|
|
"recovery_records": {},
|
|
|
|
|
"observations": {}
|
2026-06-08 20:07:39 +03:00
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2026-06-08 21:09:23 +03:00
|
|
|
`state_revision`, `resource_statuses`, `approval_records`, `recovery_records`,
|
|
|
|
|
and `observations` are optional so older Stage 1 state fixtures keep working.
|
|
|
|
|
Missing `state_revision` is treated as `0`. Resource status values are
|
|
|
|
|
`pending`, `planned`, `applying`, `applied`, `drifted`, `blocked`, or `error`.
|
|
|
|
|
|
2026-06-08 20:07:39 +03:00
|
|
|
Plan output compares desired resource digests against state resource digests
|
2026-06-08 21:09:23 +03:00
|
|
|
and reports `create`, `update`, and `delete` changes. It also reports the state
|
2026-06-09 18:30:33 +03:00
|
|
|
CAS (`sha256:<digest>`) and state revision. `state_observations.locked` means an
|
2026-06-09 02:12:00 +03:00
|
|
|
existing lock file was observed, along with its metadata (`lock_id`,
|
|
|
|
|
`lock_operation`, `lock_created_at`, `lock_pid`, `lock_age_seconds`); a
|
|
|
|
|
successful `plan` instead reports `lock_acquired: true` and an
|
|
|
|
|
`acquired_lock_id`, then releases the lock before returning. The command never
|
|
|
|
|
writes `state.json` and does not scan live graphs. Use explicit
|
|
|
|
|
`cluster refresh` / `cluster import` when the state ledger should be updated
|
2026-06-09 23:36:33 +03:00
|
|
|
from live observations. Live drift scans during plan are later-stage work.
|
|
|
|
|
|
2026-06-10 15:30:57 +03:00
|
|
|
Policy entries additionally record their applied `applies_to` bindings as
|
|
|
|
|
normalized typed refs — the state ledger is serving-sufficient for the
|
|
|
|
|
future server-boot stage. A change to `applies_to` alone (the policy file
|
|
|
|
|
digest unchanged) appears in the plan as an Update marked `binding_change`
|
|
|
|
|
(human output: `[bindings]`), applies like any catalog change, and counts
|
|
|
|
|
toward convergence; ledgers written before this field existed are backfilled
|
|
|
|
|
by the next apply.
|
|
|
|
|
|
2026-06-09 23:36:33 +03:00
|
|
|
Each plan change carries a `disposition` field — an honest preview of what
|
|
|
|
|
`cluster apply` will do with it in this stage: `applied` (executes), `derived`
|
|
|
|
|
(a `graph.<id>` composite-digest update that converges automatically once its
|
|
|
|
|
query digests land), `deferred` (graph/schema change, later phase), or
|
|
|
|
|
`blocked` (query/policy gated by an unapplied or missing dependency, with the
|
|
|
|
|
condition in `reason`).
|
|
|
|
|
|
|
|
|
|
## Apply
|
|
|
|
|
|
2026-06-10 05:00:42 +03:00
|
|
|
`cluster apply` executes the executable subset of the plan — stored-query and
|
2026-06-10 13:14:20 +03:00
|
|
|
policy-bundle changes, graph creates, and schema updates. There is no confirm
|
|
|
|
|
flag: `cluster plan` is the preview,
|
2026-06-09 23:36:33 +03:00
|
|
|
and apply recomputes the same diff under the state lock before executing, so a
|
|
|
|
|
stale preview can never be applied. Apply requires an existing `state.json`
|
|
|
|
|
(`state_missing` directs you to `cluster import` first).
|
|
|
|
|
|
|
|
|
|
For each applied create/update, the resource payload is written
|
|
|
|
|
content-addressed into the local catalog:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
<config-dir>/__cluster/resources/query/<graph>/<name>/<digest>.gq
|
|
|
|
|
<config-dir>/__cluster/resources/policy/<name>/<digest>.yaml
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Extensions are fixed per kind regardless of the source file's name. Payloads
|
|
|
|
|
are written before the state update because `state.json` is the publish point:
|
|
|
|
|
if the final CAS-checked state write fails, no success is reported and the
|
|
|
|
|
digest-named blobs already written are inert — re-running apply is the repair.
|
|
|
|
|
Deletes remove the resource from state; their old payload blobs stay on disk
|
|
|
|
|
(garbage collection is a later stage). Re-running a converged apply is a no-op:
|
|
|
|
|
no state write, no revision change (`state_written: false`).
|
|
|
|
|
|
feat!: delete the legacy OmnigraphConfig + config migrate; finish the omnigraph.yaml docs sweep (#252)
* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI
The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.
No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: remove the `config migrate` command and migrate.rs
`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.
- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(server)!: delete the legacy OmnigraphConfig type and load_config
With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).
- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
only caller was its own test) is removed — server boot and the CLI both build
registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
(`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).
No behavior change — every deleted item was unreachable after the excision.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs: purge the legacy omnigraph.yaml surface from the docs
Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.
- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
two-surface-config passage (drop `config migrate`, the deprecation env vars,
and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
file)" section and the `config migrate` row; re-home the `policy` row, the
bearer-token chain, the actor/format/param-precedence references, and the
`--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
cluster (`--cluster` + `cluster apply`); fix the policy examples to
`--cluster`; replace the `## Config` omnigraph.yaml example with the
operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
`policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
references.
RFCs, design specs, and prior release notes are left as historical records.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:31:29 +03:00
|
|
|
**Applied means serving.** A server started with `--cluster <dir>` boots from
|
|
|
|
|
the applied revision (see
|
2026-06-10 17:55:15 +03:00
|
|
|
[Serving from the cluster](#serving-from-the-cluster-the-mode-switch)); it
|
feat!: delete the legacy OmnigraphConfig + config migrate; finish the omnigraph.yaml docs sweep (#252)
* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI
The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.
No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: remove the `config migrate` command and migrate.rs
`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.
- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(server)!: delete the legacy OmnigraphConfig type and load_config
With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).
- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
only caller was its own test) is removed — server boot and the CLI both build
registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
(`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).
No behavior change — every deleted item was unreachable after the excision.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs: purge the legacy omnigraph.yaml surface from the docs
Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.
- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
two-surface-config passage (drop `config migrate`, the deprecation env vars,
and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
file)" section and the `config migrate` row; re-home the `policy` row, the
bearer-token chain, the actor/format/param-precedence references, and the
`--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
cluster (`--cluster` + `cluster apply`); fix the policy examples to
`--cluster`; replace the `## Config` omnigraph.yaml example with the
operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
`policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
references.
RFCs, design specs, and prior release notes are left as historical records.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:31:29 +03:00
|
|
|
picks up newly applied state on its next restart. Until that restart, applied
|
|
|
|
|
means recorded in the catalog, nothing more.
|
2026-06-09 23:36:33 +03:00
|
|
|
|
2026-06-10 05:00:42 +03:00
|
|
|
### Graph creation
|
|
|
|
|
|
|
|
|
|
A `graph.<id>` create (the graph is declared but no root exists) is executed
|
|
|
|
|
by apply: the graph is initialized at the derived root
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
<config-dir>/graphs/<graph-id>.omni
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
with the declared schema, before any catalog writes, so queries and policies
|
|
|
|
|
that depend on the new graph apply **in the same run**. Each create is fenced
|
|
|
|
|
by a recovery sidecar under `__cluster/recoveries/{ulid}.json`, written before
|
|
|
|
|
the init and removed only after the state update lands. If apply crashes in
|
|
|
|
|
between, the next state-mutating command (`apply`, `refresh`, `import`) runs a
|
|
|
|
|
**recovery sweep** that classifies the survivor by observation: an absent root
|
|
|
|
|
removes the stale intent; a completed create rolls the cluster state forward
|
|
|
|
|
(recorded in the state's `recovery_records`); a partial root reports
|
|
|
|
|
`graph_create_incomplete` (status `error` — remove the root and re-run apply;
|
|
|
|
|
nothing is auto-deleted); unexpected graph content reports
|
|
|
|
|
`actual_applied_state_pending` (status `drifted` — run `cluster refresh` and
|
|
|
|
|
re-plan). While a kept sidecar is pending, that graph's create and its
|
|
|
|
|
dependents are blocked with `cluster_recovery_pending`. Read-only commands
|
|
|
|
|
(`status`, `plan`) warn about pending sidecars without acting on them.
|
|
|
|
|
|
|
|
|
|
**Re-creation is convergence.** If a graph root disappears out-of-band,
|
|
|
|
|
`refresh` records the drift and the next `plan` proposes a create — and apply
|
|
|
|
|
will execute it, producing an **empty** graph at the root. The data was
|
|
|
|
|
already lost when the root vanished; the create is visible in the plan
|
|
|
|
|
(disposition `applied`) before anything runs.
|
|
|
|
|
|
2026-06-10 13:14:20 +03:00
|
|
|
### Schema updates
|
|
|
|
|
|
|
|
|
|
A `schema.<id>` update (the declared schema differs from what state records)
|
|
|
|
|
is executed by apply via the engine's schema-apply, after graph creates and
|
|
|
|
|
before catalog writes — so a query change that depends on the new schema
|
|
|
|
|
applies in the same run. Each schema apply is sidecar-fenced like a create:
|
|
|
|
|
pre-operation manifest version recorded, post-operation version written back,
|
|
|
|
|
sidecar retired only after the state update lands; the recovery sweep
|
|
|
|
|
classifies survivors by schema digest (consistent ledger → retired; completed
|
|
|
|
|
on the graph → state rolled forward with an audit entry; anything else →
|
|
|
|
|
`drifted`/`actual_applied_state_pending`, kept).
|
|
|
|
|
|
|
|
|
|
Migrations run with **soft drops only** — a removed property disappears from
|
|
|
|
|
the current version while prior versions retain the data (reversible until
|
|
|
|
|
`cleanup`). Data-loss migrations (`allow_data_loss`) are not reachable from
|
|
|
|
|
cluster apply until the approval-artifact stage. Unsupported migrations
|
|
|
|
|
(e.g. changing a property's type), engine lock contention, or graphs with
|
|
|
|
|
user branches fail loudly as `schema_apply_failed` with the engine's message;
|
|
|
|
|
dependent changes are demoted to `blocked` and graph-moving work stops for
|
|
|
|
|
the run.
|
|
|
|
|
|
|
|
|
|
`cluster plan` previews schema updates with the engine's real migration plan:
|
|
|
|
|
each schema change carries a `migration` field (`supported` + typed steps),
|
|
|
|
|
and the human output prints the steps. If the live graph cannot be opened the
|
|
|
|
|
preview degrades to the digest diff with a `schema_preview_unavailable`
|
|
|
|
|
warning.
|
|
|
|
|
|
|
|
|
|
**Drift is converged, not just reported.** A schema changed out-of-band on
|
|
|
|
|
the live graph shows up as `drifted` after `refresh`, and the next plan
|
|
|
|
|
proposes migrating it back to the declared schema — apply executes that like
|
|
|
|
|
any other soft migration. Drift correction is gated by the same rules as any
|
|
|
|
|
change; nothing about it is hidden (the plan shows the steps, including soft
|
|
|
|
|
drops of out-of-band fields).
|
|
|
|
|
|
|
|
|
|
**Attribution.** `cluster apply --as <actor>` records the operator identity
|
|
|
|
|
in recovery sidecars and audit entries and threads it to the engine's
|
|
|
|
|
schema-apply (so commit attribution and Cedar enforcement — wherever a policy
|
|
|
|
|
checker is installed — work unchanged).
|
|
|
|
|
|
2026-06-10 14:35:50 +03:00
|
|
|
### Approvals and graph deletion
|
|
|
|
|
|
|
|
|
|
Deleting a graph is the irreversible tier: it requires a recorded human
|
|
|
|
|
decision. `cluster plan` lists the gate under `approvals_required` (one gate
|
|
|
|
|
per graph — the graph-level approval carries its schema and queries);
|
|
|
|
|
`cluster approve graph.<id> --as <actor>` writes a digest-bound artifact to
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
<config-dir>/__cluster/approvals/<approval-id>.json
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
bound to the exact desired config digest and the change's state digest, so
|
|
|
|
|
**any config or state drift after approving invalidates the artifact**
|
|
|
|
|
automatically (`approval_stale` warning; it never authorizes a different
|
|
|
|
|
change). An unapproved delete blocks with `approval_required`.
|
|
|
|
|
|
|
|
|
|
An approved delete executes **last** in the apply run: the graph root is
|
|
|
|
|
removed recursively, the subtree (graph, schema, its queries) is tombstoned
|
|
|
|
|
out of the state ledger with a tombstone observation, and the approval is
|
|
|
|
|
consumed — recorded in the state's `approval_records` in the same state
|
|
|
|
|
update, and the artifact file rewritten with `consumed_at` (the file is never
|
|
|
|
|
deleted: the audit fact survives the loss of either store). A failed run
|
|
|
|
|
consumes nothing; the approval stays valid for the retry. Catalog blobs of
|
|
|
|
|
the deleted graph's queries stay on disk (GC is a later stage).
|
|
|
|
|
|
|
|
|
|
Crash recovery for deletes: a completed-but-unrecorded delete is rolled
|
|
|
|
|
forward by the sweep (tombstone + approval consumption + audit entry); an
|
|
|
|
|
incomplete delete (root still present) is retired with a
|
|
|
|
|
`graph_delete_incomplete` warning and simply **re-proposed** — prefix removal
|
|
|
|
|
is idempotent, so the still-approved retry is the repair.
|
|
|
|
|
|
|
|
|
|
Standalone schema deletes are never executed by this stage. They are
|
2026-06-10 05:00:42 +03:00
|
|
|
reported as `deferred` (warning `apply_unsupported_change`), and query/policy
|
|
|
|
|
changes that depend on them are `blocked` (warning `apply_dependency_blocked`, status
|
2026-06-09 23:36:33 +03:00
|
|
|
`blocked` in state). A partially-applicable plan still exits 0 with warnings;
|
|
|
|
|
the JSON `converged` field is the automation signal for "state now matches the
|
|
|
|
|
desired revision". The applied `config_digest` is only recorded when apply
|
|
|
|
|
fully converges. The `graph.<id>` composite digest is recomputed from state's
|
|
|
|
|
own schema/query digests after each apply, so applied query changes converge
|
|
|
|
|
without graph movement.
|
2026-06-08 21:09:23 +03:00
|
|
|
|
2026-06-10 17:55:15 +03:00
|
|
|
## Serving from the cluster (the mode switch)
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-06-11 00:46:21 +03:00
|
|
|
omnigraph-server --cluster company-brain --bind 0.0.0.0:8080
|
2026-06-10 17:55:15 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`--cluster <dir>` is an **exclusive boot source** (axiom 15): it cannot
|
|
|
|
|
combine with a graph URI, `--target`, or `--config`, and in this mode
|
|
|
|
|
`omnigraph.yaml` is never read — not for graphs, not for queries, not for
|
|
|
|
|
policies. The server serves the **applied revision**: graph roots recorded in
|
|
|
|
|
`state.json`, stored-query and policy content from the content-addressed
|
|
|
|
|
catalog at the applied digests (re-verified at boot), and policy bundles
|
|
|
|
|
wired by their applied `applies_to` bindings — `cluster`-bound bundles become
|
|
|
|
|
the server-level Cedar engine, graph-bound bundles attach per graph.
|
|
|
|
|
Un-applied config drift never leaks into serving; `cluster plan` is where
|
|
|
|
|
drift is visible. Routing is always multi-graph (`/graphs/{id}/...`). Bearer
|
|
|
|
|
tokens and the bind address stay process-level (flags/env) — they are
|
|
|
|
|
per-replica facts, not cluster facts.
|
|
|
|
|
|
|
|
|
|
Boot is fail-fast: missing or unreadable state, pending recovery sidecars,
|
|
|
|
|
missing/tampered catalog blobs, policy entries without binding metadata
|
|
|
|
|
(pre-binding ledgers — re-run `cluster apply`), an empty graph set, more than
|
|
|
|
|
one policy bundle binding a single scope (split or merge bundles; stacked
|
|
|
|
|
scopes are a later stage), unopenable graph roots, and stored queries that no
|
|
|
|
|
longer type-check all refuse startup with a remedy. A held state lock is
|
|
|
|
|
*not* an error — boot reads the atomically-replaced state file without
|
|
|
|
|
locking.
|
|
|
|
|
|
|
|
|
|
Serving is static per process: the server reads the applied revision once at
|
|
|
|
|
startup, so picking up newly applied state means restarting it. Stored
|
|
|
|
|
queries are all listed in `GET /queries` in cluster mode (the cluster
|
|
|
|
|
registry has no expose flag; exposure becomes a policy decision in a later
|
|
|
|
|
phase).
|
|
|
|
|
|
2026-06-08 21:09:23 +03:00
|
|
|
## Status
|
|
|
|
|
|
|
|
|
|
`cluster status` reads the same local JSON state ledger and prints what the
|
|
|
|
|
ledger says is deployed. It does not validate referenced schema/query/policy
|
|
|
|
|
files and does not inspect live graphs. Missing `state.json` succeeds with a
|
2026-06-09 02:12:00 +03:00
|
|
|
warning; invalid state JSON or an unsupported state version fails. If a lock is
|
|
|
|
|
present, status reports its id, operation, creation time, pid, and age.
|
2026-06-08 23:18:44 +03:00
|
|
|
|
2026-06-10 02:07:08 +03:00
|
|
|
Status also verifies the catalog payloads read-only: every query/policy digest
|
|
|
|
|
recorded in state is checked against its content-addressed blob under
|
|
|
|
|
`__cluster/resources/` (existence and full digest re-hash). A missing or
|
|
|
|
|
mismatched blob is reported as a warning (`catalog_payload_missing` /
|
|
|
|
|
`catalog_payload_mismatch`); an unreadable blob is an error
|
|
|
|
|
(`catalog_payload_read_error`) because an unverifiable catalog must not report
|
|
|
|
|
healthy. Status never writes state — persisting the `drifted` condition is
|
|
|
|
|
refresh's job. The check runs without the state lock, so it is a point-in-time
|
|
|
|
|
report.
|
|
|
|
|
|
2026-06-08 23:18:44 +03:00
|
|
|
## Refresh And Import
|
|
|
|
|
|
|
|
|
|
`cluster refresh` updates an existing `state.json` from actual observations.
|
|
|
|
|
`cluster import` creates the first `state.json` when the ledger is missing.
|
|
|
|
|
Both commands open declared graphs read-only at:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
<config-dir>/graphs/<graph-id>.omni
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
They observe only branch `main`, recording graph existence, manifest version,
|
|
|
|
|
live schema digest, desired schema digest, and schema-match status under
|
|
|
|
|
`observations["graph.<id>"]`. Missing graph roots are recorded as drift and
|
|
|
|
|
remove the graph/schema digests from state so a later `plan` proposes creates.
|
|
|
|
|
Invalid graph roots are recorded as errors; `refresh` persists the error
|
|
|
|
|
observation and exits non-zero, while `import` exits non-zero without creating
|
|
|
|
|
initial state.
|
|
|
|
|
|
2026-06-10 02:07:08 +03:00
|
|
|
Refresh also verifies the catalog payloads of every query/policy digest
|
|
|
|
|
recorded in state (the same check `cluster status` reports read-only), and
|
|
|
|
|
closes the loop:
|
|
|
|
|
|
|
|
|
|
- a **missing** or **digest-mismatched** blob marks the resource `drifted`
|
|
|
|
|
(condition `payload_missing` / `payload_mismatch`) and removes its digest
|
|
|
|
|
from state — so the next `cluster plan` proposes a create and the next
|
|
|
|
|
`cluster apply` republishes the blob (the self-heal loop, mirroring how a
|
|
|
|
|
missing graph root is handled);
|
|
|
|
|
- an **unreadable** blob (IO error other than not-found) keeps the digest,
|
|
|
|
|
marks the resource `error` (condition `payload_read_error`), and exits
|
|
|
|
|
non-zero — transient IO must not trigger a spurious republish.
|
|
|
|
|
|
|
|
|
|
Upgrade note: a state ledger written before catalog publish existed records
|
|
|
|
|
query/policy digests with no blobs on disk; the first refresh after upgrading
|
|
|
|
|
flags them all `payload_missing`, and a single `cluster apply` republishes
|
|
|
|
|
everything and converges.
|
|
|
|
|
|
|
|
|
|
Refresh/import do not observe query or policy resources beyond their catalog
|
|
|
|
|
payloads yet. Existing query and policy state digests are preserved on refresh
|
|
|
|
|
(unless their payload drifted, above) and are not invented on import.
|
2026-06-09 02:12:00 +03:00
|
|
|
|
|
|
|
|
## Force Unlock
|
|
|
|
|
|
|
|
|
|
`cluster force-unlock <LOCK_ID>` removes `<config-dir>/__cluster/lock.json` only
|
|
|
|
|
when the file exists, is valid version-1 lock JSON, and its `lock_id` exactly
|
|
|
|
|
matches the argument. A wrong id, missing lock, invalid lock JSON, or unsupported
|
|
|
|
|
lock version exits non-zero and leaves the file untouched.
|
|
|
|
|
|
|
|
|
|
This is manual recovery for abandoned local locks. OmniGraph does not perform
|
|
|
|
|
PID-liveness checks, TTL expiry, stale-lock breaking, or automatic unlock in
|
|
|
|
|
Stage 2C.
|