omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-18 02:24:27 +02:00

Author	SHA1	Message	Date
aaltshuler	8d2128438e	fix(cli): quote @embed annotation values in schema show so they round-trip `render_annotations` emitted `@embed` values unquoted — `@embed(title, model=openai/text-embedding-3-large)`. The parser stores values via `decode_string_literal` (quotes stripped) and `annotation_kwarg` requires a quoted `literal`, so the rendered output did not re-parse: a `model` containing `/` or `-` is not a valid bare token. `schema show` therefore produced schema text that `schema apply`/lint would reject. Re-quote the positional value and every kwarg value as string literals, so `schema show` reproduces `@embed("title", model="openai/text-embedding-3-large")` and round-trips. Regression: `render_annotations_quotes_values_so_embed_round_trips` parses the rendered form back through the schema grammar. Addresses the bot-review finding on #248. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 01:25:27 +03:00
Ragnor Comerford	74476f7f51	feat(compiler): @embed model kwarg in grammar/AST/parser (RFC-012 Phase 3) Annotations gain optional comma-separated key=value kwargs. Annotation keeps value (existing consumers unchanged) and adds kwargs: BTreeMap with serde(default, skip_serializing_if) so empty kwargs are omitted and existing schemas' IR JSON/hash stay byte-identical. The parser rejects any @embed kwarg other than model. render_annotations shows kwargs. 3 new parser tests.	2026-06-15 21:09:15 +02:00
Ragnor Comerford	30377c453b	fix(embedding): address PR review feedback (RFC-012 Phase 2) openai-alias host (Cursor): OMNIGRAPH_EMBED_PROVIDER=openai now defaults its base URL to https://api.openai.com/v1 (model text-embedding-3-large), while openai-compatible/unset keep the OpenRouter gateway default. The default is derived from the alias rather than the Provider enum, so an operator's stated intent can no longer be silently routed to OpenRouter; an explicit OMNIGRAPH_EMBED_BASE_URL still wins. New test from_env_openai_alias_uses_openai_host_not_openrouter. single model source of truth (Cursor): remove the EmbedSpec.model field. The provider config is authoritative for the model, so a spec can no longer declare a model that is silently ignored while the API uses another (the wrong-space-vectors footgun); the embed summary reports the model actually resolved. Correct by construction rather than a truthful-echo patch. stale @embed docs (Codex): docs/user/schema/index.md and docs/dev/execution.md still claimed @embed embeds at ingest; corrected to the real contract (catalog annotation; vectors supplied or pre-filled by 'omnigraph embed'). Also documented the openai-vs-OpenRouter base default in embeddings.md. Greptile's RFC-status note is declined: the repo lifecycle keeps an RFC Status: Proposed while its PR is open and flips to Accepted on merge.	2026-06-15 18:37:34 +02:00
Ragnor Comerford	b999ae3753	feat(engine)!: provider-independent embedding client (RFC-012 Phase 2) Replace the Gemini-only EmbeddingClient with one resolved EmbeddingConfig { provider, model, base_url, api_key } behind a sealed Provider enum (OpenAiCompatible \| Gemini \| Mock). OpenAiCompatible (POST {base}/embeddings, bearer, {model, input, dimensions}) covers OpenRouter — the new default gateway — OpenAI direct, and self-hosted endpoints; Gemini keeps its RETRIEVAL_QUERY/RETRIEVAL_DOCUMENT task types; Mock is offline/deterministic. EmbedRole replaces the task-type string. from_env() resolves provider via OMNIGRAPH_EMBED_PROVIDER (default openai-compatible), base/model via OMNIGRAPH_EMBED_BASE_URL/_MODEL, and the api key from OPENROUTER_API_KEY/OPENAI_API_KEY or GEMINI_API_KEY. BREAKING (pre-release, no back-compat): the default provider is now OpenRouter, OMNIGRAPH_GEMINI_BASE_URL is dropped, and Gemini-only users set OMNIGRAPH_EMBED_PROVIDER=gemini. Folds in RFC-012 Phase 1 NFR floor: a total-operation OMNIGRAPH_EMBED_QUERY_DEADLINE_MS deadline (default 60s; 0=unbounded) bounds the ~121s worst case, and tracing spans (target omnigraph::embedding) record provider/model/dim/attempt/elapsed/outcome. The offline 'omnigraph embed' CLI follows the resolved provider (its hardcoded gemini-only bail removed). 17 engine embedding unit tests, 4 CLI embed tests, and the search integration suite (22) pass. Cross-query client reuse and the docs refresh land in follow-up commits.	2026-06-15 17:27:49 +02:00
Andrew Altshuler	bc2a989a7b	feat(cli)!: remove legacy data-plane addressing (--target, positional http→remote, --as-on-served) (#238 ) * feat(cli): --server accepts a literal URL (RFC-011 Decision 2) `resolve_server_flag` now treats a `--server` value containing `://` as a literal base URL (trailing slash trimmed; `--graph` appends `/graphs/<id>`), bypassing the operator-config `servers:` registry; a bare name still resolves through the registry. This is the replacement the upcoming `--uri http(s)://` deprecation points at, and a small ergonomic win on its own (`--server https://host` with no config entry). Token resolution for a literal-URL server falls to the legacy OMNIGRAPH_BEARER_TOKEN chain, same as a positional URL today. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(cli): address the parity-matrix arms with global --store/--server flags Prep for removing the positional-http→remote dispatch. The parity harness addressed both arms with a positional graph right after the verb (`omnigraph <verb> <addr> <args…>`), which only parses for top-level verbs — for nested subcommands (`schema show`, `branch list`, …) the address landed in the subcommand slot and BOTH arms failed identically, so the test passed vacuously (matching exit codes, never comparing output). Address both arms with the global flags instead — local `--store <graph>` (embedded), remote `--server <url>` (served) — appended after the verb + args, valid regardless of nesting. The previously-vacuous nested-verb parity checks now actually compare embedded vs remote (and pass — parity holds), and the remote arm no longer relies on the positional-URL dispatch that's about to be removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli)!: --as on a served write is a hard error (was a silent no-op) A served write resolves the actor server-side from the bearer token, so `--as` could never set identity there — it was silently ignored. It now errors (in the remote write factory, before any HTTP call), pointing the user at removing `--as` or writing directly with `--store`. Reads don't carry `--as`, so this is write-path only. BREAKING for any script that passed `--as` to a remote write (it was a no-op, so behavior is unchanged except the now-explicit error). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli)!: a positional/--uri http(s):// URL no longer dispatches to a server Remote graphs must be addressed with `--server <url>` (or a named server / a profile binding one). A positional or `--uri` `http(s)://` URL on a data verb now errors instead of silently routing to the remote HTTP client — the scheme no longer carries transport semantics. The discriminator is `via_server`: a remote URL produced by a server scope is fine; a remote URL from a positional/`--uri` source is rejected (`reject_positional_remote` in both GraphClient factories). Storage verbs are unaffected — they already reject remote URIs through `resolve_local_graph` with the existing "direct (storage-native)" error. Migrated the gh-host keyed-credential system test to `--server <url>` (the literal URL still prefix-matches the operator server for token resolution). BREAKING: scripts addressing a server by a bare URL must switch to `--server <url>`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli)!: remove the --target flag (use --store / --profile / --server) Removes the legacy named-graph flag and threads its parameter out of the whole resolver chain. `--target` resolved a graph name through `omnigraph.yaml`'s `graphs:` map; its replacements (`--store <uri>`, `--profile <name>`, `--server <name>`) all ship. - Drops the 22 `target` clap fields + the `--cluster` exclusion that named it. - Threads `target`/`cli_target` out of `resolve_uri`/`resolve_cli_graph`/ `resolve_local_graph`/`resolve_local_uri`/`resolve_storage_uri`/ `resolve_remote_bearer_token`/`apply_server_flag`/`execute_query_lint`/ `resolve_selected_graph`/`resolve_registry_selection_for_list`/ `execute_queries_{validate,list}`, the two `GraphClient` factories, and `ScopeFlags`/`ResolvedScope`. - Keeps the shared `OmnigraphConfig::resolve_target_uri` 3-arg (server boot uses it); the CLI passes None for the explicit-target arm. The `cli.graph` default (omnigraph.yaml bare-command fallback) is unchanged — its removal belongs to the omnigraph.yaml excision. - Operator/file aliases that bind a `graph` name still work: the name is now resolved to a URI inline (a positional URI wins). - Error messages and `--graph`/`--server`/`--store` help text no longer name `--target`; the queries-list selection hint points at `cli.graph`. BREAKING. Tests updated (named-target resolution rewritten onto `cli.graph`; positional-URI tests unchanged). Full omnigraph-cli suite green (228). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(cli): drop --target and positional-http addressing; --as-on-served is an error Update the user docs for the legacy data-plane addressing removals: - the CLI `--target` flag is gone — address graphs with a positional URI, `--store`, `--profile`, or `--server <name\|url>`; - a positional `http(s)://` URI no longer dispatches to a server (use `--server`); - `--as` on a served write is now rejected (was a silent no-op). Touches cli/reference.md (addressing intro, capability table, error examples, scopes), cli/index.md (the remote-read example → --server), operations/maintenance + policy, and the cluster docs' data-plane load guidance. The server's own `--target` boot flag is unchanged (server.md untouched). Also fixes a pre-existing broken maintenance link in search/indexes.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(cli): --store is loudly exclusive with a positional URI / --server; test graphs→Served Address two Greptile findings on the RFC-011 slices: - Slice A (P1): `--store` combined with a positional URI silently dropped the URI (`scope.rs` did `store.or(uri)`); `--store` + `--server` errored with a misleading "positional URI" message. Now both combinations fail loudly with a declared `--store is exclusive with a positional URI and --server` error. - Slice B (P2): the `command_capability` unit test never exercised the one Data→Served refinement (`graphs`); added the assertion so deleting that guard can't pass silently. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 04:29:16 +03:00
Andrew Altshuler	7eeced3e88	feat(cli): RFC-011 Slice B — capability vocabulary (any/served/direct/control/local) (#237 ) * feat(cli): RFC-011 Slice B — capability vocabulary (any/served/direct/control/local) User-facing CLI errors and --help now speak a single "capability" vocabulary — what a command needs — instead of the internal four-plane jargon. Behavior is unchanged: the --server/--graph allow set is identical (the served-graph capabilities `any` ∪ `served` = the old `Data` plane, since `graphs` was already allowed). Only error text and the --help legend change. - planes.rs: add `Capability { Any, Served, Direct, Control, Local }` derived from the existing exhaustive `command_plane` classifier (which stays as the drift guard) plus the one Data→Served refinement (`graphs`). `guard_addressing` now allows `--server`/`--graph` on `{Any, Served}` and rejects elsewhere with a capability-worded message. The mapping reflects current behavior (`queries list` → Local, `queries validate` → Direct); it converges to the RFC end-state table when later slices re-route those verbs. - scope.rs: `resolve_scope` takes `Capability` instead of `Plane`, so the whole addressing path speaks one vocabulary; call sites in client.rs (Any) and the 3 maintenance verbs in main.rs (Direct) updated. - helpers.rs: the storage-direct remote rejection reworded to "direct (storage-native) command". - cli.rs: the --help legend is now "COMMANDS BY CAPABILITY". - Tests: the 5 assertions pinning the old plane text updated; added planes.rs unit tests proving the allow set is exactly {Any, Served} (behavior-preservation), the per-verb mapping, and distinct capability phrases. Full omnigraph-cli suite: 225 green (222 + 3 new), zero behavior-test changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(cli): capability vocabulary in the CLI reference + maintenance addressing Rename the reference's "Command planes" section to "Command capabilities" (any/served/direct/control/local), reword the error examples, and update the maintenance doc's addressing note + its section cross-link to match Slice B. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 03:02:07 +03:00
Andrew Altshuler	a4d08a4184	feat(cli): RFC-011 Slice A — additive scope/profile addressing (#235 ) Some checks failed CI / Classify Changes (push) Has been cancelled Details CI / Check AGENTS.md Links (push) Has been cancelled Details CI / Container Entrypoint (push) Has been cancelled Details Release Edge / Prepare edge release (push) Has been cancelled Details CI / Test Workspace (push) Has been cancelled Details CI / Test omnigraph-server --features aws (push) Has been cancelled Details CI / RustFS S3 Integration (push) Has been cancelled Details Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-windows-x86_64 (push) Has been cancelled Details Release Edge / Smoke Windows installer (push) Has been cancelled Details * feat(cli): RFC-011 Slice A — operator-config scope structs (profiles/clusters/defaults) Additive operator-config surface for the RFC-011 scope model. No behavior change yet — these structs are parsed but not consumed until the scope resolver lands. - OperatorConfig gains `profiles:` (name → OperatorProfile) and `clusters:` (name → OperatorCluster { root }) — the latter the only place a storage root appears in operator config (RFC-011 storage-root rule). - OperatorDefaults gains `server` and `default_graph` (the flat-default scope). - OperatorProfile binds one of {server, cluster, store} + default_graph; `binding()` validates exactly-one on use and returns a ScopeBinding. - Accessors profile()/cluster_root()/default_server()/default_graph(); unknown-key warnings extended to the new blocks (forward-compat preserved — old configs still load, new keys are no longer "unknown"). Tests: parse profiles/clusters/scope-defaults, binding rejects zero/multiple entities, unknown keys in a profile warn. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli): RFC-011 Slice A — scope resolver + --profile/--store, wired (additive) Translate the new scope inputs into the existing addressing tuple, in front of the unchanged resolvers. Purely additive: an explicit address (--uri/--target/--server/--store) passes straight through, so every existing invocation is byte-for-byte unchanged. - scope.rs: resolve_scope() with the RFC-011 precedence (explicit > --profile / OMNIGRAPH_PROFILE > flat defaults.server), producing the effective (server, graph, uri, target) for data verbs and (cluster, cluster_graph) for maintenance. Plane×scope capability check (server scope rejected on a maintenance verb; cluster scope rejected on a data verb; store rejects --graph) fires only on the new paths. 9 unit tests. - cli.rs: global --profile <NAME> and --store <URI>. (--graph keeps requires=server for now; profile/default graph comes from default_graph — profile+--graph override is deferred to the --cluster-graph rework.) - client.rs: the two GraphClient factories call resolve_scope (Plane::Data) up front; the explicit branch reproduces today's behavior exactly. - main.rs: the 15 data call sites forward --profile/--store; the 3 maintenance verbs consult the scope (Plane::Storage) only when no explicit per-command address is given, so cluster-binding profiles and --store reach optimize/repair/cleanup. Verified: the full omnigraph-cli suite (221 tests) stays green untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test+docs(cli): RFC-011 Slice A — end-to-end scope test + reference docs - cli_data.rs: prove --store and a --profile store binding drive a read identically to the legacy positional URI (the additive-coexistence contract), end to end against a local graph (no server needed). - cli/reference.md: document profiles/clusters/defaults.server/default_graph, the --profile/--store flags, and a "Scopes & profiles" section; note the model coexists with legacy addressing (nothing removed yet). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 02:37:55 +03:00
Ragnor Comerford	7963499995	fix(cli): unify remote URL builder, fix branch delete //branches 404 (#230 ) * test(cli): reproduce branch-delete //branches 404 (failing) Regression test for the `branch delete` 404 over a multi-graph `--server`/`--graph` target: the composed URL must be `<base>/branches/<name>` with no empty `//` segment. Fails against the current `remote_branch_url`, which appends a trailing slash before extending path segments and so emits `…/graphs/p9-os//branches/tmpbranch`. The next commit fixes it. left: "http://host/graphs/p9-os//branches/tmpbranch" right: "http://host/graphs/p9-os/branches/tmpbranch" * fix(cli): unify remote URL builder, close the //branches 404 class Correct-by-design fix for the failing test in the previous commit. The bug was not specific to `branch delete`: URL assembly was scattered across a string-concat `remote_url`, a url-crate `remote_branch_url`, and several `format!` interpolations that left dynamic path/query components un-encoded (commit id in the path, branch in the query string). `branch delete` was the instance that surfaced because it is the only verb that puts a dynamic value in the path. Replace both helpers with one `remote_url(base, segments, query)` that every remote call routes through. Callers pass structured segments and query pairs, so trailing-slash normalization (pop_if_empty) and per-segment / per-value percent-encoding live in one place. A stray `//` or an un-encoded dynamic component is no longer representable, closing the whole class rather than the reported instance. Migrates the previous commit's failing test to the new builder and adds the single-graph, trailing-slash, slash-in-name, commit-id-path, and query-value cases (the last two cover the previously latent siblings). All 16 callsites migrated; `remote_branch_url` removed.	2026-06-14 20:37:12 +02:00
Andrew Altshuler	d46e50dd6d	docs(user): restructure user docs into topic sections (Phase 1) (#223 ) Move the 23 flat docs/user/*.md files into topic subdirectories so the user guide is organized by area (schema, queries, search, branching, cli, operations, clusters, concepts, reference) instead of a flat list. This is a pure structural move — whole files relocated, every cross-doc link recomputed, no prose rewrites or content splits (those follow in Phase 2). - 19 `git mv`s (install.md, deployment.md stay top-level); history preserved (renames detected at 92–100% similarity). - All intra-doc links, AGENTS.md's topic table (52 pointers), and the docs/dev + docs/releases back-links recomputed via relpath from each file's new location. - docs/user/index.md rewritten as a sectioned nav hub. - Fixed 5 doc-path references in Rust (comments + two user-facing server settings error strings) to point at the new locations. Verified: zero broken .md links across tracked docs; check-agents-md.sh green (with the untracked scratch docs set aside); touched crates build. Note: the public site (omnigraph-web) imports docs/ via a flat-only script; its import-docs.mjs needs a subdir-aware update before the next re-sync. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 13:52:14 +03:00
Andrew Altshuler	8726ca92ec	feat: canonical POST /load, deprecate /ingest (RFC-009 Phase 5) (#222 ) * feat(server): canonical POST /load, deprecate /ingest (RFC-009 Phase 5) The CLI's non-deprecated `load` verb rode the deprecated `/ingest` route, so `/ingest`'s eventual removal would silently break it. Add a canonical `/load`, mirroring the shipped `/mutate`↔`/change` and `/query`↔`/read` pattern. - Extract `server_ingest`'s body into a shared `run_ingest` (branch-exists / fork-if-`from`, Cedar auth, admission, `load_as`, `IngestOutput` mapping). - `server_load` (canonical) → `run_ingest`, `Json<IngestOutput>`. - `server_ingest` (deprecated) → `run_ingest` + `#[deprecated]` + RFC 9745/8288 `Deprecation: true` / `Link: </load>; rel="successor-version"` headers. - Router mounts `/load` (same 32 MB body limit) beside `/ingest`; OpenAPI `paths(...)` gains `server_load` and flags `server_ingest` deprecated. `/load` reuses `IngestRequest`/`IngestOutput`, exactly as canonical `/mutate` reuses `Change` — a DTO rename is a separate, larger change (out of scope). openapi.json regenerated. Tests: openapi `/load` present + not deprecated, `/ingest` deprecated, `/load` bearer-secured; data_routes `/load` happy path + `/ingest` deprecation headers. Existing `/ingest` route tests stay green (the shim is unchanged). Docs: server.md endpoint table; RFC-009 Phase 5 marked landed (incl. the hand-mount-vs-utoipa-axum registration finding). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> feat(cli): point remote load at /load (RFC-009 Phase 5) `GraphClient::load`'s remote arm now POSTs to the canonical `/load` route instead of the deprecated `/ingest`; the deprecated `ingest` verb keeps riding `/ingest`. `parity_load` exercises `/load` on the remote arm (its documented flip); the matrix exclusions comment is updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 03:32:16 +03:00
Andrew Altshuler	6144bb18d6	feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3) (#221 ) * feat(cluster): cluster_root_for_graph_uri detection helper (RFC-010 Slice 3) Public helper the CLI uses to refuse `init` into a cluster-managed location: given a graph storage URI of the cluster layout (`<root>/graphs/<id>.omni`), return the cluster root if `<root>` holds `__cluster/state.json`, else None. Cheap by construction — a URI that doesn't match the `<root>/graphs/<id>.omni` shape returns None with zero I/O, so ordinary `init` targets never probe storage. Works for file:// and s3:// via the storage adapter. Adds two ClusterStore accessors (`display_root`, `has_state`). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3) Two cluster-graph-aware CLI behaviors, sharing the cluster-resolution path. Maintenance addressing. `optimize`/`repair`/`cleanup` gain `--cluster <dir\|s3://…> --cluster-graph <id>`, which resolves the graph's storage URI from the served cluster snapshot (the same truth a `--cluster` server boots from — `read_serving_snapshot`) and opens it embedded. The operator no longer hand-types `<storage>/graphs/<id>.omni`. A distinct flag is required because the global `--graph` is `requires = server` and means a remote multi-graph id. clap enforces both-or-neither and exclusion with the positional URI / `--target`; an unserved graph errors loudly, pointing at `cluster apply`. init signpost. `init` refuses a cluster-managed positional path (the `<root>/graphs/<id>.omni` layout where `<root>` holds `__cluster/state.json`, detected by `cluster_root_for_graph_uri`) and points at `cluster apply` — graphs in an established cluster are created with ledger/recovery/approvals, not by hand. The check is gated on the path shape, so ordinary `init` does no extra I/O and existing pre-apply cluster-graph inits are unaffected. planes guard remediation now also mentions `--cluster … --cluster-graph …` (the two Slice-1 guard-string tests track it). Docs updated (cli-reference Command planes, maintenance.md, cluster.md §7); the stale "no S3-hosted cluster directories" limitation is dropped (RFC-006 landed it). Tests (cli_cluster.rs, reusing the apply-a-cluster fixture): resolve by id, unknown-id error, `--cluster` requires `--cluster-graph`, init refusal + signpost, and ordinary init still works. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> fix(cli): resolve cluster graphs from the state ledger, not the serving snapshot Addresses the Greptile review on #221. `read_serving_snapshot` does all-or-nothing serving validation — recovery-sidecar checks plus a digest verify of every catalog payload (query .gq, policy blobs). Using it to resolve a maintenance target coupled `optimize`/`repair`/`cleanup` to the readiness of unrelated resources: a single corrupt policy blob, or a pending recovery sweep, would block the command before it could touch the graph — worst for `repair`, the tool you reach for when the cluster is degraded*. Add `omnigraph_cluster::resolve_graph_storage_uri(cluster, graph_id)`: read the state ledger, confirm the graph is in the applied revision, return `graph_root(id)` — the URI is deterministically derivable, no catalog validation. The CLI's cluster resolver now calls it. Test: `optimize --cluster … --cluster-graph …` still resolves after the catalog payloads (`__cluster/resources/`) are removed — the ledger-only path is not blocked by degraded/unrelated catalog state. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 02:52:21 +03:00
Andrew Altshuler	d6cf5b298c	feat(cli): plane-grouped --help + clap 4.6.1 (RFC-010 Slice 2) (#220 ) * chore(deps): bump clap to 4.6.1 Workspace constraint "4" → "4.6" so the resolver picks up the 4.6 line (a plain `cargo update` stayed on 4.5.x). clap 4.5.58 → 4.6.1 (clap_builder 4.6.0, clap_derive 4.6.1). Minor bump, no API breakage; the workspace builds and all CLI suites pass unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli): group --help by plane (RFC-010 Slice 2) Slice 1 declared the planes (the command_plane table + the wrong-plane guard); this makes them visible in `--help`. clap can't print labeled heading rows between subcommand groups (verified against the source — help_heading is args-only, {subcommands} is one flat block), so per the chosen approach: cluster + legend. - Reorder the `Command` enum into plane bands (clap lists subcommands in declaration order): data (query, mutate, load, branch, snapshot, export, commit, schema, graphs) → storage/local-graph ops (init, optimize, repair, cleanup, lint, queries) → control (cluster) → session (policy, embed, login, logout, config, version). No magic display_order numbers — the source order IS the help order, with band comments for readers. The band placement matches `command_plane` (lint/queries are storage-plane: they reject --server), so the help grouping and the guard agree. - Add an `after_help` legend on `Cli` naming the planes. Written to describe the planes (not enumerate every command) so it doesn't drift. Help-polish (post-review): hide the deprecated `ingest` from the list (still a valid command); trim the long `login` and `--as` descriptions to one line each so the columns don't blow up. The behavioral source of truth for planes stays `planes::command_plane`; this ordering is its cosmetic counterpart. Test: `help_groups_commands_by_plane` pins the legend phrase + the cluster ordering (query < optimize < cluster). Doc: a line under cli-reference's Command planes section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli): qualify mixed-plane commands in the --help legend Addresses the Greptile P2 on #220: the legend placed `schema` entirely in Data and `queries` entirely in Storage, but per `command_plane` the subcommands differ — `schema plan` is storage-plane (rejects --server) and `queries list` is session (no graph). A user reading the legend then running `schema plan --server` would hit a rejection contradicting it. The Commands list is one entry per top-level command (necessarily coarse), so the legend carries the nuance: `schema [plan: storage]` and `queries [list: session]`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 01:49:40 +03:00
Andrew Altshuler	4187d56f8a	fix(cli): align lint plane label + document the plane model (RFC-010 follow-up) (#218 ) Addresses the Greptile review on #217: P1 — `lint` reported two different names. `command_label` returns `lint`, but `execute_query_lint` passed `"query lint"` as the resolver operation string, so `lint --server` said `lint` while `lint <https>` said `query lint`. Both were pinned by tests. `query lint` is the deprecated alias (argv-rewritten to `lint`), so the canonical name is `lint`: switch both user-facing strings in `execute_query_lint` (the storage-plane bail label and the requires-schema-or-target usage message) to `lint`, and update the two pinned assertions in `cli_data.rs`. P2 — user-doc debt (AGENTS.md rule 1: error text is observable behavior). Document the plane model in `cli-reference.md` (new Command planes section: data vs storage/maintenance vs control, which addressing flags apply, and the declared wrong-plane / remote-target errors), and add an addressing note to `maintenance.md` cross-referencing it. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 22:58:51 +03:00
Andrew Altshuler	106356ab25	feat(cli): RFC-010 Slice 1 — declared plane capability surface + honest addressing (#217 ) * feat(cli): declared plane capability surface + wrong-plane guard (RFC-010 Slice 1) New `planes.rs` is the single source of truth for which plane each subcommand belongs to (Data / Storage / Control / Session). `command_plane` is an exhaustive match — adding a `Command` variant is a compile error until its plane is declared, so the surface cannot silently drift from the command set. It descends into the nested enums where the plane differs per subcommand (`schema plan` is storage while `schema show/apply` are data; `queries validate` opens the graph while `queries list` reads only config). `guard_addressing` runs once in `main` before dispatch: the data-plane addressing flags `--server`/`--graph` on any non-data verb now fail with one declared, pinned error instead of being silently ignored (`optimize --server prod` previously dropped `--server`). `init`'s message drops the `--target` half since it takes only a positional URI today. Test: `cli_schema_config::schema_plan_with_server_flag_errors_wrong_plane` pins the per-subcommand label, proving the guard descends into the nested enum. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(cli): storage-plane verbs fail loudly on a remote target (RFC-010 Slice 1) `optimize`/`repair`/`cleanup` switch from `resolve_uri` to `resolve_local_uri`, so a `--target` (or positional URI) that resolves to a remote server now fails with a declared storage-plane message instead of whatever `Omnigraph::open` said about an `http(s)://` URI. The `resolve_local_graph` bail is reworded to that storage-plane message, so every storage verb already on the local resolver (`schema plan`, `queries validate`, `lint`) speaks with one voice. Net: `optimize --target knowledge` resolves to the graph's storage URI and runs embedded; `optimize --target prod` (remote) fails loudly; `optimize --server` is caught earlier by the guard. Positional-URI invocations are unchanged. Tests (pinned strings, per RFC-010's test plan): optimize happy path on a local graph, `optimize --server` wrong-plane error, `optimize <https>` storage-plane error; the existing `query_lint_rejects_http_targets_without_schema` assertion is updated to the new shared message. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 22:45:58 +03:00
Andrew Altshuler	45500a690a	refactor(cli): collapse export + graphs-list onto GraphClient (RFC-009 Phase 3c) (#213 ) The last two embedded-vs-remote forks move onto the enum, so every such `if` in the CLI now lives in client.rs — the point of the refactor. - `export<W: Write>`: the streaming verb 3b deferred (writes to a writer, chunks the HTTP response body, rather than returning a DTO). Embedded calls db.export_jsonl_to_writer; Remote streams the chunked body through. Opens WITHOUT policy (like reads), so it routes via resolve(). - `list_graphs`: remote-only by design (no local enumeration endpoint), so the Embedded arm keeps the loud "requires a remote multi-graph server" bail verbatim. Routing it through the enum still buys the shared resolve() addressing/token preamble the arm hand-rolled. Retire the now-orphaned execute_export_to_writer / execute_export_remote_to_writer pair, and sweep two pre-existing dead fns while in the files: inferred_config_path (helpers.rs) and yaml_string (output.rs, shadowed by test-local copies). parity_matrix gains one row, parity_export — the single intended matrix change in this phase. Export is a JSONL stream, not a single --json doc, so it compares the two arms' output line-wise (sorted; twin graphs are byte-copies so rows need no scrubbing). graphs-list gets no row: its remote-only behavior is a documented exclusion, not an equality case. Full workspace tests pass; all 12 parity rows green. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 21:03:45 +03:00
Andrew Altshuler	d32c1ac191	refactor(cli): collapse write/query forks onto GraphClient (RFC-009 Phase 3b) (#211 ) Phase 3a put the GraphClient enum in place and collapsed the five uniform read forks. 3b folds the remaining data-plane forks onto the same enum: load, ingest, mutate, query, branch create/delete/merge, and schema apply. The wrinkle 3a deferred was the local policy attachment. Reads and query open the local engine without a policy; writes open through open_local_db_with_policy and attribute a resolved actor. So the Embedded variant grows an optional policy context (graph/actor) filled by a second factory, resolve_with_policy; resolve() leaves it empty. open_embedded picks the open path from whether the context is present, preserving both of today's behaviors exactly. query still uses resolve() (no policy), as the read path did. apply_schema takes the catalog-validator closure as impl FnOnce(&Catalog) — the embedded arm runs it inside apply_schema_as_with_catalog_check, the remote arm ignores it (the server runs its own check). That non-object-safe closure is why GraphClient is an enum, not a trait. The stored-query registry is still built caller-side and only for the local path. load and ingest stay separate methods: same operation, but load surfaces the CLI LoadOutput (two distinct per-arm mappings preserved) while ingest surfaces the wire IngestOutput. The now-fully-dead execute_read/ execute_read_remote and execute_change/execute_change_remote pairs are retired (legacy_change_request_body stays — client.rs uses it); the export pair remains for 3c. The Phase-1 parity matrix is unchanged and green; full workspace tests pass. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 19:25:57 +03:00
aaltshuler	25d74d689d	refactor(cli): GraphClient enum + read verbs (RFC-009 Phase 3a) The embedded-vs-remote split gets one home: a GraphClient enum (Embedded { uri } \| Remote { http, base_url, token }) with a resolve() factory that absorbs the shared preamble (apply_server_flag -> token -> URI/remoteness) and a verb method per command. The five uniform read forks — branch list, commit list, commit show, schema show, snapshot — collapse from per-command if-graph-is-remote else to one line each (main.rs: -113/+47). Behavior identical per verb (local reads still open WITHOUT policy, as today); the Phase-1 parity matrix is the referee and passes textually unchanged. Enum, not the RFC trait: only two variants ever, and inherent async methods avoid async_trait boxing and the apply_schema closure that is not object-safe (3b) — same one-body-two-impls collapse, less ceremony. Scope: the uniform reads only. The query verb (policy-open + operator- alias early-return + param merge) joins the write verbs in 3b; export/streaming and graphs-list in 3c, where the now-shared execute__remote/execute_ pairs get retired. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 17:44:49 +03:00
aaltshuler	adbb2a181c	refactor(cli): consume omnigraph-api-types directly; unify the load mapping The CLI's wire-DTO imports repoint from omnigraph_server::api to omnigraph-api-types (the server's other exports — queries registry, config types — still come from omnigraph-server). The local Load arm's inline LoadOutput hand-construction in main.rs is extracted into load_output_from_result next to load_output_from_tables in output.rs, so both '-> LoadOutput' mappings (engine LoadResult for local, wire IngestOutput for remote) live in one place. Deviation from the plan, with reason: LoadOutput stays CLI-side rather than moving into the wire-DTO crate — it is a rendered CLI output type, not an HTTP wire DTO, and its mapping consumes a CLI clap type (CliLoadMode). The shared crate stays strictly wire DTOs. Shapes unchanged: the parity matrix passes textually unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 17:05:32 +03:00
aaltshuler	5328c91341	refactor(cli): drop cluster init — no replacement scaffold Andrew's call, and the right one by the repo's own lens: a minimal cluster.yaml is five lines; a generator is a second copy of the schema to keep in sync forever, emitting a file that is unusable until hand-edited anyway (graphs: {} cannot apply or serve). Terraform has no config scaffolder either. New users copy from the cluster quick-start; migrants get a ready-to-review cluster.yaml from config migrate. RFC-008 stage 3 becomes purely subtractive. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 23:45:18 +03:00
aaltshuler	5ba9656666	feat(cli): init stops scaffolding omnigraph.yaml; cluster init replaces it (RFC-008 stage 3) omnigraph init no longer writes a legacy config into cwd (the source of the earlier test-pollution bug, and a scaffold for a deprecated file); the scaffolder is deleted. omnigraph cluster init scaffolds the replacement: a minimal valid cluster.yaml (version: 1, optional metadata.name / storage:, a commented graphs example), refusing to overwrite. The scaffold validates clean via cluster validate in the e2e. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 23:34:04 +03:00
aaltshuler	cd1f175396	feat(cli): omnigraph config migrate — the RFC-008 split (stage 2) Reads a legacy omnigraph.yaml and produces the three-section split: team half as a ready-to-review cluster.yaml proposal (graphs with TODO schema pointers — the legacy file never knew schemas — per-graph queries directories, policies with applies_to bindings), personal half as an operator-config merge (actor, output/table defaults — OperatorDefaults gains the two table keys with their cascade hops — remote graphs with bearer_token_env become servers entries plus a printed login step, and legacy aliases split per the RFC: content to the catalog as a manual step, binding to an operator alias), plus a dropped-keys section with reasons. Touches nothing without --write; with it, the operator merge is key-level (existing entries always win; prior file backed up), and cluster.yaml is emitted only when absent (else cluster.yaml.proposed). --json emits the report structurally. The completeness contract is a unit test: every top-level key of the legacy schema must classify somewhere, or the RFC-008 map has a bug. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 23:32:05 +03:00
aaltshuler	20ddfc61c1	fix(cli): reclaim the hidden legacy-uri positional for operator aliases Caught on the live smoke: with --alias, the first bare CLI arg lands in the hidden legacy_uri positional, so an operator alias's positional param never bound ('parameter not provided' from the server). An operator alias always knows its target, so the existing normalize_legacy_alias_uri reclaims the swallowed positional as the first alias arg — same rule the legacy path already applies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 22:29:57 +03:00
aaltshuler	dc91c55970	feat(cli): operator aliases — pure bindings invoking stored queries (RFC-007 PR 3, part 2) aliases: in the operator config bind a personal name to (server, graph, stored-query NAME, positional arg mapping, fixed param defaults, format) — zero content, per the ratified bindings-not-content model. Invocation goes through the server's stored-query endpoint (POST {base}/graphs/{g}/queries/{name}) with the keyed credential resolving via the ordinary URL match; param precedence --params > positionals > fixed defaults; the result renders through the existing format cascade with the alias's format as its hop. A legacy omnigraph.yaml alias with the same name wins during the RFC-008 window, with a warning naming both. E2e (spawned policy-gated server, invoke_query granted via a per-graph bundle): the alias invokes with name + one positional and nothing else — server, graph, query, and token all from the operator layer; --server/ --graph explicit targeting; unknown --server lists defined names; --server exclusive with a positional URI. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 22:25:42 +03:00
aaltshuler	2b33ab64f2	feat(cli): --server <name> targeting (RFC-007 PR 3, part 1) Global flags --server (operator-defined server name) and --graph (graph id on a multi-graph server, requires --server) resolve to the effective remote URI through one helper and feed the ordinary uri slot — graph resolution and the PR-2 keyed-token URL match work unchanged; the flag is sugar for a URI the operator already owns. Exclusive with a positional URI and --target (loud error, never silent precedence). Unknown names fail listing the servers that ARE defined. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 22:19:25 +03:00
aaltshuler	a819ab500e	feat(cli): keyed credentials — servers:, the token chain, login/logout (RFC-007 PR 2) The operator config gains servers: (name -> url; never a token). A remote command whose URL prefix-matches an operator server resolves its bearer token through the keyed chain first — OMNIGRAPH_TOKEN_<NAME> env, then the [<name>] section of ~/.omnigraph/credentials (created 0600 via temp+rename, #139 finding 7; group/world-readable files refused loudly) — falling through to the legacy chain unchanged. URL keying makes §D5 rule 3 structural: a token is only ever sent to the server it is keyed to. Longest-prefix matching with a path-boundary check (http://h:8080 never matches http://h:8080-evil). Inserting the keyed hop above the legacy chain is safe by construction — no existing setup can have servers: defined. omnigraph login <name> stores/rotates one section (token from --token or one stdin line — the pipe flow keeps secrets out of shell history); omnigraph logout removes it, idempotently; logging in before declaring the server warns instead of failing (the gh model). Coverage: URL-match/no-substring-trap, credentials round-trip preserving sibling sections, 0600 write + over-permissive refusal, env-name mapping; the legacy resolve test is now hermetic against a real ~/.omnigraph and asserts byte-identical legacy behavior with no servers defined; one spawned-binary e2e walks the whole lifecycle against an authed server: refusal -> wrong-token login (stdin) -> rotate (--token) -> authorized read -> env-beats-file -> non-matching-URL negative -> logout revokes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 21:24:51 +03:00
aaltshuler	be4bd46212	feat(cli): the operator config surface — identity and output defaults (RFC-007 PR 1) ~/.omnigraph/config.yaml joins the resolution chains as the operator surface: operator.actor becomes the last hop of THE actor chain (--as > legacy cli.actor during the RFC-008 window > operator.actor > none, one implementation for direct-engine and cluster commands alike) and defaults.output joins the read-format cascade below every more-specific source. Discovery honors $OMNIGRAPH_HOME (tilde-expanded, #139 finding 9); an absent file is an empty layer; unknown keys WARN and load (a file written for later slices must not break this CLI); malformed YAML is a loud error. The module is CLI-only — the server never reads operator config (invariant 11 by construction). $OMNIGRAPH_CONFIG becomes a first-class stand-in for --config in load_config (flag > env > ./omnigraph.yaml), one meaning in both binaries. The test harness pins hermeticity: spawned binaries get a nonexistent OMNIGRAPH_HOME by default so no test ever reads the developer's real operator config. New coverage: loader unit tests, the env-precedence matrix on load_config_in, and spawned-binary e2es for the actor chain (operator wins with no flag/legacy key; legacy outranks it; --as wins) and the format cascade. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 20:29:02 +03:00
aaltshuler	916015c416	refactor(cli): split main.rs into cli/helpers/output modules Verbatim moves: the clap surface (every command/subcommand/arg struct) to cli.rs, resolution helpers (config/actor/graph/branch/query, remote HTTP, env/token, scaffolding) to helpers.rs, human/JSON formatting to output.rs, the in-source test mod to main_tests.rs via #[path]. main.rs (1,184 lines) keeps main() and the dispatch match. Visibility bumps only; 22 binary tests green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:14:27 +03:00
aaltshuler	fd002abaa5	feat(cluster): port the storage backend to the engine StorageAdapter LocalStateBackend becomes ClusterStore: every stored byte — state ledger, lock, recovery sidecars, approval artifacts — now flows through the engine's StorageAdapter, making file:// and s3:// one code path. Behavior on the file backend is byte-compatible (layout, CAS semantics, diagnostics, lock release timing) and the entire pre-existing suite passes unchanged. Mechanics: the ledger CAS keeps its public sha256 vocabulary while the physical swap is token-conditioned (ETag If-Match on S3 via PR #186's primitives; content-token + temp/rename locally — the pre-port semantics); the lock is a create-only put (genuinely cross-machine on object stores) with deterministic drop-release locally and best-effort spawned release on S3; sidecars/approvals address by URI (SweepOutcome and the executors carry strings); sweep row-1 retirement joins the uniform deferred post-CAS cleanup. ClusterStore also gains the catalog-payload and graph-root methods that commit 2 wires in. Async ripple: status/force-unlock/serving-snapshot and the server's settings loader chain go async (CLI dispatch and ~20 test hosts follow, mechanically). tokio joins the cluster crate's runtime deps for the lock guard's handle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 14:11:14 +03:00
aaltshuler	fa6af775c1	feat(cli)!: unified load command; deprecate ingest as an alias omnigraph load is now the single data-write command: - works against remote graphs (POSTs the server's /ingest endpoint with the same bearer/actor resolution as other remote commands) — previously load was the only data command forced to open Lance storage directly - --from <base> opts into fork-if-missing for --branch (the former ingest semantics); without --from a missing branch is an error, never a fork - --mode is now required: overwrite is destructive, so there is no implicit default (the old silent default was overwrite) - output gains base_branch/branch_created (and table sums on remote loads) omnigraph ingest stays as a deprecated alias (defaults preserved: --from main --mode merge) that prints a one-line warning to stderr, matching the read/change deprecation convention; removal in a later release. Docs updated in the same change: cli.md, cli-reference.md, policy.md, audit.md, execution.md (unified load section), AGENTS.md quick-flow, README.md. BREAKING CHANGE: scripts running omnigraph load without --mode must now pass it explicitly (previously defaulted to the destructive overwrite). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 04:18:00 +03:00
aaltshuler	90676ef52f	feat(server)!: POST /ingest forks only when 'from' is present Branch creation becomes opt-in by presence of the request's 'from' field. Previously the handler defaulted from to 'main' and always auto-created a missing branch — a typo'd branch name silently forked main and landed the data there, with the client none the wiser. Now a request without 'from' against a missing branch returns 404 branch-not-found and creates nothing; with 'from' set, fork-if-missing behaves as before. The BranchCreate authority is only consulted when a fork will actually happen. The handler calls the unified load_as directly (the deprecated ingest_as shim is no longer used in the server). IngestOutput.base_branch becomes nullable: it echoes the request's 'from' and is null when absent. OpenAPI regenerated; the CLI's local ingest arm moves to load_file_as + the new converter shape. BREAKING CHANGE: clients that relied on implicit fork-from-main with 'from' omitted must now pass from='main' explicitly. IngestOutput.base_branch is now nullable. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 04:05:29 +03:00
aaltshuler	e676c151bb	feat(engine): unify load/ingest — load_as gains an optional fork base load_as/load_file_as gain a base: Option<&str> parameter: with Some(base) a missing target branch is forked from base first (the former ingest semantics); with None the target branch must exist — staging fails on an unknown branch, so a typo'd name can never create one. LoadResult gains branch/base_branch/branch_created metadata (additive). The ingest family (ingest, ingest_as, ingest_file, ingest_file_as) becomes #[deprecated] shims over load_as that preserve the historical contract exactly (from: None still means fork from main; base recorded even when no fork happened). IngestResult and to_ingest_tables stay for the shims and the server until the removal release. The layered policy check is unchanged: Change on the target branch always, BranchCreate additionally when a fork actually happens (enforced inside branch_create_from_as with the actor threaded through). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 03:53:22 +03:00
aaltshuler	3b2bf755ae	fix(cli): address review — honor the one-thing contract, restore docs, untangle test phases - resolve_cluster_actor uses load_config directly: load_cli_config also loads auth.env_file into the process env — a second thing, violating the documented 'exactly one thing' omnigraph.yaml contract for cluster ops. - resolve_cli_actor gets its doc comment back (the inserted helper had absorbed the contiguous /// block). - The actor-default test imports once as setup and asserts on apply alone, idempotently, instead of re-importing inside the assertion helper. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 22:54:05 +03:00
aaltshuler	f3374ac6dc	feat(cli): resolve cluster actor via the per-operator config cascade Cluster FACTS stay unlayered (cluster.yaml only), but the operator's identity is a per-operator fact — exactly the per-operator omnigraph.yaml's permanent job, and the cascade every data-plane write already uses. cluster apply/approve now resolve: --as flag wins and skips any config read entirely (containers and CI stay config-free); without it, the standard cwd search supplies cli.actor, with a malformed config failing loudly and actionably ('pass --as to skip this lookup') rather than silently dropping attribution. approve's no-actor error now names both sources. Tests pin the contract from both sides: cli.actor is the no-flag default for apply (echoed actor) and approve (approved_by), the flag overrides it, a malformed omnigraph.yaml in cwd breaks nothing except the no-flag actor lookup, and a conflicting well-formed one leaks nothing into cluster outputs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 22:29:49 +03:00
aaltshuler	711865e6f1	docs(cluster,server): the Phase 5 mode switch; retire applied-not-serving caveats The standing caveat ('applied means recorded in the cluster catalog — nothing more; the server still boots from omnigraph.yaml') retires: cluster docs gain the 'Serving from the cluster' section (exclusivity, applied- revision serving, fail-fast readiness, restart-to-pick-up, expose-all bridge), server.md gains mode-inference rule 0 and the cluster-booted multi mode, deployment.md the boot-source choice, and the CLI's apply note plus the cli-reference cluster row (stale back to Stage 3A) now describe the full convergence surface. RFC-005 flips to Landed with four implementation deviations recorded. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 17:56:54 +03:00
aaltshuler	0b84b1adc3	feat(cluster): record policy applies_to bindings in the applied revision Slice 5A of RFC-005: the state ledger becomes serving-sufficient for the Phase-5 server boot. StateResource gains an optional applies_to (normalized typed refs: cluster \| graph.<id>), written by apply for every applied policy create/update from the desired config's validated bindings. The hole this closes: applies_to is not part of the policy file digest, so a binding-only edit previously produced NO plan change at all (a 4C e2e even asserted that — the gap, not a contract). Binding changes are now first-class: a post-diff pass emits an Update with equal before/after digests and a binding_change marker (visible in plan/apply JSON and human output as [bindings]), classification/execution treat it as an ordinary catalog-tier applied change (payload skips naturally — the blob is unchanged), and convergence requires zero binding divergence, so stale bindings can never report converged. Pre-5A ledger entries (no bindings recorded) surface as the same backfill Update; one apply heals them, exactly the remedy RFC-005's boot-error path names. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 15:30:33 +03:00
aaltshuler	f4e9105272	feat(cluster): cluster approve — digest-bound approval artifacts RFC-004 §D4, gate half: graph deletes (and their subtree) now classify Blocked/approval_required instead of Deferred; the new cluster approve command (requires the global --as actor) writes __cluster/approvals/{ulid}.json bound to the desired config digest and the change's before/after digests, so config or state drift invalidates the artifact automatically (approval_stale warning, never authorizes). One gate per subtree: compute_approvals lists only the graph-level delete, and ApprovalRequirement gains a satisfied flag surfaced by plan. Consumption and the delete executor land next — until then approved deletes stay blocked so a gate-only build can never strip state without removing the root. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 14:30:05 +03:00
aaltshuler	a1ba4dc413	feat(cluster): execute schema applies in cluster apply Stage 4B (RFC-004 §D1/§D5): schema.<id> Update changes classify Applied and execute after graph creates, sequentially and sidecar-fenced — read-write open (the engine's own recovery runs first), pre-op manifest pin recorded, apply_schema_as with allow_data_loss: false (soft drops only; hard drops wait for 4C's approval artifacts), post-op pin rewritten into the sidecar, sidecar retired only after the final state CAS. Queries gated on a same-plan schema update unblock (the migration lands first in the same run); failures — unsupported migrations, lock contention, user branches — surface as schema_apply_failed with the engine's message, demote dependents via the origin-aware demotion helper, and stop further graph-moving work. Schema evolution is now fully cluster-driven (the defer -> manual schema apply -> refresh loop is gone), and out-of-band schema drift is converged back by apply as an ordinary soft migration (axiom 8: drift correction is gated like any change; the recoverable tier needs no approval) — both pinned by reworked e2es. The multi-graph mixed e2e's deferred row is now delete-shaped, pre-staging the 4C surface. Actor: cluster apply accepts the CLI's global --as via the new ApplyOptions / apply_config_dir_with_options (apply_config_dir delegates unchanged); the actor is echoed in ApplyOutput and recorded in sidecars and audit entries, and threads to apply_schema_as so Cedar fires wherever a checker is installed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:12:15 +03:00
aaltshuler	ca63a9340b	feat(cluster): embed schema migration previews in cluster plan RFC-004 §D7's data-aware preview: for every schema update, plan opens the live graph read-only and embeds the engine's migration plan (supported flag + typed steps) in the change record; the human renderer prints the steps. Preview failures (unreachable graph, planner error) degrade to the digest diff with a schema_preview_unavailable warning — planning never blocks. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:04:19 +03:00
aaltshuler	b313075476	refactor(cluster): make plan_config_dir async Mechanical conversion ahead of Stage 4B (plan will preview schema migrations against live graphs): signature, CLI dispatch, and test callers. Zero behavior change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 13:02:12 +03:00
aaltshuler	6fbf09d5c9	refactor(cluster): make apply_config_dir async Mechanical conversion ahead of Stage 4A graph create (which calls the async Omnigraph::init from inside apply): the fn signature, the CLI dispatch arm, and every test caller (#[test] -> #[tokio::test]). Zero behavior change; all 60 lib tests and 3 failpoint tests green before and after. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 04:43:38 +03:00
aaltshuler	5e1dede08f	fix(cluster,cli): apply failure output — persisted statuses only, changes list printed Two review findings (greptile, PR #165): - ApplyOutput.resource_statuses on a failed state write now carries the pre-apply on-disk snapshot instead of the in-memory mutations that were never persisted, so automation reading the field independently of `ok` cannot see phantom applied/blocked statuses. Regression test forces the state write to fail via a read-only __cluster dir (unix-only, skips when permissions are not enforced). - Human-mode `cluster apply` prints the classified changes list on failure too, so an operator debugging a partial apply without --json sees what was attempted. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 00:35:03 +03:00
aaltshuler	bcef8444dd	feat(cli): omnigraph cluster apply Terraform-style: apply executes directly (cluster plan is the preview, now annotated with apply dispositions). Human output prints per-change dispositions, convergence, and the catalog-only caveat; --json emits the full ApplyOutput. Exit is non-zero only on errors — deferred/blocked changes are warnings with converged: false as the automation signal. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 23:34:48 +03:00
aaltshuler	89b876c797	Add cluster state lock recovery	2026-06-09 22:31:46 +03:00
aaltshuler	d00d42274e	Implement cluster refresh and import	2026-06-09 21:17:23 +03:00
aaltshuler	b046515e1c	Merge origin/main into cluster-config-docs	2026-06-09 18:11:12 +03:00
Ragnor Comerford	d0e39e677e	fix(maintenance): route uncovered drift through repair (#156 ) * docs(invariants): note the non-atomic manifest->commit-graph publish gap Every graph publish commits __manifest then appends _graph_commits as two separate writes; a crash between them leaves the manifest ahead of the commit DAG. Live reads + durability are unaffected (reads resolve via the manifest) and recovery does not repair it; impact is bounded to commit history / time-travel by commit id / merge-base completeness. Pre-existing across all publishes, not the optimize reconcile specifically. Documented as a Known Gap; the fix is a commit-graph reconcilable from the manifest, not a recovery sidecar. * fix(maintenance): route uncovered drift through repair * fix(maintenance): harden repair review feedback	2026-06-09 14:42:54 +02:00
aaltshuler	a7956ea5a9	Add cluster JSON state ledger status	2026-06-08 21:09:23 +03:00
aaltshuler	043b02e617	feat(cluster): add read-only validate and plan	2026-06-08 20:07:39 +03:00
Ragnor Comerford	d54bccb940	fix(optimize): skip blob-bearing tables to avoid Lance compaction crash (#138 ) Some checks failed CI / Classify Changes (push) Has been cancelled Details CI / Check AGENTS.md Links (push) Has been cancelled Details CI / Container Entrypoint (push) Has been cancelled Details Release Edge / Prepare edge release (push) Has been cancelled Details CI / Test Workspace (push) Has been cancelled Details CI / Test omnigraph-server --features aws (push) Has been cancelled Details CI / Test Windows release binaries (push) Has been cancelled Details CI / RustFS S3 Integration (push) Has been cancelled Details Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled Details Release Edge / Build edge omnigraph-windows-x86_64 (push) Has been cancelled Details Release Edge / Smoke Windows installer (push) Has been cancelled Details * test(optimize): pin Lance blob-column compaction failure as a surface guard Lance compact_files mis-decodes blob-v2 columns under its forced BlobHandling::AllBinary read ("more fields in the schema than provided column indices"), failing even a pristine uniform-V2_2 multi-fragment blob table; reads use descriptor handling and are unaffected. Guard 10 reproduces this and is self-retiring: it turns red on the Lance bump that fixes the bug, forcing LANCE_SUPPORTS_BLOB_COMPACTION to flip. * fix(optimize): skip blob-bearing tables instead of crashing compaction omnigraph optimize aborted the whole sweep when any node/edge table had a Blob property: Lance compact_files cannot decode blob-v2 columns under AllBinary (the column-index error pinned by the surface guard). Skip blob-bearing tables behind a LANCE_SUPPORTS_BLOB_COMPACTION gate and report them via TableOptimizeStats.skipped / SkipReason (surfaced in the CLI and a tracing::warn) instead of erroring, which also isolates the failure so the other tables still compact. Reads/writes are unaffected; only fragment/space reclamation on blob tables is deferred until the upstream Lance fix. Adds a maintenance.rs regression test (validated red with the column-index symptom before the fix, green after), a concise v0.6.1 release note, and updates docs (maintenance, cli-reference, AGENTS capability matrix, invariants Known Gaps, lance.md audit, constants). * refactor(optimize): make TableOptimizeStats and SkipReason non_exhaustive Both are returned result types, never built by callers, so #[non_exhaustive] makes this the last field/variant addition that can break downstream literal construction and keeps future ones non-breaking (review feedback on the public-field addition). The v0.6.1 Compatibility Notes call out the source-level change. Also drops the now-stale "RED today / GREEN after the fix lands" narration in the optimize_skips_blob_table_and_reports_skip test (historical regression context now that the fix is in this branch), and folds in the expanded v0.6.1 release note. * chore(release): bump workspace to v0.6.1 Coherent version bump to accompany the v0.6.1 release note: all five crate manifests + path-dependency constraints, Cargo.lock, the AGENTS.md surveyed-version line, and openapi.json info.version move 0.6.0 -> 0.6.1. Matches the established release pattern (#118 landed the v0.6.0 note + bump together) and resolves the Codex/Devin review flag that a v0.6.1 note without a bump leaves CARGO_PKG_VERSION reporting 0.6.0 and mixed package versions.	2026-06-02 17:12:00 +02:00
Ragnor Comerford	3c2b1b8051	Stored-query registry foundation + config/CLI RFC-002 (#128 ) * MR-969: add stored-query registry config surface Introduce the `queries:` block in omnigraph.yaml — an inline `name -> entry` map of stored queries, per-graph (`graphs.<id>.queries`) and top-level for single-graph mode, mirroring how `policy` is wired in both modes. Each entry points at a `.gq` file and carries optional MCP exposure settings (`expose`, `tool_name`), defaulting to not-exposed. Additive: absent `queries:` leaves current behavior unchanged. - QueryEntry { file, mcp: McpSettings { expose, tool_name } } - `queries` field on TargetConfig + OmnigraphConfig (serde default) - query_entries() / target_query_entries() accessors - resolve_query_file() — base_dir-relative `.gq` path resolution - round-trip + absent-block tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add stored-query registry loader and GraphHandle wiring Add a `queries` module: QueryRegistry loads each declared `.gq` entry, parses it, and selects the query whose symbol matches the manifest key, asserting the two agree (key == `query <name>` symbol). Identity is the query name; a key/symbol mismatch is a load-time error. Errors are collected, not fail-fast, so a bad registry surfaces every broken entry at once. Schema type-checking is deliberately left to a separate pass so the loader stays callable without an open engine. Thread an `Option<Arc<QueryRegistry>>` through GraphHandle alongside the per-graph policy; the URI-canonicalizing clone propagates it. Production openers default to None for now — the boot path loads and attaches the registry in a later change. - QueryRegistry::{from_specs, load, lookup, iter}; StoredQuery::is_mutation - GraphHandle.queries field, propagated on canonical clone - registry unit tests: identity match/mismatch, multi-query selection, per-entry parse errors, error collection, mutation classification Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: add RFC-002 config & CLI architecture Layered config (user-global ~/.config/omnigraph/ + per-project), a unifying `target` abstraction resolving to (locus, graph, sub-state, credential) with embedded-URI XOR remote-server loci, multi-server × multi-graph client targeting, credentials by-reference, and the file-naming decision: project and server config are one artifact (`omnigraph.yaml`); the only differently-named file is the user-global `config.yaml`, split by scope not role. Includes the 12-factor bind portability rule (prefer --bind/OMNIGRAPH_BIND over a committed server.bind) and the defined-locally / invoked-remotely model for stored queries. Derived from first principles working backwards from what the engine enables; validated against kube/Helix/git/compose. Linked from docs/dev/index.md. Proposed; phased rollout for the MR-973/974/981 family. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add check() to validate stored queries against the live schema A pure check(registry, catalog) that type-checks every stored query via the same typecheck_query_decl the engine runs for inline queries — no parallel implementation. Failures are collected, not fail-fast, so an operator sees every broken query (e.g. a type/property a migration renamed or removed) in one pass. Breakages are fatal (the boot path will refuse to start); warnings are advisory. Pure over (registry, catalog) so it is callable both at boot (engine catalog) and offline from the CLI without an open engine. Advisory lint: an mcp.expose:true query that declares a Vector(N) parameter warns — an LLM cannot supply a raw embedding vector; such a query should take a String parameter and embed server-side. Warns rather than rejects, since service-to-service callers may pass vectors. - CheckReport { breakages, warnings }; has_breakages / is_clean - tests: valid query, unknown type, unknown property, collect-not-fail-fast, vector-param-exposed warns, unexposed silent Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Drop internal plan-label refs from stored-query config comments Doc comments referenced sequencing labels ("C2") that mean nothing to a reader; reword to describe the behavior directly. Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: reconcile aliases with the role model in RFC-002 Place the existing client-only `aliases:` block in the client/server role split: aliases are client-role (CLI, embedded, ungated) and may live in both user-global and project config; `queries:` is server-role (deployment manifest only). They overlap as "name -> .gq"; `queries:` is the superset, and the end-state subsumes aliases (definition -> queries, target/branch/format -> client invocation context, positional args -> CLI sugar). v1 keeps aliases unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: make RFC-002 config global-first, project-optional The global user config is the primary, self-sufficient default; the CLI works from any directory with no project file (the kubectl/aws/gh posture), a deliberate flip from today's project-anchored behavior. The project omnigraph.yaml becomes an optional repo-scoped override and the deployment manifest. Uniform schema, both layers optional; global can hold any section including a personal server's graphs/queries. Additive: project still overrides global; the flip adds a fallback layer below the project file rather than removing it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: justify XDG ~/.config/omnigraph over legacy ~/.omnigraph in RFC-002 Make the rationale explicit: XDG-first because OmniGraph is a client that will cache remote catalogs and keep session state alongside secrets, and XDG separates config / cache / state into distinct dirs (clear cache without touching creds; backups skip cache) whereas a single ~/.omnigraph/ mixes them. Honor ~/.omnigraph/ as a fallback for the peer-group (aws/kube/docker/helix) expectation. Add XDG_CACHE_HOME / XDG_STATE_HOME to the override precedence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: build RFC-002 credentials on the existing env-file mechanism OmniGraph already has credentials-by-reference: bearer_token_env names the env var, and auth.env_file is a git-ignored dotenv the CLI auto-loads (real env vars win), resolved via resolve_remote_bearer_token. The RFC's proposed credentials.yaml + token_env were redundant parallel inventions. Reconcile: reuse bearer_token_env (extend to servers.<name>) and auth.env_file (add a global ~/.config/omnigraph/.env layered under the project .env.omni); OS keychain is an additive future resolver. No new credentials.yaml. Updated summary, non-goals, background, file-naming, credentials, example, login, migration, rollout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: use single ~/.omnigraph dir (Helix-style), not XDG, in RFC-002 Reverse the earlier XDG-first call. The prior argument rested on a false dichotomy (single-dir => mixed config/cache/state); in fact the peer tools (aws, kube, helix) achieve separation via SUBDIRECTORIES inside one ~/.tool/ dir (~/.aws/sso/cache/, ~/.kube/cache/), getting cache hygiene AND one discoverable place. So everything goes under ~/.omnigraph/: config.yaml, credentials (dotenv, 0600), cache/, state/. Lower cognitive load, matches what DB/cloud-CLI users expect, matches Helix. OMNIGRAPH_HOME overrides; $XDG_CONFIG_HOME optionally honored but ~/.omnigraph/ is canonical. Updated all paths, the rationale paragraph, the file-naming table (added a cache/state row), and env precedence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: reconcile RFC-002 with shipped/planned CLI tickets Align with reality found in existing tickets: - Noun is graph/graphs, not target/targets (MR-603 done renamed the config key targets->graphs, flag --graph). Use graphs:/--graph; an entry is embedded (uri) XOR remote (server + remote graph name). - ~/.omnigraph/ confirmed by MR-581 (og template pull, done) which already quick-starts templates there. - Templates already exist (MR-581/MR-531) — not invented here. - The init family is already specced (init, quickstart MR-973, serve MR-970, prune MR-972, mcp install MR-974, agent-mode MR-981); this RFC only adds the user route (~/.omnigraph/config.yaml + login). - aliases: -> operations: planned (MR-839). - bearer_token_env gap tracked in MR-971. - query lint/check already exist (MR-639) — registry validator must not collide with the singular `query check`. Add a Reconciliation section; fix the canonical example to graphs:/--graph. Also: merge semantics refined (deep-merge settings, replace named entries, replace lists, config view --resolved --show-origin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: correct stale-ticket claims and fold init/bootstrap design into RFC-002 Verify against code, not ticket statuses (MR-581 is marked done but is stale/unbuilt): no ~/.omnigraph usage, no template/serve/quickstart/ prune/login commands exist; config still uses aliases: (no operations:). So ~/.omnigraph/ stands on peer-convention merits alone, and templates are a design question, not a foothold. Add §7.5: the three-tier init model (user route = login + ~/.omnigraph/config.yaml; thin project init; fat quickstart + templates) with first-principles positions (split init/login, in-place refuse-if-exists, interactive vs --auto/agent-mode, --template flag, secrets-on-scaffold gitignore rule). This RFC owns only the user route; the rest are sibling tickets (MR-973/970/972/974/981). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: breadboard + slice Shape A in RFC-002 Add the implementation breadboard (places P1-P5, affordances N1-N14 with NEW markers, mermaid) and five vertical slices for the selected config/ CLI/init shape: V1 global layer + merge engine + config view; V2 remote graphs + HTTP-client path + credential resolution; V3 omnigraph login; V4 init-hardening + quickstart + templates (rides MR-970); V5 agent-mode (MR-981). Rollout reordered to the slice sequence; spikes X1-X4 gate their owning slice. V1-V2 close the substantive client->server gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Add InvokeQuery Cedar action (coarse, graph-scoped) A per-graph, branch-scoped action that gates invoking a server-side stored query by name. Coarse for now: an `invoke_query` allow rule permits any stored query on the graph; a future, additive refinement adds an optional per-query-name scope without changing rules written against the coarse action. Enforcement is at the HTTP boundary; the engine `_as` writers still enforce read/change per the query body, so a stored mutation is double-gated (invoke_query to reach the tool, change for the write). No call site yet — the invocation handler wires it in a later change (same pattern as Admin/GraphList added ahead of consumers). - variant + as_str/resource_kind(Graph)/FromStr/uses_branch_scope - Cedar schema: invoke_query appliesTo Graph - tests: per-graph allow/deny, branch-scope accepted Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Load and type-check stored queries at server boot, refusing breakage At startup the server now loads each graph's stored-query registry, type-checks every query against that graph's live schema, and refuses to boot if any query references a type/property the schema doesn't have (same posture as bad policy YAML) — so schema drift surfaces at the deploy boundary, not silently at invocation. Non-blocking warnings are logged. The validated registry is attached to the GraphHandle (the two production sites previously held `queries: None`). Loading (parse + key==symbol identity) happens at settings-build time where the config is in scope; the schema type-check happens after each engine opens (single mode in `open_single_with_queries`, multi mode in `open_single_graph`). `open_with_bearer_tokens_and_policy` delegates with an empty registry so its 18 test callers are unchanged; the public `new_` constructors are unchanged (only the private build path threads the registry). - ServerConfigMode::Single / GraphStartupConfig carry the loaded registry - boot tests: valid registry boots; type-broken query refuses boot + names it Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Add `omnigraph queries validate` and `queries list` CLI `queries validate` type-checks the stored-query registry against the live schema offline — it opens the selected graph, runs the same check() the server runs at boot, prints breakages/warnings (human or --json), and exits non-zero on any breakage — so an operator can catch a query broken by a schema change without restarting the server. `queries list` prints each registered query's name, MCP exposure, and typed params. Named `validate` (not `check`) to avoid overlap with the existing `omnigraph lint` — `query check`/`query lint` are already deprecated argv-shims to `lint`. Registry entries resolve like the server: a named graph uses its per-graph `queries:`; otherwise the top-level one. - Queries subcommand group; reuses QueryRegistry::load + check from omnigraph-server; local-only (needs the schema), mirrors lint - tests: clean registry exits 0, broken query exits non-zero + names it, list shows the query and its typed params Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Route registry selection through one shared query_entries_for The "which queries: block applies for graph X" rule existed twice — the server boot path and the CLI's registry_entries — and had already drifted: the CLI carried an unreachable unwrap_or_else fallback the server lacked. Add OmnigraphConfig::query_entries_for(graph: Option<&str>) as the single definition (named graph -> its per-graph block; otherwise top-level) and route all three sites through it: server single mode, server multi-graph loop, and the CLI. The CLI's dead fallback arm is deleted; CLI and server now resolve identically by construction. No behavior change. Extends the config round-trip test to pin the selector, including the unknown-name -> top-level fallback the deleted CLI arm covered. * Funnel registry validation through one validate_and_attach gate The check -> refuse-on-breakage -> log-warnings -> empty->None block was copy-pasted across both open paths (single mode and the multi-graph per-graph open), differing only by the graph label. A third opener could attach a registry that was never schema-checked. Extract validate_and_attach(queries, catalog, label) -> Option<Arc<..>> as the single gate both paths call, so attaching an unchecked registry is no longer expressible. The catalog handle is an owned Arc, so calling it before the multi-mode policy match (which rebinds db) is borrow-clean. No behavior change. Adds a direct unit test of the helper (empty / clean / breakage incl. the graph label in the message) — covering the multi-graph path's logic, which previously had no boot-refusal coverage. * Resolve param types structurally in the MCP vector lint The exposed-query advisory detected vector params with type_name.starts_with("Vector(") — a second copy of the compiler's own ScalarType::from_str_name vector parsing that could drift from it. Key the lint off PropType::from_param_type_name + ScalarType::Vector(_) instead, the one canonical resolver the type system already uses. Any future param-suppliability lint now reads the structured type rather than scanning the surface string. Behavior-preserving: the grammar forbids list-of-vector params (list_type = "[" base_type "]", and base_type excludes Vector), so the only input where the structured and string checks could differ is unparseable. Adds a guard test that an exposed String param does not false-trigger the warning. * Refuse duplicate MCP tool names across exposed stored queries The effective MCP tool name (explicit tool_name, else the query name) is a second identity namespace beside the registry key, but nothing enforced it unique — two exposed queries could claim one catalog key, and each consumer re-derived the name ad hoc. Add StoredQuery::effective_tool_name() as the one definition, and a load-time uniqueness pass in from_specs over exposed queries: a collision is a collected LoadError naming the loser and the winner. Scoped to exposed queries (unexposed have no MCP tool); deterministic over the BTreeMap so the first-declared wins and the error order is stable. New (rare) refusal: a config with colliding exposed tool names now fails `omnigraph queries validate` offline and refuses server boot, the same posture as a malformed registry. Release-note-worthy. Test-first: duplicate_exposed_tool_name_is_a_load_error (red before the pass, green after) + a CLI offline test; the unexposed sibling pins the exposed-only scope; effective_tool_name asserts folded into the load test. * docs: document the queries registry, CLI, and invoke_query action The stored-query surface shipped without user docs. Add it, per the same-PR maintenance contract: - policy.md: invoke_query as per-graph action #10 (branch-scoped), with the double-gating note; renumber graph_list; add it to the branch_scope list. - cli-reference.md: the `queries validate \| list` command, and the `queries:` config block (per-graph + top-level) with mcp.expose/tool_name and the tool-name uniqueness rule. - server.md: boot-time stored-query type-check (refuse on breakage), noting invocation over HTTP/MCP is not yet exposed. * Add POST /queries/{name} stored-query invocation handler Invoke a curated server-side stored query by name: source + name come from the per-graph queries: registry, the client sends only runtime inputs (params, branch, snapshot). Gated by the invoke_query Cedar action at the boundary; the handler delegates to the existing run_query/run_mutate, whose inner Read/Change enforce still runs — so a stored mutation is double-gated (invoke_query to reach the tool, change for the write). - InvokeStoredQueryRequest + an untagged InvokeStoredQueryResponse { Read(ReadOutput), Change(ChangeOutput) } → one Json<_> return type and a oneOf 200 schema (a correct contract, not a wrong-but-simple one). - Route lives in per_graph_protected → single-mode /queries/{name} and multi-mode /graphs/{id}/queries/{name} for free. - Deny == unknown: an invoke_query denial and a missing query both return the same 404, so the catalog can't be probed by an unauthorized caller. - OpenAPI regenerated; tests cover read, mutation double-gate (403 vs 200), bad-param 400, and the identical-404 deny path. Completes the MR-969 V1 invocation slice (registry + /queries/{name} + invoke_query). * docs: stored-query invocation endpoint; flip the not-yet-exposed caveat Now that POST /queries/{name} ships (C7), document it: add the endpoint to server.md's inventory + an invocation section (body, untagged read/mutate envelope, invoke_query gate, double-gated mutations, deny == 404), and flip the startup note that said invocation was not yet exposed. In policy.md, replace "no invocation call site yet" on the invoke_query action with a pointer to the endpoint. * Scope the stored-query 404-hiding claim to non-invoke_query callers Review found the deny==404 catalog-hiding was overstated as a contract: it holds only at the outer invoke_query gate. A caller that HOLDS invoke_query but lacks read/change gets the inner gate's 403 for an existing query vs 404 for an unknown one — so existence is visible to grant-holders by design (the intended double-gate). The handler docstring, OpenAPI 404 description, and server.md all claimed the 404 was airtight against any denied actor. Correct the wording in all three (no behavior change) and add the missing symmetric test (invoke_query but no read -> 403 for an existing query, 404 for unknown) so the actual contract is pinned. Also document that in default-deny mode (tokens, no policy) every invocation 404s until an invoke_query rule is configured. Nits: the from_specs collision comment said "first declared wins" but it is lexicographically-first by name (BTreeMap); the effective_tool_name docstring overclaimed the CLI display routes through it (it resolves the rule on its own output DTO). * Default mcp.expose to true (the manifest entry is the opt-in) expose controls MCP-catalog membership only — it is not an authorization gate (invocation is gated by invoke_query regardless). So requiring a per-query mcp.expose: true was friction with no safety benefit: a non-exposed query is still HTTP-invocable by name. Flip the default so declaring a query in the manifest exposes it to the agent tool catalog by default; expose: false is the escape hatch for service-only queries. Both the absent-mcp path (Default impl) and the present-but-no-expose path (serde default fn) now yield true. Doc comments + cli-reference updated; the config round-trip test asserts the new default. * Add GET /queries stored-query catalog endpoint List a graph's mcp.expose stored queries as a typed tool catalog so a client (the MCP server) can register them as tools without fetching .gq source. Each entry carries name, MCP tool_name, description/instruction, a read/mutate flag, and decomposed typed params (kind enum: string\|bool\|int\| bigint\|float\|date\|datetime\|blob\|vector\|list, plus item_kind for lists and vector_dim) — so the consumer builds an input schema with a closed match and never re-parses omnigraph type spelling. I64/U64 are bigint (string on the wire): a JSON number loses precision past 2^53 and the engine already accepts decimal strings. Read-gated (works in default-deny; the catalog is graph-wide, authorized against main). NOT Cedar-filtered per query yet — a reader can list a query whose invoke_query they lack (documented gap until per-query authz lands); invocation stays invoke_query-gated + deny==404. - api: QueriesCatalogOutput / QueryCatalogEntry / ParamDescriptor / ParamKind + query_catalog_entry (reuses PropType::from_param_type_name; scalar_kind is exhaustive, so a new ScalarType is a compile error here until catalogued). - GET /queries route in per_graph_protected (→ /graphs/{id}/queries in multi mode); OpenAPI regenerated; path allowlists updated. - Tests: projection unit (every kind, list, vector, nullable, mutation, empty) + handler (exposed-only filter, read-gate probe-oracle, empty registry). * docs: GET /queries stored-query catalog endpoint Document the catalog: the endpoint table row (GET /queries, read-gated), a catalog section (typed-param kind enum, bigint/date/datetime/blob-as-string, graph-wide/branch-independent, mcp.expose default true, the read-gated probe-oracle gap), and flip the startup note now that the catalog ships. * Collect file-I/O and parse errors in QueryRegistry::load in one pass load() early-returned on any unreadable .gq file, masking parse / identity / tool-name-collision errors in the OTHER (readable) files — so an operator fixed the missing file, restarted, and only then saw the next broken query. Now it collects I/O errors but still runs from_specs on the readable specs and returns the union, so every broken entry surfaces at once (matching the collected-errors contract the rest of the registry already follows). Safe: from_specs' tool-name collision check runs over loaded queries only, so dropping an I/O-failed entry can only under-report a collision, never invent one. I/O errors are ordered first (BTreeMap key order), then spec errors. Adds a load-level test (tempdir: a valid, a missing, and a parse-broken .gq) asserting all three surface in one Err — confirmed red before the fix. * Make invoke_query graph-scoped (one branch authority) invoke_query gates reaching the curated stored-query surface — a graph-level capability. Per-branch/snapshot access is already enforced by the inner read/change gate in run_query/run_mutate (authorized against the resolved branch), so branch-scoping the outer gate was redundant AND wrong for snapshot reads (it defaulted to main). Drop the branch dimension: remove InvokeQuery from uses_branch_scope (it joins admin as graph-scoped) and authorize the boundary gate with branch: None. Lossless: an actor confined to branch X by their read/change rules can still only invoke a stored query that touches X. A rule that sets branch_scope on invoke_query is now rejected by validate() — write invoke_query in its own rule. Ripple (atomic): restructure the server invoke fixture so invoke_query sits in its own branch_scope-free rule; invert invoke_query_is_branch_scoped -> invoke_query_rejects_branch_scope; the per-graph authorize test uses branch: None; docs (policy.md, server.md, the InvokeQuery doc). No wire/OpenAPI change. * Resolve graph config by identity, not server mode Which policy/queries block applies for a graph was decided three different, mode-dependent ways: single-mode boot used top-level even for a named graph; multi-mode used per-graph (and silently ignored a top-level queries block); the CLI used per-graph for a named target. So `queries validate --target prod` could check a different registry than the single-mode server loaded, and a named graph's per-graph policy/queries were silently shadowed. Make config a function of graph IDENTITY: a graph served by NAME (--target/server.graph, a graphs: entry) uses its own graphs.<name>.{policy, queries}; a bare URI is anonymous and uses top-level. One rule, applied by single-mode boot, multi-mode boot, and the CLI — so they can't diverge and the CLI predicts the server exactly. No silent ignore: serving a named graph while a top-level policy/queries block is populated now refuses boot, naming the block (the multi-mode top-level-policy bail, extended to queries and to single-mode-named). The CLI's `queries validate` derives the schema URI and the registry from ONE selection, and a positional URI forces anonymous (ignoring cli.graph) so the two can't come from different graphs. BREAKING (released behavior): single mode by name (--target/server.graph) with top-level policy/queries previously used top-level; it now uses the per-graph block and refuses boot if top-level is also populated. Bare-URI single mode is unchanged. Loud, with migration text pointing at graphs.<name>. - config: resolve_policy_file_for (policy sibling of query_entries_for, no top-level fallback) + populated_top_level_blocks for the coherence check. - characterization tests (single-mode named -> per-graph; named + top-level -> bail; multi-mode top-level queries -> bail; CLI positional-URI -> top-level). - docs: policy.md, server.md, cli-reference.md. * docs: RFC-002 credentials keyed by server name (keychain/profile/env) Reworks the RFC's credentials model: secrets are keyed by server name — OS keychain `omnigraph:<server>` (preferred) -> a `[<server>]` profile in `~/.omnigraph/credentials` -> `OMNIGRAPH_TOKEN[_<SERVER>]` env (CI), the AWS/gh/kube model. `servers.<name>` is endpoint-only by default but may carry an explicit, secret-free `auth: { token: { env\|file\|command\|keychain } }` source. The shipped `bearer_token_env` + `.env.omni` dotenv remain a legacy compat path; no `credentials.yaml`. * docs: RFC-002 — typed graph locator (storage/server/graph_id), not a uri string Add §1.1: the resolved graph address is a typed GraphLocator (Embedded{storage} \| Remote{server, graph_id}), not a flat uri: String. Diagnoses the string model's cost in the code today (~16 is_remote_uri forks, TargetConfig can't express multi-server x multi-graph, the CLI bails on remote, the ts SDK models baseUrl+graphId separately) and settles the YAML naming so the key names the locus: - storage: (embedded) — shipped uri: is a deprecated alias - server: + graph_id: (remote) — graph_id defaults to the entry key - storage xor server, reject both/neither (no silent ambiguity) Kills the graphs:/graph: collision and the uri:-might-be-a-server ambiguity. Updates the §1/§8 examples and the entry-shape notes to the new naming. * Test: queries list must reject an unknown --target queries list opens no graph URI, so unknown-graph validation does not ride along on resolve_target_uri the way it does for every other command. The new test reproduces the gap: with an unknown --target the command currently exits 0 and prints the (empty) top-level registry instead of erroring like the URI-resolving commands do. Fails against current code; the fix follows. * Validate the graph selection in queries list Graph-existence validation was a side effect of URI resolution: every URI-resolving command rejects an unknown --target via resolve_target_uri, but queries list opens no URI, so query_entries_for(Some(unknown)) silently fell back to the top-level registry and showed the wrong (or empty) catalog. Make membership a property of the selection: add the fallible resolve_graph_selection alongside the infallible query_entries_for (a known name passes through, an unknown name errors with the same message as resolve_target_uri, None stays anonymous), and validate the selection in execute_queries_list. query_entries_for is unchanged — server boot's bare-URI path still needs its None -> top-level arm. * Surface policy-engine errors from stored-query invoke The invoke handler mapped every authorize_request failure to 404 ('stored query not found'), which collapsed the authorization decision (deny -> 403) together with operational failures (no actor -> 401, Cedar evaluation error -> 500). A real policy-engine 500 was hidden as a missing query. Separate the two concerns instead of sniffing the masked status. Extract authorize() returning an Authz { Allowed, Denied(msg) } decision and reserve Err for operational failures only; authorize_request becomes a thin wrapper that maps Denied -> 403, so the 16 deny-as-403 callers are unchanged. The invoke handler now matches the decision directly: a denial stays 404 (deny == missing, so the catalog can't be probed without the grant), while a 401/500 propagates with its true status. 500 is now a reachable outcome on POST /queries/{name}; document it in the endpoint responses and regenerate openapi.json. * Extract the named-graph/top-level coherence rule into one helper The rule 'a named graph uses its own graphs.<name> block, so a populated top-level block is a config error' lived inline in single-mode server boot. Extract it to OmnigraphConfig::ensure_top_level_blocks_honored so the same definition can be shared by the CLI selection gate (next commit) and the two can't drift. Boot calls the helper; the message is reworded context-neutral (drops 'serving') so it reads correctly from both boot and the CLI. Behavior-preserving: multi-graph mode keeps its own unconditional check, and single_mode_named_graph_rejects_top_level_blocks still passes. * Test: queries validate/list must reject a named graph with a top-level block Server boot refuses a config where a graph is selected by name yet a top-level queries:/policy.file block is populated (the block would be silently ignored). The CLI's queries validate/list resolve the same named selection but skip that coherence check, so they give a false green / list the per-graph block. The new test reproduces it: validate prints OK and list succeeds where boot would refuse. Fails against current code; the fix follows. * Enforce top-level coherence in the single CLI selection gate queries validate validated graph membership only as a side effect of URI resolution and queries list only via resolve_graph_selection's membership check; neither applied the named-graph/top-level coherence rule server boot enforces, so both gave a false green on a config boot refuses. Fold ensure_top_level_blocks_honored into resolve_graph_selection so it is the single gate that returns only valid + server-coherent selections, and route resolve_selected_graph (queries validate) through it; queries list already calls the gate. A named graph with a populated top-level block now errors in both commands, matching boot. A positional URI stays anonymous (top-level honored), so queries_validate_positional_uri_ignores_default_graph is unaffected. * docs: RFC-003 — MCP server surface for omnigraph-server Detailed MCP-transport design for the stored-query/MCP work, building on the shipped #128 registry. Corrects the draft against the branch head: the coarse invoke_query gate + 404 denial-masking are already wired (server_invoke_query), so per-query invoke_query scope (PolicyRequest has no query-name dimension yet) is the real prerequisite; positions the doc as superseding rfc-001's MCP transport (/mcp/tools+/mcp/invoke) and reconciles the shipped mcp.expose YAML form and the schema-introspection non-goal; grounds the parity surface in the actual omnigraph-ts package (13 tools with read/change ids, 2 resources). * docs(config): clarify graph config boundaries * fix(config): enforce graph-scoped policies and query validation * fix(cli): require graph selection for scoped query registries * fix(server): preserve named graph id in single mode policy * fix(cli): share graph identity for policy resolution * test(cli): cover policy tooling server graph selection * fix(cli): honor server graph for policy tooling --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 22:50:31 +02:00

1 2

69 commits