omnigraph/docs/user/operations/server.md
Andrew Altshuler 9c792649e2
docs(user): coherence cleanup aligned with 0.7.1 (#293)
* docs(cli): fix cluster apply semantics — converges graphs+schema, not config-only

`cluster apply` creates graphs, applies schema updates (soft drops), writes
stored-query/policy catalog resources, and executes approved graph deletes in
one ordered run. Both the user docs and the shipped CLI help text still
described it as a "Stage 3A" config-only (query/policy) subset that defers
graph/schema changes "to a later stage" — wrong since the graph/schema executor
landed.

- docs/user/cli/reference.md: rewrite the cluster paragraph to describe apply's
  actual converge behavior; keep deferred for the genuinely-unsupported case
  (standalone schema deletes); drop the stale "Stage 3A" / "reserved for later
  stages" framing.
- crates/omnigraph-cli/src/cli.rs: fix the `cluster apply` help text to match.

Part of the docs/user coherence cleanup (docs/dev/docs-issues.md, P1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs(server): align stored-query exposure with cluster-only behavior

server.md documented a per-query expose knob ("`mcp.expose` defaults to true;
set `mcp: { expose: false }` to hide from the catalog") that does not exist in
the only deployment mode. Cluster-only serving lists every stored query: the
cluster registry has no expose field (`QueryConfig { file }`) and the boot
bridge hardcodes `expose: true` for all cluster queries
(omnigraph-server settings), and there is no GQ-level expose annotation. This
contradicted clusters/config.md, which already states the correct behavior.

Replace the knob bullet with the cluster truth (every applied query is listed;
per-query exposure may become a Cedar-policy decision later) and drop the
"`mcp.expose` stored queries" phrasing from the catalog description, the
endpoint table, and the intro. The `mcp_expose` JSON catalog field is unchanged
(still emitted, always true in cluster mode).

Part of the docs/user coherence cleanup (docs/dev/docs-issues.md, P1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs(schema): split direct/embedded vs cluster-managed schema apply

schema/index.md claimed `allow_data_loss` is "honored uniformly across
transports" and listed HTTP `POST /schema/apply` among them. But that route is
409-disabled for cluster-backed serving (already documented in server.md), and
cluster-managed graphs evolve only through `cluster apply` with soft drops —
there is no cluster HTTP data-loss path.

Scope the data-loss flag to the direct/embedded path (`schema apply --store`,
SDK), and add a paragraph: cluster-managed graphs use `cluster apply`
(soft drops only); HTTP `POST /schema/apply` is 409 for cluster serving; direct
apply against a cluster-managed path is refused. Cross-refs server + cluster
docs.

Part of the docs/user coherence cleanup (docs/dev/docs-issues.md, P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs(server): document /load as canonical in limits + admission prose

The endpoint table already listed both `/load` (canonical) and `/ingest`
(deprecated alias) at 32 MB, but the admission-control, body-limit,
rate-limit, and manifest-conflict prose named only `/ingest` — and the
constants page called the limit "Ingest body limit". Add `/load` alongside (or
ahead of) `/ingest` everywhere, and rename the constant to "Load (bulk-write)
body limit" noting the `/ingest` alias shares it.

Part of the docs/user coherence cleanup (docs/dev/docs-issues.md, P2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs(cli): drop stale bearer-token keys + fix version string

The "Bearer token resolution (CLI)" section still listed removed omnigraph.yaml
keys (`graphs.<name>.bearer_token_env`, `auth.env_file`) — config surfaces that
no longer exist and that implied plaintext tokens in config. Replace it with a
pointer to the keyed-credential model documented above
(`OMNIGRAPH_TOKEN_<NAME>` → `~/.omnigraph/credentials` →
`OMNIGRAPH_BEARER_TOKEN`). Also fix the `version` row: the CLI prints 0.7.x, not
0.3.x.

Part of the docs/user coherence cleanup (docs/dev/docs-issues.md, P2 + smaller).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs: route-spelling note + drop stale stage/deferred crumbs

- server.md: add a one-line note that the per-graph subsections name routes in
  shorthand (`GET /queries`, `POST /query`, `POST /mutate`,
  `POST /queries/{name}`) but every one is served under `/graphs/{id}/…` — the
  endpoint table is already fully-qualified.
- clusters/config.md: redefine the `deferred` plan disposition as an unsupported
  change (e.g. a standalone schema delete) instead of "graph/schema change,
  later phase" (graph creates and schema updates apply now); drop the "Stage 2C"
  label from the lock-recovery note.
- search/indexes.md: `ingest --mode merge` → canonical `load --mode merge`.

Part of the docs/user coherence cleanup (docs/dev/docs-issues.md, P2 + smaller).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs(dev): track user-docs coherence ledger; mark 2026-06-20 findings resolved

Convert the scratch review notes into a tracked living ledger and link it from
the dev index. All ten findings from the 2026-06-20 docs/user sweep are
validated and fixed in this branch (P1 cluster-apply semantics + stored-query
exposure; P2 schema-apply paths, /load canonical, bearer-token keys, route
shorthand; plus version/ingest/deferred/stage crumbs). The verification grep
checklist is retained for future audits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

* docs(api): align GET /queries OpenAPI contract with cluster-only behavior

Greptile P1 on #293: the prose fix in server.md left the OpenAPI surface stale.
The utoipa annotations (handlers.rs, omnigraph-api-types QueriesCatalogOutput)
still described the catalog as "the `mcp.expose == true` subset", and those
drive the checked-in openapi.json — so SDK consumers read a contract the
cluster-only server does not honor (it lists every stored query).

Update the three Rust doc-comment/annotation strings to "every stored query"
and regenerate openapi.json (OMNIGRAPH_UPDATE_OPENAPI=1; drift test green) in
the same change, per AGENTS.md rule 4. Ledger updated: this finding resolved,
plus the cross-repo drift it surfaced (omnigraph-ts generated spec/types and
omnigraph-cookbooks best-practices bearer_token_env) tracked as open follow-ups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FQ1Hf4eXLsJmeLUkTYBEw7

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 00:02:34 +03:00

241 lines
15 KiB
Markdown

# HTTP Server (`omnigraph-server`)
Axum 0.8 + tokio + utoipa-generated OpenAPI. **Cluster-only boot**: the server always boots from a cluster (`--cluster <dir | s3://…>`) and serves N graphs (N ≥ 1) under cluster routes. There is no longer a single-graph flat-route mode, no positional `<URI>` boot, no `--target`, and no `omnigraph.yaml`-`graphs:`-map boot. All HTTP is nested under `/graphs/{graph_id}/...`; `/healthz` and the management `/graphs` enumeration stay flat.
## Boot
### Cluster boot (the only boot)
```bash
omnigraph-server --cluster <dir | s3://…> --bind 0.0.0.0:8080
```
`omnigraph-server --cluster <dir-or-uri>` boots from the cluster catalog's
**applied revision**. The server resolves that revision into per-graph
startup configs (id, URI, optional per-graph policy, stored-query
registry) plus an optional server-level policy, then opens every
configured graph in parallel at startup (bounded concurrency = 4,
quarantining graph-specific open failures). Routing is always multi-graph —
requests to bare flat protected paths (`/read`, `/snapshot`, …) return
404; the served surface is `/graphs/{graph_id}/...`. See
[cluster-config.md](../clusters/config.md#serving-from-the-cluster-the-mode-switch)
for what is read and the readiness rules.
Readiness is fail-fast for cluster-global problems: missing or unreadable
state, invalid/unattributable recovery sidecars, unreadable shared catalog
payloads, cluster policy errors, or zero healthy graphs. Graph-attributed
pending recovery sidecars and graph-specific startup failures quarantine
that graph instead; the server logs startup diagnostics and serves the
remaining healthy graphs. `GET /graphs` enumerates ready/served graphs only,
so quarantined graphs are absent and their routes return 404.
Operators who want the original all-or-nothing boot contract can pass
`--require-all-graphs` or set `OMNIGRAPH_REQUIRE_ALL_GRAPHS=1`. In that mode,
any graph quarantine, graph-open failure, stored-query startup failure, or
embedding-provider resolution failure aborts startup.
A scheme-qualified argument (`s3://…`) reads the ledger straight from the
storage root, with no local config directory. `--bind`,
`--unauthenticated`, and the bearer-token env vars all apply.
### Stored-query validation at startup
If a graph declares a `queries:` registry (see [cli-reference](../cli/reference.md)), the server **loads and type-checks every stored query against that graph's live schema at startup**. Query parse/type failures quarantine that graph; if no graph remains healthy, startup refuses. Two MCP-exposed queries claiming the same tool name are likewise graph-local startup failures. Non-blocking advisories (e.g. an MCP-exposed query with a vector parameter an agent cannot supply) are logged. Validate offline before deploying with `omnigraph queries validate`. Discover the stored queries as a typed tool catalog with `GET /queries`, and invoke one over HTTP with `POST /queries/{name}` (both below).
## Endpoint inventory
Per-graph endpoints — all nested under `/graphs/{id}/...`. `{id}` is the
graph id from the cluster's applied revision:
| Method | Path | Auth | Action |
|---|---|---|---|
| GET | `/healthz` | none | — |
| GET | `/openapi.json` | none | — (strips security if auth disabled; emits the nested cluster paths with `cluster_` operation-id prefix) |
| GET | `/graphs/{id}/snapshot?branch=` | bearer + `read` | snapshot of branch |
| POST | `/graphs/{id}/query` | bearer + `read` | inline read query (canonical; clean field names `query`/`name`; mutations → 400) |
| POST | `/graphs/{id}/read` | bearer + `read` | **deprecated** alias of `/query` (legacy field names `query_source`/`query_name`, byte-stable response; carries `Deprecation: true` + `Link: <query>; rel="successor-version"`) |
| POST | `/graphs/{id}/export` | bearer + `export` | NDJSON stream |
| POST | `/graphs/{id}/mutate` | bearer + `change` | mutation (canonical; `query`/`name`; accepts legacy `query_source`/`query_name` as serde aliases) |
| POST | `/graphs/{id}/change` | bearer + `change` | **deprecated** alias of `/mutate` (carries `Deprecation: true` + `Link: <mutate>; rel="successor-version"`) |
| GET | `/graphs/{id}/queries` | bearer + `read` | list the graph's stored queries as a typed tool catalog |
| POST | `/graphs/{id}/queries/{name}` | bearer + `invoke_query` (+ `change` for a stored mutation) | invoke a named query from the `queries:` registry; deny == 404 |
| GET | `/graphs/{id}/schema` | bearer + `read` | get current `.pg` source |
| POST | `/graphs/{id}/schema/apply` | bearer + `schema_apply` (target=`main`) | disabled for cluster-backed serving; returns 409 and points operators at `omnigraph cluster apply` + restart |
| POST | `/graphs/{id}/load` | bearer + `branch_create` (only when `from` is set and the branch is created) + `change` | bulk load (canonical); branch creation is opt-in via `from` — without it a missing `branch` is a 404, never an implicit fork (32 MB body limit) |
| POST | `/graphs/{id}/ingest` | bearer + `branch_create` (only when `from` is set and the branch is created) + `change` | **deprecated** alias of `/load` (carries `Deprecation: true` + `Link: <load>; rel="successor-version"`) (32 MB body limit) |
| GET | `/graphs/{id}/branches` | bearer + `read` | list branches |
| POST | `/graphs/{id}/branches` | bearer + `branch_create` | create |
| DELETE | `/graphs/{id}/branches/{branch}` | bearer + `branch_delete` | delete |
| POST | `/graphs/{id}/branches/merge` | bearer + `branch_merge` | merge `source → target` |
| GET | `/graphs/{id}/commits?branch=` | bearer + `read` | list |
| GET | `/graphs/{id}/commits/{commit_id}` | bearer + `read` | show |
Server-level management endpoints:
| Method | Path | Auth | Action |
|---|---|---|---|
| GET | `/graphs` | bearer + `graph_list` on `Server::"root"` | list ready/served graphs |
> The per-graph subsections below name routes in shorthand (`GET /queries`,
> `POST /query`, `POST /mutate`, `POST /queries/{name}`); every one is served
> under the `/graphs/{id}/…` prefix shown in the table — only `/graphs` and
> `/healthz` are flat.
### Stored-query catalog (`GET /queries`)
List the graph's stored queries as a typed tool catalog — enough for a client (e.g. an MCP server) to register each as a tool without fetching `.gq` source. Each entry: `{ name, tool_name, description, instruction, mutation, params }`, where each param is `{ name, kind, item_kind?, vector_dim?, nullable }`. `kind` is one of `string | bool | int | bigint | float | date | datetime | blob | vector | list` (decomposed so a consumer maps it with a closed `switch`, never re-parsing GQ type spelling). `bigint` (I64/U64), `date`, `datetime`, and `blob` are carried as JSON **strings** — a 64-bit integer loses precision as a JSON number, dates are ISO strings, and a blob is a URI string.
- **Read-gated** (works in default-deny mode). The catalog is **graph-wide** (branch-independent; `read` is authorized against `main`).
- **Every stored query in the applied registry is listed.** Cluster-served graphs have no per-query expose flag today — every query in the cluster `queries:` registry appears in the catalog. (Per-query exposure may become a Cedar-policy decision in a later release; see [cluster-config](../clusters/config.md).)
- **Not Cedar-filtered per query (yet).** A caller with `read` but not `invoke_query` can *list* a query they can't *invoke* (which would 404). Closing that gap is future per-query authorization; for now the catalog is a discovery surface and `invoke_query` remains the invocation gate.
### Stored-query invocation (`POST /queries/{name}`)
Invoke a curated, server-side stored query by **name** — the source comes from the graph's `queries:` registry, so the client never sends `.gq`. The request body itself is optional; omit it for no-param queries, or send `{ "params": { … }, "branch": "main", "snapshot": null }`, where every field is optional and `params` keys match the query's declared parameters. The response is the **read envelope** (`ReadOutput`) for a stored read or the **mutation envelope** (`ChangeOutput`) for a stored mutation — serialized untagged, so the wire shape is identical to `/query` / `/mutate`.
- **Gate:** `invoke_query` (per-graph, graph-scoped) at the boundary. A stored *mutation* is **double-gated** — it also passes the engine's `change` gate, so an actor with `invoke_query` but not `change` gets `403`.
- **Deny == unknown, for callers without `invoke_query`:** for a caller lacking the grant, an `invoke_query` denial and an unknown query name return the **same `404`** (identical body), so the catalog can't be probed. A caller that *holds* `invoke_query` may still get the inner gate's `403` for an existing query it can't `read`/`change` (the double-gate, above) — so existence is visible to grant-holders by design.
- **Requires an explicit policy grant when auth is on.** In default-deny mode (bearer tokens but no `policy.file`), only `read` is permitted, so *every* `/queries/{name}` call returns `404` until an `invoke_query` rule is configured.
- A stored mutation cannot target a `snapshot` (`400`); a parameter type error is a structured `400` naming the parameter.
## Adding and removing graphs
Runtime add/remove via API is **not** exposed — neither `POST /graphs`
nor `DELETE /graphs/{id}` is implemented. Operators add or remove graphs
by running `cluster apply` against the cluster (which publishes a new
applied revision) and restarting the server so it boots from the new
revision. The server treats the cluster source as operator-owned and
never writes it.
A future release may introduce a managed registry and re-expose runtime
mutation on top of it.
## Inline read queries (`POST /query`)
`POST /query` is the read-only, agent-friendly twin of `POST /read`. The
request body uses clean field names that match the CLI `-e` flag and the GQ
`query` keyword:
```json
{
"query": "query find($n: String) { match { $p: Person { name: $n } } return { $p.name } }",
"name": "find",
"params": { "n": "Alice" },
"branch": "main",
"snapshot": null
}
```
Response shape is identical to `/read` (`ReadOutput`). If the inline source
contains mutations (`insert` / `update` / `delete`), the request is rejected
with HTTP 400 and an error pointing the caller at `POST /mutate` — the
read-only contract is enforced at the URL.
`POST /mutate` is the canonical mutation endpoint. It accepts the same clean
field names (`query`, `name`); the legacy field names `query_source` and
`query_name` continue to deserialize as serde aliases so existing clients keep
working without changes.
## Deprecated names (`/read`, `/change`)
`POST /read` and `POST /change` are kept for back-compat indefinitely — they
are byte-stable on the request side and otherwise behave identically to
`/query` / `/mutate`. They are flagged as deprecated through three independent
channels:
- **OpenAPI**: the operations carry `deprecated: true` in `openapi.json`, so
every OpenAPI codegen (typescript-fetch, openapi-generator, oapi-codegen,
…) emits a `@deprecated` marker on the generated SDK method.
- **Response headers (RFC 9745)**: every response carries `Deprecation: true`.
- **Response headers (RFC 8288)**: every response carries a `Link` header
pointing at the canonical successor:
`Link: <query>; rel="successor-version"` for `/read`, and
`Link: <mutate>; rel="successor-version"` for `/change`. SDKs and HTTP
proxies can pick the successor up automatically.
Migration is purely cosmetic on the client side — swap the URL path, leave
the request body and response handling alone.
## Streaming
Only `/export` streams (`application/x-ndjson`, MPSC channel + `Body::from_stream`). Everything else is buffered JSON.
## Error model
Uniform `ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? }` with `code ∈ unauthorized | forbidden | bad_request | not_found | conflict | too_many_requests | internal`. Merge conflicts attach structured `MergeConflictOutput { table_key, row_id?, kind, message }`.
`manifest_conflict` is set on **concurrent-write rejections** (HTTP 409): the
caller's pre-write view of one table's manifest version was stale.
`ManifestConflictOutput { table_key, expected, actual }` tells the client
which table to refresh and retry. This is the conflict shape produced by
concurrent `/mutate` (or its `/change` alias), `/load` (or its deprecated
`/ingest` alias) calls landing the same `(table, branch)` race.
HTTP status codes used: 200, 400, 401, 403, 404, 409, 429, 500.
## Per-actor admission control
Disjoint
`(table, branch)` writes from different actors now run concurrently,
guarded only by the engine's per-(table, branch) write queue. To keep
one heavy actor from exhausting shared capacity (Lance I/O, manifest
churn, network), the server gates mutating handlers through per-process
admission limits configured from environment variables:
| Env var | Default | Purpose |
|---|---|---|
| `OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX` | 16 | Concurrent in-flight mutations per actor |
| `OMNIGRAPH_PER_ACTOR_BYTES_MAX` | 4 GiB | In-flight estimated bytes per actor |
When an actor exceeds its in-flight count or byte budget, the server
returns **HTTP 429 Too Many Requests** with `code: too_many_requests`
and a `Retry-After` header (seconds). The actor should back off; other
actors are unaffected.
Cedar policy authorization runs **before** admission accounting so
denied requests don't consume admission slots.
Today admission gates every mutating handler: `/mutate` (and its
deprecated alias `/change`), `/load` (and its deprecated alias `/ingest`),
`/branches/{create,delete,merge}`,
and `/schema/apply`. Read-only endpoints (`/snapshot`, `/query`, `/read`,
`/export`, `/branches` GET, `/commits`, `/schema` GET) are not
admission-gated.
## Body limits
- Default: 1 MB
- `/load` (and its deprecated `/ingest` alias): 32 MB
## Auth model (`bearer + SHA-256`)
- Tokens are SHA-256 hashed on startup; plaintext is never persisted in memory.
- Constant-time comparison.
- Three sources, in precedence:
1. `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` — AWS Secrets Manager (build with `--features aws`)
2. `OMNIGRAPH_SERVER_BEARER_TOKENS_FILE` or `OMNIGRAPH_SERVER_BEARER_TOKENS_JSON` — JSON `{actor_id: token, …}`
3. `OMNIGRAPH_SERVER_BEARER_TOKEN` — single legacy token, actor `default`
- If no tokens are configured, startup refuses unless `--unauthenticated` or
`OMNIGRAPH_UNAUTHENTICATED=1` explicitly opts into open local-dev mode. A
policy file without tokens is also rejected at startup. In open mode
`/openapi.json` strips the security scheme.
See [deployment.md](../deployment.md) for token-source operational details.
## Tracing & observability
- `tower_http::TraceLayer::new_for_http()`
- Policy decisions logged at INFO level with actor, action, branch, decision, matched rule
- Startup logs: token source name, graph URI, bind address
- Graceful SIGINT shutdown
## Not implemented (by design or "TBD")
- CORS — not configured; add `tower_http::cors` if needed.
- Rate limiting — per-actor admission control gates `/mutate` (alias
`/change`), `/load` (alias `/ingest`), `/branches/{create,delete,merge}`,
`/schema/apply` (see "Per-actor
admission control" above). No global rate limiter is configured;
add `tower_http::limit` if a graph-wide cap is needed.
- Pagination — none (commits/branches return everything; export streams).
- Runtime graph add/remove — run `cluster apply` and restart.