omnigraph/docs/server.md
Ragnor Comerford 7aca6ddac5
docs: PR 2 documentation pass (server / architecture / §VI.23)
- docs/server.md: new "Per-actor admission control (MR-686)" section
  documenting WorkloadController defaults, the 429/503 mapping with
  Retry-After semantics, the Cedar-then-admission ordering, and the
  /change-only-for-now scope. Adds 429 / 503 to the listed HTTP status
  codes and `too_many_requests` / `service_unavailable` to the ErrorCode
  enumeration in the error model paragraph.

- docs/architecture.md: server/CLI diagram updated. Adds WorkloadController
  and WriteQueueManager nodes; flow is HTTP -> auth -> Cedar -> admission
  -> engine -> queue. Engine label changed to "Arc<Omnigraph>" to reflect
  the AppState flip. Prose now points at server.md and runs.md for the
  admission/queue contracts. The CLI's bypass-admission note is preserved.

- docs/invariants.md §VI.23 status annotation: explicitly cites the
  per-(table, branch) writer-queue + revalidation-under-queue as closing
  the Lance-HEAD-vs-manifest drift class under concurrent writers once
  the global RwLock is removed (PR 2 Step F). Continuous in-process
  rollback recovery still aspirational (MR-870 ticket).

scripts/check-agents-md.sh passes (26 links, 26 docs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:09:49 +02:00

5.1 KiB

HTTP Server (omnigraph-server)

Axum 0.8 + tokio + utoipa-generated OpenAPI. Single repo per process; deploy multiple processes for multi-tenant.

Endpoint inventory

Method Path Auth Action Handler
GET /healthz none server_health
GET /openapi.json none server_openapi (strips security if auth disabled)
GET /snapshot?branch= bearer + read snapshot of branch server_snapshot
POST /read bearer + read run named query server_read
POST /export bearer + export NDJSON stream server_export
POST /change bearer + change mutation server_change
GET /schema bearer + read get current .pg source server_schema_get
POST /schema/apply bearer + schema_apply (target=main) migrate server_schema_apply
POST /ingest bearer + branch_create (if new) + change bulk load server_ingest (32 MB body limit)
GET /branches bearer + read list branches server_branch_list
POST /branches bearer + branch_create create server_branch_create
DELETE /branches/{branch} bearer + branch_delete delete server_branch_delete
POST /branches/merge bearer + branch_merge merge source → target server_branch_merge
GET /commits?branch= bearer + read list server_commit_list
GET /commits/{commit_id} bearer + read show server_commit_show

Streaming

Only /export streams (application/x-ndjson, MPSC channel + Body::from_stream). Everything else is buffered JSON.

Error model

Uniform ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? } with code ∈ unauthorized | forbidden | bad_request | not_found | conflict | too_many_requests | service_unavailable | internal. Merge conflicts attach structured MergeConflictOutput { table_key, row_id?, kind, message }.

manifest_conflict is set on publisher CAS rejections (HTTP 409): the caller's pre-write view of one table's manifest version was stale. ManifestConflictOutput { table_key, expected, actual } tells the client which table to refresh and retry. This is the conflict shape produced by concurrent /change or /ingest calls landing the same (table, branch) race (MR-771 / MR-766).

HTTP status codes used: 200, 400, 401, 403, 404, 409, 429, 500, 503.

Per-actor admission control (MR-686)

PR 2 (MR-686) removed the global server RwLock<Omnigraph>. Disjoint (table, branch) writes from different actors now run concurrently, guarded only by the engine's per-(table, branch) write queue. To keep one heavy actor from exhausting shared capacity (Lance I/O, manifest churn, network), the server gates mutating handlers through a WorkloadController configured per-process from environment variables:

Env var Default Purpose
OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX 16 Concurrent in-flight mutations per actor
OMNIGRAPH_PER_ACTOR_BYTES_MAX 4 GiB In-flight estimated bytes per actor
OMNIGRAPH_GLOBAL_REWRITE_MAX 4 Concurrent compaction / index-build slots

When an actor exceeds its in-flight count or byte budget, the server returns HTTP 429 Too Many Requests with code: too_many_requests and a Retry-After header (seconds). The actor should back off; other actors are unaffected.

When the global rewrite pool is exhausted (compaction, index build), the server returns HTTP 503 Service Unavailable with code: service_unavailable. Clients can retry; the rewrite pool empties as in-flight rewrites complete.

Cedar policy authorization runs before admission accounting so denied requests don't consume admission slots.

Today admission gates the /change hot path. /ingest, /branches/*, and /schema/apply flow through the unlocked engine handle without admission gates — wiring those is mechanical follow-up work tracked on MR-686.

Body limits

  • Default: 1 MB
  • /ingest: 32 MB

Auth model (bearer + SHA-256)

  • Tokens are SHA-256 hashed on startup; plaintext is never persisted in memory.
  • Constant-time comparison via subtle::ConstantTimeEq.
  • Three sources, in precedence:
    1. OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET — AWS Secrets Manager (build with --features aws)
    2. OMNIGRAPH_SERVER_BEARER_TOKENS_FILE or OMNIGRAPH_SERVER_BEARER_TOKENS_JSON — JSON {actor_id: token, …}
    3. OMNIGRAPH_SERVER_BEARER_TOKEN — single legacy token, actor default
  • If no tokens configured, server runs unauthenticated (local dev) and /openapi.json strips the security scheme.

See deployment.md for token-source operational details.

Tracing & observability

  • tower_http::TraceLayer::new_for_http()
  • Policy decisions logged at INFO level with actor, action, branch, decision, matched rule
  • Startup logs: token source name, repo URI, bind address
  • Graceful SIGINT shutdown

Not implemented (by design or "TBD")

  • CORS — not configured; add tower_http::cors if needed.
  • Rate limiting — none.
  • Pagination — none (commits/branches return everything; export streams).
  • Multi-tenant routing — one repo per process.