From 7aca6ddac5e90cf25c428e28f1b924e7bc274196 Mon Sep 17 00:00:00 2001 From: Ragnor Comerford Date: Thu, 7 May 2026 17:09:49 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20PR=202=20documentation=20pass=20(server?= =?UTF-8?q?=20/=20architecture=20/=20=C2=A7VI.23)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - docs/server.md: new "Per-actor admission control (MR-686)" section documenting WorkloadController defaults, the 429/503 mapping with Retry-After semantics, the Cedar-then-admission ordering, and the /change-only-for-now scope. Adds 429 / 503 to the listed HTTP status codes and `too_many_requests` / `service_unavailable` to the ErrorCode enumeration in the error model paragraph. - docs/architecture.md: server/CLI diagram updated. Adds WorkloadController and WriteQueueManager nodes; flow is HTTP -> auth -> Cedar -> admission -> engine -> queue. Engine label changed to "Arc" to reflect the AppState flip. Prose now points at server.md and runs.md for the admission/queue contracts. The CLI's bypass-admission note is preserved. - docs/invariants.md §VI.23 status annotation: explicitly cites the per-(table, branch) writer-queue + revalidation-under-queue as closing the Lance-HEAD-vs-manifest drift class under concurrent writers once the global RwLock is removed (PR 2 Step F). Continuous in-process rollback recovery still aspirational (MR-870 ticket). scripts/check-agents-md.sh passes (26 links, 26 docs). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/architecture.md | 9 ++++++--- docs/invariants.md | 2 +- docs/server.md | 37 +++++++++++++++++++++++++++++++++++-- 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/docs/architecture.md b/docs/architecture.md index e0fc140..173d37a 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -270,13 +270,16 @@ flowchart LR srv_in[Axum HTTP
REST + OpenAPI]:::l2 auth[Bearer auth
SHA-256 hashed tokens]:::l2 pol[Cedar policy gate
per request]:::l2 - eng[engine API]:::l2 + wl[WorkloadController
per-actor admission]:::l2 + eng[engine API
Arc<Omnigraph>]:::l2 + wq[WriteQueueManager
per-(table, branch)]:::l2 cli -.-> eng - srv_in --> auth --> pol --> eng + srv_in --> auth --> pol --> wl --> eng + eng --> wq ``` -The server applies Cedar policy at the HTTP boundary today (per [`docs/invariants.md`](invariants.md) §VII.45, the roadmap is to push policy into the planner as predicates). The CLI bypasses the HTTP layer and calls the engine API directly. +The server applies Cedar policy at the HTTP boundary today (per [`docs/invariants.md`](invariants.md) §VII.45, the roadmap is to push policy into the planner as predicates). After Cedar, mutating handlers go through `WorkloadController` (per-actor admission cap + byte budget; PR 2 / MR-686) before reaching the engine. The engine itself holds an `Arc` so concurrent mutations on the same `(table, branch)` serialize at the queue, while disjoint keys run in parallel — see [server.md](server.md) "Per-actor admission control" and [runs.md](runs.md). The CLI bypasses the HTTP layer (and admission) and calls the engine API directly. Code paths: diff --git a/docs/invariants.md b/docs/invariants.md index 8593785..1a60191 100644 --- a/docs/invariants.md +++ b/docs/invariants.md @@ -105,7 +105,7 @@ These are user-visible commitments. They state what the engine guarantees and wh Specific defaults (timeout values, memory caps, TTL windows) are *configuration*, not invariants — see [docs/constants.md](constants.md) and per-deployment configuration. The invariant is that bounds and contracts exist, not their numerical values. 23. **Atomicity is per-query.** Every `.gq` query is atomic — multi-statement mutations are all-or-nothing via the substrate's atomic-commit primitive. No cross-query `BEGIN`/`COMMIT`; branches and merges fill that role for agent workflows. - *Status: upheld at the writer-trait surface, across process boundaries, AND in-process for the common case — the sealed `TableStorage` trait routes inserts / updates / scalar-index builds / merge_insert / overwrite through `stage_*` + `commit_staged` (Phase A is drift-free); the open-time recovery sweep in `db/manifest/recovery.rs` (sidecars at `__recovery/{ulid}.json` written by `MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) closes the per-table commit_staged → manifest publish residual on the next `Omnigraph::open`; and `Omnigraph::refresh` runs roll-forward-only recovery in-process so long-running servers close the common case (mutation/load finalize → publisher failure) without restart. The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures, recoverable across process boundaries for all writer kinds, and recoverable in-process for roll-forward-eligible sidecars. Sidecars that would require `Dataset::restore` are deferred to the next ReadWrite open (restore unsafe under concurrency); continuous in-process recovery for that case requires per-(table, branch) writer-queue acquisition and is the goal of a future background reconciler. Two writer paths still inline-commit pending upstream Lance work: `delete_where` (lance-format/lance#6658) and `create_vector_index` (lance-format/lance#6666).* + *Status: upheld at the writer-trait surface, across process boundaries, AND in-process for the common case under concurrent writers (PR 2 / MR-686) — the sealed `TableStorage` trait routes inserts / updates / scalar-index builds / merge_insert / overwrite through `stage_*` + `commit_staged` (Phase A is drift-free); the open-time recovery sweep in `db/manifest/recovery.rs` (sidecars at `__recovery/{ulid}.json` written by `MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) closes the per-table commit_staged → manifest publish residual on the next `Omnigraph::open`; `Omnigraph::refresh` runs roll-forward-only recovery in-process so long-running servers close the common case without restart; and the per-(table, branch) writer-queue (`db/write_queue.rs`) + revalidation under the queue (`MutationStaging::commit_all`) prevents concurrent writers on the same key from corrupting each other once the HTTP server's global `RwLock` is removed (PR 2 Step F). The "Lance HEAD ahead of `__manifest`" drift class is unreachable for op-execution failures, recoverable across process boundaries for all writer kinds, and recoverable in-process for roll-forward-eligible sidecars. Sidecars that would require `Dataset::restore` are deferred to the next ReadWrite open (restore unsafe under concurrency); continuous in-process rollback recovery is the goal of a future background reconciler (MR-870). Two writer paths still inline-commit pending upstream Lance work: `delete_where` (lance-format/lance#6658) and `create_vector_index` (lance-format/lance#6666).* 24. **Schema integrity is strict at commit.** Type validation, required-field presence (auto-filled from `@default` if declared), uniqueness across batches and versions, and referential integrity — all enforced before commit succeeds. Per-write softening flags are opt-in, never default. *Status: aspirational — referential integrity at scale requires SIP-backed cross-table validation; not yet implemented. Cross-batch / cross-version uniqueness tracked in MR-714.* diff --git a/docs/server.md b/docs/server.md index c705635..a20c5a7 100644 --- a/docs/server.md +++ b/docs/server.md @@ -28,7 +28,7 @@ Only `/export` streams (`application/x-ndjson`, MPSC channel + `Body::from_strea ## Error model -Uniform `ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? }` with `code ∈ unauthorized | forbidden | bad_request | not_found | conflict | internal`. Merge conflicts attach structured `MergeConflictOutput { table_key, row_id?, kind, message }`. +Uniform `ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? }` with `code ∈ unauthorized | forbidden | bad_request | not_found | conflict | too_many_requests | service_unavailable | internal`. Merge conflicts attach structured `MergeConflictOutput { table_key, row_id?, kind, message }`. `manifest_conflict` is set on **publisher CAS rejections** (HTTP 409): the caller's pre-write view of one table's manifest version was stale. @@ -37,7 +37,40 @@ which table to refresh and retry. This is the conflict shape produced by concurrent `/change` or `/ingest` calls landing the same `(table, branch)` race (MR-771 / MR-766). -HTTP status codes used: 200, 400, 401, 403, 404, 409, 500. +HTTP status codes used: 200, 400, 401, 403, 404, 409, 429, 500, 503. + +## Per-actor admission control (MR-686) + +PR 2 (MR-686) removed the global server `RwLock`. Disjoint +`(table, branch)` writes from different actors now run concurrently, +guarded only by the engine's per-(table, branch) write queue. To keep +one heavy actor from exhausting shared capacity (Lance I/O, manifest +churn, network), the server gates mutating handlers through a +`WorkloadController` configured per-process from environment variables: + +| Env var | Default | Purpose | +|---|---|---| +| `OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX` | 16 | Concurrent in-flight mutations per actor | +| `OMNIGRAPH_PER_ACTOR_BYTES_MAX` | 4 GiB | In-flight estimated bytes per actor | +| `OMNIGRAPH_GLOBAL_REWRITE_MAX` | 4 | Concurrent compaction / index-build slots | + +When an actor exceeds its in-flight count or byte budget, the server +returns **HTTP 429 Too Many Requests** with `code: too_many_requests` +and a `Retry-After` header (seconds). The actor should back off; other +actors are unaffected. + +When the global rewrite pool is exhausted (compaction, index build), +the server returns **HTTP 503 Service Unavailable** with +`code: service_unavailable`. Clients can retry; the rewrite pool +empties as in-flight rewrites complete. + +Cedar policy authorization runs **before** admission accounting so +denied requests don't consume admission slots. + +Today admission gates the `/change` hot path. `/ingest`, `/branches/*`, +and `/schema/apply` flow through the unlocked engine handle without +admission gates — wiring those is mechanical follow-up work tracked +on MR-686. ## Body limits