Closes the cubic P2 finding on commit 22d76db: `Semaphore::new(concurrency.max(1))`
silently coerced --heavy-concurrency=0 to 1, so the JSON output reported
0 while execution actually used 1. Reported settings differed from
actual.
Adds an explicit `--heavy-concurrency > 0` check in `main()` (with a
helpful error message pointing to --heavy-batches=0 as the way to
disable heavy traffic) and a defensive `assert!()` inside
`drive_heavy_actor` so future callers can't pass 0 silently.
Verified: `bench_actor_isolation --heavy-concurrency 0` exits with
code 2 and the explanatory error message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cubic findings on bench_actor_isolation.rs flagged together:
P2 (lib.rs:202): `unsafe { std::env::set_var(...) }` ran inside
`#[tokio::main] async fn main()` AFTER the multi-thread tokio runtime
was up. Rust 2024 made `set_var` unsafe because libc's `setenv` is
not thread-safe; concurrent env reads from logging or runtime
internals can race or read torn state.
Fix (correct by design, AGENTS.md rule 9): add a public
`AppState::new_with_workload(uri, db, bearer_tokens, workload)`
constructor that takes a caller-built `WorkloadController`. Tests and
benches override per-actor caps via the constructor instead of
mutating global env. Closes the bug class "tests need to mutate
global env to override AppState defaults."
P2 (lib.rs:130): heavy actor's `oneshot.await` inside the loop
serialized — heavy in-flight count was always 1, so cap=1 never
tripped on the heavy side. The bench validated isolation (light p99
bounded) but didn't demonstrate the rejection path.
Fix: add a `--heavy-concurrency` arg (default 4) and spawn batches
as concurrent tokio tasks bounded by an internal semaphore. With
heavy_concurrency=4 and inflight_cap=1, the bench now reports
heavy_too_many_requests > 0 and heavy_ok == 1 at peak — proving the
gate fires for the heavy actor.
Sample run on local FS (4 light actors × 30 ops, 20 heavy batches ×
50 rows, heavy_concurrency=4, cap=1):
heavy_ok: 1
heavy_too_many_requests: 19
light_ok: 120
light_too_many_requests: 0
light_p99: 565 ms (target < 2 s)
Heavy saturates its own cap; light actors are completely unaffected.
The isolation property is now empirically proven by the rejection
counts rather than just by the latency tail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical proof of MR-686's central design promise: per-actor
admission control isolates noisy actors from light traffic. The
existing bench_concurrent_http harness measures aggregate throughput;
this harness measures the latency tail seen by light actors while a
heavy actor saturates its own per-actor cap.
Setup: one "heavy" actor flooding /ingest with multi-row NDJSON
batches; N "light" actors each running short bursts of /change
inserts, each authenticating with a distinct bearer token so the
WorkloadController accounts them as separate identities.
Output: heavy throughput / 429 count, light p50/p95/p99/max latency.
Acceptance heuristic on local FS: light-actor p99 < 2 s while the
heavy actor saturates its own cap.
Sample run on local FS, cap=1, 4 light actors x 30 ops, 20 heavy
batches x 50 rows: light p99 = 710 ms, light errors = 0 (well under
the 2 s acceptance target). The test demonstrates the isolation
property — the heavy /ingest holds its own admission slot but
doesn't affect light actors since they have separate per-actor
state.
Usage:
cargo run --release -p omnigraph-server --example bench_actor_isolation -- \
--light-actors 4 --light-ops-per-actor 30 \
--heavy-batches 20 --heavy-rows-per-batch 50 \
--inflight-cap 1 \
--output .context/bench-results/after-pr2-phase2/actor-isolation.json
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundles the working-tree state from the prior session (PR 0 bench harness,
PR 1a audit_actor_id removal, PR 1b WriteQueueManager + writer integration)
together with the first half of PR 2's interior-mutability foundation
(catalog and schema_source wrapped in Arc<ArcSwap<...>>). The two streams
intermix in 7 of the same files, so splitting via git add -p was
impractical. Subsequent PR 2 steps land as separate atomic commits.
PR 0 — server-level concurrent /change bench harness
- crates/omnigraph-server/examples/bench_concurrent_http.rs (new)
- .context/bench-results/{baseline-main,after-pr1}/ (gitignored)
PR 1a — drop the audit_actor_id field, thread per-call
- removed Omnigraph::audit_actor_id and the swap-restore patterns in
mutation.rs, merge.rs, loader/mod.rs
- actor_id: Option<&str> threaded through MutationStaging::finalize,
mutate_with_current_actor, ingest_with_current_actor,
branch_merge_impl, branch_merge_on_current_target,
commit_prepared_updates*, record_merge_commit,
commit_updates_on_branch_with_expected
- apply_schema and ensure_indices_for_branch pass None (system-attributed)
PR 1b — per-(table_key, branch) write queue + revalidation + sidecar
- new crates/omnigraph/src/db/write_queue.rs with WriteQueueManager,
acquire/acquire_many, sorted+deduped acquisition; 6 unit tests
- Arc<WriteQueueManager> field on Omnigraph + db.write_queue() accessor
- MutationStaging::finalize split into stage_all (Phase A, no queue)
and StagedMutation::commit_all (Phase B, acquire_many + revalidate
pins + sidecar + commit_staged); guards held across publisher
- delete-only mutations now emit recovery sidecars; revalidation
extended to inline_committed tables
- branch_merge_on_current_target, apply_schema_with_lock, and
ensure_indices_for_branch acquire per-table queues for their
touched tables
PR 2 Step B (partial) — catalog and schema_source via ArcSwap
- catalog: Catalog -> Arc<ArcSwap<Catalog>>
- schema_source: String -> Arc<ArcSwap<String>>
- public accessors return Arc<Catalog> / Arc<String>; readers bind
locally where the borrow has to outlive an expression
- new pub(crate) store_catalog / store_schema_source helpers replace
the field assignments in apply_schema and reload_schema_if_source_changed
- 117 tests across lifecycle/end_to_end/branching/runs pass; engine
lib + workspace compile clean
Coordinator wrap (Mutex) and the &mut self -> &self engine API
conversion follow in subsequent commits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>