Commit graph

19 commits

Author SHA1 Message Date
Ragnor Comerford
0976cbebc5
tests: pin /ingest admission gate + 429 Retry-After (red)
Per AGENTS.md rule 8, this commit lands the failing regression test
ahead of the fix. Currently fails on f925ad1 with 8/8 statuses returning
200 because /ingest does not call WorkloadController::try_admit.

The test pins:
- /ingest is gated on per-actor admission control (returns 429 when
  the cap is exceeded).
- 429 responses carry the structured `code: too_many_requests` error
  body so clients can distinguish them from generic conflicts.
- 429 responses include a `Retry-After` header so clients can implement
  bounded backoff. The doc claim at api.rs:343 and lib.rs:344 was that
  this header exists; the IntoResponse impl currently emits no headers.

Two follow-up commits will turn this green:
1. Wire WorkloadController::try_admit on /ingest and the four other
   mutating handlers (Block 2.1).
2. Emit the Retry-After header on 429/503 responses (Block 2.2).

The test uses #[serial] + EnvGuard to override
OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=1 without racing parallel tests, then
spawns 8 concurrent /ingest tasks aligned at a tokio::sync::Barrier so
multiple tasks reach try_admit close in time. With cap=1, at least one
must be rejected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:57:01 +02:00
Ragnor Comerford
c263732b1a
tests: extend same-key insert test with /snapshot row-count assertion
The existing change_concurrent_inserts_same_key_serialize_without_409
test claimed in its comment "asserts the final row count equals N" but
only checked HTTP status codes. cubic flagged the gap; this commit
adds the actual /snapshot read after the concurrent inserts to verify
all N batches landed (no silent overwrite) by comparing the post-test
node:Person row_count against SEED + N.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:49:38 +02:00
Ragnor Comerford
3b33e9ac56
tests: pin branch_create_from swap-restore race (red)
Per AGENTS.md rule 8, this commit lands the failing regression test
ahead of the fix so the red → green pair is visible in git log.

The test demonstrates that two concurrent `POST /branches` calls with
distinct `from` parents corrupt coordinator state: A's "operate" step
runs against B's swapped coordinator instead of its own, forking the
new branch off the wrong parent's HEAD.

Currently fails on f925ad1 with all 8 gamma branches (declared
parent: alpha, 5 rows) reporting 4 rows — beta's row count. The
operate step ran against beta's coord because B's swap interleaved
between A's swap and A's operate.

Fix lands in the next commit: hold a single `coordinator.write().await`
guard across the entire swap-operate-restore sequence in
`branch_create_from_impl` so the three steps are atomic relative to
other callers.

Closes the bug class "non-atomic three-step coordinator manipulation
under &self callers" rather than guarding the specific call site —
the right architectural seam (single critical section per swap-restore
sequence) eliminates the interleave window for branch_create_from and
any future swap-restore caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:44:50 +02:00
Ragnor Comerford
ebf5a5769d
tests: pin UPDATE RYW under in-process concurrency (red)
Per AGENTS.md rule 8, this commit lands the failing regression test
ahead of the fix so the red → green pair is visible in git log.

The test asserts the RYW invariant for in-process concurrent UPDATEs on
the same row: exactly one writer commits and N-1 receive 409
manifest_conflict. Currently fails on f925ad1 with 1 x 200 + 7 x 500:

> "storage: Retryable commit conflict for version 6: This Update
>  transaction was preempted by concurrent transaction Update at
>  version 6. Please retry."

Lance's transaction conflict resolver correctly detects the Update vs
Update race, but the error wraps as `OmniError::Lance(<string>)` and the
API surfaces it as 500 internal rather than 409 retryable conflict. Users
see "internal server error" for what is documented as a retryable
conflict path.

The fix lands in the next commit: an op-kind-aware drift check at the
commit_all entry that returns 409 ExpectedVersionMismatch for tables
whose first touch was Update / Delete / SchemaRewrite when the staged
dataset version drifts from the manifest pin under the queue.

Closes the bug class "Lance internal conflict surfaces as 500 instead
of 409" rather than mapping the specific Lance error variant — the
right architectural layer (engine boundary, under the queue) catches
the drift before commit_staged ever runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 16:33:53 +02:00
Ragnor Comerford
f925ad1739
mr-686: Phase 2 — op-kind-aware version check + coord Mutex → RwLock
Fix A: op-kind-aware ensure_expected_version. Insert/Merge skip the
strict pre-stage check; Update/Delete/SchemaRewrite keep it. New
MutationOpKind enum threaded through open_for_mutation_on_branch /
open_owned_dataset_for_branch_write / reopen_for_mutation and all
callers (execute_insert/update/delete_node/delete_edge,
branch_merge::publish_rewritten_merge_table, schema_apply,
ensure_indices_for_branch, loader Append/Merge/Overwrite). Closes the
77% rejection rate on same-key concurrent inserts.

Fix B: coordinator Mutex -> RwLock. Reads parallelize via .read();
writes serialize via .write(). Atomic-commit invariant preserved by
the single .write() covering commit_manifest_updates +
record_graph_commit.

Bench-as-test change_concurrent_inserts_same_key_serialize_without_409
(server.rs:2180) spawns 12 concurrent /change inserts on a single
(table, branch); asserts every request returns 200. Was failing
pre-Phase-2; passes post-Phase-2.
change_conflict_returns_manifest_conflict_409 (cross-process drift
sentinel) and branch_merge_conflict_response_includes_structured_conflicts
both still pass.

Bench (after-pr2-phase2):
- single-actor 1x1: 14.9 ops/s, p50 68ms (baseline 12.3, +22%)
- disjoint 8x8:    7.04 ops/s, p50 1023ms (baseline 6.24, +13%)
- same-key 8x1:    2.62 ops/s, 0 errors (after-pr2: 77% errors)

Disjoint stayed at +13% — Fix B's RwLock helped read paths but the
publisher's .write() critical section still serializes graph-wide.
Splitting GraphCoordinator into per-concern primitives (manifest in
ArcSwap, commit_graph in RwLock, atomic-commit serializer) is the
deferred next step.

102 lib + 30 branching + 24 runs + 16 staged_writes + 63 end_to_end
+ 40 server tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 12:42:26 +02:00
Ragnor Comerford
044ed46019
chore: scrub Linear ticket numbers and review-bot mentions from code comments
OmniGraph is OSS; internal Linear ticket references and code-review-bot
mentions in source-code comments don't help external readers and leak
internal tooling. Replace ticket numbers (MR-XXX) with descriptive
prose, drop linear.app URLs, and remove inline mentions of
Cursor/Bugbot/Cubic/Codex review threads.

Scope is limited to source-code comments (`crates/`). Docs under
`docs/` keep their MR-XXX references — those are part of the
established change-history narrative for in-repo docs and don't
require a Linear account to find context for.

No behavior changes; no public API changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:45:38 +02:00
Ragnor Comerford
35be20cb05
MR-771: demote Run to direct-publish via expected_table_versions CAS
mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 08:52:50 +02:00
Andrew Altshuler
7310f69928
Revert "Merge pull request #49 from ModernRelay/ragnorc/x-request-id" (#54)
This reverts commit b352fca13c, reversing
changes made to 748ad334a9.
2026-04-26 15:56:29 +03:00
Ragnor Comerford
284c9377c2
Add X-Request-Id middleware
Per-request ULID minted at the edge, exposed in request extensions and
on the response header. Caller-supplied X-Request-Id is echoed when
well-formed (1..=128 ASCII printable characters); otherwise rejected
and replaced with a fresh ULID so the value is always safe to log.

Companion to the TypeScript SDK redesign — clients now correlate logs
across the wire by reading X-Request-Id from response headers (and the
SDK already surfaces it on every OmnigraphError as `requestId`).

No spec change required; the header is a transport-layer concern.

Tests:
- mint a ULID when no header is provided
- echo a valid caller-supplied id
- reject overlong header (200 chars), mint a fresh ULID

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 22:56:17 +02:00
Ragnor Comerford
a157f6a17c
Fold openapi.json auto-sync into main CI test job
The separate openapi-sync workflow was duplicating the workspace build
(~15 min cold-cache compile), paying the cost twice per PR. Fold the
regen + auto-commit into the existing test job: one compile, shared
rust-cache, same drift-check semantics.

- Same-repo PRs: OMNIGRAPH_UPDATE_OPENAPI=1 during the test run, then
  commit the regenerated spec back to the PR branch
- Fork PRs / pushes: env var empty, test stays in strict drift-check mode
- openapi_spec_is_up_to_date treats empty env value as unset, so the
  conditional workflow env expression works

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:00:46 +02:00
Ragnor Comerford
9de2079263
Merge remote-tracking branch 'origin/main' into ragnorc/explore-api
# Conflicts:
#	CONTRIBUTING.md
2026-04-18 20:24:39 +02:00
andrew
c338e80180 Harden bearer auth: constant-time compare, hashed at rest, authoritative actor_id
Fixes two live authz bugs in omnigraph-server:

- Bearer-token lookup previously used HashMap::get, which compares keys with
  Eq and short-circuits on the first differing byte — a network-observable
  timing oracle for brute-forcing tokens. Tokens are now stored as SHA-256
  digests and compared with subtle::ConstantTimeEq, iterating every entry
  unconditionally so total work is independent of which slot matches. Raw
  token bytes no longer live in server memory after startup.

- authorize_request now overwrites PolicyRequest.actor_id from the
  authenticated session instead of trusting the handler-supplied field,
  which previously defaulted to "" via unwrap_or_default(). The empty
  string can no longer reach Cedar as a policy subject even if a future
  refactor drops the None check.

External API of AppState constructors is unchanged — tokens still enter as
Vec<(String, String)> and are hashed on the way in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 01:41:02 +03:00
andrew
be520f31f4 Polish schema endpoint: rename show, align field name, add tests
Review feedback on #23, applied on top of the original commit:

- Rename the CLI subcommand from `schema get` to `schema show` to match
  the existing `run show` / `commit show` convention. A `#[command(alias
  = "get")]` preserves muscle memory for anyone who already typed `get`.
- Rename `SchemaGetOutput` → `SchemaOutput` and its field `source` →
  `schema_source`, so the get response and the apply request use the
  same field name for the same concept.
- Use `println!` instead of `print!` in the CLI so the shell prompt
  doesn't land on the last line of schema output.
- Add three integration tests on `/schema`: happy path (no auth),
  401 when bearer is required but missing, 403 when the policy grants
  the actor branch_create but not read.

Follow-ups left for a separate PR: include `schema_ir_hash` and
`schema_identity_version` in the response payload so clients can do
drift detection and the server can set an ETag; and a fast-path local
read that skips `Omnigraph::open()` when only the schema source is
needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 00:30:46 +03:00
Ragnor Comerford
228032a4ac
Add static OpenAPI spec and Stainless SDK config
Introduce SDK generation scaffolding: commit a static openapi.json
extracted from the Utoipa annotations via a golden-file test, add
Stainless workspace/config for TypeScript and Python SDKs, and clean
up operation IDs for ergonomic generated method names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:26:31 +02:00
Claude
0c4df674fa
Add schema get command to CLI and HTTP API
Exposes the existing schema_source() method via a new `omnigraph schema get`
CLI subcommand and a `GET /schema` API endpoint, allowing users to retrieve
the current accepted schema from any graph repository.

https://claude.ai/code/session_01UYybeBQks3fz3RJrTHtwQw
2026-04-16 21:15:17 +00:00
Claude
4c07d3c095
Make /openapi.json reflect runtime auth configuration
The served OpenAPI spec now matches runtime behavior: when no bearer
tokens or policy are configured (open mode), the spec omits security
schemes and per-operation security requirements. When auth is active,
the full bearer_token security metadata is included.

Also fixes SecurityAddon to initialize components if absent, and
removes the redundant utoipa dev-dependency.

Adds 5 new tests covering open-mode vs auth-mode spec serving.

https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY
2026-04-12 11:04:13 +00:00
Claude
859ec9faa8
Add OpenAPI spec generation via utoipa with /openapi.json endpoint
Integrate utoipa 5 to auto-generate an OpenAPI 3.1 spec from the existing
Axum handlers and serde types. All 16 endpoints are annotated with path
metadata, request/response schemas, security requirements, and tags. A
public /openapi.json endpoint serves the spec without requiring auth.

Includes 59 tests covering path completeness, HTTP methods, schema fields,
enum variants, security scheme, path/query parameters, request bodies,
response references, and endpoint integration.

https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY
2026-04-12 11:03:23 +00:00
andrew
92fa3189f7 Add schema apply command and policy support 2026-04-12 04:01:14 +03:00
andrew
338289656a Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00