PR 7 of the MR-668 multi-graph server work. Operators can now add a
graph to a running multi-graph server without restarting:
curl -X POST http://server/graphs \
-H "Content-Type: application/json" \
-d '{
"graph_id": "beta",
"uri": "/data/beta.omni",
"schema": { "source": "node Person { name: String @key }\n" },
"policy": { "file": "./policies/beta.yaml" }
}'
DELETE remains deferred (out of v0.7.0 scope per the trimmed plan —
no `delete_prefix`, no tombstones).
Body shape (decision 7):
- Nested `schema: { source: "..." }` (mirrors the `policy: { file }`
pattern; leaves room for future fields without breakage).
- Optional nested `policy: { file: "..." }` for per-graph Cedar.
- 32 MiB body limit (reuses `INGEST_REQUEST_BODY_LIMIT_BYTES`).
- Asymmetric with `SchemaApplyRequest` which keeps flat
`schema_source: String` — documented in api.rs.
Atomic YAML rewrite + drift detection:
- New `config::rewrite_atomic(path, new_config, expected_hash)`:
flock → re-read + hash check → serialize → write `.tmp` → fsync
→ rename → fsync parent dir. Returns the new hash for the caller
to update its in-memory baseline.
- New `config::hash_config_file(path)` — SHA-256 of the on-disk
bytes, used at startup and after each rewrite.
- New `RewriteAtomicError { Drift | Io | Serialize }` enum.
- `AppState.config_hash: Option<Arc<Mutex<[u8;32]>>>` carries the
in-memory baseline. Updated after every successful rewrite so
subsequent POSTs don't false-trigger drift.
- The mutex is `std::sync::Mutex` (brief critical section, no .await
inside). The flock itself serializes file access process-wide
AND across multiple server instances (defense in depth).
- All sync I/O runs inside `tokio::task::spawn_blocking` — flock
is sync.
Handler ordering (the load-bearing sequence):
1. Mode check: 405 in single mode.
2. Cedar authorize: `GraphCreate` against `Omnigraph::Server::"root"`.
3. Validate body: `GraphId::try_from` (regex + reserved-name), empty
schema/uri checks, per-graph policy file parse.
4. Pre-check registry for duplicate graph_id / duplicate uri (409).
5. `Omnigraph::init` the new engine.
6. Atomic YAML rewrite (drift detection inside).
7. Publish in registry (atomic re-check via `GraphRegistry::insert`).
Failure modes (documented in handler rustdoc):
- Init fails → orphan storage at `req.uri` (PR 2a cleans up schema
files; Lance datasets remain orphans until `delete_prefix` lands).
- YAML rewrite fails (drift, IO) → orphan storage; YAML unchanged.
- Registry insert fails (race) → YAML has entry but registry doesn't;
next restart opens it cleanly.
New dependency: `fs2 = "0.4"` (workspace + omnigraph-server). POSIX-only
file locking. Linux/macOS deployment supported; Windows out of scope.
Tests (10 new in `tests/server.rs::multi_graph_startup`):
- `post_graphs_creates_a_new_graph_end_to_end` — happy path, includes
YAML inspection to confirm the rewrite landed.
- `post_graphs_baseline_hash_updates_between_rewrites` — two POSTs in
a row both succeed (drift baseline updates correctly).
- `post_graphs_duplicate_graph_id_returns_409`
- `post_graphs_duplicate_uri_returns_409`
- `post_graphs_invalid_graph_id_returns_400` (reserved name)
- `post_graphs_empty_schema_source_returns_400`
- `post_graphs_returns_405_in_single_mode`
- `post_graphs_yaml_drift_detection_returns_503` — operator hand-edits
omnigraph.yaml; server refuses to clobber.
- `hash_config_file_is_deterministic_and_detects_changes`
- `rewrite_atomic_refuses_when_hash_drifts`
OpenAPI: `server_graphs_create` registered in `ApiDoc::paths(...)`;
openapi.json regenerated.
Result: 225 server tests green (74 lib + 66 openapi + 85 integration),
all MR-731 regressions still pinned.
LOC: ~580 lib.rs net (handler + helpers), ~120 config.rs (rewrite
machinery), +71 api.rs (request/response shapes), +332 tests/server.rs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR 6b of the MR-668 multi-graph server work. First management endpoint —
`GET /graphs` lists every graph registered with the server, gated by the
server-level Cedar policy from PR 6a.
New API shapes (in `omnigraph-server::api`):
- `GraphInfo { graph_id, uri }` — one entry per registered graph.
- `GraphListResponse { graphs: Vec<GraphInfo> }` — sorted alphabetically
by `graph_id` for deterministic output.
Handler `server_graphs_list`:
- Mounted at `GET /graphs` in both modes.
- Single mode: returns 405 (resource exists in the API surface, just
not operational without a `graphs:` map). 405 chosen over 404 so
clients see "resource exists, wrong context" rather than "no such
resource".
- Multi mode: requires bearer auth (when configured); Cedar-gated by
`PolicyAction::GraphList` against `Omnigraph::Server::"root"`
(PR 6a's chassis). Returns the sorted registry list.
Cedar gate composition:
- When no `server.policy.file` is configured, the MR-723 default-deny
falls through: `GraphList` is not `Read`, so an authenticated actor
without a server policy gets 403. This is the right default — don't
expose the registry until the operator explicitly authorizes it.
- When a server policy is configured, Cedar evaluates the rule. The
test `get_graphs_with_server_policy_authorizes_per_cedar` pins the
admin-allow / viewer-deny split.
Routing:
- New `management` sub-router holding `/graphs` (auth-required, no
`resolve_graph_handle` middleware — operates on the registry, not
a single graph).
- Single mode merges flat protected routes + management.
- Multi mode merges nested `/graphs/{graph_id}/...` + management.
OpenAPI:
- `server_graphs_list` registered in `ApiDoc::paths(...)`.
- `EXPECTED_PATHS` in `tests/openapi.rs` gains `/graphs`.
- `openapi.json` regenerated (auto-tracked by
`openapi_spec_is_up_to_date` in CI).
Tests: 4 new in `tests/server.rs::multi_graph_startup`:
- `get_graphs_lists_registered_graphs_in_multi_mode`
- `get_graphs_returns_405_in_single_mode`
- `get_graphs_requires_bearer_auth_when_configured`
- `get_graphs_with_server_policy_authorizes_per_cedar`
What's NOT in this PR (deferred):
- Per-graph policy enforcement is wired through `handle.policy`
(PR 4a already did this); PR 6b doesn't add new per-graph
behavior beyond making sure the server policy lookup composes
cleanly alongside it.
- `POST /graphs` (PR 7) and `DELETE /graphs/{id}` (out of scope
for v0.7.0).
- CLI `omnigraph graphs list` (PR 8 will add).
Result: 215 server tests green (74 lib + 66 openapi + 75 integration),
11 policy tests green. MR-731 spoof regression preserved across all
this work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* gitignore: exclude docs/internal/ from publication
Mirrors the existing "Local-only working files (not for the public
repo)" pattern. Working notes filed under docs/internal/ stay on the
contributor's machine instead of cluttering the published doc tree
or tripping the AGENTS.md / docs-index cross-link check
(scripts/check-agents-md.sh enumerates every docs/*.md and requires
each one to be linked from an audience index — internal notes don't
have an audience index by definition).
Incidental to the v0.5.0 release; lands separately from the version
bump commits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: skip docs/internal/ in agents-md cross-link check
Matches the .gitignore exclusion. Mirrors the existing 'docs/releases/'
exclusion pattern: notes under docs/internal/ aren't part of the
published doc tree and don't need to be linked from an audience index.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* release: v0.5.0 — Lance 6 substrate, Cedar policy engine, schema-lint v1
Bumps the workspace from 0.4.2 to 0.5.0. Release notes at
docs/releases/v0.5.0.md.
Three user-visible pillars motivate the minor bump:
1. Lance 6.0.1 substrate (DataFusion 52→53, Arrow 57→58)
2. Engine-wide Cedar policy enforcement on every _as writer; server
defaults to deny-all; signed-token-claim-only actor identity
3. Schema-lint v1 chassis: OG-XXX-NNN codes, soft drops, and
`--allow-data-loss` (Hard mode) for destructive migrations
Plus structured DataFusion Expr filter pushdown (unblocks
CompOp::Contains via array_has), HTTP allow_data_loss parity, inline
.gq sources on CLI/HTTP, optional CORS layer, and bug fixes
(merge-insert dup-rowid, branch-merge coordinator restore on error,
blob columns in branch merge).
Sites bumped:
- 5 crate [package].version lines (omnigraph, omnigraph-cli,
omnigraph-compiler, omnigraph-policy, omnigraph-server)
- 10 internal path-dep `version = "..."` constraints across the
four manifests that depend on sister crates (engine, server, cli,
plus engine's dev-dep on the compiler)
- Cargo.lock (regenerated via cargo update --workspace)
- AGENTS.md "Version surveyed:"
- openapi.json `info.version` (regenerated via
OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test
openapi)
Verification:
- cargo test --workspace --locked: 907/907 green
- cargo test -p omnigraph-engine --test failpoints --features
failpoints: 19/19 green
- cargo test -p omnigraph-engine --test lance_surface_guards: 3/3
- scripts/check-agents-md.sh: clean
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema-lint chassis v1.2 (PR #100) shipped `--allow-data-loss` on
the CLI, but `SchemaApplyRequest` had no equivalent field — Hard-mode
drops were CLI-only. This commit closes that feature gap and adds e2e
test coverage for drop modes across HTTP + CLI, plus data preservation
on additive apply, plus a CLI↔SDK plan-parity assertion.
Feature gap closed:
- `crates/omnigraph-server/src/api.rs` — added `allow_data_loss: bool`
(default false via `#[serde(default)]`) to `SchemaApplyRequest`.
Added `Default` derive so test usages can use `..Default::default()`.
- `crates/omnigraph-server/src/lib.rs` — `server_schema_apply` now
constructs `SchemaApplyOptions { allow_data_loss: request.allow_data_loss }`
and threads through to `apply_schema_as`.
- `crates/omnigraph-cli/src/main.rs` — remote-URI schema-apply path
used to bail with "--allow-data-loss not yet supported on remote";
now forwards the flag into the JSON payload so the CLI behaves
identically against local and remote URIs.
- `openapi.json` — regenerated; only diff is the new field on
`SchemaApplyRequest`.
Tests added (8 new):
* `crates/omnigraph-server/tests/server.rs` (+5):
- `schema_apply_route_soft_drops_property_via_http` — POST schema
removing nullable property, verify catalog reflects the drop AND
`snapshot_at_version(pre)` still has `age` in the field list
(time-travel reachability is the Soft contract).
- `schema_apply_route_soft_drops_node_type_via_http` — POST schema
removing `Company` node + cascading `WorksAt` edge.
- `schema_apply_route_hard_drops_property_with_allow_data_loss` —
POST with `allow_data_loss: true`, verify plan step reports
`mode: hard`.
- `schema_apply_route_keeps_drops_soft_without_flag` — same schema
without flag, verify `mode: soft`. Pins default semantics against
accidental Hard promotion.
- `schema_apply_route_additive_property_preserves_existing_rows` —
load fixture, POST adding nullable property, verify row count
preserved (SDK suite covers data preservation on drops + renames;
additive AddProperty wasn't pinned).
Plus helpers `schema_without_age` and `schema_without_company`.
* `crates/omnigraph-cli/tests/cli.rs` (+3):
- `schema_apply_allow_data_loss_flag_promotes_drops_to_hard` — CLI
`omnigraph schema apply --allow-data-loss --schema X.pg --json`,
verify plan step has `mode: hard`.
- `schema_apply_without_allow_data_loss_keeps_soft_drops` — without
flag, verify Soft.
- `schema_plan_parity_cli_and_sdk` — same `.pg` source through
`Omnigraph::plan_schema` (SDK) and `omnigraph schema plan --json`
(CLI), assert the steps array is byte-identical post-JSON. HTTP
has no `/schema/plan` endpoint; apply-side parity is implicitly
covered by the HTTP drop tests + CLI drop tests using identical
fixtures.
Docs:
- `docs/user/schema-language.md` — new "Destructive drops" section
documenting Soft vs Hard semantics and that `allow_data_loss` is
now honored uniformly across CLI / HTTP / SDK.
Verification: every new test passes; full `cargo test --workspace --locked`
green; `scripts/check-agents-md.sh` passes.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Cursor Bugbot LOW on commit 3ad359d: try_admit_rewrite is defined and
tested but no HTTP handler calls it; the six handler OpenAPI
annotations declared status = 503 (added in 8e1a8e7) but try_admit
(the only path handlers invoke) returns 429 only. 503 was unreachable.
Fix: remove (status = 503, ...) from the six handler OpenAPI
annotations and regenerate openapi.json. Kept as forward-looking
infrastructure: try_admit_rewrite, global rewrite semaphore,
RejectReason::GlobalRewriteExhausted, ApiError::ServiceUnavailable,
the 503 branch in IntoResponse, --global-rewrite-cap, and
OMNIGRAPH_GLOBAL_REWRITE_MAX. When a future commit wires
try_admit_rewrite into a handler, the 503 OpenAPI annotation lands
alongside that wiring.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the cubic finding (P2) at lib.rs:1061: the new admission gates
add HTTP 429 / 503 failure paths but the affected endpoint
`#[utoipa::path(... responses(...) ...)]` annotations weren't updated.
Also closes a pre-existing miss on /change (admission-gated since
PR 2 Step F).
Adds (status = 429, ...) and (status = 503, ...) to all six
admission-gated endpoints:
- POST /change (operation_id = "change")
- POST /schema/apply (operation_id = "applySchema")
- POST /ingest (operation_id = "ingest")
- POST /branches (operation_id = "createBranch")
- DELETE /branches/{branch} (operation_id = "deleteBranch")
- POST /branches/merge (operation_id = "mergeBranches")
The descriptions reference the `Retry-After` header, which the
`IntoResponse for ApiError` impl emits on both codes (added in
commit c745dd6).
openapi.json regenerated via OMNIGRAPH_UPDATE_OPENAPI=1; the openapi
sentinel test passes both with the regen flag and in strict-check
mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The substantive PR 2 change. Removes the global server `RwLock<Omnigraph>`
that has serialized every mutating request across all actors. Disjoint
`(table, branch)` writes from different actors now run concurrently,
guarded only by the engine's per-(table, branch) write queue (PR 1b)
and per-actor admission control (PR 2 Step E).
AppState changes:
- `db: Arc<RwLock<Omnigraph>>` -> `engine: Arc<Omnigraph>`
- New field: `workload: Arc<workload::WorkloadController>` initialized
from env (`OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=16`,
`OMNIGRAPH_PER_ACTOR_BYTES_MAX=4GiB`,
`OMNIGRAPH_GLOBAL_REWRITE_MAX=4`).
- `tokio::sync::RwLock` import dropped.
Handler updates (16 sites):
- All `Arc::clone(&state.db).read_owned().await` and `write_owned()`
calls replaced with `let db = &state.engine`. Engine APIs are now
`&self` (Step C) so this works directly.
- `/export` clones `Arc<Omnigraph>` once and moves into the spawned
task instead of acquiring a long-held read lock.
- `/change` handler additionally wires
`state.workload.try_admit(&actor_arc, est_bytes)`. Cedar runs FIRST
so denied requests don't consume admission slots; admission runs
SECOND before the engine call. `est_bytes` uses the request body
size as a coarse proxy.
API surface additions (`api::ErrorCode`):
- `TooManyRequests` -> HTTP 429 (per-actor cap exceeded; respect
`Retry-After`)
- `ServiceUnavailable` -> HTTP 503 (global rewrite pool exhausted)
`ApiError` constructors `too_many_requests` / `service_unavailable` and
`from_workload_reject` (maps `RejectReason` variants to HTTP status).
Other mutating handlers (`/ingest`, `/branches/*`, `/branches/merge`,
`/schema/apply`) currently flow through the Arc<Omnigraph> path
without admission gates; wiring those is mechanical and lands as a
follow-up. The /change hot path covers the bulk of MR-686's load
profile.
OpenAPI regenerated to include the new ErrorCode variants.
102 lib + 39 server tests + 5 workload tests pass. The regression
sentinel `change_conflict_returns_manifest_conflict_409` continues
to pass (revalidation perf opt + per-table queue + publisher CAS
preserve manifest_conflict semantics under concurrent writers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add operation descriptions and examples to utoipa annotations so the
generated TypeScript SDK has rich JSDoc, and so future Python/Go SDKs
and any /openapi.json docs UI benefit from the same effort.
- Doc comments on all 18 handlers (utoipa picks up summary/description)
- #[schema(example = ...)] on free-text fields (query_source,
schema_source, NDJSON data) and i64 timestamps
- Destructive/irreversible warnings on change, applySchema, ingest,
mergeBranches, deleteBranch, publishRun, abortRun
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Parallel per-type load writes + omnigraph optimize/cleanup CLI
## MR-677.3 — parallel per-type load writes
The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).
## MR-676 — `omnigraph optimize` and `omnigraph cleanup`
New CLI subcommands that walk every node + edge table in the repo:
- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
runs Lance `cleanup_old_versions` to prune historical manifests +
unique fragments. Requires `--confirm` because it's destructive.
Supports both count-based and time-based retention (or both AND'd
together). Time uses chrono `DateTime<Utc>` (added as a workspace
dep, default-features off).
Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.
Public API on `Omnigraph`:
pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
-> Result<Vec<TableCleanupStats>>
All 10 existing loader tests still pass.
Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: regenerate openapi.json
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Introduce SDK generation scaffolding: commit a static openapi.json
extracted from the Utoipa annotations via a golden-file test, add
Stainless workspace/config for TypeScript and Python SDKs, and clean
up operation IDs for ergonomic generated method names.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>