Commit graph

16 commits

Author SHA1 Message Date
Ragnor Comerford
eab99e6f48
fix: close validated init and multi-graph gaps 2026-05-28 15:41:04 +02:00
Ragnor Comerford
5b19a03003
mr-668: align GET /graphs 405 body code with HTTP status
The single-mode `GET /graphs` handler returned an `ApiError` built
via struct literal with `status: METHOD_NOT_ALLOWED, code: BadRequest`.
The body code disagreed with the HTTP status — clients deserializing
on `code` saw `bad_request`, clients deserializing on `status` saw
405. Same bug class as the earlier 503+Conflict mismatch on the
removed YAML drift path.

Close the class for this one remaining instance:

* Add `ErrorCode::MethodNotAllowed` to the API enum.
* Add `ApiError::method_not_allowed(msg)` — pairs the 405 status
  with the matching code.
* Replace the struct literal in `server_graphs_list` with the
  constructor.
* Regenerate `openapi.json` (adds `method_not_allowed` to the
  ErrorCode schema enum).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 12:04:51 +02:00
Ragnor Comerford
52f28cebe8
mr-668: comment cleanup and policy format style
Strip "PR Na/Nb" sub-PR references throughout MR-668 surfaces — they
were useful during the 10-PR delivery sequence but rot now that the
work is in the tree. Keep the MR-668 umbrella references.

Also:
- Add explicit `when = when` and `resource_literal = resource_literal`
  named args in `compile_policy_source`'s outer `format!` to match the
  surrounding crate style (already explicit for `group` and `action`).
- Rename the best-effort cleanup tracing target from
  "omnigraph::init" to "omnigraph::init::cleanup" so operators can
  filter init-failure cleanup events separately from init's other
  log lines.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 11:57:04 +02:00
Ragnor Comerford
937fd6382d
mr-668: remove POST /graphs and CLI graphs create (defer runtime graph mgmt)
The POST /graphs runtime-create endpoint shipped in PR 7/10 has three
unresolved high-severity bugs:

  - flock-on-renamed-inode race: the YAML flock is taken on
    omnigraph.yaml itself, then a temp file is renamed over it.
    Cross-process writers end up locking different inodes — both
    believing they hold exclusive access.
  - duplicate-check outside the file lock: precheck runs against
    the in-memory registry only; the locked closure does
    config.graphs.insert(...) unconditionally. Concurrent same-id
    POSTs can persist the loser in YAML while the in-memory registry
    keeps the winner — they disagree after restart.
  - best_effort_cleanup_init_artifacts deletes _schema.pg /
    _schema.ir.json / __schema_state.json on any init failure. An
    accidental re-init against an existing graph's URI destroys its
    schema; subsequent open() fails at read_text(_schema.pg).

The correct fix is a Lance-style cluster catalog (reserve → init →
publish with recovery sidecars), parallel to the engine's existing
__manifest discipline. That work is out of scope for v0.7.0.

For now, disable runtime add/remove from the network and CLI surface.
Operators add graphs by editing omnigraph.yaml and restarting. The
GET /graphs read-only enumeration stays.

Removed:
- POST /graphs handler + router fragment + utoipa registration
- 13 post_graphs_* server tests + 3 composite POST tests +
  multi_mode_app_with_real_config / post_graph helpers
- CLI omnigraph graphs create subcommand + its handler + cli.rs tests
- system_remote.rs combined list+create test trimmed to list-only
- YAML rewrite infra: rewrite_atomic[_with_modify], RewriteAtomicError,
  staging_path, hash_config_file, AppState::config_hash field +
  threading through new_multi and open_multi_graph_state
- fs2 dependency (verified absent from cargo tree)
- sha2/fs2 imports in config.rs (only the rewrite path used them)
- Cedar PolicyAction::GraphCreate variant + "graph_create" match arms
  + action def in Cedar schema + graph_create_action_authorizes_against_server_resource test
- GraphCreateRequest / GraphCreateResponse / GraphSchemaSpec /
  GraphPolicySpec API types (only the POST handler / CLI imported them)

Kept:
- GET /graphs (read-only enumeration) and graph_list Cedar action
- omnigraph graphs list CLI subcommand
- All multi-graph startup, mode inference, cluster routes,
  per-graph + server-level Cedar policies
- server_settings_drive_multi_graph_startup_end_to_end (the test
  that covers operator-authored YAML + restart — the path that
  survives)
- best_effort_cleanup_init_artifacts and the three init failpoints
  (still reachable from CLI `omnigraph init`; preflight fix deferred
  as a follow-up)
- GraphRegistry::insert and its concurrency tests — production
  callers gone, but the method is the natural seam for the future
  cluster-catalog work

Also fixed (transcript issue 4):
- ALWAYS_FLAT_PATHS now includes /graphs so multi-mode OpenAPI
  advertises the management route correctly (was previously rewritten
  to /graphs/{graph_id}/graphs)
- multi_mode_openapi_keeps_healthz_flat → renamed to
  multi_mode_openapi_keeps_management_paths_flat, asserts both
  /healthz and /graphs stay flat
- multi_mode_openapi_prefixes_operation_ids_with_cluster skips
  /graphs in addition to /healthz

Doc fixes:
- docs/user/cli.md: graphs list example was --target http://...,
  but --target is a config-graph-name lookup; corrected to --uri.
  Removed the graphs create example.
- docs/user/server.md: dropped POST /graphs row, "omnigraph.yaml
  ownership", and "POST /graphs body shape" sections. Added a
  paragraph stating runtime add/remove is not exposed in v0.7.0.
- docs/user/policy.md: dropped graph_create action; reworded the
  "Configuration" line to clarify that server-scoped rules (graph_list)
  take neither branch_scope nor target_branch_scope.
- docs/releases/v0.7.0.md: rewrote release narrative — multi-graph
  mode ships; runtime add/remove deferred.
- AGENTS.md: HTTP server bullet and capability matrix row updated to
  reflect read-only GET /graphs and the operator-edit workflow.
- openapi.json regenerated; /graphs has only .get, no .post.

Diff: 17 files, +123 −1525 LOC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 17:49:38 +02:00
Ragnor Comerford
a4e6cb689a
mr-668: POST /graphs runtime create endpoint (PR 7/10)
PR 7 of the MR-668 multi-graph server work. Operators can now add a
graph to a running multi-graph server without restarting:

  curl -X POST http://server/graphs \
    -H "Content-Type: application/json" \
    -d '{
          "graph_id": "beta",
          "uri": "/data/beta.omni",
          "schema": { "source": "node Person { name: String @key }\n" },
          "policy": { "file": "./policies/beta.yaml" }
        }'

DELETE remains deferred (out of v0.7.0 scope per the trimmed plan —
no `delete_prefix`, no tombstones).

Body shape (decision 7):
  - Nested `schema: { source: "..." }` (mirrors the `policy: { file }`
    pattern; leaves room for future fields without breakage).
  - Optional nested `policy: { file: "..." }` for per-graph Cedar.
  - 32 MiB body limit (reuses `INGEST_REQUEST_BODY_LIMIT_BYTES`).
  - Asymmetric with `SchemaApplyRequest` which keeps flat
    `schema_source: String` — documented in api.rs.

Atomic YAML rewrite + drift detection:
  - New `config::rewrite_atomic(path, new_config, expected_hash)`:
    flock → re-read + hash check → serialize → write `.tmp` → fsync
    → rename → fsync parent dir. Returns the new hash for the caller
    to update its in-memory baseline.
  - New `config::hash_config_file(path)` — SHA-256 of the on-disk
    bytes, used at startup and after each rewrite.
  - New `RewriteAtomicError { Drift | Io | Serialize }` enum.
  - `AppState.config_hash: Option<Arc<Mutex<[u8;32]>>>` carries the
    in-memory baseline. Updated after every successful rewrite so
    subsequent POSTs don't false-trigger drift.
  - The mutex is `std::sync::Mutex` (brief critical section, no .await
    inside). The flock itself serializes file access process-wide
    AND across multiple server instances (defense in depth).
  - All sync I/O runs inside `tokio::task::spawn_blocking` — flock
    is sync.

Handler ordering (the load-bearing sequence):
  1. Mode check: 405 in single mode.
  2. Cedar authorize: `GraphCreate` against `Omnigraph::Server::"root"`.
  3. Validate body: `GraphId::try_from` (regex + reserved-name), empty
     schema/uri checks, per-graph policy file parse.
  4. Pre-check registry for duplicate graph_id / duplicate uri (409).
  5. `Omnigraph::init` the new engine.
  6. Atomic YAML rewrite (drift detection inside).
  7. Publish in registry (atomic re-check via `GraphRegistry::insert`).

Failure modes (documented in handler rustdoc):
  - Init fails → orphan storage at `req.uri` (PR 2a cleans up schema
    files; Lance datasets remain orphans until `delete_prefix` lands).
  - YAML rewrite fails (drift, IO) → orphan storage; YAML unchanged.
  - Registry insert fails (race) → YAML has entry but registry doesn't;
    next restart opens it cleanly.

New dependency: `fs2 = "0.4"` (workspace + omnigraph-server). POSIX-only
file locking. Linux/macOS deployment supported; Windows out of scope.

Tests (10 new in `tests/server.rs::multi_graph_startup`):
  - `post_graphs_creates_a_new_graph_end_to_end` — happy path, includes
    YAML inspection to confirm the rewrite landed.
  - `post_graphs_baseline_hash_updates_between_rewrites` — two POSTs in
    a row both succeed (drift baseline updates correctly).
  - `post_graphs_duplicate_graph_id_returns_409`
  - `post_graphs_duplicate_uri_returns_409`
  - `post_graphs_invalid_graph_id_returns_400` (reserved name)
  - `post_graphs_empty_schema_source_returns_400`
  - `post_graphs_returns_405_in_single_mode`
  - `post_graphs_yaml_drift_detection_returns_503` — operator hand-edits
    omnigraph.yaml; server refuses to clobber.
  - `hash_config_file_is_deterministic_and_detects_changes`
  - `rewrite_atomic_refuses_when_hash_drifts`

OpenAPI: `server_graphs_create` registered in `ApiDoc::paths(...)`;
openapi.json regenerated.

Result: 225 server tests green (74 lib + 66 openapi + 85 integration),
all MR-731 regressions still pinned.

LOC: ~580 lib.rs net (handler + helpers), ~120 config.rs (rewrite
machinery), +71 api.rs (request/response shapes), +332 tests/server.rs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 20:38:58 +02:00
Ragnor Comerford
94b6346bdd
mr-668: GET /graphs endpoint + per-graph policy wire-up (PR 6b/10)
PR 6b of the MR-668 multi-graph server work. First management endpoint —
`GET /graphs` lists every graph registered with the server, gated by the
server-level Cedar policy from PR 6a.

New API shapes (in `omnigraph-server::api`):
  - `GraphInfo { graph_id, uri }` — one entry per registered graph.
  - `GraphListResponse { graphs: Vec<GraphInfo> }` — sorted alphabetically
    by `graph_id` for deterministic output.

Handler `server_graphs_list`:
  - Mounted at `GET /graphs` in both modes.
  - Single mode: returns 405 (resource exists in the API surface, just
    not operational without a `graphs:` map). 405 chosen over 404 so
    clients see "resource exists, wrong context" rather than "no such
    resource".
  - Multi mode: requires bearer auth (when configured); Cedar-gated by
    `PolicyAction::GraphList` against `Omnigraph::Server::"root"`
    (PR 6a's chassis). Returns the sorted registry list.

Cedar gate composition:
  - When no `server.policy.file` is configured, the MR-723 default-deny
    falls through: `GraphList` is not `Read`, so an authenticated actor
    without a server policy gets 403. This is the right default — don't
    expose the registry until the operator explicitly authorizes it.
  - When a server policy is configured, Cedar evaluates the rule. The
    test `get_graphs_with_server_policy_authorizes_per_cedar` pins the
    admin-allow / viewer-deny split.

Routing:
  - New `management` sub-router holding `/graphs` (auth-required, no
    `resolve_graph_handle` middleware — operates on the registry, not
    a single graph).
  - Single mode merges flat protected routes + management.
  - Multi mode merges nested `/graphs/{graph_id}/...` + management.

OpenAPI:
  - `server_graphs_list` registered in `ApiDoc::paths(...)`.
  - `EXPECTED_PATHS` in `tests/openapi.rs` gains `/graphs`.
  - `openapi.json` regenerated (auto-tracked by
    `openapi_spec_is_up_to_date` in CI).

Tests: 4 new in `tests/server.rs::multi_graph_startup`:
  - `get_graphs_lists_registered_graphs_in_multi_mode`
  - `get_graphs_returns_405_in_single_mode`
  - `get_graphs_requires_bearer_auth_when_configured`
  - `get_graphs_with_server_policy_authorizes_per_cedar`

What's NOT in this PR (deferred):
  - Per-graph policy enforcement is wired through `handle.policy`
    (PR 4a already did this); PR 6b doesn't add new per-graph
    behavior beyond making sure the server policy lookup composes
    cleanly alongside it.
  - `POST /graphs` (PR 7) and `DELETE /graphs/{id}` (out of scope
    for v0.7.0).
  - CLI `omnigraph graphs list` (PR 8 will add).

Result: 215 server tests green (74 lib + 66 openapi + 75 integration),
11 policy tests green. MR-731 spoof regression preserved across all
this work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 20:24:52 +02:00
Andrew Altshuler
aadfa11ecb
schema: HTTP allow_data_loss exposure + e2e drop coverage (MR-694 follow-up) (#107)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
The schema-lint chassis v1.2 (PR #100) shipped `--allow-data-loss` on
the CLI, but `SchemaApplyRequest` had no equivalent field — Hard-mode
drops were CLI-only. This commit closes that feature gap and adds e2e
test coverage for drop modes across HTTP + CLI, plus data preservation
on additive apply, plus a CLI↔SDK plan-parity assertion.

Feature gap closed:

- `crates/omnigraph-server/src/api.rs` — added `allow_data_loss: bool`
  (default false via `#[serde(default)]`) to `SchemaApplyRequest`.
  Added `Default` derive so test usages can use `..Default::default()`.
- `crates/omnigraph-server/src/lib.rs` — `server_schema_apply` now
  constructs `SchemaApplyOptions { allow_data_loss: request.allow_data_loss }`
  and threads through to `apply_schema_as`.
- `crates/omnigraph-cli/src/main.rs` — remote-URI schema-apply path
  used to bail with "--allow-data-loss not yet supported on remote";
  now forwards the flag into the JSON payload so the CLI behaves
  identically against local and remote URIs.
- `openapi.json` — regenerated; only diff is the new field on
  `SchemaApplyRequest`.

Tests added (8 new):

* `crates/omnigraph-server/tests/server.rs` (+5):
  - `schema_apply_route_soft_drops_property_via_http` — POST schema
    removing nullable property, verify catalog reflects the drop AND
    `snapshot_at_version(pre)` still has `age` in the field list
    (time-travel reachability is the Soft contract).
  - `schema_apply_route_soft_drops_node_type_via_http` — POST schema
    removing `Company` node + cascading `WorksAt` edge.
  - `schema_apply_route_hard_drops_property_with_allow_data_loss` —
    POST with `allow_data_loss: true`, verify plan step reports
    `mode: hard`.
  - `schema_apply_route_keeps_drops_soft_without_flag` — same schema
    without flag, verify `mode: soft`. Pins default semantics against
    accidental Hard promotion.
  - `schema_apply_route_additive_property_preserves_existing_rows` —
    load fixture, POST adding nullable property, verify row count
    preserved (SDK suite covers data preservation on drops + renames;
    additive AddProperty wasn't pinned).
  Plus helpers `schema_without_age` and `schema_without_company`.

* `crates/omnigraph-cli/tests/cli.rs` (+3):
  - `schema_apply_allow_data_loss_flag_promotes_drops_to_hard` — CLI
    `omnigraph schema apply --allow-data-loss --schema X.pg --json`,
    verify plan step has `mode: hard`.
  - `schema_apply_without_allow_data_loss_keeps_soft_drops` — without
    flag, verify Soft.
  - `schema_plan_parity_cli_and_sdk` — same `.pg` source through
    `Omnigraph::plan_schema` (SDK) and `omnigraph schema plan --json`
    (CLI), assert the steps array is byte-identical post-JSON. HTTP
    has no `/schema/plan` endpoint; apply-side parity is implicitly
    covered by the HTTP drop tests + CLI drop tests using identical
    fixtures.

Docs:

- `docs/user/schema-language.md` — new "Destructive drops" section
  documenting Soft vs Hard semantics and that `allow_data_loss` is
  now honored uniformly across CLI / HTTP / SDK.

Verification: every new test passes; full `cargo test --workspace --locked`
green; `scripts/check-agents-md.sh` passes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 01:56:46 +03:00
Devin AI
6a3f0677ae server: drop unwired try_admit_rewrite / 503 admission surface 2026-05-09 20:58:17 +00:00
Ragnor Comerford
c15962e6b0
server: flip AppState to Arc<Omnigraph>, wire admission on /change (PR 2 Step F)
The substantive PR 2 change. Removes the global server `RwLock<Omnigraph>`
that has serialized every mutating request across all actors. Disjoint
`(table, branch)` writes from different actors now run concurrently,
guarded only by the engine's per-(table, branch) write queue (PR 1b)
and per-actor admission control (PR 2 Step E).

AppState changes:
- `db: Arc<RwLock<Omnigraph>>` -> `engine: Arc<Omnigraph>`
- New field: `workload: Arc<workload::WorkloadController>` initialized
  from env (`OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX=16`,
  `OMNIGRAPH_PER_ACTOR_BYTES_MAX=4GiB`,
  `OMNIGRAPH_GLOBAL_REWRITE_MAX=4`).
- `tokio::sync::RwLock` import dropped.

Handler updates (16 sites):
- All `Arc::clone(&state.db).read_owned().await` and `write_owned()`
  calls replaced with `let db = &state.engine`. Engine APIs are now
  `&self` (Step C) so this works directly.
- `/export` clones `Arc<Omnigraph>` once and moves into the spawned
  task instead of acquiring a long-held read lock.
- `/change` handler additionally wires
  `state.workload.try_admit(&actor_arc, est_bytes)`. Cedar runs FIRST
  so denied requests don't consume admission slots; admission runs
  SECOND before the engine call. `est_bytes` uses the request body
  size as a coarse proxy.

API surface additions (`api::ErrorCode`):
- `TooManyRequests` -> HTTP 429 (per-actor cap exceeded; respect
  `Retry-After`)
- `ServiceUnavailable` -> HTTP 503 (global rewrite pool exhausted)

`ApiError` constructors `too_many_requests` / `service_unavailable` and
`from_workload_reject` (maps `RejectReason` variants to HTTP status).

Other mutating handlers (`/ingest`, `/branches/*`, `/branches/merge`,
`/schema/apply`) currently flow through the Arc<Omnigraph> path
without admission gates; wiring those is mechanical and lands as a
follow-up. The /change hot path covers the bulk of MR-686's load
profile.

OpenAPI regenerated to include the new ErrorCode variants.
102 lib + 39 server tests + 5 workload tests pass. The regression
sentinel `change_conflict_returns_manifest_conflict_409` continues
to pass (revalidation perf opt + per-table queue + publisher CAS
preserve manifest_conflict semantics under concurrent writers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:08:26 +02:00
Ragnor Comerford
35be20cb05
MR-771: demote Run to direct-publish via expected_table_versions CAS
mutate_as and load now write directly to target tables and call the
publisher once at the end with per-table expected versions; the Run
state machine, _graph_runs.lance writers, __run__ staging branches,
and server /runs/* endpoints are removed. Multi-statement mutations
remain atomic at the manifest level via an in-memory MutationStaging
accumulator that gives read-your-writes within a query and a single
publish at the end. Concurrent-writer conflicts surface as
ExpectedVersionMismatch (HTTP 409 manifest_conflict) instead of the
old DivergentUpdate merge shape. Documents one known limitation in
docs/runs.md: a multi-statement mid-query failure where op-N writes
a Lance fragment and op-N+1 fails leaves Lance HEAD ahead of the
manifest until a follow-up introduces per-table Lance branches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 08:52:50 +02:00
Ragnor Comerford
7809bf607e
Polish OpenAPI spec for SDK generation
Add operation descriptions and examples to utoipa annotations so the
generated TypeScript SDK has rich JSDoc, and so future Python/Go SDKs
and any /openapi.json docs UI benefit from the same effort.

- Doc comments on all 18 handlers (utoipa picks up summary/description)
- #[schema(example = ...)] on free-text fields (query_source,
  schema_source, NDJSON data) and i64 timestamps
- Destructive/irreversible warnings on change, applySchema, ingest,
  mergeBranches, deleteBranch, publishRun, abortRun

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 16:36:51 +02:00
andrew
be520f31f4 Polish schema endpoint: rename show, align field name, add tests
Review feedback on #23, applied on top of the original commit:

- Rename the CLI subcommand from `schema get` to `schema show` to match
  the existing `run show` / `commit show` convention. A `#[command(alias
  = "get")]` preserves muscle memory for anyone who already typed `get`.
- Rename `SchemaGetOutput` → `SchemaOutput` and its field `source` →
  `schema_source`, so the get response and the apply request use the
  same field name for the same concept.
- Use `println!` instead of `print!` in the CLI so the shell prompt
  doesn't land on the last line of schema output.
- Add three integration tests on `/schema`: happy path (no auth),
  401 when bearer is required but missing, 403 when the policy grants
  the actor branch_create but not read.

Follow-ups left for a separate PR: include `schema_ir_hash` and
`schema_identity_version` in the response payload so clients can do
drift detection and the server can set an ETag; and a fast-path local
read that skips `Omnigraph::open()` when only the schema source is
needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 00:30:46 +03:00
Claude
0c4df674fa
Add schema get command to CLI and HTTP API
Exposes the existing schema_source() method via a new `omnigraph schema get`
CLI subcommand and a `GET /schema` API endpoint, allowing users to retrieve
the current accepted schema from any graph repository.

https://claude.ai/code/session_01UYybeBQks3fz3RJrTHtwQw
2026-04-16 21:15:17 +00:00
Claude
859ec9faa8
Add OpenAPI spec generation via utoipa with /openapi.json endpoint
Integrate utoipa 5 to auto-generate an OpenAPI 3.1 spec from the existing
Axum handlers and serde types. All 16 endpoints are annotated with path
metadata, request/response schemas, security requirements, and tags. A
public /openapi.json endpoint serves the spec without requiring auth.

Includes 59 tests covering path completeness, HTTP methods, schema fields,
enum variants, security scheme, path/query parameters, request bodies,
response references, and endpoint integration.

https://claude.ai/code/session_01NfoPVx21rZUQned1f7WpXY
2026-04-12 11:03:23 +00:00
andrew
92fa3189f7 Add schema apply command and policy support 2026-04-12 04:01:14 +03:00
andrew
338289656a Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00