mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-09 01:35:18 +02:00
346 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
5eead8d29e
|
ci(branch-protection): let code owners bypass required PR review (#154)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
CI / Container Entrypoint (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
Release Edge / Build edge omnigraph-windows-x86_64 (push) Has been cancelled
Release Edge / Smoke Windows installer (push) Has been cancelled
require_code_owner_reviews + count=1 with no bypass meant EVERY PR needed a code-owner approval — including code owners' own PRs, which can't be self-approved, so an owner's PR deadlocked on the other owner (forcing admin overrides). Intended behavior: review is required only for non-owners. Add bypass_pull_request_allowances for the two engineering owners (ragnorc, aaltshuler): they merge their own PRs after CI without a second review; non-owners still require a code-owner approval. CI status checks remain required for everyone. Applied live via scripts/apply-branch-protection.sh. Note: the bypass list mirrors codeowners-roles.yml engineering members by hand (render-codeowners.py doesn't generate it) — keep in sync on owner changes. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
c2a97f4559
|
ci: drop per-PR Windows release build; bind to release tags (#155)
The `test_windows_binaries` job ran a full Windows --release build + smoke test on every code PR. It was a non-required (non-blocking) check, so it never gated a merge — it only burned the slowest/most expensive runner (windows-latest, --release, 75-min ceiling) on every code change. Windows binary validation is already covered (better) on release tags: release.yml's `smoke_windows_installer` (on v* tags) builds the release binaries, installs via scripts/install.ps1, and smoke-runs `omnigraph.exe version` + `omnigraph-server.exe --help` — the same smoke test plus the real installer path. Nothing `needs:` the removed job. Trade-off (accepted): a PR that breaks the Windows build or install.ps1 syntax is now caught at release-cut rather than at PR time. install.ps1 and platform-specific code change rarely; the cost savings on every PR outweigh the earlier signal. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
ce150fb0ca
|
docs(testing): fix stale optimize test name in maintenance.rs row (#148)
The maintenance.rs row referenced `optimize_reconciles_preexisting_manifest_head_drift`, which never existed (leftover from the reconcile-drift heuristic removed in #141). The actual second test is `optimize_defers_when_recovery_sidecar_is_pending`. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
e62d9166fb
|
fix: optimize publishes compaction; recovery roll-back converges manifest (#141)
* test(optimize): cover manifest publish + HEAD-drift reconcile Red against the pre-fix optimize, which ran compact_files without publishing the compacted version to __manifest: - maintenance: optimize must publish so the manifest table_version tracks the compacted Lance HEAD and a later schema apply succeeds; and must reconcile a pre-existing manifest-behind-HEAD drift (forged via raw Lance compaction) so strict writes commit again. - end_to_end + composite_flow: post-optimize query / strict update / reopen in the full lifecycle (the canonical flow previously omitted post-optimize writes as a documented "known limitation"). - failpoints: a crash between compaction and the manifest publish rolls forward on next open. * fix(optimize): publish compaction to manifest and reconcile HEAD drift optimize ran Lance compact_files without publishing the new version to __manifest, so the manifest table_version lagged the Lance HEAD: reads stayed pinned to the pre-compaction version, and the next schema apply or strict update/delete failed its HEAD-vs-manifest precondition with "stale view ... refresh and retry" (open-time recovery rollback inflated the gap on retry). optimize now publishes each compacted table's version under the per-(table, main) write queue, guarded by a manifest CAS and a SidecarKind::Optimize recovery sidecar (loose-match; roll-forward is safe because compaction is content-preserving). When a table has nothing left to compact but its Lance HEAD is already ahead of the manifest pin (pre-fix drift, or a recovery restore commit), optimize reconciles the manifest forward to HEAD (metadata-only, no sidecar). Caches and the CSR/CSC graph index are invalidated after a publish. Docs updated (maintenance, storage, branches-commits, writes, testing). * test(recovery): rollback convergence + optimize-defer regressions Red against the current code, landed before the fix: - recovery: after the open-time sweep rolls a sidecar back, the manifest must track Lance HEAD (no residual drift) so a follow-up schema apply succeeds — the original "+1 per retry" loop. Today roll-back restores without publishing, so the manifest lags HEAD and the apply fails its HEAD-vs-manifest precondition. - maintenance: optimize must refuse while a recovery sidecar is pending — operating on an unrecovered graph could publish a partial write the sweep would roll back. Also removes optimize_reconciles_preexisting_manifest_head_drift: the ad-hoc drift reconcile it covered is replaced by recovery-side convergence. * fix(recovery): converge manifest on roll-back; optimize defers on pending recovery Root of PR #141's review findings and the original "+1 per retry" loop: a Lance HEAD ahead of the manifest was ambiguous (benign content-preserving drift vs. a partial write a sidecar will roll back), and optimize's reconcile guessed it benign. Close the class instead of guessing: - Recovery roll-back now PUBLISHES the restored version (via a push_table_update_at_head helper shared with roll-forward), so the manifest tracks the Lance HEAD after recovery — symmetric with roll-forward. This fixes the +1 loop (after one roll-back the retry's HEAD-vs-manifest precondition passes) and removes the only remaining source of orphaned drift. The audit still records the logical rolled-back-to version; the manifest is published at the restore commit (identical content). - optimize drops the ad-hoc drift reconcile and instead REFUSES when a __recovery sidecar is pending, so it only ever operates on a recovered graph (manifest == HEAD); its compaction publish can no longer commit a partial write. With the reconcile gone, the blob-skip-vs-reconcile gap is moot. Updates the rollback recovery-test helper (manifest == HEAD after roll-back), the failpoints assertions, and the user/dev docs. * test(recovery): fix rollback assertion for manifest convergence The roll-back-publishes change makes the manifest version advance after a SchemaApply roll-back (to the old-schema content), so the schema_apply_without_schema_staging_rolls_back_on_next_open assertion must be `version > pre`, not `version == pre`. This update was dropped during the commit churn and surfaced as a CI Test Workspace failure; the old-schema-preserved intent stays covered by count_rows + _schema.pg + the RolledBack convergence invariant. |
||
|
|
4a66d6e071
|
fix(loader): accept multi-line (pretty-printed) JSON in load (#146)
The loader read input line-by-line (reader.lines() + serde_json::from_str per line), so any delta where a JSON object spanned multiple lines failed with 'invalid JSON on line 1: EOF while parsing an object'. Compact JSONL worked; pretty-printed JSON never did. Switch to a streaming value deserializer (Deserializer::from_reader().into_iter::<Value>()), which treats any whitespace (including newlines inside objects) as a separator — so both compact JSONL and pretty-printed JSON load. Error labels switch from line numbers to record numbers (line numbers are meaningless once objects span lines). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Ragnor Comerford <ragnor.comerford@gmail.com> |
||
|
|
54842808db
|
feat(engine): sweep & remove legacy __run__ branch guard (MR-770) (#132)
* feat(engine): sweep legacy __run__ branches via v2→v3 manifest migration Pre-v0.4.0 graphs can carry stale `__run__<id>` staging branches on the `__manifest` dataset, left by the Run state machine removed in MR-771. Lance's `list_branches` still enumerates them, so they leak into `branch_list()` and count as blocking branches at schema-apply time. Add a one-time `migrate_v2_to_v3` arm to the internal-schema dispatcher: on the first read-write open it enumerates `__manifest` branches, deletes every `__run__*` ref, and bumps the stamp to 3. Idempotent under retry (re-enumerates fresh each run). The `"__run__"` prefix is inlined so the migration does not depend on the run_registry guard that MR-770 removes next. This is the prerequisite sweep; the guard removal follows in the next commit. * refactor(engine): remove the legacy __run__ branch guard (MR-770) With the v2→v3 migration sweeping stale `__run__*` branches off `__manifest` on first read-write open, the defense-in-depth `is_internal_run_branch` guard is no longer needed. - delete `db/run_registry.rs`; drop the module + re-export from `db/mod.rs` - collapse `is_internal_system_branch` to the schema-apply-lock check only - `ensure_public_branch_ref`: drop the run-ref rejection; `__run__*` is now an ordinary branch name - `branch_merge`: reject `is_internal_system_branch` (was run-only) so the schema-apply lock is rejected consistently with create/delete — a small, deliberate tightening - update the inline schema-apply test + the writes integration tests (`public_branch_apis_reject_internal_run_refs` → `public_branch_apis_reject_internal_system_refs`, which also asserts `__run__*` now creates successfully) - docs: flip the "pending production sweep / defense-in-depth" notes to "auto-swept by the v2→v3 migration"; document the read-only-open limitation Known residual: the inert `_graph_runs.lance` / `_graph_run_actors.lance` bytes remain until a `StorageAdapter::delete_prefix` primitive lands. * fix(engine): run __run__ sweep at Omnigraph::open, not only on publish Review (PR #132) caught a regression: removing __run__ from `is_internal_system_branch` exposed legacy `__run__*` branches to the schema-apply blocking-branch checks (schema_apply.rs:104 and :778) and to `branch_list()`, but the v2→v3 sweep ran only inside the publisher's `load_publish_state`. On a pre-v0.4.0 graph whose first write is a schema apply, the blocking-branch check fires before any publish, so apply failed with "found non-main branches: __run__…". The same lazy timing also created a reverse hazard: a user-created `__run__*` branch on a still-v2 graph could be deleted by the first publish's sweep. Fix: run the internal-schema migration in `Omnigraph::open(ReadWrite)` (new `manifest::migrate_on_open`), before the coordinator reads branch state. The sweep now lands before any branch-observing code, and a graph is stamped v3 at open — so the one-time sweep can never catch a legitimately-created branch. Both checks and `branch_list` see the swept graph; correct by construction for every write path. Accepted residual: a read-only open of an unmigrated legacy graph still lists `__run__*` (read-only opens must not write, so they can't sweep). Documented. Regression test `legacy_run_branch_is_swept_on_open_and_does_not_block_schema_apply` confirmed RED before the fix (panicked on the branch_list leak assertion) and GREEN after. Also updates the stale schema_apply.rs comment, the writes.md "Migration code" section, and adds the v3 row to storage.md's migration table. * test(engine): sweep multiple legacy __run__ branches; doc nit Strengthen the v2→v3 migration test to synthesize three `__run__*` branches (a real legacy graph accumulates one per run) so the migration's delete loop is exercised on a single reused dataset handle, not just a single branch. Confirms multi-branch deletion is safe. Also drop a stale "active runs" reference from the branch_delete doc line. * fix(engine): force-delete in __run__ sweep for concurrency safety `migrate_v2_to_v3` ran `Dataset::delete_branch` (= `branches().delete(.., false)`), which errors "BranchContents not found" if the branch is already gone. Since the sweep now runs in `Omnigraph::open(ReadWrite)`, two processes opening the same legacy v2 graph concurrently would race: one wins each delete, the other's open fails. The migration only claimed idempotency under *sequential* retry. Switch to `Dataset::force_delete_branch` (= `delete(.., true)`), Lance's documented path for cleaning up zombie branches, which tolerates an already-absent branch. The sweep is now idempotent under concurrent runners and robust to partial/zombie state. Found in self-review; no behavior change for the common single-open path. * docs(release): note MR-770 __run__ cleanup in v0.6.1 * docs(branches): reconcile branch cleanup semantics |
||
|
|
fd8e078a77
|
ci(codeowners): add aaltshuler to engineering role (#147)
Restores aaltshuler as an `engineering` code-owner (removed in #142), so `crates/**` and repo-infra PRs have a second reviewer besides the sole owner ragnorc — unblocking review of author-ragnorc PRs (e.g. #132) that ragnorc cannot self-approve. Edited the source of truth (.github/codeowners-roles.yml) and re-rendered .github/CODEOWNERS + the docs/dev/codeowners.md tables via .github/scripts/render-codeowners.py, per the documented flow. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
343f1f17ed
|
governance: external contribution model (issues/discussions/RFCs/PRs) (#143)
Formalize the public contribution surface. Maintainers keep a separate internal process and are exempt from the intake gates; everyone stays bound by review, CODEOWNERS, and branch protection. Model: - Issues = problem reports only (bug form + config.yml redirects ideas to Discussions and disables blank issues). - Discussions = ideas + RFC incubation. - RFCs = anyone (incl. external) authors docs/rfcs/NNNN-*.md; a maintainer merging it is acceptance. Distinct from the maintainer-internal docs/dev/rfc-00N-* track. - PRs = link an `accepted` issue or accepted RFC, or use the trivial fast-lane (typos/docs/deps). Enforced softly to start (template + review). Adds GOVERNANCE.md, rewrites CONTRIBUTING.md, adds docs/rfcs/ (README + template), .github issue/PR/discussion templates. Wires docs/rfcs/ into the doc-link checker (excluded like releases; linked from docs/dev/index.md). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
c7365bf8ef
|
ci(codeowners): un-trap required checks, auto-render, generate owner tables (#142)
The CODEOWNERS required checks blocked every PR — the real root cause was a name mismatch, compounded by a path filter: - branch-protection.json required the contexts `CODEOWNERS / drift` and `CODEOWNERS / noedit` (the GitHub UI "workflow / job-id" display form), but the jobs report check-run names from their `name:` fields — "CODEOWNERS matches source" / "CODEOWNERS not hand-edited". The required contexts therefore never matched any reported check and sat permanently pending. - The workflow was also path-filtered to CODEOWNERS files, so it didn't even run for most PRs. Net effect: with both required checks unsatisfiable, every PR could only land via admin override (e.g. #140). Fixes: - A: drop the `paths:` filter so the workflow runs on every PR and both required contexts always report. - name fix: point branch-protection.json at the actual job names verbatim, and add a doc note that the contexts must equal the job `name:` values. - B: the `drift` job now re-renders and, on same-repo PRs, auto-commits the regenerated artifacts back to the branch (mirrors the openapi.json job in ci.yml); forks / manual runs strict-check instead. Contributors no longer run the script by hand. - D: render-codeowners.py also generates a "who owns what" path->owners + roles table spliced into docs/dev/codeowners.md between markers, so the human-readable view never drifts. Idempotent; CODEOWNERS output unchanged. - docs: correct the stale `enforce_admins: true` line (JSON and live are false). NOTE: the branch-protection.json change only takes effect after an admin runs `./scripts/apply-branch-protection.sh` (deliberate manual step, per docs/dev/branch-protection.md). Until then `main` still requires the old mismatched contexts, so this PR itself needs an admin-override merge — the last one that should be necessary. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
96dbe9dec0
|
fix(release): make Homebrew audit non-blocking + set up brew on runner (#140)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
CI / Container Entrypoint (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / Test Windows release binaries (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
Release Edge / Build edge omnigraph-windows-x86_64 (push) Has been cancelled
Release Edge / Smoke Windows installer (push) Has been cancelled
The v0.6.1 Release shipped binaries but the Homebrew tap update job died at the audit step (brew not on the ubuntu runner; exit 127), skipping the formula push so the tap stayed at 0.6.0. - Install Homebrew via Homebrew/actions/setup-homebrew so brew is available. - Make both the setup and audit steps continue-on-error: they are best-effort diagnostics (the formula is correct by construction via update-homebrew-formula.sh), so neither can skip the actual tap publish. - Drop --online from brew audit for deterministic, network-independent linting. |
||
|
|
d54bccb940
|
fix(optimize): skip blob-bearing tables to avoid Lance compaction crash (#138)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
CI / Container Entrypoint (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / Test Windows release binaries (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
Release Edge / Build edge omnigraph-windows-x86_64 (push) Has been cancelled
Release Edge / Smoke Windows installer (push) Has been cancelled
* test(optimize): pin Lance blob-column compaction failure as a surface guard
Lance compact_files mis-decodes blob-v2 columns under its forced BlobHandling::AllBinary read ("more fields in the schema than provided column indices"), failing even a pristine uniform-V2_2 multi-fragment blob table; reads use descriptor handling and are unaffected.
Guard 10 reproduces this and is self-retiring: it turns red on the Lance bump that fixes the bug, forcing LANCE_SUPPORTS_BLOB_COMPACTION to flip.
* fix(optimize): skip blob-bearing tables instead of crashing compaction
omnigraph optimize aborted the whole sweep when any node/edge table had a Blob property: Lance compact_files cannot decode blob-v2 columns under AllBinary (the column-index error pinned by the surface guard). Skip blob-bearing tables behind a LANCE_SUPPORTS_BLOB_COMPACTION gate and report them via TableOptimizeStats.skipped / SkipReason (surfaced in the CLI and a tracing::warn) instead of erroring, which also isolates the failure so the other tables still compact.
Reads/writes are unaffected; only fragment/space reclamation on blob tables is deferred until the upstream Lance fix. Adds a maintenance.rs regression test (validated red with the column-index symptom before the fix, green after), a concise v0.6.1 release note, and updates docs (maintenance, cli-reference, AGENTS capability matrix, invariants Known Gaps, lance.md audit, constants).
* refactor(optimize): make TableOptimizeStats and SkipReason non_exhaustive
Both are returned result types, never built by callers, so #[non_exhaustive] makes this the last field/variant addition that can break downstream literal construction and keeps future ones non-breaking (review feedback on the public-field addition). The v0.6.1 Compatibility Notes call out the source-level change.
Also drops the now-stale "RED today / GREEN after the fix lands" narration in the optimize_skips_blob_table_and_reports_skip test (historical regression context now that the fix is in this branch), and folds in the expanded v0.6.1 release note.
* chore(release): bump workspace to v0.6.1
Coherent version bump to accompany the v0.6.1 release note: all five crate manifests + path-dependency constraints, Cargo.lock, the AGENTS.md surveyed-version line, and openapi.json info.version move 0.6.0 -> 0.6.1. Matches the established release pattern (#118 landed the v0.6.0 note + bump together) and resolves the Codex/Devin review flag that a v0.6.1 note without a bump leaves CARGO_PKG_VERSION reporting 0.6.0 and mixed package versions.
|
||
|
|
3c2b1b8051
|
Stored-query registry foundation + config/CLI RFC-002 (#128)
* MR-969: add stored-query registry config surface
Introduce the `queries:` block in omnigraph.yaml — an inline
`name -> entry` map of stored queries, per-graph
(`graphs.<id>.queries`) and top-level for single-graph mode, mirroring
how `policy` is wired in both modes. Each entry points at a `.gq` file
and carries optional MCP exposure settings (`expose`, `tool_name`),
defaulting to not-exposed.
Additive: absent `queries:` leaves current behavior unchanged.
- QueryEntry { file, mcp: McpSettings { expose, tool_name } }
- `queries` field on TargetConfig + OmnigraphConfig (serde default)
- query_entries() / target_query_entries() accessors
- resolve_query_file() — base_dir-relative `.gq` path resolution
- round-trip + absent-block tests
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Add stored-query registry loader and GraphHandle wiring
Add a `queries` module: QueryRegistry loads each declared `.gq` entry,
parses it, and selects the query whose symbol matches the manifest key,
asserting the two agree (key == `query <name>` symbol). Identity is the
query name; a key/symbol mismatch is a load-time error. Errors are
collected, not fail-fast, so a bad registry surfaces every broken entry
at once. Schema type-checking is deliberately left to a separate pass so
the loader stays callable without an open engine.
Thread an `Option<Arc<QueryRegistry>>` through GraphHandle alongside the
per-graph policy; the URI-canonicalizing clone propagates it. Production
openers default to None for now — the boot path loads and attaches the
registry in a later change.
- QueryRegistry::{from_specs, load, lookup, iter}; StoredQuery::is_mutation
- GraphHandle.queries field, propagated on canonical clone
- registry unit tests: identity match/mismatch, multi-query selection,
per-entry parse errors, error collection, mutation classification
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: add RFC-002 config & CLI architecture
Layered config (user-global ~/.config/omnigraph/ + per-project), a
unifying `target` abstraction resolving to (locus, graph, sub-state,
credential) with embedded-URI XOR remote-server loci, multi-server ×
multi-graph client targeting, credentials by-reference, and the
file-naming decision: project and server config are one artifact
(`omnigraph.yaml`); the only differently-named file is the user-global
`config.yaml`, split by scope not role. Includes the 12-factor bind
portability rule (prefer --bind/OMNIGRAPH_BIND over a committed
server.bind) and the defined-locally / invoked-remotely model for
stored queries. Derived from first principles working backwards from
what the engine enables; validated against kube/Helix/git/compose.
Linked from docs/dev/index.md. Proposed; phased rollout for the
MR-973/974/981 family.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Add check() to validate stored queries against the live schema
A pure check(registry, catalog) that type-checks every stored query via
the same typecheck_query_decl the engine runs for inline queries — no
parallel implementation. Failures are collected, not fail-fast, so an
operator sees every broken query (e.g. a type/property a migration
renamed or removed) in one pass. Breakages are fatal (the boot path will
refuse to start); warnings are advisory.
Pure over (registry, catalog) so it is callable both at boot (engine
catalog) and offline from the CLI without an open engine.
Advisory lint: an mcp.expose:true query that declares a Vector(N)
parameter warns — an LLM cannot supply a raw embedding vector; such a
query should take a String parameter and embed server-side. Warns
rather than rejects, since service-to-service callers may pass vectors.
- CheckReport { breakages, warnings }; has_breakages / is_clean
- tests: valid query, unknown type, unknown property, collect-not-fail-fast,
vector-param-exposed warns, unexposed silent
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Drop internal plan-label refs from stored-query config comments
Doc comments referenced sequencing labels ("C2") that mean nothing to a
reader; reword to describe the behavior directly. Comment-only.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: reconcile aliases with the role model in RFC-002
Place the existing client-only `aliases:` block in the client/server
role split: aliases are client-role (CLI, embedded, ungated) and may
live in both user-global and project config; `queries:` is server-role
(deployment manifest only). They overlap as "name -> .gq"; `queries:` is
the superset, and the end-state subsumes aliases (definition -> queries,
target/branch/format -> client invocation context, positional args ->
CLI sugar). v1 keeps aliases unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: make RFC-002 config global-first, project-optional
The global user config is the primary, self-sufficient default; the
CLI works from any directory with no project file (the kubectl/aws/gh
posture), a deliberate flip from today's project-anchored behavior.
The project omnigraph.yaml becomes an optional repo-scoped override and
the deployment manifest. Uniform schema, both layers optional; global
can hold any section including a personal server's graphs/queries.
Additive: project still overrides global; the flip adds a fallback
layer below the project file rather than removing it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: justify XDG ~/.config/omnigraph over legacy ~/.omnigraph in RFC-002
Make the rationale explicit: XDG-first because OmniGraph is a client
that will cache remote catalogs and keep session state alongside
secrets, and XDG separates config / cache / state into distinct dirs
(clear cache without touching creds; backups skip cache) whereas a
single ~/.omnigraph/ mixes them. Honor ~/.omnigraph/ as a fallback for
the peer-group (aws/kube/docker/helix) expectation. Add XDG_CACHE_HOME
/ XDG_STATE_HOME to the override precedence.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: build RFC-002 credentials on the existing env-file mechanism
OmniGraph already has credentials-by-reference: bearer_token_env names
the env var, and auth.env_file is a git-ignored dotenv the CLI
auto-loads (real env vars win), resolved via resolve_remote_bearer_token.
The RFC's proposed credentials.yaml + token_env were redundant parallel
inventions. Reconcile: reuse bearer_token_env (extend to
servers.<name>) and auth.env_file (add a global ~/.config/omnigraph/.env
layered under the project .env.omni); OS keychain is an additive future
resolver. No new credentials.yaml. Updated summary, non-goals,
background, file-naming, credentials, example, login, migration, rollout.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: use single ~/.omnigraph dir (Helix-style), not XDG, in RFC-002
Reverse the earlier XDG-first call. The prior argument rested on a false
dichotomy (single-dir => mixed config/cache/state); in fact the peer
tools (aws, kube, helix) achieve separation via SUBDIRECTORIES inside
one ~/.tool/ dir (~/.aws/sso/cache/, ~/.kube/cache/), getting cache
hygiene AND one discoverable place. So everything goes under
~/.omnigraph/: config.yaml, credentials (dotenv, 0600), cache/, state/.
Lower cognitive load, matches what DB/cloud-CLI users expect, matches
Helix. OMNIGRAPH_HOME overrides; $XDG_CONFIG_HOME optionally honored but
~/.omnigraph/ is canonical. Updated all paths, the rationale paragraph,
the file-naming table (added a cache/state row), and env precedence.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: reconcile RFC-002 with shipped/planned CLI tickets
Align with reality found in existing tickets:
- Noun is graph/graphs, not target/targets (MR-603 done renamed the
config key targets->graphs, flag --graph). Use graphs:/--graph; an
entry is embedded (uri) XOR remote (server + remote graph name).
- ~/.omnigraph/ confirmed by MR-581 (og template pull, done) which
already quick-starts templates there.
- Templates already exist (MR-581/MR-531) — not invented here.
- The init family is already specced (init, quickstart MR-973, serve
MR-970, prune MR-972, mcp install MR-974, agent-mode MR-981); this
RFC only adds the user route (~/.omnigraph/config.yaml + login).
- aliases: -> operations: planned (MR-839).
- bearer_token_env gap tracked in MR-971.
- query lint/check already exist (MR-639) — registry validator must not
collide with the singular `query check`.
Add a Reconciliation section; fix the canonical example to graphs:/--graph.
Also: merge semantics refined (deep-merge settings, replace named
entries, replace lists, config view --resolved --show-origin).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: correct stale-ticket claims and fold init/bootstrap design into RFC-002
Verify against code, not ticket statuses (MR-581 is marked done but is
stale/unbuilt): no ~/.omnigraph usage, no template/serve/quickstart/
prune/login commands exist; config still uses aliases: (no operations:).
So ~/.omnigraph/ stands on peer-convention merits alone, and templates
are a design question, not a foothold. Add §7.5: the three-tier init
model (user route = login + ~/.omnigraph/config.yaml; thin project init;
fat quickstart + templates) with first-principles positions (split
init/login, in-place refuse-if-exists, interactive vs --auto/agent-mode,
--template flag, secrets-on-scaffold gitignore rule). This RFC owns only
the user route; the rest are sibling tickets (MR-973/970/972/974/981).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs: breadboard + slice Shape A in RFC-002
Add the implementation breadboard (places P1-P5, affordances N1-N14 with
NEW markers, mermaid) and five vertical slices for the selected config/
CLI/init shape: V1 global layer + merge engine + config view; V2 remote
graphs + HTTP-client path + credential resolution; V3 omnigraph login;
V4 init-hardening + quickstart + templates (rides MR-970); V5 agent-mode
(MR-981). Rollout reordered to the slice sequence; spikes X1-X4 gate
their owning slice. V1-V2 close the substantive client->server gap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Add InvokeQuery Cedar action (coarse, graph-scoped)
A per-graph, branch-scoped action that gates invoking a server-side
stored query by name. Coarse for now: an `invoke_query` allow rule
permits any stored query on the graph; a future, additive refinement
adds an optional per-query-name scope without changing rules written
against the coarse action. Enforcement is at the HTTP boundary; the
engine `_as` writers still enforce read/change per the query body, so a
stored mutation is double-gated (invoke_query to reach the tool, change
for the write). No call site yet — the invocation handler wires it in a
later change (same pattern as Admin/GraphList added ahead of consumers).
- variant + as_str/resource_kind(Graph)/FromStr/uses_branch_scope
- Cedar schema: invoke_query appliesTo Graph
- tests: per-graph allow/deny, branch-scope accepted
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Load and type-check stored queries at server boot, refusing breakage
At startup the server now loads each graph's stored-query registry,
type-checks every query against that graph's live schema, and refuses to
boot if any query references a type/property the schema doesn't have
(same posture as bad policy YAML) — so schema drift surfaces at the
deploy boundary, not silently at invocation. Non-blocking warnings are
logged. The validated registry is attached to the GraphHandle (the two
production sites previously held `queries: None`).
Loading (parse + key==symbol identity) happens at settings-build time
where the config is in scope; the schema type-check happens after each
engine opens (single mode in `open_single_with_queries`, multi mode in
`open_single_graph`). `open_with_bearer_tokens_and_policy` delegates
with an empty registry so its 18 test callers are unchanged; the public
`new_*` constructors are unchanged (only the private build path threads
the registry).
- ServerConfigMode::Single / GraphStartupConfig carry the loaded registry
- boot tests: valid registry boots; type-broken query refuses boot + names it
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Add `omnigraph queries validate` and `queries list` CLI
`queries validate` type-checks the stored-query registry against the
live schema offline — it opens the selected graph, runs the same
check() the server runs at boot, prints breakages/warnings (human or
--json), and exits non-zero on any breakage — so an operator can catch
a query broken by a schema change without restarting the server.
`queries list` prints each registered query's name, MCP exposure, and
typed params.
Named `validate` (not `check`) to avoid overlap with the existing
`omnigraph lint` — `query check`/`query lint` are already deprecated
argv-shims to `lint`. Registry entries resolve like the server: a named
graph uses its per-graph `queries:`; otherwise the top-level one.
- Queries subcommand group; reuses QueryRegistry::load + check from
omnigraph-server; local-only (needs the schema), mirrors lint
- tests: clean registry exits 0, broken query exits non-zero + names it,
list shows the query and its typed params
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Route registry selection through one shared query_entries_for
The "which queries: block applies for graph X" rule existed twice — the
server boot path and the CLI's registry_entries — and had already drifted:
the CLI carried an unreachable unwrap_or_else fallback the server lacked.
Add OmnigraphConfig::query_entries_for(graph: Option<&str>) as the single
definition (named graph -> its per-graph block; otherwise top-level) and
route all three sites through it: server single mode, server multi-graph
loop, and the CLI. The CLI's dead fallback arm is deleted; CLI and server
now resolve identically by construction.
No behavior change. Extends the config round-trip test to pin the selector,
including the unknown-name -> top-level fallback the deleted CLI arm covered.
* Funnel registry validation through one validate_and_attach gate
The check -> refuse-on-breakage -> log-warnings -> empty->None block was
copy-pasted across both open paths (single mode and the multi-graph
per-graph open), differing only by the graph label. A third opener could
attach a registry that was never schema-checked.
Extract validate_and_attach(queries, catalog, label) -> Option<Arc<..>> as
the single gate both paths call, so attaching an unchecked registry is no
longer expressible. The catalog handle is an owned Arc, so calling it
before the multi-mode policy match (which rebinds db) is borrow-clean.
No behavior change. Adds a direct unit test of the helper (empty / clean /
breakage incl. the graph label in the message) — covering the multi-graph
path's logic, which previously had no boot-refusal coverage.
* Resolve param types structurally in the MCP vector lint
The exposed-query advisory detected vector params with
type_name.starts_with("Vector(") — a second copy of the compiler's own
ScalarType::from_str_name vector parsing that could drift from it.
Key the lint off PropType::from_param_type_name + ScalarType::Vector(_)
instead, the one canonical resolver the type system already uses. Any
future param-suppliability lint now reads the structured type rather than
scanning the surface string.
Behavior-preserving: the grammar forbids list-of-vector params
(list_type = "[" base_type "]", and base_type excludes Vector), so the only
input where the structured and string checks could differ is unparseable.
Adds a guard test that an exposed String param does not false-trigger the
warning.
* Refuse duplicate MCP tool names across exposed stored queries
The effective MCP tool name (explicit tool_name, else the query name) is a
second identity namespace beside the registry key, but nothing enforced it
unique — two exposed queries could claim one catalog key, and each consumer
re-derived the name ad hoc.
Add StoredQuery::effective_tool_name() as the one definition, and a
load-time uniqueness pass in from_specs over exposed queries: a collision is
a collected LoadError naming the loser and the winner. Scoped to exposed
queries (unexposed have no MCP tool); deterministic over the BTreeMap so the
first-declared wins and the error order is stable.
New (rare) refusal: a config with colliding exposed tool names now fails
`omnigraph queries validate` offline and refuses server boot, the same
posture as a malformed registry. Release-note-worthy.
Test-first: duplicate_exposed_tool_name_is_a_load_error (red before the
pass, green after) + a CLI offline test; the unexposed sibling pins the
exposed-only scope; effective_tool_name asserts folded into the load test.
* docs: document the queries registry, CLI, and invoke_query action
The stored-query surface shipped without user docs. Add it, per the same-PR
maintenance contract:
- policy.md: invoke_query as per-graph action #10 (branch-scoped), with the
double-gating note; renumber graph_list; add it to the branch_scope list.
- cli-reference.md: the `queries validate | list` command, and the
`queries:` config block (per-graph + top-level) with mcp.expose/tool_name
and the tool-name uniqueness rule.
- server.md: boot-time stored-query type-check (refuse on breakage), noting
invocation over HTTP/MCP is not yet exposed.
* Add POST /queries/{name} stored-query invocation handler
Invoke a curated server-side stored query by name: source + name come from
the per-graph queries: registry, the client sends only runtime inputs
(params, branch, snapshot). Gated by the invoke_query Cedar action at the
boundary; the handler delegates to the existing run_query/run_mutate, whose
inner Read/Change enforce still runs — so a stored mutation is double-gated
(invoke_query to reach the tool, change for the write).
- InvokeStoredQueryRequest + an untagged InvokeStoredQueryResponse
{ Read(ReadOutput), Change(ChangeOutput) } → one Json<_> return type and a
oneOf 200 schema (a correct contract, not a wrong-but-simple one).
- Route lives in per_graph_protected → single-mode /queries/{name} and
multi-mode /graphs/{id}/queries/{name} for free.
- Deny == unknown: an invoke_query denial and a missing query both return the
same 404, so the catalog can't be probed by an unauthorized caller.
- OpenAPI regenerated; tests cover read, mutation double-gate (403 vs 200),
bad-param 400, and the identical-404 deny path.
Completes the MR-969 V1 invocation slice (registry + /queries/{name} + invoke_query).
* docs: stored-query invocation endpoint; flip the not-yet-exposed caveat
Now that POST /queries/{name} ships (C7), document it: add the endpoint to
server.md's inventory + an invocation section (body, untagged read/mutate
envelope, invoke_query gate, double-gated mutations, deny == 404), and flip
the startup note that said invocation was not yet exposed. In policy.md,
replace "no invocation call site yet" on the invoke_query action with a
pointer to the endpoint.
* Scope the stored-query 404-hiding claim to non-invoke_query callers
Review found the deny==404 catalog-hiding was overstated as a contract: it
holds only at the outer invoke_query gate. A caller that HOLDS invoke_query
but lacks read/change gets the inner gate's 403 for an existing query vs 404
for an unknown one — so existence is visible to grant-holders by design (the
intended double-gate). The handler docstring, OpenAPI 404 description, and
server.md all claimed the 404 was airtight against any denied actor.
Correct the wording in all three (no behavior change) and add the missing
symmetric test (invoke_query but no read -> 403 for an existing query, 404
for unknown) so the actual contract is pinned. Also document that in
default-deny mode (tokens, no policy) every invocation 404s until an
invoke_query rule is configured.
Nits: the from_specs collision comment said "first declared wins" but it is
lexicographically-first by name (BTreeMap); the effective_tool_name docstring
overclaimed the CLI display routes through it (it resolves the rule on its
own output DTO).
* Default mcp.expose to true (the manifest entry is the opt-in)
expose controls MCP-catalog membership only — it is not an authorization
gate (invocation is gated by invoke_query regardless). So requiring a
per-query mcp.expose: true was friction with no safety benefit: a
non-exposed query is still HTTP-invocable by name. Flip the default so
declaring a query in the manifest exposes it to the agent tool catalog by
default; expose: false is the escape hatch for service-only queries.
Both the absent-mcp path (Default impl) and the present-but-no-expose path
(serde default fn) now yield true. Doc comments + cli-reference updated; the
config round-trip test asserts the new default.
* Add GET /queries stored-query catalog endpoint
List a graph's mcp.expose stored queries as a typed tool catalog so a client
(the MCP server) can register them as tools without fetching .gq source.
Each entry carries name, MCP tool_name, description/instruction, a
read/mutate flag, and decomposed typed params (kind enum: string|bool|int|
bigint|float|date|datetime|blob|vector|list, plus item_kind for lists and
vector_dim) — so the consumer builds an input schema with a closed match and
never re-parses omnigraph type spelling. I64/U64 are bigint (string on the
wire): a JSON number loses precision past 2^53 and the engine already accepts
decimal strings.
Read-gated (works in default-deny; the catalog is graph-wide, authorized
against main). NOT Cedar-filtered per query yet — a reader can list a query
whose invoke_query they lack (documented gap until per-query authz lands);
invocation stays invoke_query-gated + deny==404.
- api: QueriesCatalogOutput / QueryCatalogEntry / ParamDescriptor / ParamKind
+ query_catalog_entry (reuses PropType::from_param_type_name; scalar_kind is
exhaustive, so a new ScalarType is a compile error here until catalogued).
- GET /queries route in per_graph_protected (→ /graphs/{id}/queries in multi
mode); OpenAPI regenerated; path allowlists updated.
- Tests: projection unit (every kind, list, vector, nullable, mutation,
empty) + handler (exposed-only filter, read-gate probe-oracle, empty
registry).
* docs: GET /queries stored-query catalog endpoint
Document the catalog: the endpoint table row (GET /queries, read-gated), a
catalog section (typed-param kind enum, bigint/date/datetime/blob-as-string,
graph-wide/branch-independent, mcp.expose default true, the read-gated
probe-oracle gap), and flip the startup note now that the catalog ships.
* Collect file-I/O and parse errors in QueryRegistry::load in one pass
load() early-returned on any unreadable .gq file, masking parse / identity /
tool-name-collision errors in the OTHER (readable) files — so an operator
fixed the missing file, restarted, and only then saw the next broken query.
Now it collects I/O errors but still runs from_specs on the readable specs
and returns the union, so every broken entry surfaces at once (matching the
collected-errors contract the rest of the registry already follows).
Safe: from_specs' tool-name collision check runs over loaded queries only, so
dropping an I/O-failed entry can only under-report a collision, never invent
one. I/O errors are ordered first (BTreeMap key order), then spec errors.
Adds a load-level test (tempdir: a valid, a missing, and a parse-broken .gq)
asserting all three surface in one Err — confirmed red before the fix.
* Make invoke_query graph-scoped (one branch authority)
invoke_query gates reaching the curated stored-query surface — a graph-level
capability. Per-branch/snapshot access is already enforced by the inner
read/change gate in run_query/run_mutate (authorized against the resolved
branch), so branch-scoping the outer gate was redundant AND wrong for snapshot
reads (it defaulted to main). Drop the branch dimension: remove InvokeQuery
from uses_branch_scope (it joins admin as graph-scoped) and authorize the
boundary gate with branch: None.
Lossless: an actor confined to branch X by their read/change rules can still
only invoke a stored query that touches X. A rule that sets branch_scope on
invoke_query is now rejected by validate() — write invoke_query in its own
rule.
Ripple (atomic): restructure the server invoke fixture so invoke_query sits in
its own branch_scope-free rule; invert invoke_query_is_branch_scoped ->
invoke_query_rejects_branch_scope; the per-graph authorize test uses
branch: None; docs (policy.md, server.md, the InvokeQuery doc). No wire/OpenAPI
change.
* Resolve graph config by identity, not server mode
Which policy/queries block applies for a graph was decided three different,
mode-dependent ways: single-mode boot used top-level even for a named graph;
multi-mode used per-graph (and silently ignored a top-level queries block); the
CLI used per-graph for a named target. So `queries validate --target prod`
could check a different registry than the single-mode server loaded, and a
named graph's per-graph policy/queries were silently shadowed.
Make config a function of graph IDENTITY: a graph served by NAME
(--target/server.graph, a graphs: entry) uses its own graphs.<name>.{policy,
queries}; a bare URI is anonymous and uses top-level. One rule, applied by
single-mode boot, multi-mode boot, and the CLI — so they can't diverge and the
CLI predicts the server exactly.
No silent ignore: serving a named graph while a top-level policy/queries block
is populated now refuses boot, naming the block (the multi-mode top-level-policy
bail, extended to queries and to single-mode-named). The CLI's `queries
validate` derives the schema URI and the registry from ONE selection, and a
positional URI forces anonymous (ignoring cli.graph) so the two can't come from
different graphs.
BREAKING (released behavior): single mode by name (--target/server.graph) with
top-level policy/queries previously used top-level; it now uses the per-graph
block and refuses boot if top-level is also populated. Bare-URI single mode is
unchanged. Loud, with migration text pointing at graphs.<name>.
- config: resolve_policy_file_for (policy sibling of query_entries_for, no
top-level fallback) + populated_top_level_blocks for the coherence check.
- characterization tests (single-mode named -> per-graph; named + top-level ->
bail; multi-mode top-level queries -> bail; CLI positional-URI -> top-level).
- docs: policy.md, server.md, cli-reference.md.
* docs: RFC-002 credentials keyed by server name (keychain/profile/env)
Reworks the RFC's credentials model: secrets are keyed by server name — OS
keychain `omnigraph:<server>` (preferred) -> a `[<server>]` profile in
`~/.omnigraph/credentials` -> `OMNIGRAPH_TOKEN[_<SERVER>]` env (CI), the
AWS/gh/kube model. `servers.<name>` is endpoint-only by default but may carry
an explicit, secret-free `auth: { token: { env|file|command|keychain } }`
source. The shipped `bearer_token_env` + `.env.omni` dotenv remain a legacy
compat path; no `credentials.yaml`.
* docs: RFC-002 — typed graph locator (storage/server/graph_id), not a uri string
Add §1.1: the resolved graph address is a typed GraphLocator
(Embedded{storage} | Remote{server, graph_id}), not a flat uri: String.
Diagnoses the string model's cost in the code today (~16 is_remote_uri forks,
TargetConfig can't express multi-server x multi-graph, the CLI bails on remote,
the ts SDK models baseUrl+graphId separately) and settles the YAML naming so
the key names the locus:
- storage: (embedded) — shipped uri: is a deprecated alias
- server: + graph_id: (remote) — graph_id defaults to the entry key
- storage xor server, reject both/neither (no silent ambiguity)
Kills the graphs:/graph: collision and the uri:-might-be-a-server ambiguity.
Updates the §1/§8 examples and the entry-shape notes to the new naming.
* Test: queries list must reject an unknown --target
queries list opens no graph URI, so unknown-graph validation does not ride
along on resolve_target_uri the way it does for every other command. The new
test reproduces the gap: with an unknown --target the command currently exits 0
and prints the (empty) top-level registry instead of erroring like the
URI-resolving commands do. Fails against current code; the fix follows.
* Validate the graph selection in queries list
Graph-existence validation was a side effect of URI resolution: every
URI-resolving command rejects an unknown --target via resolve_target_uri, but
queries list opens no URI, so query_entries_for(Some(unknown)) silently fell
back to the top-level registry and showed the wrong (or empty) catalog.
Make membership a property of the selection: add the fallible
resolve_graph_selection alongside the infallible query_entries_for (a known
name passes through, an unknown name errors with the same message as
resolve_target_uri, None stays anonymous), and validate the selection in
execute_queries_list. query_entries_for is unchanged — server boot's bare-URI
path still needs its None -> top-level arm.
* Surface policy-engine errors from stored-query invoke
The invoke handler mapped every authorize_request failure to 404 ('stored
query not found'), which collapsed the authorization decision (deny -> 403)
together with operational failures (no actor -> 401, Cedar evaluation error ->
500). A real policy-engine 500 was hidden as a missing query.
Separate the two concerns instead of sniffing the masked status. Extract
authorize() returning an Authz { Allowed, Denied(msg) } decision and reserve
Err for operational failures only; authorize_request becomes a thin wrapper
that maps Denied -> 403, so the 16 deny-as-403 callers are unchanged. The
invoke handler now matches the decision directly: a denial stays 404 (deny ==
missing, so the catalog can't be probed without the grant), while a 401/500
propagates with its true status.
500 is now a reachable outcome on POST /queries/{name}; document it in the
endpoint responses and regenerate openapi.json.
* Extract the named-graph/top-level coherence rule into one helper
The rule 'a named graph uses its own graphs.<name> block, so a populated
top-level block is a config error' lived inline in single-mode server boot.
Extract it to OmnigraphConfig::ensure_top_level_blocks_honored so the same
definition can be shared by the CLI selection gate (next commit) and the two
can't drift. Boot calls the helper; the message is reworded context-neutral
(drops 'serving') so it reads correctly from both boot and the CLI.
Behavior-preserving: multi-graph mode keeps its own unconditional check, and
single_mode_named_graph_rejects_top_level_blocks still passes.
* Test: queries validate/list must reject a named graph with a top-level block
Server boot refuses a config where a graph is selected by name yet a top-level
queries:/policy.file block is populated (the block would be silently ignored).
The CLI's queries validate/list resolve the same named selection but skip that
coherence check, so they give a false green / list the per-graph block. The new
test reproduces it: validate prints OK and list succeeds where boot would
refuse. Fails against current code; the fix follows.
* Enforce top-level coherence in the single CLI selection gate
queries validate validated graph membership only as a side effect of URI
resolution and queries list only via resolve_graph_selection's membership
check; neither applied the named-graph/top-level coherence rule server boot
enforces, so both gave a false green on a config boot refuses.
Fold ensure_top_level_blocks_honored into resolve_graph_selection so it is the
single gate that returns only valid + server-coherent selections, and route
resolve_selected_graph (queries validate) through it; queries list already
calls the gate. A named graph with a populated top-level block now errors in
both commands, matching boot. A positional URI stays anonymous (top-level
honored), so queries_validate_positional_uri_ignores_default_graph is
unaffected.
* docs: RFC-003 — MCP server surface for omnigraph-server
Detailed MCP-transport design for the stored-query/MCP work, building on the
shipped #128 registry. Corrects the draft against the branch head: the coarse
invoke_query gate + 404 denial-masking are already wired (server_invoke_query),
so per-query invoke_query scope (PolicyRequest has no query-name dimension yet)
is the real prerequisite; positions the doc as superseding rfc-001's MCP
transport (/mcp/tools+/mcp/invoke) and reconciles the shipped mcp.expose YAML
form and the schema-introspection non-goal; grounds the parity surface in the
actual omnigraph-ts package (13 tools with read/change ids, 2 resources).
* docs(config): clarify graph config boundaries
* fix(config): enforce graph-scoped policies and query validation
* fix(cli): require graph selection for scoped query registries
* fix(server): preserve named graph id in single mode policy
* fix(cli): share graph identity for policy resolution
* test(cli): cover policy tooling server graph selection
* fix(cli): honor server graph for policy tooling
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
fab105bcce
|
fix(release): generate audit-clean Homebrew formula (#134)
The generated formula failed `brew audit --strict` with 5 problems: `version` declared after `license`, and `url`/`sha256` placed directly inside `on_macos`/`on_linux` (forbidden by FormulaAudit/ComponentsOrder). Order `version` before `license`, hoist `head`/`livecheck` above the platform blocks, and nest `url`/`sha256` in `on_arm`/`on_intel`. Add a `brew audit --strict --online` gate to the release workflow so a malformed formula can never be published again. Verified clean against v0.6.0. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
353c0c876a
|
fix(branch): make branch delete correct under partial failure (#137)
* test(lance): pin force_delete_branch surface guard
Pin the Lance 6.0.1 force_delete_branch behavior the branch-delete
single-authority redesign relies on: plain delete_branch errors on a
missing ref, force_delete_branch removes an existing forked branch, and
the local-store quirk where force_delete on a fully-absent branch still
errors (worked around by the upcoming TableStore::force_delete_branch).
Re-pin the docs/dev/lance.md alignment stanza (9 guards; 4 runtime).
* feat(storage): add force branch-delete to TableStore + CommitGraph
Add TableStore::force_delete_branch and CommitGraph::force_delete_branch
(idempotent: tolerate an already-absent branch via Lance RefNotFound /
NotFound), plus CommitGraph::list_branches for the cleanup reconciler to
diff against the manifest authority. RefConflict (referencing
descendants) is still surfaced. Unused until the branch-delete rewire.
* test(maintenance): red — cleanup reconciles orphaned branch forks
Forge a Lance branch on the Person table that the manifest never
references (a zombie fork from an incomplete prior delete) and assert
cleanup reclaims it while leaving main intact. Fails today: cleanup does
not yet reconcile orphaned forks. Goes green with the next commit.
* fix(maintenance): reconcile orphaned branch forks in cleanup
Add reconcile_orphaned_branches: force_delete_branch every per-table and
commit-graph Lance branch absent from the manifest branch set (the
authority), children-before-parents. Folded into cleanup_all_tables,
runs before version GC. Idempotent and authority-derived; no-ops once
nothing is orphaned, and would harmlessly find nothing if a future Lance
atomic multi-dataset branch op prevented orphans. Adds TableStore::list_branches
and exposes graph_commits_uri(pub crate). Turns the maintenance red test green.
* test(failpoints): red — branch_delete partial failure converges
Add the branch_delete.before_table_cleanup failpoint hook (inert without
the feature) and a regression test: a cleanup-step failure after the
manifest authority flip must leave branch_delete returning Ok, the branch
gone, the orphan stranded, then reclaimed by cleanup, and the name
reusable. Fails today: cleanup_deleted_branch_tables propagates the error
as a hard failure. Goes green with the next commit.
* fix(branch): best-effort fork reclaim after the manifest flip
Make branch_delete treat per-table forks and the commit-graph branch as
derived state reclaimed best-effort with force_delete_branch after the
manifest authority flip. A reclaim failure (transient error, or the
branch_delete.before_table_cleanup failpoint) is logged via tracing::warn
and swallowed: the branch is already gone and the cleanup reconciler
converges the orphan. cleanup_deleted_branch_tables no longer returns an
error or blocks the call. Turns the partial-failure recovery test green.
* test(failpoints): red — recreate over orphaned fork is actionable
After a partial-failure delete leaves a fork orphaned, recreating the
branch name and writing to the previously-forked table before cleanup
runs currently surfaces the opaque ExpectedVersionMismatch ("stale view
... expected manifest table version N"). Assert instead a clear error
pointing the user at cleanup. Goes green with the next commit.
* fix(branch): actionable orphan-collision error in fork_branch_from_state
When a fork's create_branch collides with an existing target ref, reuse
it only if its head matches source_version (a legitimate concurrent
first-write). A version mismatch means a zombie fork from an incomplete
prior delete: return a manifest_conflict pointing the user at
`omnigraph cleanup`, instead of the opaque ExpectedVersionMismatch.
Turns the recreate-over-orphan red test green.
* docs(invariants): single-authority branch-lifecycle + Lance forward-compat
Record branch delete in the Current Truth Matrix: manifest is the single
authority flipped atomically first, per-table forks + commit-graph branch
are derived state reclaimed best-effort with the cleanup reconciler as
backstop, and reusing a name whose reclaim failed surfaces an actionable
error. Note the reconciler is authority-derived and degrades to a no-op
under a future Lance atomic multi-dataset branch op, the same shape as
invariant 7.
* test(failpoints): red — cleanup isolates a single-table failure
Add the cleanup.table_gc failpoint hook (inert without the feature) and
an error: Option<String> field on TableCleanupStats (mechanical, always
None for now). Regression test: a one-shot version-GC failure for one
table must not abort the whole cleanup — assert cleanup still succeeds,
surfaces the failure per-table in stats, and the independent reconcile
pass still reclaimed an orphan. Fails today: the version-GC collect
aborts on the first table error. Goes green with the next commit.
* fix(maintenance): fault-isolate cleanup per table
Make the cleanup sweep do as much as it can and converge on re-run
instead of aborting wholesale on one table's transient error
(invariant 13). The version-GC loop now records a per-table failure on
its stats row (error: Some) and logs it rather than collecting into a
Result that aborts; reconcile_orphaned_branches isolates per-table and
commit-graph failures into BranchReconcileStats.failures. The CLI reports
any failed tables and tells the user to rerun cleanup. Addresses the
Devin review finding. Turns the single-table-failure test green.
* test(failpoints): red — branch_create heals commit-graph zombie + is atomic
Add the branch_delete.before_commit_graph_reclaim failpoint hook and two
regression tests: (a) recreating a name whose delete left a commit-graph
zombie must succeed (today it dies on Lance's internal Clone error), and
(b) branch_create must roll back the manifest branch when the derived
commit-graph branch fails (today it leaves the manifest branch created
while returning Err). Both fail now; green with the next commit. The
existing branch_create_failpoint_triggers test still passes.
* fix(branch): make branch_create atomic + heal commit-graph zombie
branch_create now flips the manifest authority first, then creates the
derived commit-graph branch in create_commit_graph_branch, force-dropping
any orphaned commit-graph ref left by an incomplete prior delete (the
manifest branch is fresh, so a same-named commit-graph branch is provably
a zombie). If commit-graph creation fails, the manifest branch is rolled
back so the name never half-exists. Addresses the Codex review finding.
Turns the two branch_create red tests green; existing tests unaffected.
* test(failpoints): red — fork collision misclassifies live concurrent fork
Add the fork.before_classify failpoint hook and a concurrency test: when
a concurrent first-write legitimately wins the fork race, the loser must
get a retryable refresh-and-retry, not the misleading run-cleanup orphan
error. Today the version-comparison misclassifies the live fork as an
orphan (the Cursor finding). Goes green with the next commit.
* fix(branch): manifest-arbitrated fork-collision classification
Classify a fork collision by the manifest authority instead of comparing
Lance branch versions. Before forking, open_owned_dataset_for_branch_write
re-reads the live manifest: if the table is already forked on the active
branch, a concurrent first-write won and the loser gets a retryable
refresh-and-retry (not a misleading orphan error). fork_branch_from_state
no longer guesses from versions — a create collision past that check is
an orphan, so it returns the actionable cleanup error. Addresses the
Cursor finding; turns the live-concurrent-fork test green, zombie path
unchanged.
* test(failpoints): close branch-lifecycle test gaps
Three coverage additions for the branch-delete work (behavior already
correct; these lock it in and catch regressions):
- cleanup_isolates_reconcile_failure: inject a force-delete failure into
the reconcile loop (new cleanup.reconcile_fork hook) and assert the
sweep continues + converges on re-run. Directly covers the reconcile
loop the Devin finding was about (previously only version-GC was).
- cleanup_reclaims_orphaned_commit_graph_branch: forge a commit-graph
orphan via the delete reclaim failpoint and assert cleanup's
reconcile_commit_graph_orphans drops it (previously untested).
- fork_collision_with_live_concurrent_fork_is_retryable: replace the
fixed 300ms sleep with a deterministic readiness signal (cfg_callback +
compare_exchange atomics) so the two-writer ordering can't flake.
Full failpoints suite 31/0.
|
||
|
|
e94e7d124a
|
fix(bootstrap): pin RustFS to beta.3 + allow insecure default creds (#136)
`local-rustfs-bootstrap.sh` defaulted RUSTFS_IMAGE to the floating `rustfs/rustfs:latest`, which resolved to 1.0.0-beta.4 (2026-05-21). beta.4 added a credentials-policy check that refuses to start when the access/secret keys are values it treats as "default" (rustfsadmin/rustfsadmin, the script's defaults) — so a fresh bootstrap broke at RustFS startup. Pin the default to 1.0.0-beta.3 to match CI (.github/workflows/ci.yml) and the v0.5.0 release notes, and additionally pass RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true so the script stays forward-compatible if RUSTFS_IMAGE is overridden to beta.4+. Co-authored-by: Ragnor Comerford <ragnor@equator.so> |
||
|
|
2d5c4b1202
|
docs: rename runs.md/runs.rs → writes and repoint all references (#131)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
CI / Container Entrypoint (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / Test Windows release binaries (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
Release Edge / Build edge omnigraph-windows-x86_64 (push) Has been cancelled
Release Edge / Smoke Windows installer (push) Has been cancelled
The Run state machine was removed in MR-771 (v0.4.0); `docs/dev/runs.md` and `crates/omnigraph/tests/runs.rs` have since documented and tested the direct-publish write path, so the "runs" name was misleading. - git mv docs/dev/runs.md → docs/dev/writes.md (reframe H1 + intro; keep MR-771 history note) - git mv crates/omnigraph/tests/runs.rs → tests/writes.rs (reframe header) - repoint every runs.md / runs.rs reference across docs, AGENTS.md, and source comments - fix four pre-existing broken `docs/runs.md` links (the file never lived at that path) to `docs/dev/writes.md` - fix the stale v0.4.0 anchor to the live section No behavior change: every source edit is a comment. Engine builds and the renamed test passes 25/25; scripts/check-agents-md.sh passes. The run-removal cleanup itself (run_registry.rs guard, __run__ prefix) is deferred to MR-770. |
||
|
|
854ad0afcb
|
feat(server): compose OMNIGRAPH_TARGET_URI with OMNIGRAPH_CONFIG in entrypoint (#129)
The container entrypoint's URI and config branches were mutually exclusive, so a deployment driven by OMNIGRAPH_TARGET_URI could never load a policy file. Forward --config alongside the positional URI when OMNIGRAPH_CONFIG is also set (the URI still wins via resolve_target_uri), enabling Cedar policy without changing how the URI is provided. Add docker/entrypoint_test.sh (arg-composition cases) + a CI job, and document the env-var contract in docs/user/deployment.md. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
8eba37cc60
|
README: link to TypeScript SDK and MCP server (#92)
Adds a `Clients` section pointing at the TypeScript SDK and the Model Context Protocol server for programmatic / LLM access to omnigraph-server. Both are published on npm and developed in ModernRelay/omnigraph-ts; this README is the canonical discovery surface so it should mention them. Notes the major.minor lockstep convention so users know which client version targets which server. |
||
|
|
24413844ae
|
Add Windows release binaries (#127)
* Add Windows release binaries * Fix Windows installer downloads |
||
|
|
6c9afc7e9b
|
Fix typos and enhance README content | ||
|
|
d8451c3d33
|
docs(readme): add header labels to hero tables (#126)
Fill the empty second-column header cell in both tables: "What it means" for the AS CODE table, "What it's for" for Core Use Cases. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
c221e67e1b
|
docs(readme): restructure hero with bullets + tables (#125)
* docs(readme): restructure hero with bullets + tables Establish a reading rhythm: infra facts as quick bullets, then an AS CODE table and a Core Use Cases table that unpack each item. - Feature block → bullets (drop the redundant <br> left on list items). - AS CODE block → table; "Context as code" reframed around pre-defined, versioned queries. - Use cases → table with one-line unpacking; "Code & dev graph" → "Dev graph" (issue + dependency model for AI agents); "Context graph" reframed to decision traces + tribal knowledge; "R&D data layer" to experimental data written into branches; wiki line to "agent-updatable knowledge base". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(readme): refine hero copy and lift thesis line - Promote "operational state & coordination layer for agents" to sit directly under the tagline. - Tighten AS CODE and use-case descriptions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(readme): make AS CODE rows consistent Apply the AS CODE motif to all four rows (Security, Dashboards) to match Schema and Context. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
6e948de985
|
Update README.md (#124) | ||
|
|
baeb4387df
|
docs(readme): hard-break the manifesto lines with <br> (#123)
The X-as-code and feature blocks used bare single newlines, which collapse onto one line in renderers that treat soft breaks as spaces. Append <br> to each line so they break everywhere without converting the flat manifesto into bullet lists. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
4e431d28b8
|
docs: rewrite README opening + add AGENTS.md dev commands (#122)
* docs(agents): add build/test/lint dev-command section
AGENTS.md (CLAUDE.md) covered architecture and invariants but had no
developer command surface — only runtime `omnigraph` CLI usage. Add a
concise "Build, test, lint" section with the non-obvious gotchas:
- crate dir `crates/omnigraph` is package `omnigraph-engine` (the `-p` name)
- canonical CI gate is `cargo test --workspace --locked`
- how to run one file / one fn
- feature-gated suites (`failpoints`, server `aws`)
- S3 tests skip without `OMNIGRAPH_S3_TEST_BUCKET`
- the two non-test CI checks (check-agents-md, OpenAPI drift)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(readme): rewrite opening, dedupe, fix stale references
- New manifesto-style opening (tagline, X-as-code, features, core
use cases, coordination-layer line); drop the old prose intro,
Use Cases, and Capabilities sections.
- Remove Capabilities, which restated the new opening line-for-line.
- Harmonize heading case: "## Core Use Cases".
- Dedupe the verbatim Slack invite (kept the Community section) and
the double-linked cli.md (kept the contextual pointer).
- Fix stale references that no longer match the code:
- drop "transactional runs" / "and runs" — no run concept remains;
writes are atomic per-query, multi-query workflows use branches.
- update the CLI crate command list to canonical query/mutate plus
commit/lint/optimize/cleanup.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
50910b3753
|
docs: align release artifact docs | ||
|
|
1a4d2cee97
|
feat: inline query strings in CLI and HTTP server (#110)
* feat(MR-656): inline query strings in CLI and HTTP server
CLI:
- Add -e / --query-string <STRING> to omnigraph read and omnigraph change
- Exactly one of --query, --query-string, --alias is required (3-way XOR)
- Empty --query-string is rejected with a clear error
HTTP:
- New POST /query (read-only, clean field names: query/name/params/branch/snapshot)
- Mutations on /query are rejected with 400 -- use POST /change instead
- ChangeRequest fields polished: query (alias query_source), name (alias query_name)
- POST /read and POST /change remain byte-compatible for existing clients
Tests:
- cli.rs: -e happy-path on read/change, mutex error vs --query, empty -e rejected
- system_local.rs: inline -e read and -e change exercise the local flow
- system_remote.rs: inline -e read/change over HTTP plus direct /query 200/400
- server.rs: /query 200, /query 400 on mutation, /change legacy field alias
- openapi.rs: new /query path, QueryRequest schema, ChangeRequest field-name polish
Docs: cli.md (-e examples), cli-reference.md (read/change rows), server.md (/query)
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* feat(MR-656): rename read/change to query/mutate with deprecation signals
HTTP server:
- Add POST /mutate as canonical write endpoint (pairs with POST /query).
- Mark POST /read and POST /change as deprecated. Three-channel signal:
* OpenAPI: `deprecated: true` on the operation (every codegen flags
the generated SDK method).
* RFC 9745: response `Deprecation: true` header on every response.
* RFC 8288: response `Link: </successor>; rel="successor-version"`
pointing at /query and /mutate respectively.
- Share business logic across /mutate and /change via run_mutate(); the
/change wrapper is the only place that adds the deprecation headers.
- ChangeRequest field aliases (query_source/query_name) preserved.
- AliasCommand serde now accepts `query`/`mutate` alongside `read`/`change`.
CLI:
- Promote `omnigraph query` / `omnigraph mutate` to top-level canonical
subcommands (clap visible_alias keeps `omnigraph read` / `omnigraph
change` working forever).
- Promote `omnigraph lint` / `omnigraph check` to top-level (was nested
under `omnigraph query lint`, which is now a deprecated argv shim that
rewrites to the canonical form).
- Argv-level preprocessing prints a one-line deprecation warning to
stderr when any legacy spelling is used. Canonical names are silent.
Tests:
- Server: /mutate works, /change emits Deprecation+Link headers, /read
emits Deprecation+Link headers, /query carries no deprecation signal.
- OpenAPI: /read and /change flagged deprecated; /query and /mutate not.
- CLI: canonical `lint` matches deprecated `query lint` / `query check`
output; `read` / `change` print deprecation warnings.
Docs:
- cli.md: new canonical examples; "Deprecated names" migration table.
- cli-reference.md: top-level table updated; aliases.<name>.command
accepts both legacy and canonical spellings.
- server.md: endpoint inventory shows /query and /mutate as canonical
and /read and /change as deprecated; dedicated section explains the
three-channel deprecation signal.
- og-cheet-sheet.md: use new `omnigraph lint` / `omnigraph check`.
- openapi.json regenerated.
Migration is purely cosmetic — every deprecated form continues to work
indefinitely; only the spelling changes.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* fix(MR-656): address Devin Review findings on /query and /change
Two issues raised by Devin Review on PR #110:
1. `POST /query` mutation-rejection error pointed at the deprecated
`/change` endpoint instead of the canonical `/mutate`. Fixed in
three places: the runtime error message in `server_query`, the
utoipa 400-response description, and the handler doc comment. The
`QueryRequest` schema docstrings in `api.rs` got the same update so
the openapi.json bodies match. Server and openapi tests updated.
2. `execute_change_remote` serialized `ChangeRequest` directly, which
emits the new canonical field names `query` / `name` on the wire.
`#[serde(alias = "query_source")]` only affects deserialization, so
a newer CLI talking to an older server would have its `/change`
POST body fail with "missing field: query_source". Fixed by
extracting a `legacy_change_request_body` helper that hand-rolls
the JSON with the legacy keys (`query_source` / `query_name`), the
same byte-stable contract `execute_read_remote` already uses
against `/read`. Added two unit tests on the helper to lock the
wire shape in.
Co-Authored-By: Ragnor Comerford <ragnor.comerford@gmail.com>
* docs(dev): RFC 001 — inline + stored queries, envelope, MCP
Tracked artifact consolidating the design across MR-656 (this branch),
MR-976 (Phase 1 envelope hardening parent, with MR-977/978/979/980
sub-issues), and MR-969 (stored queries + MCP).
Sections:
* Two paths, one engine — inline `/query` + `/mutate` (this PR) coexist
with stored `/queries/{name}` (MR-969). Same `run_query` / `run_mutate`
backend (the fold-in landed in the previous commit).
* Request envelope ("before") — Idempotency-Key, If-Match, X-Deadline,
X-Trace-Id, expect, dry_run, fields. Phase 1 ships the load-bearing
subset on `/mutate`.
* Response envelope ("after") — audit_id, snapshot_id, commit_id, stats,
warnings. Closes the provenance loop today's `ChangeOutput` leaves
open.
* `.gq` pragmas — `@description`, `@returns`, `@mcp`. Source-of-truth
for the stored-query agent contract; no separate YAML registry.
* Multi-graph MCP — per-graph `/graphs/{id}/mcp/tools` + `/mcp/invoke`.
Token binds to one graph by default; cross-graph agents loop.
* Cedar split — `read`/`change` for inline, `invoke_query` for stored.
Operators deny ad-hoc for agent groups while keeping curated tool
list open.
* Rejected alternatives — per-env override files, compiled bundles,
tool-name prefixing across graphs, body-field graph dispatch.
Index entry added under "Active Implementation Plans" so future agents
land on the RFC before touching queries / mutations / envelope code.
`scripts/check-agents-md.sh` clean (35 links, 34 docs).
* docs(server): clarify why run_query lacks AppState parameter
run_mutate takes state for workload admission; run_query doesn't because
reads aren't admission-gated today. Mark the asymmetry as intentional and
flag the two future events that would grow the signature: Phase 1's
`expect: { max_rows_scanned: N }` budget (MR-976) or per-actor admission
extending to stored-read invocations (MR-969). Prevents the natural
"make these symmetrical" follow-up.
* refactor(server): run_query / run_mutate take &ResolvedActor
Replace `Option<Extension<ResolvedActor>>` in the helpers with
`Option<&ResolvedActor>`. Saves MR-969's stored-query handler from
wrapping a bare actor in axum's `Extension(...)` before calling.
Handler signatures (`server_query`, `server_read`, `server_mutate`,
`server_change`) keep `Option<Extension<ResolvedActor>>` because that
is what axum injects, and unwrap at the call site with
`actor.as_ref().map(|Extension(actor)| actor)`.
Net: -13/+10 LOC, 89/0 server tests pass.
* docs(releases): v0.6.0 — describe inline + canonical-named queries (MR-656)
Extend the v0.6.0 release notes to cover the third piece of work landing
alongside the graph terminology rename and multi-graph server mode:
canonical-named `POST /query` and `POST /mutate` endpoints, the CLI's
new `-e/--query-string` flag, the top-level promotion of `lint` /
`check`, and the three-channel deprecation signal on `/read` and
`/change` (OpenAPI `deprecated: true` + RFC 9745 + RFC 8288).
Additions:
* Top blurb: "Two pieces" -> "Three pieces" with a bullet describing
the rename + inline flow.
* Breaking Changes: new "Query / mutation rename" subsection covering
the `ChangeRequest` field rename (with the back-compat serde aliases
and the CLI's `legacy_change_request_body` byte-stable wire helper)
and the `omnigraph query lint` -> `omnigraph lint` move.
* New: 5 bullets — the two endpoints, the CLI subcommands, the `-e`
flag, the deprecation signal channels, the widened `aliases.<name>.command`
vocabulary.
* User Impact: one bullet making explicit that the rename is cosmetic
on the client side and migration is voluntary.
* Documentation: pointers to the updated `server.md` / `cli.md` /
`cli-reference.md` and the new `docs/dev/rfc-001-queries-envelope-mcp.md`.
+15/-1 lines. `./scripts/check-agents-md.sh` clean.
* refactor(cli): demote `check` from visible_alias to deprecation shim
`omnigraph check` was a clap `visible_alias` on `lint`, advertised in
`--help` as an equivalent canonical name. Per MR-981 §6 (long-form
flags as canonical, short forms as visible aliases), visible aliases
on subcommand names hurt agent CX: agents emit either spelling
depending on training-data drift, and there's no length signal
pointing at the canonical name.
Changes:
* Remove `#[command(visible_alias = "check")]` from the `Lint` variant.
`omnigraph --help` now shows only `lint`.
* Add bare `check` to `rewrite_deprecated_argv` so `omnigraph check
<args>` still works — it rewrites to `omnigraph lint <args>` and
emits a one-line stderr deprecation warning, matching the existing
pattern for `read` / `change` / `query lint` / `query check`.
* Fix the nested `query check` shim to substitute `check` -> `lint` in
the rewritten argv (previously it relied on `check` being a
visible_alias to reach the `Lint` variant).
* New test `deprecated_check_top_level_rewrites_to_lint` covers: bare
`check` produces identical stdout to `lint`, emits the deprecation
warning, and `check` does NOT appear as an alias in `omnigraph
--help`.
* Release notes updated to reflect the deprecation-shim treatment and
cross-reference MR-981 §6 reasoning.
Cargo / Go users typing `check` still work indefinitely; one stderr
nudge per invocation teaches the canonical name. Agents see only
`lint` in `--help --json` so they emit one canonical form.
67/0 omnigraph-cli tests pass; 39 workspace test suites green.
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Ragnor Comerford <ragnor.comerford@gmail.com>
Co-authored-by: Ragnor Comerford <hello@ragnor.co>
|
||
|
|
e0f13b32c5
|
(feat): multi-graph server mode (#119)
* mr-668: add GraphId newtype + Cloud-mode forward identity stubs (PR 1/10)
PR 1 of the MR-668 multi-graph server work. Pure types, no runtime
behavior changes yet.
Ships the validated identity vocabulary that the rest of the implementation
will consume:
- `GraphId(String)` — `^[a-zA-Z0-9-]{1,64}$`, leading underscore rejected
(engine reserves every `_*` filename), reserved route names rejected
(`policies`, `healthz`, `openapi`, `openapi.json`, `graphs`). Validation
lives in `try_from` only; serde `Deserialize` re-runs it so JSON payloads
cannot bypass.
- `TenantId(String)` — same regex shape as GraphId. `None` in Cluster
mode; reserved for Cloud mode (RFC 0003) where it carries the OAuth
`org_id` claim.
- `GraphKey { tenant_id: Option<TenantId>, graph_id }` — the registry
HashMap key. `cluster()` constructor for the Cluster-mode default.
- `Scope` enum with `Full` variant — Cluster mode default; RFC 0004 will
extend with OAuth scopes (`graph:read`/`write`/`admin`/`*`).
- `AuthSource` enum with `Static` variant — Cluster mode default; RFC
0001 step 1 will add `Oidc`.
- `ResolvedActor { actor_id, tenant_id, scopes, source }` — replaces the
upcoming refactor of `AuthenticatedActor(Arc<str>)` in PR 4a.
Per MR-668 design decision 13: ship the Cloud-mode forward type shapes
now (no `TokenVerifier` trait yet — that's RFC 0001 step 1) so handler
signatures stay stable across the Cluster → Cloud trajectory. `Scope`
and `AuthSource` use `#[non_exhaustive]` so future variants don't break
caller matches.
Tests: 26 new (15 graph_id + 11 identity), all passing. No regression
in the existing 36 server library tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: Omnigraph::init error-path cleanup + three failpoints (PR 2a/10)
PR 2a of the MR-668 multi-graph server work. Bug fix: a partially-failed
`Omnigraph::init` previously left orphan schema files at the graph URI,
making the URI unusable for a retry (the next `init` would refuse because
`_schema.pg` already exists).
Changes:
1. `init_with_storage` now wraps the I/O phase. On any error from
`init_storage_phase`, calls `best_effort_cleanup_init_artifacts` to
remove the three schema files before returning the original error:
- `_schema.pg`
- `_schema.ir.json`
- `__schema_state.json`
Cleanup is best-effort: a failure to delete is logged via
`tracing::warn` but does NOT mask the init error.
2. Three failpoints added at the init phase boundaries:
- `init.after_schema_pg_written`
- `init.after_schema_contract_written`
- `init.after_coordinator_init`
3. Four new failpoint tests in `tests/failpoints.rs` pin the cleanup
behavior at each boundary plus the "original error wins over cleanup
error" contract. All 23 failpoint tests pass.
Coverage gap (documented in code comments):
Lance per-type datasets and `__manifest/` directory created by
`GraphCoordinator::init` are NOT cleaned up after a coordinator-init-phase
failure. Recursive directory deletion requires `StorageAdapter::delete_prefix`,
which was deferred along with `DELETE /graphs/{id}` (originally PR 2b). When
that primitive lands, the third failpoint test can be tightened to assert
the graph root is fully empty.
Tests: 4 new (init_failpoint_*), all 23 failpoint tests green. No
regression in the 105 engine library tests or 64 end_to_end tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: add GraphHandle + GraphRegistry data structure (PR 3/10)
PR 3 of the MR-668 multi-graph server work. Pure data structure — no
routing changes yet (that's PR 4a).
New file: `crates/omnigraph-server/src/registry.rs`
- `GraphHandle { key: GraphKey, uri: String, engine: Arc<Omnigraph>,
policy: Option<Arc<PolicyEngine>> }` — the per-graph state that the
routing middleware (PR 4a) will inject as a request extension.
- `RegistrySnapshot { graphs: HashMap<GraphKey, Arc<GraphHandle>> }` —
immutable snapshot; replaced atomically via `ArcSwap`.
- `GraphRegistry { snapshot: ArcSwap<_>, mutate: Mutex<()> }` — lock-free
reads, mutex-serialized mutations.
- `RegistryLookup { Ready(Arc<GraphHandle>) | Gone }` — two-valued, no
`Tombstoned` variant since DELETE is deferred in v0.7.0 scope.
- `InsertError { DuplicateKey | DuplicateUri }` — both rejection cases
for create-graph (maps to HTTP 409 in PR 7).
- Methods: `new`, `from_handles` (bulk startup-time init), `get`, `list`,
`len`, `insert`.
Race semantics pinned by three multi-thread tests:
- `concurrent_insert_same_key_exactly_one_succeeds` — N=8 spawned
inserts with the same key; exactly 1 returns Ok, 7 return DuplicateKey.
- `concurrent_insert_distinct_keys_all_succeed` — N=8 spawned inserts
with distinct keys; all succeed.
- `concurrent_reads_during_inserts_see_consistent_snapshots` — reader
loop concurrent with sequential writes; every listed handle's key
resolves via `get()` (no torn state).
Why no tombstones field: `DELETE /graphs/{id}` is deferred to bound
the scope of v0.7.0. Without a delete endpoint, there's no use for
tombstones — every key in the registry is `Ready`, and every key
not in the registry is `Gone`. When DELETE lands later, the
`Tombstoned` variant + `tombstones: HashSet<GraphKey>` slot in
additively without breaking caller signatures (the `Gone` variant
remains the "not currently active" case).
Why `tokio::sync::Mutex`: insert is async because PR 7's flow holds
this mutex across the atomic YAML rewrite step (file I/O). std::Mutex
would footgun across .await.
Dependency additions: `arc-swap = { workspace = true }`,
`thiserror = { workspace = true }` (used by InsertError).
Tests: 12 new (12 passing). 74 server lib tests total green
(62 from PR 1 + 12 new). Clippy clean on server crate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: router restructure + handler refactor for multi-graph (PR 4a/10)
PR 4a of the MR-668 multi-graph server work. The heaviest single PR —
rewires every handler to extract `Arc<GraphHandle>` from a routing
middleware, replaces `AuthenticatedActor(Arc<str>)` with `ResolvedActor`
everywhere, and adds the `ServerMode` discriminator.
Behavior changes:
- **Single mode** (legacy `omnigraph-server <URI>`): flat routes
(`/snapshot`, `/read`, `/branches`, …) continue to work exactly as
v0.6.0. Internally, the registry holds a single handle keyed by the
sentinel `SINGLE_GRAPH_KEY_ID = "default"`; routing middleware injects
that handle on every request. No HTTP-visible change.
- **Multi mode** (new): routes nest under `/graphs/{graph_id}/...`.
Routing middleware extracts the graph id from the path, looks it up
in the registry, and injects the handle. 404 if not found.
(Multi-mode startup itself lands in PR 5; this PR provides the
router-side wiring.)
AppState refactor:
- `engine: Arc<Omnigraph>` and `policy_engine: Option<Arc<PolicyEngine>>`
fields removed — both now live inside `GraphHandle` in the registry.
- `mode: ServerMode { Single { uri } | Multi { config_path } }` added.
- `registry: Arc<GraphRegistry>` added.
- `server_policy: Option<Arc<PolicyEngine>>` added (placeholder for
management endpoints in PR 6b; unused today).
- Existing constructors (`new`, `new_with_bearer_token{s,_and_policy}`,
`new_with_workload`, `open*`) build a single-mode AppState
internally and remain source-compatible. Tests that constructed
AppState via these constructors continue to work.
- `with_policy_engine` post-construction setter — rebuilds the
single-mode handle with the policy attached. Engine-layer
enforcement is NOT reinstalled (matches the old single-field
semantics; `open_with_bearer_tokens_and_policy` is the path that
installs both layers).
- `new_multi` constructor added for PR 5's startup loop.
- `uri()` now returns `Option<&str>` (Some in single, None in multi).
Routing middleware:
- `resolve_graph_handle` injects `Arc<GraphHandle>` as a request
extension. Mode-aware: single returns the only handle; multi parses
`/graphs/{graph_id}/...` from the URI. Returns 404 in multi mode
when the graph id is unregistered. Records `graph_id` on the
current tracing span.
- `require_bearer_auth` updated to insert `ResolvedActor` (was
`AuthenticatedActor`).
Handler refactor — every protected handler:
- Gains `Extension(handle): Extension<Arc<GraphHandle>>` param.
- Replaces `state.engine` → `handle.engine`.
- Replaces `state.policy_engine()` → `handle.policy.as_deref()`.
- Replaces `state.uri()` → `handle.uri.as_str()` (or `.clone()`
where String is needed).
- Replaces `Arc::clone(&state.engine)` → `Arc::clone(&handle.engine)`
(the spawn-and-clone pattern in `server_export` — proof that a
long-running export survives the registry being mutated later).
authorize_request signature:
- Was: `(state: &AppState, actor: Option<&AuthenticatedActor>, request: PolicyRequest)`.
- Now: `(actor: Option<&ResolvedActor>, policy: Option<&PolicyEngine>, request: PolicyRequest)`.
- Per-graph callers pass `handle.policy.as_deref()`. The (future PR 6b)
management endpoints will pass `state.server_policy.as_deref()`.
MR-731 invariant preserved:
- The single chokepoint `request.actor_id = actor.actor_id.as_ref().to_string()`
inside `authorize_request` still overwrites any client-supplied
actor identity. Regression test
`actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers`
at `tests/server.rs:1114-1216` passes unchanged.
Tests: 0 new (the registry race tests in PR 3 already cover the
data structure; this PR exercises them indirectly via the existing
test suite). 74 lib + 57 server integration + 60 openapi = 191 tests
green. Clippy clean.
LOC: +397 insertions, -153 deletions in `crates/omnigraph-server/src/lib.rs`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: OpenAPI multi-mode cluster filter (PR 4b/10)
PR 4b of the MR-668 multi-graph server work. In multi mode, the served
`/openapi.json` reports cluster routes (`/graphs/{graph_id}/...`) instead
of the legacy flat protected paths — matching what `build_app` actually
mounts (PR 4a's `Router::nest`). Single mode is unchanged.
Implementation:
- New `server_openapi` branch: when `state.mode()` is `Multi`, call
`nest_paths_under_cluster_prefix(&mut doc)` after `ApiDoc::openapi()`.
- The rewrite consumes `doc.paths.paths`, then for every path-item:
- If the path is in `ALWAYS_FLAT_PATHS` (`/healthz` for now), keep
it flat.
- Otherwise, prefix every operation_id with `cluster_` and reinsert
the item at `/graphs/{graph_id}<original_path>`.
- Single mode hits no extra work — the path map is untouched.
- The static `ApiDoc::openapi()` still emits the flat surface, so
in-process callers (the existing `openapi_json()` helper in tests)
see the unmodified spec.
Why cluster_ prefix on operation IDs: OpenAPI specs require unique
operation_ids across the document. With both flat (single-mode) and
cluster (multi-mode) surfaces ever co-existing in a generated SDK,
the prefix prevents collision. The current served doc only carries
one surface, so the prefix is forward-compat with potential future
dual-surface generation.
Tests: 6 new in `tests/openapi.rs`, all via the `/openapi.json` route
(not the static `ApiDoc::openapi()` helper):
- `multi_mode_openapi_lists_cluster_paths` — every protected path
appears as a cluster variant.
- `multi_mode_openapi_drops_flat_protected_paths` — flat protected
paths are absent.
- `multi_mode_openapi_keeps_healthz_flat` — `/healthz` survives.
- `multi_mode_openapi_prefixes_operation_ids_with_cluster` — every
cluster operation_id starts with `cluster_`.
- `multi_mode_operation_ids_are_unique` — no operation_id collisions.
- `single_mode_openapi_unchanged_by_cluster_filter` — single mode
still emits the legacy flat surface (regression).
New test helper `app_for_multi_mode(graph_ids)` exercises the new
`AppState::new_multi` constructor from PR 4a — first user of multi-mode
construction outside of unit tests.
Result: 66 openapi tests + 57 server integration tests + 74 lib tests
= 197 green. No regression in the existing OpenAPI drift check
(`openapi_spec_is_up_to_date` still validates the static flat surface
matches the committed openapi.json).
LOC: +67 in lib.rs (rewrite logic), +219 in tests/openapi.rs (test
suite + helper).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: multi-graph startup + mode inference (PR 5/10)
PR 5 of the MR-668 multi-graph server work. This is the first PR that
makes multi mode actually usable end-to-end: operators invoking
`omnigraph-server --config omnigraph.yaml` with a non-empty `graphs:`
map and no single-mode selector now get a running multi-graph server.
Mode inference (MR-668 decision 2, four-rule matrix in
`load_server_settings`):
1. CLI `<URI>` positional → Single
2. CLI `--target <name>` → Single (URI from graphs.<name>)
3. `server.graph` in config → Single (URI from graphs.<name>)
4. `--config` + non-empty `graphs:` + no single-mode selector
→ Multi (all entries in `graphs:`)
5. otherwise → error with migration hint
Rule 5's error message names every escape hatch so operators can fix
their invocation without grepping docs.
Config schema extensions:
- `TargetConfig.policy: PolicySettings` (per-graph Cedar policy file).
`#[serde(default)]` so existing single-graph YAMLs keep parsing.
- `ServerDefaults.policy: PolicySettings` (server-level Cedar policy
for management endpoints — loaded in PR 5, wired into `GET /graphs`
in PR 6b).
- `OmnigraphConfig::resolve_target_policy_file(name)` and
`resolve_server_policy_file()` helpers — both resolve relative to
the config file's `base_dir`.
Public types added to `omnigraph-server`:
- `ServerConfigMode { Single { uri, policy_file } | Multi { graphs,
config_path, server_policy_file } }`.
- `GraphStartupConfig { graph_id, uri, policy_file }` — one entry
per graph in multi mode.
`ServerConfig` shape change:
- WAS: `{ uri: String, bind, policy_file, allow_unauthenticated }`.
- NOW: `{ mode: ServerConfigMode, bind, allow_unauthenticated }`.
- Breaking for any code that constructs `ServerConfig` directly.
`main.rs` is unaffected (uses `load_server_settings`).
`serve()` now forks on `ServerConfig.mode`:
- Single: existing flow via `AppState::open_with_bearer_tokens_and_policy`.
- Multi: parallel open via `futures::stream::iter(graphs)
.map(open_single_graph).buffer_unordered(4).collect()`. Bound 4 is
a rule-of-thumb for I/O-bound work — at N≤10 this trades startup
latency for a small amount of concurrent S3/Lance open pressure.
Fail-fast: first open error aborts startup; in-flight opens drop
their engine via Arc (Lance datasets close cleanly).
New helper `open_single_graph(GraphStartupConfig)`:
- Validates `GraphId` per the regex in PR 1.
- `Omnigraph::open(uri).await` with descriptive error context.
- Loads per-graph policy file and re-applies it via
`Omnigraph::with_policy` (engine-layer enforcement, MR-722).
- Returns `Arc<GraphHandle>` ready for the registry.
Routing middleware bug fix:
- `Router::nest("/graphs/{graph_id}", inner)` rewrites
`request.uri().path()` to the inner suffix (e.g. `/snapshot`).
The previous middleware tried to parse `{graph_id}` from
`request.uri().path()` and got 400 instead of 200. Fixed by reading
from `axum::extract::OriginalUri` request extension, which preserves
the pre-rewrite URI.
- Caught by the two new tests
`cluster_routes_dispatch_per_graph_handle` and
`cluster_route_for_unknown_graph_returns_404`.
Tests (14 new, all passing):
- Four-rule matrix: one test per branch + the joint case
`mode_inference_cli_uri_overrides_graphs_map` + the empty-graphs-map
error case.
- Per-graph + server-level policy file path resolution.
- Reserved `GraphId` rejection at startup.
- End-to-end multi-graph routing: two graphs side by side, each
cluster route hits the right engine.
- Unknown graph id under cluster prefix → 404.
- Flat routes 404 in multi mode.
Inline `ServerConfig` test (`serve_refuses_to_start_in_state_1_without_unauthenticated`)
and three `server_settings_*` tests updated to the new `mode` shape.
Result: 211 server tests green (74 lib + 71 integration + 66 openapi),
MR-731 regression test still pinned and passing.
LOC: +45 config.rs, +281 lib.rs (net), +395 tests/server.rs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: Cedar resource-model refactor (PR 6a/10)
PR 6a of the MR-668 multi-graph server work. Policy-crate-only refactor —
no HTTP handler changes, no operator-supplied policy.yaml changes. Sets
up the chassis that PR 6b's `GET /graphs` consumes.
Two new `PolicyAction` variants:
- `GraphCreate` — gates `POST /graphs` (deferred behavioral PR).
- `GraphList` — gates `GET /graphs` (lands in PR 6b).
Note: `GraphDelete` is intentionally NOT added in this PR. `DELETE
/graphs/{id}` is deferred from MR-668's v0.7.0 scope to bound complexity
(no `delete_prefix`, no tombstone, no `RegistryLookup::Tombstoned`).
Adding the Cedar action without a consumer would be the same kind of
"dead vocabulary" trap the `Admin` variant already documents.
New `PolicyResourceKind { Graph, Server }` enum, plus a
`PolicyAction::resource_kind()` method that classifies every action.
Per-graph actions (Read, Change, BranchCreate, …) bind to
`Omnigraph::Graph::"<graph_label>"`; server-scoped actions
(GraphCreate, GraphList) bind to the singleton
`Omnigraph::Server::"root"`. `Admin` stays classified as per-graph for
now — MR-724 will pick the final shape when the first consumer surface
ships.
Cedar schema string additions:
- `entity Server;`
- `action "graph_create" appliesTo { principal: Actor, resource: Server, ... }`
- `action "graph_list" appliesTo { principal: Actor, resource: Server, ... }`
Compiler updates:
- `compile_policy_source` picks the resource literal based on the
action's `resource_kind`. Existing graph-only policies generate
the same Cedar source as before — pinned by
`per_graph_rules_continue_to_work_alongside_server_rules`.
- `compile_entities` includes the `Server::"root"` entity only when
a rule references a server-scoped action. Keeps test assertions
for graph-only policies tight.
- `PolicyEngine::authorize` builds the right resource UID at
request time based on `request.action.resource_kind()`.
Validation rules added to `PolicyConfig::validate`:
- A rule may not mix server-scoped and per-graph actions (different
resource kinds need different `permit` clauses).
- Server-scoped actions cannot have `branch_scope` or
`target_branch_scope` — there's no branch context at the server
level.
Operator impact: zero. The Cedar schema `Omnigraph::Server` entity is
internally referenced by `compile_policy_source`; operator policy.yaml
files only declare actions in `rules[].allow.actions` and never
reference the resource entity directly. Decision 6's "internal rename
only; operator policies unaffected" contract is preserved and pinned
by `per_graph_rules_continue_to_work_alongside_server_rules`.
Tests: 5 new (11 policy tests total, up from 6):
- `graph_list_action_authorizes_against_server_resource`
- `graph_create_action_authorizes_against_server_resource`
- `server_scoped_rule_cannot_use_branch_scope`
- `rule_mixing_server_and_per_graph_actions_is_rejected`
- `per_graph_rules_continue_to_work_alongside_server_rules`
No regression: 145 server tests (74 lib + 71 integration) still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: GET /graphs endpoint + per-graph policy wire-up (PR 6b/10)
PR 6b of the MR-668 multi-graph server work. First management endpoint —
`GET /graphs` lists every graph registered with the server, gated by the
server-level Cedar policy from PR 6a.
New API shapes (in `omnigraph-server::api`):
- `GraphInfo { graph_id, uri }` — one entry per registered graph.
- `GraphListResponse { graphs: Vec<GraphInfo> }` — sorted alphabetically
by `graph_id` for deterministic output.
Handler `server_graphs_list`:
- Mounted at `GET /graphs` in both modes.
- Single mode: returns 405 (resource exists in the API surface, just
not operational without a `graphs:` map). 405 chosen over 404 so
clients see "resource exists, wrong context" rather than "no such
resource".
- Multi mode: requires bearer auth (when configured); Cedar-gated by
`PolicyAction::GraphList` against `Omnigraph::Server::"root"`
(PR 6a's chassis). Returns the sorted registry list.
Cedar gate composition:
- When no `server.policy.file` is configured, the MR-723 default-deny
falls through: `GraphList` is not `Read`, so an authenticated actor
without a server policy gets 403. This is the right default — don't
expose the registry until the operator explicitly authorizes it.
- When a server policy is configured, Cedar evaluates the rule. The
test `get_graphs_with_server_policy_authorizes_per_cedar` pins the
admin-allow / viewer-deny split.
Routing:
- New `management` sub-router holding `/graphs` (auth-required, no
`resolve_graph_handle` middleware — operates on the registry, not
a single graph).
- Single mode merges flat protected routes + management.
- Multi mode merges nested `/graphs/{graph_id}/...` + management.
OpenAPI:
- `server_graphs_list` registered in `ApiDoc::paths(...)`.
- `EXPECTED_PATHS` in `tests/openapi.rs` gains `/graphs`.
- `openapi.json` regenerated (auto-tracked by
`openapi_spec_is_up_to_date` in CI).
Tests: 4 new in `tests/server.rs::multi_graph_startup`:
- `get_graphs_lists_registered_graphs_in_multi_mode`
- `get_graphs_returns_405_in_single_mode`
- `get_graphs_requires_bearer_auth_when_configured`
- `get_graphs_with_server_policy_authorizes_per_cedar`
What's NOT in this PR (deferred):
- Per-graph policy enforcement is wired through `handle.policy`
(PR 4a already did this); PR 6b doesn't add new per-graph
behavior beyond making sure the server policy lookup composes
cleanly alongside it.
- `POST /graphs` (PR 7) and `DELETE /graphs/{id}` (out of scope
for v0.7.0).
- CLI `omnigraph graphs list` (PR 8 will add).
Result: 215 server tests green (74 lib + 66 openapi + 75 integration),
11 policy tests green. MR-731 spoof regression preserved across all
this work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: POST /graphs runtime create endpoint (PR 7/10)
PR 7 of the MR-668 multi-graph server work. Operators can now add a
graph to a running multi-graph server without restarting:
curl -X POST http://server/graphs \
-H "Content-Type: application/json" \
-d '{
"graph_id": "beta",
"uri": "/data/beta.omni",
"schema": { "source": "node Person { name: String @key }\n" },
"policy": { "file": "./policies/beta.yaml" }
}'
DELETE remains deferred (out of v0.7.0 scope per the trimmed plan —
no `delete_prefix`, no tombstones).
Body shape (decision 7):
- Nested `schema: { source: "..." }` (mirrors the `policy: { file }`
pattern; leaves room for future fields without breakage).
- Optional nested `policy: { file: "..." }` for per-graph Cedar.
- 32 MiB body limit (reuses `INGEST_REQUEST_BODY_LIMIT_BYTES`).
- Asymmetric with `SchemaApplyRequest` which keeps flat
`schema_source: String` — documented in api.rs.
Atomic YAML rewrite + drift detection:
- New `config::rewrite_atomic(path, new_config, expected_hash)`:
flock → re-read + hash check → serialize → write `.tmp` → fsync
→ rename → fsync parent dir. Returns the new hash for the caller
to update its in-memory baseline.
- New `config::hash_config_file(path)` — SHA-256 of the on-disk
bytes, used at startup and after each rewrite.
- New `RewriteAtomicError { Drift | Io | Serialize }` enum.
- `AppState.config_hash: Option<Arc<Mutex<[u8;32]>>>` carries the
in-memory baseline. Updated after every successful rewrite so
subsequent POSTs don't false-trigger drift.
- The mutex is `std::sync::Mutex` (brief critical section, no .await
inside). The flock itself serializes file access process-wide
AND across multiple server instances (defense in depth).
- All sync I/O runs inside `tokio::task::spawn_blocking` — flock
is sync.
Handler ordering (the load-bearing sequence):
1. Mode check: 405 in single mode.
2. Cedar authorize: `GraphCreate` against `Omnigraph::Server::"root"`.
3. Validate body: `GraphId::try_from` (regex + reserved-name), empty
schema/uri checks, per-graph policy file parse.
4. Pre-check registry for duplicate graph_id / duplicate uri (409).
5. `Omnigraph::init` the new engine.
6. Atomic YAML rewrite (drift detection inside).
7. Publish in registry (atomic re-check via `GraphRegistry::insert`).
Failure modes (documented in handler rustdoc):
- Init fails → orphan storage at `req.uri` (PR 2a cleans up schema
files; Lance datasets remain orphans until `delete_prefix` lands).
- YAML rewrite fails (drift, IO) → orphan storage; YAML unchanged.
- Registry insert fails (race) → YAML has entry but registry doesn't;
next restart opens it cleanly.
New dependency: `fs2 = "0.4"` (workspace + omnigraph-server). POSIX-only
file locking. Linux/macOS deployment supported; Windows out of scope.
Tests (10 new in `tests/server.rs::multi_graph_startup`):
- `post_graphs_creates_a_new_graph_end_to_end` — happy path, includes
YAML inspection to confirm the rewrite landed.
- `post_graphs_baseline_hash_updates_between_rewrites` — two POSTs in
a row both succeed (drift baseline updates correctly).
- `post_graphs_duplicate_graph_id_returns_409`
- `post_graphs_duplicate_uri_returns_409`
- `post_graphs_invalid_graph_id_returns_400` (reserved name)
- `post_graphs_empty_schema_source_returns_400`
- `post_graphs_returns_405_in_single_mode`
- `post_graphs_yaml_drift_detection_returns_503` — operator hand-edits
omnigraph.yaml; server refuses to clobber.
- `hash_config_file_is_deterministic_and_detects_changes`
- `rewrite_atomic_refuses_when_hash_drifts`
OpenAPI: `server_graphs_create` registered in `ApiDoc::paths(...)`;
openapi.json regenerated.
Result: 225 server tests green (74 lib + 66 openapi + 85 integration),
all MR-731 regressions still pinned.
LOC: ~580 lib.rs net (handler + helpers), ~120 config.rs (rewrite
machinery), +71 api.rs (request/response shapes), +332 tests/server.rs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: CLI omnigraph graphs list/create (PR 8/10)
PR 8 of the MR-668 multi-graph server work. CLI parity for the
v0.7.0 management surface: operators can now manage graphs from
the command line against a running multi-graph server.
omnigraph graphs list --target dev --json
omnigraph graphs create \
--target dev \
--graph-id beta \
--graph-uri /data/beta.omni \
--schema schema.pg
DELETE is intentionally absent — server-side DELETE was deferred from
v0.7.0 scope, and shipping a client subcommand for a server endpoint
that doesn't exist would be dead vocabulary. The help output, the
subcommand enum, and the test that pins it (`graphs_subcommand_help_
lists_list_and_create`) all agree.
CLI architecture (modeled on `BranchCommand`):
- New `Command::Graphs { command: GraphsCommand }` top-level variant.
- `GraphsCommand { List, Create }` enum.
- List: GET `<base>/graphs`. Stdout is `<graph_id>\t<uri>` per line,
or JSON via `--json`.
- Create: reads `--schema <path>` from local disk, inlines as
`schema: { source: <file> }` in the POST body (nested per
MR-668 decision 7). Optional `--policy-file <path>` becomes
`policy: { file: <path> }`. Returns 201 → "created graph X at Y"
or JSON via `--json`.
- Both subcommands reject local URI targets with a clear
"remote multi-graph server URL" error.
New API type imports in the CLI: `GraphCreateRequest`,
`GraphCreateResponse`, `GraphListResponse`, `GraphSchemaSpec`,
`GraphPolicySpec` — all from `omnigraph-server::api`.
Tests:
- cli.rs (4 new, non-network):
* `graphs_subcommand_help_lists_list_and_create` — pins the
deferral of `delete` (catches scope creep).
* `graphs_list_against_local_uri_errors_with_remote_only_message`
* `graphs_create_against_local_uri_errors_with_remote_only_message`
* `graphs_create_with_missing_schema_file_errors` — pins the
IO context in the schema-read error path.
- system_remote.rs (1 new, `#[ignore]` like its peers):
* `graphs_list_and_create_against_multi_graph_server` — spawns a
multi-mode server, calls `graphs list` (sees `alpha`),
`graphs create` (adds `beta`), `graphs list` again (sees both),
and confirms the new graph is reachable via its cluster route.
CLI suite: 62 tests green (58 existing + 4 new). The new ignored
end-to-end test runs locally with `cargo test --ignored`.
LOC: +159 main.rs (enum + handlers), +88 cli.rs (unit tests),
+131 system_remote.rs (integration test).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: composite e2e tests, race fix, v0.7.0 release (PR 9/10)
PR 9 — the final integration PR for MR-668 multi-graph server work.
Closes the v0.7.0 release.
Composite lifecycle tests (closes gaps flagged in PR 7's coverage
review):
- `multi_graph_lifecycle_post_query_restart_persistence` — POST a
graph, query it via cluster route, reload the config from disk
and confirm `load_server_settings` sees the rewritten YAML.
Validates the "restart resolves orphans" failure-mode story.
- `per_graph_policy_enforced_on_post_created_graph` — POST a graph
with a per-graph policy attached, then send authenticated read
and change requests. Per-graph Cedar enforcement fires correctly
on a POST-created graph (engine-layer policy reinstalled via
`Omnigraph::with_policy` inside the create flow).
- `concurrent_post_graphs_distinct_ids_all_succeed` — 4 concurrent
POSTs with distinct graph_ids all return 201. Caught a real
race in `rewrite_atomic` (see below).
Race fix — `rewrite_atomic_with_modify`:
The first composite test surfaced a real bug. The old
`rewrite_atomic(path, new_config, expected_hash)` captured the
baseline hash OUTSIDE the flock, then called rewrite_atomic which
re-acquired it inside. Under concurrent writers:
- POST A: captures baseline H0, calls rewrite_atomic.
- POST B: captures baseline H0 too (before A's update lands).
- A: acquires flock, on-disk == H0, writes H1, releases.
- A: updates baseline H0 → H1.
- B: tries to acquire flock — waits.
- B: acquires flock. On-disk is now H1. Expected (captured
before A finished) is H0. MISMATCH → spurious Drift error.
Worse: even if the timing happens to align, B's `updated` config
was constructed from BYTES read before the flock. B writes a config
that doesn't include A's new graph — silent data loss.
The fix: new `config::rewrite_atomic_with_modify(path, baseline,
modify)` takes a closure. Inside the flock + baseline mutex:
1. Read on-disk bytes, hash, compare to baseline.
2. Parse on-disk YAML.
3. Call `modify(parsed)` to produce the new config — receives
fresh on-disk state, returns the modification.
4. Serialize + write + fsync + rename + update baseline.
Everything is read-modify-write under the same critical section.
Concurrent writers serialize cleanly. Test confirmed this is no
longer a race.
The old `rewrite_atomic(path, new_config, expected_hash)` API stays
for tests that don't need the read-modify-write shape; the POST
handler switches to the new shape.
Version bump v0.6.0 → v0.7.0:
- All 5 `crates/*/Cargo.toml` (compiler, engine, policy, cli, server)
plus their inter-crate `path` dep version constraints.
- `Cargo.lock` regenerated by `cargo build --workspace`.
- `AGENTS.md` "Version surveyed" line, capability matrix HTTP-server
row updated to mention multi-graph + cluster routes + atomic YAML
rewrite.
- `openapi.json` regenerated.
Docs:
- `docs/releases/v0.7.0.md` (new) — release notes with breaking
changes, new features, deferred items (DELETE, `delete_prefix`,
actor forwarding), and the single→multi migration recipe.
- `docs/user/server.md` — substantial section additions for the
two modes, mode inference, cluster endpoint table, management
endpoints, `omnigraph.yaml` ownership contract, `POST /graphs`
body shape + status codes.
- `docs/user/cli.md` — `omnigraph graphs list/create` section,
deferred-DELETE note.
- `docs/user/policy.md` — server-scoped Cedar actions
(`graph_create`, `graph_list`), per-graph vs server-level policy
composition, example server-level policy.
Workspace test pass: 573 tests green across all crates. Zero
failures. MR-731 spoof regression still pinned and passing across
the entire 10-PR series.
This commit closes MR-668. v0.7.0 is ready for tagging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: remove POST /graphs and CLI graphs create (defer runtime graph mgmt)
The POST /graphs runtime-create endpoint shipped in PR 7/10 has three
unresolved high-severity bugs:
- flock-on-renamed-inode race: the YAML flock is taken on
omnigraph.yaml itself, then a temp file is renamed over it.
Cross-process writers end up locking different inodes — both
believing they hold exclusive access.
- duplicate-check outside the file lock: precheck runs against
the in-memory registry only; the locked closure does
config.graphs.insert(...) unconditionally. Concurrent same-id
POSTs can persist the loser in YAML while the in-memory registry
keeps the winner — they disagree after restart.
- best_effort_cleanup_init_artifacts deletes _schema.pg /
_schema.ir.json / __schema_state.json on any init failure. An
accidental re-init against an existing graph's URI destroys its
schema; subsequent open() fails at read_text(_schema.pg).
The correct fix is a Lance-style cluster catalog (reserve → init →
publish with recovery sidecars), parallel to the engine's existing
__manifest discipline. That work is out of scope for v0.7.0.
For now, disable runtime add/remove from the network and CLI surface.
Operators add graphs by editing omnigraph.yaml and restarting. The
GET /graphs read-only enumeration stays.
Removed:
- POST /graphs handler + router fragment + utoipa registration
- 13 post_graphs_* server tests + 3 composite POST tests +
multi_mode_app_with_real_config / post_graph helpers
- CLI omnigraph graphs create subcommand + its handler + cli.rs tests
- system_remote.rs combined list+create test trimmed to list-only
- YAML rewrite infra: rewrite_atomic[_with_modify], RewriteAtomicError,
staging_path, hash_config_file, AppState::config_hash field +
threading through new_multi and open_multi_graph_state
- fs2 dependency (verified absent from cargo tree)
- sha2/fs2 imports in config.rs (only the rewrite path used them)
- Cedar PolicyAction::GraphCreate variant + "graph_create" match arms
+ action def in Cedar schema + graph_create_action_authorizes_against_server_resource test
- GraphCreateRequest / GraphCreateResponse / GraphSchemaSpec /
GraphPolicySpec API types (only the POST handler / CLI imported them)
Kept:
- GET /graphs (read-only enumeration) and graph_list Cedar action
- omnigraph graphs list CLI subcommand
- All multi-graph startup, mode inference, cluster routes,
per-graph + server-level Cedar policies
- server_settings_drive_multi_graph_startup_end_to_end (the test
that covers operator-authored YAML + restart — the path that
survives)
- best_effort_cleanup_init_artifacts and the three init failpoints
(still reachable from CLI `omnigraph init`; preflight fix deferred
as a follow-up)
- GraphRegistry::insert and its concurrency tests — production
callers gone, but the method is the natural seam for the future
cluster-catalog work
Also fixed (transcript issue 4):
- ALWAYS_FLAT_PATHS now includes /graphs so multi-mode OpenAPI
advertises the management route correctly (was previously rewritten
to /graphs/{graph_id}/graphs)
- multi_mode_openapi_keeps_healthz_flat → renamed to
multi_mode_openapi_keeps_management_paths_flat, asserts both
/healthz and /graphs stay flat
- multi_mode_openapi_prefixes_operation_ids_with_cluster skips
/graphs in addition to /healthz
Doc fixes:
- docs/user/cli.md: graphs list example was --target http://...,
but --target is a config-graph-name lookup; corrected to --uri.
Removed the graphs create example.
- docs/user/server.md: dropped POST /graphs row, "omnigraph.yaml
ownership", and "POST /graphs body shape" sections. Added a
paragraph stating runtime add/remove is not exposed in v0.7.0.
- docs/user/policy.md: dropped graph_create action; reworded the
"Configuration" line to clarify that server-scoped rules (graph_list)
take neither branch_scope nor target_branch_scope.
- docs/releases/v0.7.0.md: rewrote release narrative — multi-graph
mode ships; runtime add/remove deferred.
- AGENTS.md: HTTP server bullet and capability matrix row updated to
reflect read-only GET /graphs and the operator-edit workflow.
- openapi.json regenerated; /graphs has only .get, no .post.
Diff: 17 files, +123 −1525 LOC.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: comment cleanup and policy format style
Strip "PR Na/Nb" sub-PR references throughout MR-668 surfaces — they
were useful during the 10-PR delivery sequence but rot now that the
work is in the tree. Keep the MR-668 umbrella references.
Also:
- Add explicit `when = when` and `resource_literal = resource_literal`
named args in `compile_policy_source`'s outer `format!` to match the
surrounding crate style (already explicit for `group` and `action`).
- Rename the best-effort cleanup tracing target from
"omnigraph::init" to "omnigraph::init::cleanup" so operators can
filter init-failure cleanup events separately from init's other
log lines.
No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: drop actor_id from PolicyRequest; pass actor as separate arg
The MR-731 "server-authoritative actor identity" invariant was enforced
by an in-function chokepoint (`request.actor_id = actor.actor_id...`
overwrite inside `authorize_request`). That worked but relied on every
caller passing in a `PolicyRequest` and trusting the overwrite — a
comment-enforced invariant.
Move the invariant into the type system:
* `PolicyRequest` no longer carries `actor_id`. The struct now models
what a caller wants to do, not who they are.
* `PolicyEngine::authorize(actor_id: &str, request: &PolicyRequest)`
and `validate_request(actor_id, request)` take identity as a
separate argument. The same shape `PolicyChecker::check` already had
for the engine layer.
* `authorize_request` in the HTTP layer extracts `actor_id` from the
bearer-resolved `ResolvedActor` and passes it positionally — no
overwrite step that could be skipped.
* CLI `omnigraph policy explain` updated (the only other consumer
that built a `PolicyRequest`).
Public API break for the `omnigraph-policy` crate. Worth it: handlers
can no longer accidentally populate `actor_id` from a request body
field, and external consumers are forced by the compiler to source
actor identity from a trusted path.
The MR-731 chokepoint test
`actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers`
still passes — the bearer-resolved actor is what reaches the engine.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: consolidate AppState single-mode constructors; delete with_policy_engine
The prior `with_policy_engine` constructor reused the engine `Arc`
from the existing handle (`engine: Arc::clone(&existing.engine)`)
without re-applying `Omnigraph::with_policy`. Combined with
`new_with_workload`, the documented composition pattern was
`AppState::new_with_workload(...).with_policy_engine(p)` — which
produced an `AppState` whose HTTP layer enforced Cedar but whose
underlying engine had no `PolicyChecker` installed. Any caller
reaching the engine via `state.registry().list()[i].engine` could
bypass policy entirely. The doc comment named this gap; the type
system didn't.
Make composition impossible to get wrong:
* Add `AppState::new_single(uri, db, tokens, Option<PolicyEngine>,
WorkloadController)` — canonical single-mode constructor that
takes every option together and routes through `build_single_mode`
(which applies `db.with_policy(checker)` to the engine itself).
* `new`, `new_with_bearer_token`, `new_with_bearer_tokens`,
`new_with_bearer_tokens_and_policy`, `new_with_workload` all become
thin wrappers around `new_single`.
* Delete `with_policy_engine`. There is no post-construction policy
install path any more; the single linear construction forces
HTTP-layer and engine-layer policy to install together or not at all.
Regression test `engine_layer_policy_fires_via_direct_arc_omnigraph_from_new_single`
constructs an `AppState::new_single` with a deny-all policy, pulls
the `Arc<Omnigraph>` from the registry handle (the same path an
embedded SDK consumer would take), and asserts a direct `mutate_as`
call returns `OmniError::Policy`. Pre-fix this test would have
succeeded the mutation.
Test caller in `ingest_per_actor_admission_cap_returns_429`
migrates from `.with_policy_engine(...)` to `new_single(...,
Some(policy_engine), workload)`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: derive any_per_graph_policy on RegistrySnapshot; simplify dup check
`AppState::requires_bearer_auth` walked the entire registry per
request (cloning Arcs into a `Vec`, then `.iter().any(|h| h.policy
.is_some())`) to decide whether the auth middleware should challenge.
The walk is unnecessary — the answer only changes when the registry
mutates, which is exactly the moment a new snapshot is constructed.
Move the flag onto the snapshot itself:
* `RegistrySnapshot { graphs, any_per_graph_policy: bool }`.
* `RegistrySnapshot::new(graphs)` is the only construction path —
it derives the flag from `graphs.values().any(|h| h.policy
.is_some())` so the cached value can't drift from the source data.
* `Default` delegates to `new(HashMap::new())`.
* `GraphRegistry::from_handles` and `insert` build snapshots via
`RegistrySnapshot::new(...)`.
* `GraphRegistry::snapshot_ref()` exposes the current snapshot
through an `arc_swap::Guard`; callers that need cached derived
state go through this accessor (callers that only want `graphs`
still use `list` / `get`).
`requires_bearer_auth` becomes one `ArcSwap::load` + bool read.
Also (drive-by, same file, same hunk): replace the dead
`if let Some(other) = seen_uris.get(...)` + `let _ = other;` pattern
in `from_handles` with a plain `seen_uris.contains_key(...)`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: fail-fast multi-graph startup with try_collect
The `open_multi_graph_state` doc comment claims "Fail-fast — the
first open error aborts startup; other in-flight opens are dropped"
but the code did
.buffer_unordered(4)
.collect::<Vec<_>>()
.await
.into_iter()
.collect::<Result<Vec<_>>>()?;
which drains every future in the stream before propagating the first
`Err`. With N S3-backed graphs and graph #2 failing fast, the caller
still waits for #1, #3, #4, … to either succeed or fail before
seeing the error.
Replace the four-line dance with `futures::TryStreamExt::try_collect`,
which short-circuits on the first `Err` and drops the rest. The
doc comment now matches behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: drop unused State extractor from 7 read-only handlers
After the routing-middleware refactor moved the engine into the per-graph
`GraphHandle` (extracted via `Extension<Arc<GraphHandle>>`), seven
read-only handlers — `server_snapshot`, `server_read`, `server_export`,
`server_schema_get`, `server_branch_list`, `server_commit_list`,
`server_commit_show` — kept an unused `State(_state): State<AppState>`
extractor. Drop it. Each request avoids one `FromRequestParts` clone
of `AppState`'s Arcs.
Handlers that actually use state (workload admission for write paths,
`server_policy` for management endpoints) keep theirs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: emit info! for graph routing decision
`tracing::Span::current().record("graph_id", ...)` in the routing
middleware silently no-ops here: no upstream `#[tracing::instrument]`
on the handlers declares a `graph_id` field, and `TraceLayer::new_for_http`
doesn't either. The recorded value never lands anywhere visible.
Replace with an explicit `info!(graph_id = %handle.key.graph_id,
"graph routed")` event so operators can grep logs and correlate
requests with the active graph. In single mode the value is the
sentinel `"default"`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: align GET /graphs 405 body code with HTTP status
The single-mode `GET /graphs` handler returned an `ApiError` built
via struct literal with `status: METHOD_NOT_ALLOWED, code: BadRequest`.
The body code disagreed with the HTTP status — clients deserializing
on `code` saw `bad_request`, clients deserializing on `status` saw
405. Same bug class as the earlier 503+Conflict mismatch on the
removed YAML drift path.
Close the class for this one remaining instance:
* Add `ErrorCode::MethodNotAllowed` to the API enum.
* Add `ApiError::method_not_allowed(msg)` — pairs the 405 status
with the matching code.
* Replace the struct literal in `server_graphs_list` with the
constructor.
* Regenerate `openapi.json` (adds `method_not_allowed` to the
ErrorCode schema enum).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: drop unused axum::handler::Handler import
The import landed in earlier work but no current call site uses it.
Emitted an `unused_imports` warning on every server build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: drop unused fs2 workspace dependency
`fs2 = "0.4"` lingered in [workspace.dependencies] after the
POST /graphs flock-on-rename design was pulled. `cargo tree -i fs2`
reports no consumers in the workspace and the dep is not in
Cargo.lock. Removing the declaration closes the "phantom dep" class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: AGENTS.md Cedar row no longer hardcodes action count
The "8 actions" claim drifted as soon as MR-668 added `graph_list`.
Bumping the count would just push the drift one PR forward; the
correct-by-design fix is to defer to the canonical list in
docs/user/policy.md and stop maintaining a duplicate count.
Closes the "doc hardcodes a count that drifts from the enum" class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: cfg(test)-gate GraphRegistry::insert and its mutex
`insert` and the `mutate: Mutex<()>` that serializes it had no
runtime consumer in v0.7.0 — the only insertion path at startup
is `from_handles`, and runtime add/remove is deferred until a
managed cluster catalog ships. Leaving both `pub` and live made
them a "looks like API, isn't" footgun: a future change could
build on `insert` without re-establishing the concurrency contract
with an actual consumer in scope.
Gate both together (`#[cfg(test)]` on the method, the field, and
the `tokio::sync::Mutex` import) so the race-pinning tests still
compile but production cannot reach them. When a real consumer
ships, ungate both — they're a unit. Closes the "public API with
no runtime consumer drifts toward incorrect" class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: drop vestigial PolicyEngine surface
* `validate_request` had zero callsites — pure surface for nothing.
* `deny`'s `_actor_id` and `_request` parameters were both unused
(the underscore prefix gave it away); the message is built by the
caller before `deny` ever sees the request. Trim both.
Closes the "public API that the type system can't justify" class
for the policy engine. No behavior change; every existing test
stays green because the deletions never had a runtime effect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: regression test for init re-init footgun (red)
A second `Omnigraph::init` against an existing graph URI today
destroys the existing graph's schema artifacts. `init_storage_phase`
overwrites `_schema.pg` before any preflight, and on the inner
`GraphCoordinator::init` failure that follows,
`best_effort_cleanup_init_artifacts` deletes all three schema files.
The existing Lance datasets and `__manifest/` survive but the
schema metadata is gone — unrecoverable without operator surgery.
This test exercises that path and currently fails with
"_schema.pg must not be deleted by a failed re-init", confirming
the destructive cleanup branch fires. The fix in the next commit
makes the test pass by preflighting with `storage.exists()` and
returning a typed error before any write touches disk.
Per AGENTS.md rule 12, the test commit lands just before the fix
commit so the red → green pair is visible in `git log` and a
reviewer can check out this commit alone to reproduce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: close init re-init footgun via InitOptions preflight (green)
`Omnigraph::init` is "create a new graph"; existing graphs need
an explicit overwrite. Today's behavior — silently overwrite
schema files, then on inner failure delete them via best-effort
cleanup — is destructive against an existing graph regardless of
which branch fires.
Correct-by-design fix:
* New `InitOptions { force: bool }` struct (default `force: false`).
* New `Omnigraph::init_with_options(uri, schema, options)`. The
old `Omnigraph::init(uri, schema)` is a thin shortcut that
passes `InitOptions::default()`.
* `init_with_storage` runs a `storage.exists()` preflight on the
three schema URIs BEFORE any parse, write, or coordinator call.
Any hit → typed `OmniError::AlreadyInitialized { uri }`. The
destructive code paths (the `write_text` overwrite and the
best-effort cleanup) are now unreachable in strict mode against
an existing graph.
* `force: true` skips the preflight; existing operators who
actually mean to overwrite opt in explicitly.
* CLI: `omnigraph init --force` maps to `InitOptions { force: true }`.
* HTTP: `OmniError::AlreadyInitialized` maps to 409 via
`ApiError::from_omni`. Not currently HTTP-reachable (POST /graphs
was pulled), but the wiring lands here so a future runtime
create endpoint has one canonical translation.
Closes the "init is destructive against existing state" class.
The regression test added in the previous commit
(`init_on_existing_graph_uri_does_not_destroy_existing_schema`)
turns green: the original schema files now survive a second
init attempt byte-for-byte, and the call errors cleanly with
`AlreadyInitialized`. The four existing
`init_failpoint_after_*_cleans_up_*` tests stay green — strict
mode's preflight passes on a fresh tempdir, and cleanup still
runs as before when a failpoint fires mid-write.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: split PolicyEngine::load into kind-typed loaders
Pre-fix, every caller of `PolicyEngine::load(path, graph_id)`
passed *some* `graph_id` argument — even when the policy was
server-scoped and Cedar's resolution would never touch a Graph
entity. The server-level loader at lib.rs passed the meaningless
sentinel `"server"`. A graph policy file containing a `graph_list`
rule compiled fine; a server policy file containing a `read` rule
compiled fine. Both silently no-op'd at request time because the
engine kind and the rule's resource kind disagreed.
Correct-by-design fix: replace `load` with two kind-typed loaders.
* `PolicyEngine::load_graph(path, graph_id)` — for per-graph
policy files. Rejects any rule whose action `resource_kind()`
is `Server`.
* `PolicyEngine::load_server(path)` — for server-level policy
files. Takes no `graph_id`: server-scoped actions resolve against
the singleton `Omnigraph::Server::"root"` entity, never a Graph.
Rejects any rule whose action `resource_kind()` is `Graph`.
The old `load` is hard-deleted in the same commit because every
in-tree consumer migrates here (no semver promise on the workspace
crate, no external pinners). New `PolicyEngineKind` enum types
the loader's intent; `validate_kind_alignment` is the load-time
check that closes the "wrong action, wrong file, silent no-op"
class — operators get a load-time error instead of confused-and-
silent behavior at request time.
Callsites migrated:
* server lib.rs:374 (single-mode per-graph) → load_graph
* server lib.rs:1065 (multi-mode server) → load_server
* server lib.rs:1103 (multi-mode per-graph) → load_graph
* CLI main.rs:732 (resolve_policy_engine) → load_graph
* tests/server.rs ×5 (4 graph, 1 server) → load_graph/load_server
* policy_engine_chassis.rs → load_graph
Four new in-source tests pin the contract: both rejection paths
and both positive paths.
Closes the "operator puts an action in the wrong file and the
rule silently never matches" class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: introduce GraphRouting, retire single_mode_handle
Pre-fix, `AppState` always carried `Arc<GraphRegistry>` even when
serving one graph. Single mode populated the registry with one
handle keyed by the `SINGLE_GRAPH_KEY_ID = "default"` sentinel;
`single_mode_handle` walked the registry, asserted `len == 1`,
and returned the single element with a 500-class "programmer
error" branch on mismatch. Three smells in a row — magic key,
walk-and-assert, programmer-error guard — all because the
single-mode runtime was forced through a multi-mode abstraction.
Correct-by-design fix: type the routing.
* New `pub enum GraphRouting { Single { handle }, Multi { registry,
config_path } }` on `AppState`. The `Single` arm carries the handle
directly — no registry, no key, no walk.
* `resolve_graph_handle` middleware matches on `routing`. Single mode
returns the handle in O(1); multi mode does the same path-extract +
registry lookup as before. The 500-class programmer-error branch
is gone — the type system now makes the violated invariant
("single mode has exactly one handle") unrepresentable.
* `requires_bearer_auth` reads `handle.policy.is_some()` directly
in the Single arm; Multi arm still uses the cached
`any_per_graph_policy` flag.
`ServerMode` and the legacy `registry` field on `AppState` are still
populated for now — C-3 removes both once every reader is migrated.
The `SINGLE_GRAPH_KEY_ID` sentinel and `ServerMode` will also go
away in C-3.
Closes the "single mode forced through a multi-mode abstraction"
class. All 76 server integration tests stay green: handlers still
extract `Extension<Arc<GraphHandle>>` from the request, so the
middleware's internal change is invisible to them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: remove ServerMode, registry field, and the SINGLE_GRAPH sentinel
C-1/C-2 introduced `GraphRouting` and pointed the middleware at it.
This commit removes the legacy shape that's now dead:
* `ServerMode` enum — deleted. Single mode's `uri` lives on
`handle.uri`; multi mode's `config_path` lives on the
`GraphRouting::Multi` arm.
* `AppState.mode: ServerMode` field — deleted.
* `AppState.registry: Arc<GraphRegistry>` field — deleted. Multi
mode's registry is on `GraphRouting::Multi { registry, .. }`;
single mode has no registry at all.
* `AppState::mode()`, `AppState::uri()`, `AppState::registry()`
accessors — deleted. New `AppState::routing() -> &GraphRouting`
is the single public entry point.
* `SINGLE_GRAPH_KEY_ID` constant — deleted. `GraphHandle.key` is
still required by the struct, but in single mode the key is now
only a tracing label (`"default"`, inlined with a comment naming
its sole remaining purpose). Single-mode flat routes never carry
a `{graph_id}` parameter, so the key is never compared against
user input, and there is no registry where it could be a map
key. C-1/C-2 already removed the registry walk that the sentinel
was named for.
Callers migrated:
* `build_app` (lib.rs:944) — matches on `state.routing()` instead
of `state.mode()`.
* `server_graphs_list` (lib.rs:1162) — destructures the Multi arm
to get the registry; Single arm short-circuits to 405.
* `server_openapi` (lib.rs:1217) — matches the Multi arm for the
cluster-prefix rewrite.
* `tests/server.rs:3735` — the B2 footgun regression test now
matches on `state.routing()` to extract the single-mode handle
(the test's earlier `state.registry().list().next()` shape was
the closest pre-fix analog to "embedded consumer reaches the
engine"; the new shape is more direct).
Closes the entire "single mode forced through a multi-mode
abstraction" class. After this commit:
* No magic sentinel as a routing key.
* No `single_mode_handle` walk-and-assert helper.
* No 500-class "programmer error" branch in the middleware.
* No two-field discriminant on `AppState` where one would do.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: regression test for nested-route path extraction (red)
`server_branch_delete` and `server_commit_show` use bare
`Path<String>` extractors. In single-mode flat routes
(`/branches/{branch}`, `/commits/{commit_id}`) this works — one
capture, one value. In multi-graph cluster routes
(`/graphs/{graph_id}/branches/{branch}`,
`/graphs/{graph_id}/commits/{commit_id}`) axum 0.8 propagates the
outer `{graph_id}` capture into the inner handler, so the
extractor sees two captures and 500s with
"Wrong number of path arguments. Expected 1 but got 2."
`cluster_routes_dispatch_per_graph_handle` only exercises
`/snapshot` (no Path extractor), so the regression slipped through.
This test closes that gap structurally: every cluster route with
an inner path param gets exercised here.
Currently fails with the exact symptom above. Fix in the next
commit makes it pass.
Per AGENTS.md rule 12, the red test commit lands just before the
fix so the pair is visible in `git log` and a reviewer can check
out this commit alone to reproduce.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: named-field path-param structs for nested cluster routes (green)
`Path<String>` deserializes one path-param value positionally.
Single-mode flat routes (`/branches/{branch}`,
`/commits/{commit_id}`) have one capture; multi-mode nested routes
(`/graphs/{graph_id}/branches/{branch}`,
`/graphs/{graph_id}/commits/{commit_id}`) have two — axum 0.8
propagates the outer capture into nested handlers. Same handler,
two different shapes; the multi-mode shape 500s with
"Wrong number of path arguments. Expected 1 but got 2."
Symptomatic fix: change to `Path<(String, String)>` and ignore the
first element. Breaks again the moment we add another nest layer
(e.g. tenant in Cloud mode).
Correct-by-design fix: named-field structs deserialized by name
from axum's path-param map. Each handler picks only the fields it
needs. Stable across single / multi / future-cloud nest depths
because deserialization is by field name, not position.
* New `BranchPath { branch: String }` (file-local to lib.rs)
* New `CommitPath { commit_id: String }`
* `server_branch_delete` extractor → `Path<BranchPath>`
* `server_commit_show` extractor → `Path<CommitPath>`
Closes the "handler path-extractor type is positional and breaks
when route nesting changes" class. Red test from the previous
commit turns green. All 77 server tests pass (single-mode branch
delete + commit show, plus new multi-mode coverage).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: centralize policy-requires-tokens check in the runtime classifier
Single-mode `open_with_bearer_tokens_and_policy` bailed at lib.rs:380
when policy was installed and no tokens. Multi-mode
`open_multi_graph_state` had no equivalent: the server started, every
request 401'd because no token could ever match, and the operator
spent time debugging a misconfiguration the single-mode path would
have caught at startup.
The doc/code contradiction made the gap easy to miss: the
`ServerRuntimeState::PolicyEnabled` docstring said tokens-or-not
was "unusual but valid — every request fails 401 without a bearer,
which is effectively 'locked'." The single-mode bail contradicted
that. In practice, silent-401-on-every-request is bug-shaped, not
feature-shaped (operators wanting deny-all should configure tokens
plus a deny-all Cedar rule to get meaningful 403s with
policy-decision logging).
Symptomatic fix: add a copy of the bail to multi-mode. Two copies
that can drift again the next time a startup path is added.
Correct-by-design fix: hoist the check into
`classify_server_runtime_state` so both modes get the same
enforcement from one source of truth. The classifier becomes the
single source of truth for "should we start?" and adding a startup
invariant there is now the natural extension point for any future
mode.
Classifier matrix is now complete:
| has_tokens | has_policy | allow_unauthenticated | Result |
|---|---|---|---|
| F | F | F | bail (existing) |
| F | F | T | Open (existing) |
| T | F | * | DefaultDeny (existing) |
| F | T | * | bail (NEW — closes the gap) |
| T | T | * | PolicyEnabled (existing) |
Changes:
* `classify_server_runtime_state` (lib.rs:870-890) gains the
`(false, true, _) => bail!(…)` arm with a clear message naming
the failure mode and the two valid resolutions.
* `open_with_bearer_tokens_and_policy` (lib.rs:369+) drops its
redundant local bail — the classifier rejected the invalid case
before construction was reached.
* `ServerRuntimeState::PolicyEnabled` docstring is rewritten:
drops the "(unusual but valid)" carve-out and states plainly
that PolicyEnabled requires tokens. Names the explicit
alternative (tokens + deny-all Cedar rule) for operators who
want the all-requests-denied behavior.
* `classify_policy_enabled_always_wins` test is renamed to
`classify_policy_enabled_requires_tokens` and the now-invalid
`(false, true, _)` assertion is removed (covered by the new
rejection test).
* New `classify_policy_without_tokens_is_rejected` test covers the
new arm.
* New `serve_refuses_to_start_with_policy_but_no_tokens_multi_mode`
integration test pins the multi-mode propagation path —
symmetric with the existing single-mode
`serve_refuses_to_start_in_state_1_without_unauthenticated`.
Closes the "single mode and multi mode startup branches can drift
on safety invariants" class.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: close coverage gaps surfaced by the test-coverage audit
The bot-review pass and the subsequent coverage audit surfaced two
material gaps in PR #119's test surface — both easy to close, both
worth closing before merge.
* **Gap 1 — cluster-route sweep.** The Bug-1 path-extractor
regression slipped through because
`cluster_routes_dispatch_per_graph_handle` only exercised
`/snapshot`. The other six protected cluster routes (`/read`,
`/change`, `/export`, `/schema`, `/schema/apply`, `/ingest`,
`/branches/merge`) were implicitly trusted to work without any
multi-mode integration test.
Add `all_protected_cluster_routes_resolve_to_their_handler`
(`tests/server.rs`) that hits each protected cluster route with
a minimal request and asserts the response is consistent with
the handler being reached — no 404 (router didn't match), no 500
with "Wrong number of path arguments" (Bug-1 class), no 500 with
"missing extension" (routing middleware didn't inject the handle).
Status code is a negative assertion because each handler's
happy-path inputs differ; what matters is "the request reached
the handler," not "the handler returned 200" — that's already
pinned by the single-mode tests.
* **Gap 2 — `--force` happy path.** The strict re-init regression
test (`init_on_existing_graph_uri_does_not_destroy_existing_schema`)
pins the error path; nothing pinned the `force: true` escape
hatch actually doing what its docstring claims.
Add `init_with_force_recovers_from_orphan_schema_files`
(`tests/lifecycle.rs`). Writes a bare `_schema.pg` to simulate
orphan files from a failed prior init, confirms strict mode
bails as expected, then confirms `init_with_options(force: true)`
succeeds and produces a functional graph.
Note: the test follows the documented semantics — force skips
the preflight only, it does NOT purge existing Lance state. An
earlier draft of the test (against full overwrite of an existing
populated graph) failed because `GraphCoordinator::init` errored
on the existing `__manifest`, which is exactly the limitation
the `InitOptions::force` docstring already calls out. Recursive
purge needs `StorageAdapter::delete_prefix` (tracked separately).
Coverage is now fully aligned with the PR's claims.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mr-668: regression test for GraphList open-mode bypass (red)
Cursor bot's review at commit
|
||
|
|
fd41f798b7
|
chore(codeowners): remove aaltshuler as owner | ||
|
|
972a6e047b
|
chore(codeowners): add ragnorc as engineering owner | ||
|
|
4de490c10c
|
Update README.md (#121)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
|
||
|
|
d6d92a32a0
|
Update README.md (#120) | ||
|
|
cc2412dc65
|
Rename repo terminology to graph (#118)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
|
||
|
|
587fbeabd8
|
ci(publish-crates): set User-Agent + treat "already exists" as success (#117)
Two related fixes uncovered while recovering the v0.5.0 publish. 1. crates.io API requires a User-Agent header. The `publish_if_new` skip check was doing a bare `curl -fsSL https://crates.io/api/...` which crates.io rejects with HTTP 403. With `-f` curl exits non-zero, the pipeline returns empty, the script doesn't recognize already-published crates, and we fall through to a real publish attempt. On a re-run that means cargo publish errors with "already exists on crates.io index" for crates that DID publish successfully on the previous run. Fix: send a `User-Agent: ModernRelay-omnigraph-ci (URL)` header. 2. Defense in depth: even with the UA, the API could hiccup. If the skip check misses an existing version and cargo publish errors with "already exists on crates.io index", treat as success instead of failing the whole run. This makes the workflow re-runnable after any partial publish without needing manual intervention. Both fixes are required to recover from the v0.5.0 partial publish where omnigraph-compiler@0.5.0 made it through but the run failed before omnigraph-policy / engine / server / cli — re-triggering the workflow now succeeds end-to-end. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1a9f8b1f7f
|
ci(publish-crates): include omnigraph-policy in the publish list (#116)
omnigraph-policy is a new crate this release cycle (Cedar policy
engine, MR-722). It wasn't added to the publish list when it was
created, so v0.5.0's tag-triggered publish run succeeded for
omnigraph-compiler but failed at omnigraph-engine:
failed to prepare local package for uploading
Caused by:
no matching package named `omnigraph-policy` found
location searched: crates.io index
required by package `omnigraph-engine v0.5.0`
omnigraph-policy has no internal omnigraph-* deps so it can publish
after omnigraph-compiler (either could go first). omnigraph-engine
depends on both; server on the three; cli on everything.
publish_if_new is idempotent — re-running with the v0.5.0 tag after
this lands will skip omnigraph-compiler (already published), then
publish policy + engine + server + cli.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
bb1fe57640
|
release: v0.5.0 (#115)
* gitignore: exclude docs/internal/ from publication
Mirrors the existing "Local-only working files (not for the public
repo)" pattern. Working notes filed under docs/internal/ stay on the
contributor's machine instead of cluttering the published doc tree
or tripping the AGENTS.md / docs-index cross-link check
(scripts/check-agents-md.sh enumerates every docs/*.md and requires
each one to be linked from an audience index — internal notes don't
have an audience index by definition).
Incidental to the v0.5.0 release; lands separately from the version
bump commits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: skip docs/internal/ in agents-md cross-link check
Matches the .gitignore exclusion. Mirrors the existing 'docs/releases/'
exclusion pattern: notes under docs/internal/ aren't part of the
published doc tree and don't need to be linked from an audience index.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* release: v0.5.0 — Lance 6 substrate, Cedar policy engine, schema-lint v1
Bumps the workspace from 0.4.2 to 0.5.0. Release notes at
docs/releases/v0.5.0.md.
Three user-visible pillars motivate the minor bump:
1. Lance 6.0.1 substrate (DataFusion 52→53, Arrow 57→58)
2. Engine-wide Cedar policy enforcement on every _as writer; server
defaults to deny-all; signed-token-claim-only actor identity
3. Schema-lint v1 chassis: OG-XXX-NNN codes, soft drops, and
`--allow-data-loss` (Hard mode) for destructive migrations
Plus structured DataFusion Expr filter pushdown (unblocks
CompOp::Contains via array_has), HTTP allow_data_loss parity, inline
.gq sources on CLI/HTTP, optional CORS layer, and bug fixes
(merge-insert dup-rowid, branch-merge coordinator restore on error,
blob columns in branch merge).
Sites bumped:
- 5 crate [package].version lines (omnigraph, omnigraph-cli,
omnigraph-compiler, omnigraph-policy, omnigraph-server)
- 10 internal path-dep `version = "..."` constraints across the
four manifests that depend on sister crates (engine, server, cli,
plus engine's dev-dep on the compiler)
- Cargo.lock (regenerated via cargo update --workspace)
- AGENTS.md "Version surveyed:"
- openapi.json `info.version` (regenerated via
OMNIGRAPH_UPDATE_OPENAPI=1 cargo test -p omnigraph-server --test
openapi)
Verification:
- cargo test --workspace --locked: 907/907 green
- cargo test -p omnigraph-engine --test failpoints --features
failpoints: 19/19 green
- cargo test -p omnigraph-engine --test lance_surface_guards: 3/3
- scripts/check-agents-md.sh: clean
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cb80fa40f1
|
exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113)
* exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr) Switches `execute_node_scan` from string-flattened Lance SQL pushdown (`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`). ## What this enables 1. **`CompOp::Contains` now pushes down.** `ir_filter_to_sql` returned `None` for list-contains (the comment said *"Can't pushdown list contains"*) because string SQL can't easily express it. With Expr, it lowers to DataFusion's `array_has(col, value)` builtin via the `nested_expressions` feature, and pushes down to Lance's scan layer the same way Eq/Lt/etc. do. Pinned by the new regression test `end_to_end::ir_filter_with_list_contains_pushes_down`. 2. **DataFusion 53's optimizer rules now reach our predicates.** Once the Expr lands at the Lance scanner, DF's planner runs: - `IN`-list vectorized eq kernel (DF #20528) - `PhysicalExprSimplifier` (DF #20111) - CASE WHEN x THEN y ELSE NULL shortcut (DF #20097) - Push limit into hash join (DF #20228) None of these were applicable before because the string SQL path short-circuited the optimizer. ## Scope This is one of three string-flattened pushdown sites; the other two (`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL string path for now: - The Expand pushdown still serializes through `hydrate_nodes`'s `extra_filter_sql: Option<&str>` parameter. Migrating it changes the `TableStorage` trait surface (`scan_stream(filter: Option<&str>)` → `Option<Expr>`) and the cascading call sites — out of scope for this MR. - The mutation delete predicate still goes through `Dataset::delete(&str)` in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the Lance v7 bump per issue #112) will migrate that path to `DeleteBuilder::execute_uncommitted` taking an Expr. The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql` helpers stay in place to serve the remaining string-SQL consumers (mutation predicates). They get retired when the other call sites migrate. ## Cargo Enables the `nested_expressions` feature on the `datafusion` workspace dep. Lance already pulls in `datafusion-functions-nested` transitively (it's listed in their feature set), so this just exposes the `datafusion::functions_nested::expr_fn::array_has` re-export. No transitive dep change (Cargo.lock unchanged). ## Tests - New: `ir_filter_with_list_contains_pushes_down` — pins the case that was previously impossible (`ir_filter_to_sql` returning `None`). - 906/906 workspace tests still pass. - 417/417 engine integration tests pass (was 416 + the new one). - 19/19 failpoints (recovery canary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break) The RustFS S3 Integration job started failing 2026-05-23 with all 3 tests panicking on the first PUT: HTTP error: error sending request The "Dump RustFS logs on failure" step revealed the container was dying at startup: [FATAL] Server encountered an error and is shutting down: Default root credentials are not allowed on non-loopback listeners; set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values, bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true for local development only `rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as "default" values. PR #111 passed yesterday because it ran against beta.3; today's runs against beta.4 fail at container startup. This is unrelated to PR #113's Expr-pushdown refactor — the bump just happened to hit the same week. Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The right long-term fix is one of: - Rotate the CI creds to less-default values (less coupling to RustFS's "default" set definition) - Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the error message - Use a workflow service container with controlled lifecycle Deferred — pinning is the minimal restore. Also incidentally documents *which* version we tested against, which `:latest` never did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3551e0d40e
|
chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111)
* tests: add lance_surface_guards pre-flight pins for the v6 bump
Land 8 named guards in a new test file that pin Lance API surfaces
OmniGraph relies on. Each guard turns a silent-break risk (variant
rename, struct restructure, async-flip) into a red CI bar instead of
runtime drift.
Guards (mapped to the silent-break inventory from the v6 migration plan):
Runtime (#[tokio::test]):
1. lance_error_too_much_write_contention_variant_exists — pins the
variant referenced by db/manifest/publisher.rs::map_lance_publish_error.
2. manifest_location_field_shape — pins .path/.size/.e_tag/.naming_scheme
types and ManifestLocation accessor returning &Self (the access
pattern at db/manifest/metadata.rs:84-88).
6. write_params_default_does_not_set_storage_version — confirms our
explicit V2_2 pin remains load-bearing (blob v2 requirement).
Compile-only async fns (#[allow(...)] + unimplemented!() placeholders;
never run, but cargo build --tests enforces the API shape):
3. checkout_version + restore chain — pins the recovery rollback hammer
at db/manifest/recovery.rs:505-522.
4. DatasetBuilder::from_namespace().with_branch().with_version().load()
— pins the namespace builder chain at db/manifest/namespace.rs:162-174.
5. MergeInsertBuilder fluent chain — pins the manifest CAS at
db/manifest/publisher.rs:370-391, including the return shape
(Arc<Dataset>, MergeStats).
7. compact_files(&mut ds, CompactionOptions, None) — pins
db/omnigraph/optimize.rs:107.
8. DeleteResult { new_dataset, num_deleted_rows } — pins the inline
delete result shape (MR-A will repurpose this guard to the staged
two-phase variant once Lance #6658 migration lands).
This is commit 1 of the chore/lance-6.0.1 migration. Cargo bump
follows in commit 2 (will trigger the guards under v6 if any surface
drifted).
Per the migration plan at ~/.claude/plans/shimmering-percolating-duckling.md
(written this session). Two guards from the plan deferred to follow-up:
- manifest_cas_returns_row_level_contention_variant (full publisher
race integration test — needs harness scaffolding)
- table_version_metadata_byte_compatible_with_v4 (TableVersionMetadata
is pub(crate); requires test reach extension).
Verified on v4: cargo test -p omnigraph-engine --test lance_surface_guards
passes 3/3 runtime tests; cargo build -p omnigraph-engine --tests
compiles all 5 compile-only guards clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(deps): bump Lance 4.0.0 → 6.0.1, DataFusion 52 → 53, Arrow 57 → 58
The Cargo bump itself. Source is intentionally untouched — this commit
will not compile. The compile errors are the work-list for subsequent
commits on this branch.
Lance updates: lance + 7 sub-crates 4.0.0 → 6.0.1. Transitive churn:
+ lance-tokenizer v6.0.1 (vendored tokenizer per Lance PR #6512)
+ object_store 0.13.x (Lance 6 brings it transitively; our explicit
pin stays at 0.12.5 for now — revisit in stages if diamond bites)
- tantivy* crates (replaced by lance-tokenizer)
Compile error landscape on this commit (11 errors):
• 1× E0432: `lance_index::DatasetIndexExt` import (Lance PR #6280
moved it to lance::index). Sites: table_store.rs:20,
db/manifest.rs:37 (the second site was missed by the pre-flight
inventory).
• 8× E0599: `create_index_builder` / `load_indices` missing on
`lance::Dataset` — all downstream of the DatasetIndexExt move.
Once the import is corrected on table_store.rs and db/manifest.rs,
these resolve automatically.
• 2× E0063: missing field `is_only_declared` in `DescribeTableResponse`
initializer at db/manifest/namespace.rs:221, 364. New Lance
namespace field per the v5 namespace restructure (PR #6186).
Surface guards (lance_surface_guards.rs, commit
|
||
|
|
e1b40aee0b
|
engine: opt MergeInsertBuilder into FirstSeen for Lance dup-rowid bug (MR-957) (#109)
* engine: opt MergeInsertBuilder into FirstSeen for Lance dup-rowid bug (MR-957)
Lance 4.0.x's MergeInsertBuilder rejects sequential merge_insert /
update against rows previously rewritten by merge_insert with a
spurious "Ambiguous merge inserts: multiple source rows match the
same target row on (id = ...)" error. The engine passes exactly 1
source row; Lance's `processed_row_ids: Mutex<HashSet<u64>>`
(lance-4.0.0 src/dataset/write/merge_insert.rs:2099) double-processes
the same source/target match against datasets previously rewritten
by merge_insert and errors under the default
SourceDedupeBehavior::Fail.
Two surfaces hit it:
- Load: `omnigraph load --mode merge` twice against the same @key set.
- Mutate: sequential `update T set {f:v} where x=y` on the same row.
Fix: opt both MergeInsertBuilder call sites (merge_insert_batch,
stage_merge_insert) into SourceDedupeBehavior::FirstSeen. Lance
silently skips a duplicate match instead of erroring.
Correctness-preserving for OmniGraph because source-side duplicates
are already rejected upstream of these call sites:
- Loader: enforce_unique_constraints_intra_batch (loader/mod.rs:1453)
rejects intra-batch dup @key values across all three LoadModes,
pinned by the new loader_rejects_intra_batch_duplicate_keys test.
- Mutate: MutationStaging::finalize pre-dedupes by id.
So FirstSeen only suppresses the spurious Lance behavior, never user
data.
Regression coverage:
- consistency::load_merge_repeated_against_overlapping_keys_succeeds
— load surface (was the basis of the original PR #98 report).
- runs::second_sequential_update_on_same_row_succeeds — update
surface (MR-920).
- consistency::loader_rejects_intra_batch_duplicate_keys — pins
FirstSeen's safety argument.
- consistency::load_merge_window_2_documents_upstream_lance_gap —
canary for the residual upstream Lance gap (after MR-848 removes
the eager BTREE-on-id, re-establishing the index via
ensure_indices re-triggers the bug class). Drop the FirstSeen
setter only when this canary stays green without it.
Cross-validation on the prior PR #98 branch: both use_index(false)
(PR #98's hypothesis) and FirstSeen (MR-920's hypothesis) cover both
surfaces individually. FirstSeen chosen because it has no perf cost
(use_index(false) would force full-table scans on every merge_insert).
Supersedes PR #98 and andrew/merge-insert-firstseen.
Tracked at MR-957; upstream: lance-format/lance#6877.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* engine: add dedup-by-keys precondition on merge_insert primitives
Addresses Codex P1 on PR #109: `SourceDedupeBehavior::FirstSeen`
silently collapses duplicate source rows, and the branch-merge rewrite
path (`exec/merge.rs::publish_rewritten_merge_table`) feeds a
concatenated batch directly into `stage_merge_insert` without going
through `MutationStaging::finalize`'s pre-dedupe. By construction the
merge algorithm (`compute_source_delta` / `compute_three_way_delta`
walk via `OrderedTableCursor` and push each id at most once) produces
1-row-per-id, but the invariant was implicit — a future refactor
could violate it and FirstSeen would mask the bug as silent data
loss.
Add `check_batch_unique_by_keys` as a release-mode precondition at the
top of `merge_insert_batch` and `stage_merge_insert`. Errors with an
explicit "duplicate source row" message before the builder runs, so
real source dups continue to fail-fast regardless of caller.
Cost: one extra O(N) pass over the key column on every merge_insert.
String HashSet over typical batch sizes is microseconds — negligible
next to the merge_insert itself.
The inline comment in `table_store.rs` now enumerates all three
pre-dedup paths (load / mutate / branch-merge) and names the
precondition as the structural pin instead of relying on
by-construction invariants from three separate callers.
Three new unit tests in `table_store::tests` pin the helper itself;
the existing `loader_rejects_intra_batch_duplicate_keys` integration
test continues to pin the loader's intake-time check as the first
defense layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
aadfa11ecb
|
schema: HTTP allow_data_loss exposure + e2e drop coverage (MR-694 follow-up) (#107)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
The schema-lint chassis v1.2 (PR #100) shipped `--allow-data-loss` on the CLI, but `SchemaApplyRequest` had no equivalent field — Hard-mode drops were CLI-only. This commit closes that feature gap and adds e2e test coverage for drop modes across HTTP + CLI, plus data preservation on additive apply, plus a CLI↔SDK plan-parity assertion. Feature gap closed: - `crates/omnigraph-server/src/api.rs` — added `allow_data_loss: bool` (default false via `#[serde(default)]`) to `SchemaApplyRequest`. Added `Default` derive so test usages can use `..Default::default()`. - `crates/omnigraph-server/src/lib.rs` — `server_schema_apply` now constructs `SchemaApplyOptions { allow_data_loss: request.allow_data_loss }` and threads through to `apply_schema_as`. - `crates/omnigraph-cli/src/main.rs` — remote-URI schema-apply path used to bail with "--allow-data-loss not yet supported on remote"; now forwards the flag into the JSON payload so the CLI behaves identically against local and remote URIs. - `openapi.json` — regenerated; only diff is the new field on `SchemaApplyRequest`. Tests added (8 new): * `crates/omnigraph-server/tests/server.rs` (+5): - `schema_apply_route_soft_drops_property_via_http` — POST schema removing nullable property, verify catalog reflects the drop AND `snapshot_at_version(pre)` still has `age` in the field list (time-travel reachability is the Soft contract). - `schema_apply_route_soft_drops_node_type_via_http` — POST schema removing `Company` node + cascading `WorksAt` edge. - `schema_apply_route_hard_drops_property_with_allow_data_loss` — POST with `allow_data_loss: true`, verify plan step reports `mode: hard`. - `schema_apply_route_keeps_drops_soft_without_flag` — same schema without flag, verify `mode: soft`. Pins default semantics against accidental Hard promotion. - `schema_apply_route_additive_property_preserves_existing_rows` — load fixture, POST adding nullable property, verify row count preserved (SDK suite covers data preservation on drops + renames; additive AddProperty wasn't pinned). Plus helpers `schema_without_age` and `schema_without_company`. * `crates/omnigraph-cli/tests/cli.rs` (+3): - `schema_apply_allow_data_loss_flag_promotes_drops_to_hard` — CLI `omnigraph schema apply --allow-data-loss --schema X.pg --json`, verify plan step has `mode: hard`. - `schema_apply_without_allow_data_loss_keeps_soft_drops` — without flag, verify Soft. - `schema_plan_parity_cli_and_sdk` — same `.pg` source through `Omnigraph::plan_schema` (SDK) and `omnigraph schema plan --json` (CLI), assert the steps array is byte-identical post-JSON. HTTP has no `/schema/plan` endpoint; apply-side parity is implicitly covered by the HTTP drop tests + CLI drop tests using identical fixtures. Docs: - `docs/user/schema-language.md` — new "Destructive drops" section documenting Soft vs Hard semantics and that `allow_data_loss` is now honored uniformly across CLI / HTTP / SDK. Verification: every new test passes; full `cargo test --workspace --locked` green; `scripts/check-agents-md.sh` passes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
e8fec2fa0f
|
tests: policy chassis e2e gap-fills (MR-722 follow-up) (#106)
Some checks failed
CI / Classify Changes (push) Has been cancelled
CI / Check AGENTS.md Links (push) Has been cancelled
Release Edge / Prepare edge release (push) Has been cancelled
CI / Test Workspace (push) Has been cancelled
CI / Test omnigraph-server --features aws (push) Has been cancelled
CI / RustFS S3 Integration (push) Has been cancelled
Release Edge / Build edge omnigraph-linux-x86_64 (push) Has been cancelled
Release Edge / Build edge omnigraph-macos-arm64 (push) Has been cancelled
* tests: policy chassis e2e gap-fills (MR-722 follow-up) Audit after PRs #101-105 surfaced real e2e gaps in the policy chassis that could let regressions ride through silently. Coverage was strong at the SDK level (18 chassis tests) and reasonable at HTTP (12+ policy tests), but the CLI×writer matrix was asymmetric (only `change` tested end-to-end), the `cli.actor` config-only precedence path was untested, the `OMNIGRAPH_UNAUTHENTICATED` env-var read path was unexercised, `serve()`'s startup-refusal propagation was structural-review only, and engine↔HTTP decision parity was a structural property without a test pinning it. This commit closes those gaps. Added (15 new tests, all test-only): * `policy_engine_chassis.rs` (+2): `load_file_as` allow + deny pair — PR #104 added the actor-aware mirror of `load_file` but it was only exercised via CLI integration; this is direct-SDK coverage. * `omnigraph-server/src/lib.rs` mod tests (+2): - `unauthenticated_env_var_classification` — consolidated single test (process-global env var; running parallel would race) that pins truthy values, falsy values, unset, and CLI-flag-overrides- env behavior of the `OMNIGRAPH_UNAUTHENTICATED` read path inside `load_server_settings`. - `serve_refuses_to_start_in_state_1_without_unauthenticated` — `#[serial]` integration test. Clears all bearer-token env vars, builds a `ServerConfig` with no policy file and no flag, calls `serve(config).await`, asserts Err before any side-effecting work (Lance dataset open, TcpListener::bind). Guards the classifier→serve propagation path so a future refactor that drops the call turns red. * `omnigraph-server/tests/server.rs` (+4): `policy_decision_parity_*` — four cases (Change×allowed+denied, BranchMerge×allowed+denied). Each case runs the same Cedar decision via both SDK (`Omnigraph::with_policy().mutate_as` / `branch_merge_as`) and HTTP (`POST /change` / `POST /branches/merge`) and asserts both either Allow or Deny. The structural property (both paths call `PolicyChecker::check`) is now test-asserted. * `omnigraph-cli/tests/system_local.rs` (+8): the CLI×writer matrix fan-out: - `local_cli_load_enforces_engine_layer_policy` - `local_cli_ingest_enforces_engine_layer_policy` - `local_cli_schema_apply_enforces_engine_layer_policy` - `local_cli_branch_create_enforces_engine_layer_policy` - `local_cli_branch_delete_enforces_engine_layer_policy` - `local_cli_branch_merge_enforces_engine_layer_policy` Each: one denied case (`--as act-bruno` against protected main) + one allowed case (`--as act-ragnor` via existing/extended admins-* rules). Plus: - `local_cli_actor_from_config_used_when_no_flag` — proves the config-only precedence path works. - `local_cli_actor_flag_overrides_config_actor` — proves the `--as` flag wins over `cli.actor` in the config. Adds `local_policy_config_with_actor` helper. Extends `POLICY_E2E_YAML` with `admins-branch-ops` (BranchCreate + BranchDelete) and `admins-schema-apply` rules so the CLI×writer matrix has positive-case rule coverage. Verification: all new tests pass; full `cargo test --workspace --locked` is green; `scripts/check-agents-md.sh` passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * tests: serialize env-touching server lib tests to fix CI flake CI flake on PR #106's Test Workspace job: two of the new tests (`serve_refuses_to_start_in_state_1_without_unauthenticated` and `unauthenticated_env_var_classification`) raced against `server_bearer_tokens_from_env_reads_legacy_token_and_token_file`, which sets `OMNIGRAPH_SERVER_BEARER_TOKEN` via `EnvGuard`. While `serve_refuses` was mid-execution with its EnvGuard cleared, the bearer-token test's EnvGuard had `OMNIGRAPH_SERVER_BEARER_TOKEN` set; `resolve_token_source()` saw it and classified the runtime state as `DefaultDeny` rather than refusing — so the test panicked with "Dataset at path X not found" instead of the expected refusal message. The unauthenticated test had the symmetric failure: its `OMNIGRAPH_UNAUTHENTICATED="anything"` got overwritten by a peer `EnvGuard` drop. Fix: mark every test that uses `EnvGuard` with `#[serial]` so they serialize against each other (default key). Already on `serve_refuses_to_start_in_state_1_without_unauthenticated`; added to `unauthenticated_env_var_classification` and `server_bearer_tokens_from_env_reads_legacy_token_and_token_file`. The `parse_bearer_tokens_json_*` tests don't touch env vars and stay parallel. Locally green (36 tests pass on my workstation); the parallelism issue is CI-runner-specific (more aggressive thread interleaving) but the fix is universal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
f3f2a051ba
|
policy: server 3-state default-deny matrix (MR-723) (#105)
Closes the "tokens but no policy" trap. Pre-MR-723, an operator who configured bearer tokens and forgot to set policy.file got a server that required auth and then permitted every action — the illusion of protection. After MR-723, that configuration is default-deny: only `read` actions succeed; every other action returns HTTP 403. Three startup states, classified deterministically: - **Open** — no tokens, no policy. Requires explicit `--unauthenticated` flag or `OMNIGRAPH_UNAUTHENTICATED=1`; otherwise `serve()` refuses to start. Forces the operator to opt in to "fully open dev mode" so it can't happen accidentally. - **DefaultDeny** — tokens configured, no policy. `authorize_request` rejects every action except `Read` with 403. The warn-log on startup names the misconfiguration explicitly. - **PolicyEnabled** — policy file configured. Cedar evaluates every request, unchanged from pre-MR-723. What landed: - `ServerConfig.allow_unauthenticated: bool` + `--unauthenticated` flag on the `omnigraph-server` bin + `OMNIGRAPH_UNAUTHENTICATED` env var (`load_server_settings` honors both). - New `classify_server_runtime_state(has_tokens, has_policy, allow_unauthenticated) -> Result<ServerRuntimeState>` pure function. `serve()` calls it before opening the engine and bails with a clear error when the operator hits the no-tokens-no-policy-no-flag cell. - `authorize_request` state-2 branch: when `policy_engine()` is None but the bearer-auth middleware delivered an authenticated actor, any action other than `Read` returns 403 with a message that names the misconfiguration. - `AppState::with_policy_engine(self, engine)` builder method so integration tests that need a custom workload (`new_with_workload`) can still install a permit-all policy without a new constructor. - `app_for_loaded_repo_with_auth(token)` and `app_for_loaded_repo_with_auth_tokens(tokens)` test helpers now install a permit-all policy alongside tokens — they previously represented the "tokens but no policy" state that MR-723 makes default-deny, and tests that don't care about policy were inadvertently coupled to the loophole. Tests: - `classify_*` unit tests (3) — every cell of the matrix. - `default_deny_mode_allows_read_for_authenticated_actor` — GET /snapshot succeeds with bearer token + no policy. - `default_deny_mode_rejects_change_with_forbidden` — POST /change rejected with 403 + "default-deny" message. - `default_deny_mode_rejects_schema_apply_with_forbidden` — POST /schema/apply rejected with 403 + "default-deny" message. - New `app_for_repo_with_auth_tokens_only(schema, tokens)` helper builds the State-2 fixture without policy. The pre-MR-723 helpers `app_for_loaded_repo_with_auth*` shift semantics to "tokens + permit-all" so existing tests retain their original intent. docs/user/policy.md: new "Server runtime states (MR-723)" section documents the matrix and the explicit `--unauthenticated` opt-in. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
a275306a15
|
policy: CLI policy injection — local writes go through engine enforce (MR-722) (#104)
Closes the CLI side of the policy chassis fan-out. Before this commit, CLI direct-engine writes bypassed Cedar entirely because the CLI never called `Omnigraph::with_policy(...)` for non-`policy validate|test|explain` subcommands. After this commit, every CLI direct-engine writer (change, load, ingest, branch create/delete/merge, schema apply) opens the engine via a new `open_local_db_with_policy(uri, &config)` helper that installs the configured `PolicyEngine` when `policy.file` is set, and threads the resolved actor through to the `_as` writer methods. Actor identity resolution: - New top-level `--as <ACTOR>` global flag on the CLI overrides config. - New `cli.actor` field in `omnigraph.yaml` provides a default actor. - Precedence: `--as` > `cli.actor` > None. - When policy is configured and neither is set, the engine-layer footgun guard fires and the write is denied — silent bypass via "I forgot the actor" is exactly what the guard prevents. - Remote HTTP writes ignore both — bearer-token-resolved server-side. Helpers added in main.rs: - `open_local_db_with_policy(uri, &config) -> Result<Omnigraph>` — opens the DB and installs the PolicyEngine when configured. Without policy this is identical to a bare `Omnigraph::open`. - `resolve_cli_actor(cli_as, &config) -> Option<&str>` — implements the flag > config > None precedence. Engine: added `load_file_as` to the loader as the actor-aware mirror of `load_file`, so CLI file-path loads flow through the same enforce gate as in-memory `load_as` calls. Test rewrite: `local_cli_policy_tooling_is_end_to_end_while_local_writes_stay_unenforced` was the explicit assertion of the pre-chassis hole. Renamed and split: - `local_cli_policy_tooling_is_end_to_end` — sanity for the read-only policy CLI surfaces (validate/test/explain), unchanged behavior. - `local_cli_change_enforces_engine_layer_policy` — the new assertion: policy installed + no actor → footgun-guard denial; `--as act-bruno` on protected main → Cedar denial; `--as act-ragnor` (admins-write rule) on main → permit, write committed. POLICY_E2E_YAML gains an `admins-write` rule so the permit case has a non-trivial actor to exercise. docs/user/policy.md updated with `cli.actor` + `--as <ACTOR>` usage. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
da42beec41
|
policy: chassis fan-out — _as variants on the remaining 6 writers (MR-722) (#103)
PR #102 wired apply_schema_as. This PR completes the chassis-side coverage so every public mutating engine entry point hits the same Omnigraph::enforce(action, scope, actor) gate regardless of transport: - mutate_as → enforce(Change, Branch(branch), actor) - load_as → enforce(Change, Branch(branch), actor) - ingest_as → enforce(Change, Branch(branch), actor); also threads actor through the implicit branch_create_from_as so fresh-branch ingest correctly hits BranchCreate too - branch_create_as → enforce(BranchCreate, TargetBranch(name), actor) - branch_create_from_as → enforce(BranchCreate, BranchTransition { source, target }, actor) - branch_delete_as → enforce(BranchDelete, TargetBranch(name), actor) - branch_merge_as → enforce(BranchMerge, BranchTransition { source, target }, actor) Three new _as variants for branch ops (create, create_from, delete) that had no actor surface before; existing actor-less variants delegate with actor=None so the no-policy path is a strict no-op. HTTP handlers updated to thread the resolved actor into the new _as variants for branch_create and branch_delete (was previously dropped). 14 new SDK chassis tests (one allow + one deny pair per wired writer); the existing 4 apply_schema_as tests stay. All 18 pass. docs/user/policy.md updated to describe engine-wide enforcement and the coarse-vs-fine layer split (engine = action gate, query layer per-row = MR-725 future). AGENTS.md capability matrix updated to match. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
9973683261
|
policy: chassis core — omnigraph-policy crate + Omnigraph::enforce() (MR-722) (#102)
PR #2 of the policy chassis series (PR #1 = MR-731, merged in #101). The structural fix that moves Cedar enforcement from HTTP-only to engine-wide. apply_schema is the proof-of-concept writer; PR #3 fans the enforce() call out to the remaining six (mutate_as, load, ingest_as, branch_create_from, branch_delete, branch_merge). ## What lands ### New crate: omnigraph-policy The 844-line policy.rs moves from `omnigraph-server` into a new `omnigraph-policy` workspace crate so both engine and server can depend on it. Cedar dependency moves with it. The server's policy.rs becomes a re-export shim (`pub use omnigraph_policy::*`) so existing `omnigraph_server::PolicyAction` etc. paths keep working — CLI and test consumers don't have to migrate in one go. ### New trait: PolicyChecker ```rust pub trait PolicyChecker: Send + Sync { fn check(&self, action: PolicyAction, scope: &ResourceScope, actor: &str) -> Result<(), PolicyError>; } ``` `PolicyEngine` (Cedar-backed) implements it. `Omnigraph::with_policy()` takes `Arc<dyn PolicyChecker>`. Engine tests mock the trait without spinning up Cedar. MR-725 will extend the trait with `predicate_for()` for query-layer pushdown — additive, no call-site changes. ### New enum: ResourceScope Four variants — Graph, Branch, TargetBranch, BranchTransition — mapping cleanly to today's `(branch, target_branch)` shape on PolicyRequest via `to_branch_pair()`. Each engine writer picks the variant that matches the existing HTTP-layer convention so engine and HTTP evaluate the same Cedar decision. **Invariant**: ResourceScope stays at branch granularity. Per-type and per-row scope are MR-725's territory, not engine-layer's. Adding Type/Row variants here creates two places per-type policy can be evaluated, which can drift. See chassis design refinements comment on MR-722 (2026-05-17). ### Omnigraph::with_policy() + enforce() * New `policy: Option<Arc<dyn PolicyChecker>>` field on Omnigraph, None by default (preserves embedded/dev no-enforcement mode). * `with_policy(self, checker)` setter — builder-style, consumes self. * `enforce(action, scope, actor)` — the gate. When policy is None, no-op. When policy is Some AND actor is None, hard error — silent bypass via "I forgot the actor" is exactly the footgun this gate is here to prevent. ### apply_schema_as: first writer wired * New public method `apply_schema_as(source, options, actor)` that calls `enforce(SchemaApply, TargetBranch("main"), actor)` before acquiring the schema-apply lock or doing any other work. * Existing `apply_schema(source)` and `apply_schema_with_options(...)` delegate to it with actor=None (no-actor variants). * HTTP handler `server_schema_apply` updated to call apply_schema_as with the resolved actor. AppState construction injects the PolicyEngine into Omnigraph via `with_policy`. HTTP-layer authorize_request still fires first; the engine gate is the redundant-but-correct backstop and the only path that protects SDK / embedded callers. PR #3 removes the HTTP redundancy. ### OmniError::Policy New error variant for engine-layer policy denial / evaluation failure. ApiError::from_omni maps it to 403. ### MR-724 Admin action — Option A reservation PolicyAction::Admin kept in the enum with a load-bearing doc comment naming its future consumers (hot reload, audit log query, approvals list per MR-726 / MR-732 / MR-734). No enforce(Admin, ...) call site exists yet — the variant is reserved so the action vocabulary is complete from chassis day one. MR-724 closes when the first consumer surface ships. ### New SDK-side integration test `crates/omnigraph/tests/policy_engine_chassis.rs` — four tests covering: * Policy denies for unauthorized actor → OmniError::Policy * Policy permits for authorized actor → apply succeeds * Policy installed + no actor → hard error (forget-the-actor footgun) * No policy → no-op (embedded/dev default still works) These exercise the engine path directly — no HTTP layer involved. ## Test results - cargo test --workspace --locked --no-fail-fast: 851 passed, 0 failed * 45 server tests (existing) pass * 14 schema_apply tests (existing) pass * 4 new chassis tests pass * 60 OpenAPI tests pass (no HTTP API surface changes) * No regressions across the workspace ## Architectural decisions baked in Per MR-722 chassis design refinements comment (2026-05-17): 1. PolicyChecker is a trait, not just a concrete. Engine and server consume the trait. MR-725 adds predicate_for() additively. 2. ResourceScope stays at branch granularity. No Type/Row variants. 3. Coarse-vs-fine framing pinned: engine-layer is action gate; query-layer (MR-725) is predicate gate. Both backed by same Cedar engine; non-overlapping responsibilities. 4. Admin action reserved for policy-management surfaces (MR-724 Option A). ## Pending follow-ups (PR #3+) - Fan-out enforce() to mutate_as, load, ingest_as, branch_create_from, branch_delete, branch_merge (PR #3). - Remove HTTP-layer authorize_request redundancy once engine gate covers all writers (PR #3). - CLI policy injection into Omnigraph for non-`policy validate|test|explain` subcommands (PR #3 or follow-up). - MR-723 default-deny 3-state matrix (PR #4). - MR-736 severity warn/deny (PR #5). - AGENTS.md scope-of-enforcement rewrite once chassis fully lands. - Coarse-vs-fine framing in docs/user/policy.md. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
7a86f654d4
|
policy: codify signed-token-claim-only actor identity (MR-731) (#101)
Warm-up commit for the policy chassis epic (MR-722). PR #1 of the chassis series — same role as schema-lint v1's commit #1 baseline. Zero behavioral change; establishes the regression test, the load-bearing doc comment, and the user-doc paragraph for an invariant already true in code. Server auth already resolves `actor_id` from the matched bearer token at `omnigraph-server/src/lib.rs:692-694`, overwriting whatever the handler put in the PolicyRequest. The principle is named in docs/dev/invariants.md Hard Invariant 11 ("clients cannot set actor identity directly"). What was missing: a regression test, a load-bearing doc comment at the resolution site, and a user-facing documentation paragraph. This commit adds all three. Why first. The actor-identity invariant is the foundation every other policy decision stands on. If `actor_id` can be spoofed, every chassis primitive (per-row scope, audit log, two-person rule) becomes ungated. Pinning the invariant first means PR #2 (the chassis core) doesn't have to re-prove this assertion. Changes: * crates/omnigraph-server/tests/server.rs — new regression test actor_id_resolves_from_bearer_token_ignoring_client_supplied_headers with three sub-assertions: - spoof-up: bearer for denied actor + X-Actor-Id naming allowed actor → 403 (header doesn't promote) - spoof-down: bearer for allowed actor + X-Actor-Id naming denied actor → 200 (header doesn't demote) - empty-string spoof: empty X-Actor-Id doesn't clear resolved actor Cross-link to MR-777 (auth boundary cases — actor-id collision + malformed bearer) noted in the test docstring. * crates/omnigraph-server/src/lib.rs — expanded doc comment at the actor-resolution site explaining the SECURITY INVARIANT, citing Hard Invariant 11, the Supabase RLS history footgun, and the regression test that pins the contract. Reader thinking "I should let clients override actor_id for impersonation" hits this comment first. * docs/user/policy.md — new "Actor identity (signed-claim-only)" section near the existing Server enforcement section. Closes the user-facing doc gap MR-731's "Done when" requires. Architectural decisions for PR #2+ pinned this session (not implemented here, recorded so future implementers don't re-litigate): - PolicyEngine moves to new `omnigraph-policy` workspace crate so both engine and server can depend on it (Q2). - `enforce(action, scope, actor)` will take a new `ResourceScope` enum, leaving room for MR-725's per-type and per-row variants (Q3). - `PolicyAction::Admin` is kept and wired (Option A) — meta-action for policy-management surfaces (hot reload, audit log query, approvals list) as those consumer features land (Q4). Test results: - cargo test -p omnigraph-server --test server: 45 pass (44 existing + 1 new); no regressions - scripts/check-agents-md.sh: passes (34 links / 33 docs OK) Out of scope (PR #2+): - Omnigraph::with_policy() + enforce() method - omnigraph-policy crate creation - ResourceScope enum - CLI policy injection into Omnigraph - HTTP-layer redundant-check removal - MR-724 Admin action wiring (PR #2) - MR-723 default-deny 3-state (PR #4) - MR-736 severity warn/deny (PR #5) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
a6e037547f
|
schema-lint chassis v1.2: --allow-data-loss flag + Hard mode (MR-694) — completes v1 (#100)
* schema-lint v1 commit 5: --allow-data-loss flag + Hard mode Final v1 commit. Wires up the --allow-data-loss CLI flag and Hard mode for both DropProperty and DropType. Per docs/dev/schema-lint-v1-plan.md, commit #5 of the schema-lint chassis v1 series (MR-694). CLI (omnigraph-cli/src/main.rs): - New --allow-data-loss flag on both `omnigraph schema plan` and `omnigraph schema apply` subcommands. Off by default (Soft). - HTTP remote schema apply explicitly rejects the flag for now (CLI-only; HTTP parity is a separate small follow-up that adds the field to SchemaApplyRequest + the server handler). Engine (omnigraph.rs + schema_apply.rs): - New SchemaApplyOptions { allow_data_loss: bool } public struct (Default = all false), re-exported via omnigraph::db::SchemaApplyOptions. - New public methods: plan_schema_with_options and apply_schema_with_options. Existing plan_schema/apply_schema are now thin wrappers that pass Default::default(). - promote_drops_to_hard: post-plan walk that promotes every DropMode::Soft step to DropMode::Hard when the flag is set. Keeps the compiler's plan_schema_migration signature unchanged (no breaking change for tests / callers). - Apply path: both Drop arms accept Hard mode; behavior is identical to Soft inside the apply loop. The DIFFERENCE is the new hard_cleanup_targets: Vec<(String, String)> accumulator, populated for every Hard variant with (table_key, full_dataset_uri). - Post-publish cleanup: a new loop after the manifest commit iterates hard_cleanup_targets and calls cleanup_old_versions (before_timestamp = now) on each dataset URI. Best-effort — the apply is already durable; cleanup failure is logged via tracing::warn rather than failing the apply. - New cleanup_dataset_old_versions helper inlines the Lance cleanup_old_versions call against a dataset URI. Behavioral details: - DropProperty Hard: stage_overwrite produced a new dataset version without the column. cleanup_old_versions removes the prior version (and reclaims unique fragments). After Hard apply, snapshot_at_version(pre_drop).open(table_key) FAILS because the prior dataset version was reclaimed. - DropType Hard: no per-table write happens (the change is the manifest tombstone). cleanup_old_versions on the orphan dataset is a no-op in the immediate term (no prior versions to clean since the dataset wasn't modified by this apply). The dataset directory persists. Full orphan-cleanup is a documented follow-up — the user-facing contract is "data is unreachable via omnigraph" (manifest entry tombstoned), which is satisfied. Tests (tests/schema_apply.rs): - apply_schema_with_allow_data_loss_promotes_drops_to_hard: default plan emits Soft; with options.allow_data_loss=true, plan emits Hard; apply succeeds. - apply_schema_hard_drops_property_makes_prior_version_unreachable: Hard drop succeeds, current snapshot lacks the column, and snapshot_at_version(pre_drop).open("node:Person") FAILS (Lance prior version reclaimed by cleanup). - apply_schema_hard_drops_node_and_edge_with_flag_succeeds: both Node and Edge DropType variants are promoted to Hard with the flag; apply succeeds; current manifest entries gone. (Orphan dataset directory cleanup deferred.) Test results: - cargo test -p omnigraph-compiler --lib: 239 passed - cargo test -p omnigraph-engine --test schema_apply: 14 passed (3 new Hard tests + 11 existing soft/regression tests) - cargo test -p omnigraph-server --test openapi: 60 passed (no HTTP API surface changes in this commit; OpenAPI parity follow-up noted) v1 status: complete for CLI/embedded use. MR-694 chassis epic + MR-700 DropType/DropProperty ticket can close after this lands. Known follow-ups (separate small PRs): - HTTP parity: extend SchemaApplyRequest with allow_data_loss field, thread through server handler, regenerate openapi.json. - Orphan-dataset directory deletion for DropType Hard (currently the dataset directory persists; cleanup_old_versions doesn't remove it because the dataset wasn't modified). - MR-948 substrate alignment: swap DropProperty Soft from stage_overwrite to Dataset::drop_columns (catalog_only vs full_rewrite cost class). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fixup: use bail! from color_eyre::eyre instead of anyhow The remote-rejection branch in SchemaCommand::Apply used anyhow::anyhow! which isn't in scope; the CLI's Result type is color_eyre::eyre::Result and bail! is already imported. Caught by CI Test Workspace job on PR #100. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
58cee158d8
|
schema-lint v1 commit 4: emit + apply DropType { Soft } (#99)
Wire the second half of the dormant Drop* family. Per docs/dev/schema-lint-v1-plan.md, commit #4 of the schema-lint chassis v1 series (MR-694). Builds on commit #3 (PR #90, DropProperty Soft). Planner (schema_plan.rs): - plan_nodes leftover loop: emit DropType { Node, name, Soft } instead of UnsupportedChange (OG-DS-102) for node-type removals. - plan_edges leftover loop: emit DropType { Edge, name, Soft } instead of UnsupportedChange (OG-DS-103) for edge-type removals. Apply (schema_apply.rs): - New dropped_tables: BTreeSet<String> accumulator alongside added_tables / renamed_tables / rewritten_tables. - DropType arm in the metadata loop populates dropped_tables for Soft mode. Hard mode errors (lands in commit #5 with --allow-data-loss). - New tombstone-emission loop after the rename sidecar build: for each dropped table, push to sidecar_tombstones AND populate table_tombstones with table_version + 1. The existing manifest publish path converts table_tombstones into ManifestChange::Tombstone operations — no new manifest plumbing needed. - Soft DropType has no Phase B per-table write; the tombstone is the entire change. Lance dataset files are retained — prior __manifest versions still reference them, so time travel + branch-from-snapshot can read the dropped table until cleanup_old_versions runs. - Rides on SidecarKind::SchemaApply per MR-847 (already established by commit #3). Tests: - Planner unit test plan_emits_soft_drop_for_removed_node_and_edge_types asserts both Node and Edge DropType { Soft } emission for the Company + WorksAt combined drop, plus no UnsupportedChange. - Integration test apply_schema_drops_node_and_referencing_edge_softly (replaces apply_schema_rejects_dropping_a_node_type): asserts plan emission, apply success, current manifest entries absent, pre-drop manifest entries present (time-travel reversibility), reopen consistency. - Integration test apply_schema_drops_an_edge_type_softly (replaces apply_schema_rejects_dropping_an_edge_type): single edge drop, asserts other tables untouched, time-travel reversibility. Test results: - cargo test -p omnigraph-compiler --lib: 239 passed (1 new + 238) - cargo test -p omnigraph-engine --test schema_apply: 11 passed (2 converted + 9 unchanged) Pending for v1 completion: - Commit #5: --allow-data-loss CLI flag + Hard mode promotion in planner + immediate compact_files + cleanup_old_versions for both DropProperty and DropType. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
e98347eb7b
|
schema-lint chassis v1.0: DropProperty Soft + code-tagged diagnostics (MR-694) (#90)
* schema-lint chassis v1 (WIP): tier surfacing + plan doc
First commit of the chassis v1 branch. Lands a small, foundational
slice without behavior change, plus a planning doc that lays out the
remaining 7 commits in sequence so the PR can be reviewed
incrementally.
This commit:
- Adds SchemaMigrationStep::diagnostic() returning the full
&'static DiagnosticCode (family + tier + severity) for
UnsupportedChange steps with codes. Renderers can now reach the
tier without re-implementing the code → tier lookup.
- CLI `omnigraph schema plan` output now displays tier alongside
code:
unsupported change on node:Person.age [OG-DS-104, destructive]:
removing property 'Person.age' is not supported in schema
migration v1
Operators see at-a-glance the kind of risk each rejection
represents — not just the rule identifier.
- No behavior change. All 11 existing schema_apply tests still pass.
Planning doc at docs/schema-lint-v1-plan.md tracks the 7 remaining
commits to bring v1 to feature-complete:
1. (this commit) Tier surfacing in plan output.
2. Soft / Hard mode enum on drop steps.
3. Tombstone fields on catalog IR.
4. Planner emits DropProperty { Soft } by default.
5. Apply path implements Soft mode.
6. Convert PR #62 destructive-rejection tests.
7. --allow-data-loss flag + Hard mode.
8. (optional) Tombstone unhide / restore command.
Delete the planning doc when v1 lands. Intentionally checked in to
the WIP branch so the scope is reviewable; not intended as a
permanent doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* schema-lint v1 commit 2: DropMode + dormant Drop* variants
Second commit of the chassis v1 branch. Lands the type-level shape
of soft/hard drops without wiring them up. Variants are reachable
from emitters but the planner doesn't produce them yet; the apply
path returns an explicit not-yet-implemented error if one shows up
via deserialization.
Added:
- `DropMode { Soft, Hard }` — orthogonal to `SafetyTier`. Tier
classifies the rule's risk class; mode is the operator's intent
for data treatment.
- `Soft` → catalog tombstone, data retained. Tier: safe.
- `Hard` → Lance-level removal. Tier: destructive; will require
--allow-data-loss to apply (commit 7).
- `SchemaMigrationStep::DropType { type_kind, name, mode }` and
`SchemaMigrationStep::DropProperty { type_kind, type_name,
property_name, mode }` variants.
- Re-export `DropMode` from `omnigraph_compiler::DropMode` so
downstream crates don't reach into the catalog submodule.
- CLI `render_schema_plan_step` arms for both variants, surfacing
the mode in plan output: `drop property 'Person.age' of node
'Person' (soft mode)`.
- `apply_schema_with_lock` exhaustive match arm for the two new
variants that returns `manifest_internal` with a clear
not-yet-implemented message. If a SchemaIR JSON containing
Drop{Type,Property} arrives (e.g. from a future tool or hand-
written), the apply path fails explicitly rather than silently
misclassifying.
- Two new in-source tests:
- `drop_steps_round_trip_through_serde` — pins the wire shape
for all four (variant × mode) combinations.
- `drop_mode_serde_uses_snake_case` — pins external-tool-
friendly serialization (`"soft"` / `"hard"`).
Build: clean, only pre-existing warnings.
Tests:
- omnigraph-compiler schema_plan: 6/6 (4 existing + 2 new).
- omnigraph-engine schema_apply: 11/11 (unchanged — planner still
emits UnsupportedChange for removal paths).
Next commit (commit 3 per docs/schema-lint-v1-plan.md): add the
`tombstoned: bool` fields to NodeIR / EdgeIR / PropertyIR for the
catalog representation of soft-mode tombstones.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* plan doc: reframe v1 around Lance native drop_columns
After a substrate audit of the Lance data-evolution guide on
2026-05-13, the v1 plan was simplified. Two key findings:
1. Lance's `drop_columns()` is already metadata-only and reversible
via time travel until cleanup. No need for a parallel
`tombstoned: bool` field in our catalog IR — Lance's version
graph IS the tombstone.
2. The full schema_apply substrate migration (add_columns,
drop_columns, alter_columns vs. stage_overwrite across all step
types) is consolidated in MR-948 as a sibling issue. v1 only
uses the relevant slice (drop_columns for OG-DS-1XX).
Net plan changes:
- Commit 3 (original): tombstone fields on catalog IR → dropped.
No catalog IR change needed. The Lance drop_columns commit IS the
tombstone.
- Commit 5 (original): apply path writes tombstoned: true → replaced
with: apply path calls Dataset::drop_columns([name]).
- Commit 7 Hard mode: stage_overwrite removing the column → replaced
with: drop_columns + compact_files + cleanup_old_versions. Same
APIs omnigraph cleanup already uses.
- Commit 8 (original): omnigraph schema unhide → dropped. Time
travel is the undo (omnigraph snapshot --at <commit>).
Net result: 8 commits → 5 commits. ~250 LoC less surface. More
substrate-aligned.
The chassis types from commit 2 (DropMode enum, DropType /
DropProperty variants) are kept exactly as designed; only the
implementation strategy changed.
The Lance docs quote is included in the doc so future readers see
the substrate behavior cited verbatim.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* schema-lint v1 commit 3: emit + apply DropProperty { Soft }
Wire the dormant DropProperty variant end-to-end for the Soft case.
Per docs/schema-lint-v1-plan.md, commit #3 of the schema-lint chassis
v1 series (MR-694).
Planner (schema_plan.rs):
- plan_properties: emit DropProperty { type_kind, type_name,
property_name, mode: Soft } instead of UnsupportedChange when a
property exists in accepted but not in desired. Plan is now
supported = true for drop-only changes.
Apply (schema_apply.rs):
- Route DropProperty { Soft } through rewritten_tables. The existing
batch_for_schema_apply_rewrite path already iterates the *target*
schema fields, so a property absent from desired_catalog is
naturally projected away. The prior Lance version retains the
dropped column for time-travel reversibility (until cleanup runs).
- DropType still errors (lands in commit #4 with different mechanics:
__manifest entry removal instead of column projection).
- DropProperty { Hard } still errors (lands in commit #5 with
--allow-data-loss CLI flag + immediate compact_files +
cleanup_old_versions).
Tests:
- Planner unit test plan_emits_soft_drop_for_removed_nullable_property
asserts the variant emission + supported = true + no UnsupportedChange.
- Integration test apply_schema_drops_a_nullable_property_softly_
preserves_prior_version (replaces the former
apply_schema_rejects_dropping_a_property_with_data) asserts:
(a) plan contains DropProperty { Soft }
(b) apply succeeds + manifest advances + row count unchanged
(c) current dataset schema lacks the dropped column
(d) snapshot_at_version(pre_drop) still has the dropped column
(e) reopen consistency — drop preserved across engine restart
Recovery: rides on SidecarKind::SchemaApply per MR-847. No new
sidecar kind needed; the entire apply path is already sidecar-wrapped.
Substrate alignment: this commit uses the stage_overwrite full-rewrite
path (full_rewrite cost class) rather than Lance native drop_columns
(catalog_only cost class). MR-948 is the follow-up substrate-alignment
refactor that introduces a LanceColumnOp surface and switches the
metadata-only case onto drop_columns. Functional outcome is identical;
cost-class improvement deferred.
Test results:
- cargo test -p omnigraph-compiler --lib: 238 passed
- cargo test -p omnigraph-engine --test schema_apply: 11 passed
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: move schema-lint-v1-plan into docs/dev/ + add to index
Post-rebase fixup for the docs split (#93). The plan doc was added
to docs/ at the top level before main reorganized to docs/{user,dev}/.
This moves it into docs/dev/ and adds an entry to docs/dev/index.md
under a new "Active Implementation Plans" section so the
check-agents-md.sh link check passes.
Per the original commit message (
|
||
|
|
5c889f8e42
|
Update README.md |