The `local` line of the `--help` "COMMANDS BY CAPABILITY" legend was stale: it
still listed `config` (the `config migrate` group was removed with the
omnigraph.yaml excision) and omitted `profile` (added by RFC-011 D8). Update the
list to `embed, login, logout, profile, version`.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Inspect the per-operator `~/.omnigraph/config.yaml` scope profiles without
running anything:
- `profile list [--json]` — every profile with its binding (server/cluster/store)
and default graph; marks the `$OMNIGRAPH_PROFILE`-active one. A malformed
(zero/two-scope) profile is reported as `invalid: <reason>`, not a hard failure.
- `profile show [<name>] [--json]` — one profile's resolved scope: binding kind +
target, the resolved endpoint (a server's URL / a cluster's root / the store
URI), default graph, and output format. With no name, shows the active
(`$OMNIGRAPH_PROFILE`) profile, else the flat operator defaults.
Both are `local` (Session plane) — they read operator config only, take no
addressing flags. Display reads `OperatorProfile::binding()` + the same
`servers`/`clusters` lookups the scope resolver uses (not `resolve_scope`, which
is capability-gated and can't render all three binding kinds at once), so it is
honest about what a profile binds.
Also: RFC-011 bookkeeping (Status → Accepted; D8 shipped, D11 gated on RFC #219,
D5 deferred) and drop the stale "legacy config actor (RFC-008 window)" comment in
operator.rs (the legacy actor is gone).
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* chore: correct stale global-lock comments
The global Arc<RwLock<Omnigraph>> that once serialized every server write was
removed — the server holds the engine as a lockless Arc<Omnigraph> and write
methods are &self, so the per-(table_key, branch) write queues are now the
actual write-serialization mechanism (in-process only).
Correct comments that still claimed the global lock is 'still in place' /
'today', or framed the queues as MR-686 scaffolding: write_queue.rs module doc,
exec/merge.rs, db/omnigraph/schema_apply.rs, db/manifest/recovery.rs, and the
bench_concurrent_http.rs example (which also wrongly stated mutate_as is
&mut self). workload.rs is left as-is — its 'previous global RwLock' wording is
accurate history.
* test: regression for self-healing a manifest-unreferenced fork
An interrupted first-write fork (create_branch succeeded, the manifest publish
did not) leaves a fully-formed Lance branch ref the manifest never references.
The branch stays a valid manifest branch, so cleanup's reconciler never
reclaims it, and today the next write to that table wedges with 'incomplete
prior delete; run cleanup'.
Forge that exact residue (a live 'feature' branch + a directly-created
'feature' ref on the Person table the manifest doesn't reference) and assert
the next load AND mutate self-heal. Deterministic and local — no S3 or timing,
since the forge IS the post-crash state. Adds a shared node_table_uri helper.
This commit is RED: it reproduces the bug and fails against the unfixed engine
with the predicted symptom. The fix follows in the next commit.
* fix: self-heal manifest-unreferenced branch forks
The first write to a table on a branch lazily forks it via Lance create_branch,
a durable two-phase op that advances Lance state BEFORE the atomic manifest
publish. If the writer dies or its request future is cancelled between the fork
and the publish, the branch ref is fully formed but the manifest never
references it. The next write re-enters the fork path, create_branch collides,
and the engine wedged with 'orphaned table state ... incomplete prior delete;
run cleanup' — which cleanup could not even fix, because the branch is still a
live manifest branch. This hit load, mutate, ingest, and the merge fork path
(one shared engine chokepoint), so a routine deploy restart or client
disconnect could wedge a branch.
Fix: treat the per-table fork ref as derived state of the manifest. fork_branch_
from_state returns a typed ForkOutcome instead of a human 'incomplete prior
delete' error; on RefAlreadyExists the db layer reclaims the manifest-
unreferenced fork (force_delete_branch + re-fork, exactly once) and proceeds.
A live committed fork is still routed to a retryable conflict before the fork
path, so concurrent first-writes stay correct.
Reclaim is only safe if no in-process writer can be mid-fork, so the write
entry points (load, mutate) acquire the per-(table, branch) write queues for
all touched tables up front — before the fork, held through the publish — when
forking a non-main branch. commit_all accepts these pre-held guards instead of
re-acquiring (the queue is non-re-entrant). The merge fork path already holds
the queue and self-heals through the shared wrapper. Cross-process in-flight
forks remain the documented one-winner-CAS gap.
Mechanical prep folded in: mutation IR lowering is hoisted so the touched-table
set is known before execution; commit_all gains the held_guards parameter.
Flips recreate_over_orphaned_fork_before_cleanup_is_actionable to assert
self-heal; fork_collision_with_live_concurrent_fork_is_retryable still holds.
Docs: writes.md cancelled-future note, invariants.md cross-process known gap.
* fix(cleanup): reconcile per-table manifest-unreferenced forks
reconcile_orphaned_branches keyed orphans on the branch NAME (absent from the
manifest), so it only reclaimed forks from a fully-deleted branch. A fork left
on a still-live branch by an interrupted first-write was never reclaimed — the
backstop the handoff expected cleanup to provide did not cover that case.
Broaden it to a per-table authority test: a Lance branch B on table T is an
orphan iff B is not a live manifest branch (delete-leftover) OR the manifest's
branch-B snapshot does not place T on B (interrupted first-write). Per-branch
snapshots are resolved once and cached across tables. Legitimately-forked
tables, main, and internal/system branches are never reclaimed; children are
dropped before parents to avoid Lance's referenced-parent RefConflict. The
commit-graph half stays whole-branch (per-table doesn't apply there).
This is the guaranteed-convergence backstop to the write-path self-heal: it
reclaims any fork the write path never revisits, and is what Lance's own
create_branch docstring asks embedders to provide for zombie/orphan refs.
* fix: reclaim self-validates against fresh manifest authority
The fork reclaim force-deletes a Lance branch ref, gated on the caller's proof
that the manifest does not place the table on the branch. But the first-write
path obtains that proof via snapshot_for_branch, which returns the coordinator's
CACHED snapshot when the handle is bound to the branch (an embedded handle on
the branch, or branch_merge's target swap). If that snapshot is stale and a
concurrent writer already published a legitimate fork, the reclaim would
force-delete it and re-fork from source, stranding the manifest at a version the
recreated ref no longer has.
Make the destructive primitive own its safety precondition: re-derive it from a
FRESH manifest read (fresh_snapshot_for_branch, which bypasses the cache)
immediately before force-deleting. If fresh authority shows the table is on the
branch, refuse with a retryable conflict instead of destroying a valid fork.
Correct for any caller regardless of snapshot staleness. Also stop branching on
Lance's exact RefConflict prose (loosened match; typed-variant is the durable
follow-up). Addresses PR review (Codex P1, Greptile P2).
* fix: cover delete-cascade edges in up-front fork-queue acquisition
A node delete cascades to every edge table touching that node (execute_delete_
node), forking those edge tables during execution. But touched_table_keys
derived the up-front fork-queue set from the IR ops alone (just node:Type), so a
branch delete that forks node + cascade edges held only the node queue —
commit_all then saw cascade-edge keys it had no guard for.
The touched set is a pure function of (IR ops + catalog), so compute the
COMPLETE set: op types plus, for delete-node ops, the cascade edges derived the
same way the executor derives them (from_type/to_type match). Pre-computed now
equals actual by construction.
Also promote commit_all's held-guard coverage check out of debug_assert into an
all-builds check that fails the write with a typed manifest_internal error: a
load-bearing serialization invariant must fail loudly+safely in release, not
silently proceed unguarded if a future execution path ever touches a table
outside the pre-computed set.
Adds branch_cascade_delete_forks_node_and_edges_under_held_queues, which drives
the cascade path on a branch (the gap the existing insert/load tests missed).
Addresses PR review (Cursor medium, Greptile P2).
* fix(cleanup): serialize fork reclaim against in-process live writers
The broadened per-table reconciler force_delete'd an orphan candidate on a LIVE
branch without holding the per-(table, branch) write queue. An in-process
first-write fork in its fork->publish window holds that queue and has not yet
advanced the manifest, so it looks exactly like an origin-2 orphan — concurrent
cleanup could delete the ref the writer still holds and is about to publish.
(The old branch-name-based reconciler did not have this race: a deleted branch
cannot have a live first-write.)
Bring the reconciler under the same invariant the write-path reclaim already
obeys: never force_delete a fork ref without holding the (table, branch) write
queue AND confirming, under it, from a fresh read, that the ref is still
manifest-unreferenced. Acquire one key at a time (no lock-order inversion vs
multi-table acquire_many writers); if the writer published meanwhile, the fresh
re-check sees the table on the branch and skips. Cross-process writers remain
the documented one-winner-CAS gap. Addresses PR review (Cursor high).
* fix: classify create_branch failure by ref existence, not by failure
fork_branch_from_state mapped ANY create_branch failure to RefAlreadyExists,
routing transient I/O / version / Lance-internal errors into the destructive
reclaim path and masking the real error as a retryable conflict.
Branch on the actual fact instead: on create_branch failure, check whether the
ref exists (list_branches). Only a genuinely pre-existing ref — a fully-formed
manifest-unreferenced fork — is a reclaim candidate; any other failure
propagates with fidelity. We deliberately do NOT force-delete on a not-found-ref
failure: it is indistinguishable from a transient error on a fresh create, and
force-deleting there is the overreach the fresh-authority guard already removed.
A phase-1-only Lance zombie (rarer; create_branch interrupted mid its two
internal phases) surfaces as the propagated error for manual reclaim.
Addresses PR review (Cursor medium).
* fix(cleanup): skip (not delete) on a transient re-check error for a live branch
The reconcile pre-delete re-check treated ANY fresh_snapshot error as 'still an
orphan' and proceeded to force_delete. A transient manifest read failure on a
LIVE branch could therefore destroy a fork the manifest still considers
legitimate — inconsistent with the write-path reclaim (aborts on the same error)
and the candidate scan (skips on snapshot failure).
Distinguish the two origins under the queue: a branch absent from the manifest
authority (origin 1) is a confirmed orphan and is deleted without a fresh read
(no live writer can hold a deleted branch's queue); a LIVE branch (origin 2)
gets the fresh re-check and, on a transient read error, is SKIPPED — never
destroyed on ambiguity — converging on a later cleanup. Same don't-destroy-on-
ambiguous-error principle as the create_branch failure classification.
Addresses PR review (Cursor medium).
* fix(cleanup): unify fork-ref reclaim on fresh authority under the queue
Consolidates the reconcile/reclaim hardening from PR review (the earlier per-site
commits were collapsed when reconciling with the main merge). Both destructive
fork-ref sites — the write-path reclaim and the cleanup reconciler — now share
one classifier, classify_fork_ref -> ForkRefStatus { Legitimate, Orphan,
Indeterminate }, evaluated from FRESH manifest authority under the held
(table, branch) write queue. A fork ref is destroyed ONLY on a confirmed Orphan;
a Legitimate (concurrent writer published a real fork) or Indeterminate
(transient read) status is never destroyed — the write path maps it to a
retryable conflict, cleanup maps it to skip. This closes, by construction:
- reclaim trusting a possibly-cached caller proof (Codex P1);
- reconcile racing an in-process live fork without the queue (Cursor);
- delete-on-transient-error in the re-check (Cursor/Greptile);
- origin-1 trusting a stale live_branches capture for a created-since branch
(Cursor/Greptile P1).
Having one classifier removes the duplication that let the two sites drift.
ForkOutcome is made pub to match the sealed trait method returning it. Verified
green on Lance 7.0.0 (full engine suite + 48/48 failpoints).
* test(cleanup): pin classify_fork_ref decision (Legitimate / Orphan / ghost)
Both fork-ref reclaim sites (write-path reclaim + cleanup reconciler) route
their destroy/skip decision through classify_fork_ref, but it had no direct
test — reverting the fresh-authority logic was not test-detectable. Add a
deterministic in-source unit test that forges each state and asserts the status:
a manifest-placed fork -> Legitimate (never destroyed); a ref the manifest does
not place on the branch -> Orphan; a ref for a branch absent from the manifest
-> Orphan (ghost reclaim preserved). This makes the core fresh-authority
decision behind every reclaim fix revert-detectable in one place.
(The Indeterminate arm — transient read on a live branch -> skip — needs an
injected read failure and is left to the failpoints suite; the cross-process
cleanup-vs-writer and cached-snapshot reclaim races are the documented
one-winner-CAS gap, not reachable same-process bugs, so they are not faked here.)
* test(cleanup): pin the Indeterminate (transient re-check) reclaim arm
Closes the last untested classify_fork_ref arm. Adds a 'classify.fresh_read'
failpoint (no-op without the failpoints feature) that simulates a transient
failure of the fresh-authority read, and a failpoints test driving it through
cleanup: a genuine origin-2 orphan on a LIVE branch whose fresh re-check fails
classifies as Indeterminate, so the reconciler SKIPS it (never destroys on an
inconclusive read) and reclaims it on the next run once the read succeeds.
This makes the don't-destroy-on-ambiguity rule revert-detectable end-to-end.
The only paths now left untested are the cross-process cleanup-vs-writer and
reclaim-vs-publish races — the documented one-winner-CAS gap (cleanup is
&mut self / CLI-only, so no reachable same-process race), not faked here.
* test(server): avoid stale schema apply route handle
* fix(cleanup): report indeterminate fork authority clearly
`omnigraph schema apply` against a cluster-managed graph's storage root bypassed
the cluster ledger/recovery/approvals. Mirror `init`'s refusal: on the embedded
(direct-store) path, if the resolved URI is inside a cluster
(`cluster_root_for_graph_uri`), bail and point at `cluster apply`. The served
(`--server`) path is unaffected — it addresses a server, not a storage root.
`schema plan`/`show` (read-only) are untouched.
Two e2e tests injected "out-of-band drift" via this exact CLI path; since the
CLI now refuses it, they inject drift via a direct engine `apply_schema` against
the storage root instead — a faithful control-plane bypass, which is what
out-of-band drift is. New regression:
`schema_apply_refuses_a_cluster_managed_graph_and_signposts_cluster_apply`.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(cli): own ReadOutputFormat/TableCellLayout in the CLI
The two output-presentation enums lived in `omnigraph-server::config` and were
re-exported for the CLI, even though the server never used them. Move both
definitions into `omnigraph-cli/src/read_format.rs` (where the renderer already
lives) and drop them from the server's public re-export. This is a step toward
deleting the legacy `omnigraph-server::config` module entirely — a CLI
presentation concern has no business in the server crate.
No behavior change. The server keeps private copies in `config.rs` only for the
soon-to-be-deleted legacy `CliDefaults`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: remove the `config migrate` command and migrate.rs
`config migrate` was the last CLI consumer of the legacy `omnigraph.yaml`
(`OmnigraphConfig` + `load_config`). With the excision complete there is no
legacy file to split, so the whole `omnigraph config` command group is removed
along with `migrate.rs`. The `OmnigraphConfig` type, `load_config`, and the
deprecation machinery are deleted next.
- Remove `Command::Config` / `ConfigCommand` from the clap surface and the
dispatch arm; drop `mod migrate;` and the now-unused `load_config` import.
- Drop the `Command::Config` arms in `planes.rs`.
- Delete the `config_migrate_splits_legacy_config` integration test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(server)!: delete the legacy OmnigraphConfig type and load_config
With `config migrate` gone, nothing loads `omnigraph.yaml` anymore. Delete the
entire `omnigraph-server::config` module: the `OmnigraphConfig` type and its
sub-structs (`ProjectConfig`, `TargetConfig`, `CliDefaults`, `ServerDefaults`,
`AuthDefaults`, `QueryDefaults`, `AliasConfig`, `AliasCommand`, `PolicySettings`,
`QueryEntry`, `McpSettings`), `load_config`, and the RFC-008 deprecation
machinery (`OMNIGRAPH_CONFIG`, `OMNIGRAPH_NO_LEGACY_CONFIG`,
`OMNIGRAPH_SUPPRESS_YAML_DEPRECATION`, the deprecation map + warner).
- `QueryRegistry::load` (the only `OmnigraphConfig`/`QueryEntry` consumer; its
only caller was its own test) is removed — server boot and the CLI both build
registries via `QueryRegistry::from_specs`.
- `graph_resource_id_for_selection` (CLI-only) moves into the CLI
(`helpers.rs`), with its unit test; the server no longer exports it.
- Drop the already-dead `format_registry_load_errors` helper (config-adjacent).
No behavior change — every deleted item was unreachable after the excision.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs: purge the legacy omnigraph.yaml surface from the docs
Finish the RFC-011 excision in the docs: the CLI no longer reads omnigraph.yaml
and the server boots cluster-only, so every doc that described the legacy file
as a live config is now wrong.
- AGENTS.md: rewrite the HTTP-server line to cluster-only boot (drop the
single-graph/flat-route and omnigraph.yaml-boot framing); rewrite the CLI
two-surface-config passage (drop `config migrate`, the deprecation env vars,
and "Never extend omnigraph.yaml"); fix the topic table + capability rows.
- cli/reference.md: delete the entire "omnigraph.yaml schema (legacy combined
file)" section and the `config migrate` row; re-home the `policy` row, the
bearer-token chain, the actor/format/param-precedence references, and the
`--config` mentions to the operator config + `--cluster`.
- cli/index.md: rewrite the multi-graph-server + add-graph paragraphs to
cluster (`--cluster` + `cluster apply`); fix the policy examples to
`--cluster`; replace the `## Config` omnigraph.yaml example with the
operator/cluster two-surface model.
- operations/policy.md: rewrite per-graph-vs-server-level policy to the cluster
`policies:`/`applies_to` model; re-home the actor + CLI tooling sections.
- clusters/config.md, clusters/index.md, deployment.md: server boots from the
cluster only; per-operator facts come from ~/.omnigraph/config.yaml.
- architecture.md, testing.md: drop the stale omnigraph.yaml / deleted-test
references.
RFCs, design specs, and prior release notes are left as historical records.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The server already dropped omnigraph.yaml (cluster-only boot). This removes the
CLI's last use of the legacy `OmnigraphConfig`: graphs are addressed only via
`--store`/`--server`/`--cluster`/`--profile`/operator defaults, and actor,
output format, and bearer credentials come from `~/.omnigraph/config.yaml`.
After this change no CLI command reads `omnigraph.yaml` except `config migrate`.
Resolvers (helpers.rs): drop every legacy fallback —
- `resolve_actor` → `--as` > `operator.actor` (no `cli.actor`);
- `resolve_read_format` → `--json`/`--format` > alias > `defaults.output`;
- `resolve_branch`/`resolve_read_target` → `--branch` > alias > "main";
- `resolve_uri`/`resolve_cli_graph` → scope path only; an absent address is a
loud error;
- `resolve_remote_bearer_token` → operator keyed chain + `OMNIGRAPH_BEARER_TOKEN`.
`GraphClient::resolve`/`resolve_with_policy` drop the `&OmnigraphConfig` param;
direct-store access carries no Cedar policy (policy lives in the cluster/server).
Flags (cli.rs): remove `--config` from every data/query command; it stays only
on `cluster *` (the cluster dir) and `config migrate` (the legacy path).
Re-home control-plane tooling to `--cluster` (RFC-011):
- `policy validate|test|explain` source the Cedar bundle from the cluster's
applied policies; `--graph` picks a graph's bundle; `policy test` takes
`--tests <file>`;
- `queries list|validate` source the registry + schemas from the cluster
serving snapshot; `--graph` scopes to one graph;
- `lint` requires `--schema` (offline) or a direct/cluster graph target;
- `schema plan`/`lint` route their graph-target through the shared direct-scope
resolver so `--store`/`--profile`/`defaults.store` addressing works.
Tests migrate from `omnigraph.yaml` fixtures to `--store`/operator-config/
`--cluster` (converged-cluster fixtures); the now-impossible command-path
RFC-008 tests are deleted (`config migrate` coverage kept). The
`OmnigraphConfig` type, `load_config`/deprecation machinery, and `config
migrate` are removed in a follow-up.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
omnigraph-server boots only from --cluster; all HTTP is /graphs/<id>/…; flat single-graph routes and the omnigraph.yaml server boot are removed. GraphRouting/ServerConfigMode collapse to multi-only; openapi.json regenerated to the nested shape; ~100 server route tests migrated; parity/system_local boot from a converged cluster. Gate green (1410 tests).
* test(engine): reproduce empty-table Vector @index aborting schema apply
A Vector (IVF) index trains k-means centroids over the column, so Lance
cannot build it on 0 vectors ("Creating empty vector indices with
train=False is not yet implemented"). schema apply reconciles a table's
whole index set whenever any @index on it changes, so adding an unrelated
scalar @index materializes the dormant empty vector index and aborts the
entire migration (all-or-nothing).
This regression test inits a 0-row Doc with a Vector @index, adds a scalar
@index, and asserts the apply succeeds (then loads one embedded row and
asserts the deferred index materializes). It fails today at the apply step
with the vector-index abort; the fix lands in the next commit.
Refs dev-graph iss-empty-vector-index-schema-apply, iss-848.
* fix(engine): defer Vector @index on an empty table instead of aborting schema apply
build_indices_on_dataset_for_catalog materialized a declared Vector @index
unconditionally. On a 0-row table Lance cannot train the IVF index
("Creating empty vector indices with train=False is not yet implemented"),
so any later migration that touches the table (e.g. adding an unrelated
scalar @index, which reconciles the table's whole index set) aborted the
entire migration on the dormant vector index — all-or-nothing.
Guard the vector arm with a row-count check, matching the guard
ensure_indices_for_branch and the branch-merge rebuild already use: an
untrainable column becomes a pending index that a later ensure_indices /
optimize materializes once the table has rows. Reads stay correct meanwhile
(vector search degrades to a brute-force scan).
Stop-gap: the residual rows-present-but-vectors-null window and the full
decoupling (intent recorded at apply, an idempotent coverage reconciler)
are dev-graph iss-848. Turns the green half of the regression test added in
the previous commit.
Refs dev-graph iss-empty-vector-index-schema-apply, iss-848, iss-687.
* docs(invariants): record the logical-contract-over-physical-state principle
The bug class behind the empty-table vector-index abort (and the schema-apply
vs optimize version drift) is one shape: a physical operation allowed to fail
a logical one. Several hard invariants (2, 5, 7, 13) and deny-list items are
already instances of this, but the unifying rule was never written down.
Add it to docs/dev/invariants.md as a "Governing principle" section above the
hard invariants, naming which invariants and deny-list items instantiate it
and the smell to watch for (a logical operation gated on a physical fact).
Add a one-line always-on rule (7) in AGENTS.md so it stays in working memory,
with the qualifier that genuine logical conflicts still fail loudly — the
licence to lag covers physical convergence, not correctness.
Audience-neutral: no private ticket refs. check-agents-md.sh passes.
* test(engine): index build must tolerate rows with null vectors (load-before-embed)
Loading rows whose vector column is null into a `Vector @index` table fails
today: build_indices (reached via the loader's prepare_updates_for_commit)
calls create_vector_index, and Lance's IVF KMeans errors "cannot train 1
centroids with 0 vectors". The same abort hits ensure_indices/optimize/schema
apply/merge, since they all funnel through build_indices_on_dataset_for_catalog.
This test loads two null-embedding rows and calls ensure_indices; it must not
abort (the untrainable vector column is deferred, sibling indexes still build).
Fails today at the load step; fixed in the next commit.
Refs dev-graph iss-848, iss-empty-vector-index-schema-apply.
* fix(engine): defer unbuildable index columns instead of aborting the write path
build_indices_on_dataset_for_catalog is the chokepoint every write path funnels
through (load/mutate via prepare_updates_for_commit, schema apply, ensure_indices,
optimize, branch merge). Its vector arm called create_vector_index
unconditionally, so a column with no trainable vectors yet — an empty table, or
rows loaded before `embed` populates them — aborted the whole operation with
Lance's IVF KMeans error.
Fault-isolate the vector build: on failure, record the column as a PendingIndex
(table, column, reason), log it, and continue building the sibling indexes; a
later ensure_indices/optimize materializes it once the column is trainable, and
reads use brute-force meanwhile. Manifest/CAS/IO errors at the publish boundary
still propagate. Isolating at the single chokepoint realizes the governing
principle (physical index state never fails a logical operation) for every write
path, and supersedes the earlier symptomatic count_rows==0 stop-gap (removed) —
closing the residual rows-present-but-vectors-null window it left open.
Surfacing pending index status rather than failing is the database norm
(Postgres indisvalid, LanceDB list_indices). ensure_indices and the build_indices
wrappers now return Vec<PendingIndex>; optimize surfaces it in a later commit.
Refs dev-graph iss-848, iss-951 (vector index stays inline-commit until lance#6666).
* test(engine): index-only schema apply must not touch table data
Adding an @index to an existing column should be a pure metadata change once
index materialization moves to the reconciler (iss-848): the apply records the
intent in the catalog/IR but builds nothing inline, so the table's manifest
version is unchanged. Today the indexed_tables block builds the index inline
and bumps the version (4 -> 5). Fixed in the next commit.
Refs dev-graph iss-848.
* fix(engine): schema apply records index intent only; index-only apply is metadata
Schema apply no longer builds indexes inline. The four build_indices calls
(added/renamed/rewritten/index-only tables) are removed; the @index/@key intent
is already persisted in the catalog/IR the apply writes, and the physical index
is materialized off the critical path by ensure_indices/optimize (iss-848).
Concretely:
- AddConstraint (an @index addition — every other added constraint plans as
UnsupportedChange) becomes a pure metadata step alongside the metadata-only
steps: it touches no table data, so the table version is unchanged.
- added/renamed/rewritten tables still write their data; only the trailing
index build is gone. The rewritten table's coverage is restored later by
optimize_indices.
- recovery_pins drops index-only tables (they no longer advance Lance HEAD) and
keeps rewritten tables; their post_commit_pin = expected+1 is now exact (one
rewrite commit), strengthening recovery classification.
- the now-orphaned Omnigraph::build_indices_on_dataset_for_catalog wrapper is
removed.
A migration can no longer abort on an index build, for any index type at any
cardinality. Turns the green half of index_only_constraint_apply_touches_no_table_data.
Refs dev-graph iss-848.
* test(engine): optimize must converge a declared-but-unbuilt index
After iss-848, adding an @index post-data is a metadata-only apply that defers
the physical build, so the column is declared-indexed but unbuilt (reads scan).
`optimize` — the operator's cron reconciler — must materialize it. Today optimize
only maintains coverage of EXISTING indexes (optimize_indices) and never creates
missing ones, so the rank BTREE stays Degraded after optimize. Fixed next commit.
Refs dev-graph iss-848.
* fix(engine): optimize materializes declared-but-unbuilt indexes (the reconciler)
`omnigraph optimize` is the operator's cron reconciler. It already compacts and
folds new fragments into EXISTING indexes (optimize_indices); now it also builds
declared-but-missing indexes, so the indexes schema apply / load defer (iss-848)
converge on the next optimize.
Done inside optimize_one_table (not by composing the all-tables ensure_indices,
which is drift-blind and would re-publish the uncovered HEAD>manifest drift that
optimize deliberately skips): after the per-table drift/blob skips and under the
queue + Optimize sidecar already held, a needs_index_create gate (reusing
needs_index_work_node/edge — "declared index missing AND row_count > 0", so empty
tables stay no-ops) admits index-only work, and Phase B builds the missing index
over the just-compacted layout via the build chokepoint. An untrainable vector
column fault-isolates into the new TableOptimizeStats.pending_indexes (the
list_indices/indisvalid analog operators read), not a failure. committed now
reflects index commits, so the existing post-publish cache invalidation covers
them. LanceDB's optimize only maintains existing indexes; creating
declared-but-missing ones is the L2 behavior omnigraph's declarative @index needs.
Turns the green half of optimize_materializes_index_declared_but_unbuilt.
Refs dev-graph iss-848.
* docs: index materialization is deferred to the reconciler (iss-848)
Update the index-lifecycle docs to reflect the new contract: @index/@key
declares intent and the physical index is derived state that never fails a
logical operation. Schema apply builds nothing (records intent only);
load/mutate build inline through one chokepoint that defers an untrainable
Vector column as pending; optimize/ensure_indices is the reconciler that
creates declared-but-missing indexes and maintains coverage, reporting
still-pending columns.
Touches: dev/invariants.md (truth-matrix Index-lifecycle row), AGENTS.md
(capability matrix), user/search/indexes.md (L2 orchestration), user/operations/
maintenance.md (optimize reconciler bullet), dev/testing.md (new tests).
* test(server): schema_apply_route_can_add_index reflects deferred index build
iss-848 made schema apply record @index intent without building the physical
index inline. The route test asserted the index count increased after apply;
on an empty graph it now stays unchanged (the build is deferred to
ensure_indices/optimize). Assert the new contract: apply succeeds and the
physical index count is unchanged.
* fix(engine): precheck vector trainability — don't pin or swallow (PR review)
Two issues Cursor Bugbot caught in the chokepoint fault-isolation:
1. (HIGH) Pending vector pins roll back siblings. needs_index_work_node counted
a missing vector index as work whenever the table had rows, so a column with
no trainable vectors got pinned in the EnsureIndices recovery sidecar — but
the build deferred it (zero commit). On a crash before manifest publish the
classifier sees NoMovement and the all-or-nothing decision (recovery.rs
decide()) rolls back the WHOLE sidecar, undoing a sibling table's committed
index work.
2. (MED) Vector build swallowed fatal errors. The match arm converted every
create_vector_index error into a deferred PendingIndex, hiding genuine
I/O/manifest/Lance failures as "pending".
Fix both with one trainability precheck (vector_column_trainable: >=1 non-null
vector, the ivf_flat(1) minimum) used identically by needs_index_work_node and
the build arm: an untrainable column is never counted as work (so never pinned —
no zero-commit pin) and never attempted (so it can't fail); only a trainable
column is built, and then any error PROPAGATES (stays fatal). The deferred
column is still recorded as a PendingIndex with a clear reason.
Refs dev-graph iss-848.
* feat(cli): surface pending index column + reason in optimize output (PR review)
Codex (P2): pending_indexes was documented as visible in `optimize --json` but
the CLI projection never emitted it — operators would lose the only signal that
optimize has deferred index work. Greptile (P2): the stat dropped the reason, so
operators saw which column was stuck, not why.
Carry the reason: TableOptimizeStats.pending_indexes is now Vec<PendingIndex>
(column + reason), and `omnigraph optimize --json` emits {column, reason} per
pending index; human output prints a "↳ index pending on '<col>': <reason>" line.
Refs dev-graph iss-848.
* test: align CLI index-add test with deferred build; cover post-rename reconcile
- schema_apply_json_adds_index_for_existing_property (cli_schema_config.rs): the
CLI analog of the server test — asserted the index count grew after apply;
under iss-848 the apply defers the build, so the count is unchanged on an
empty graph. Assert the deferred contract. (The only full-suite failure.)
- optimize_materializes_index_after_type_rename (maintenance.rs, new): covers
the gap Greptile flagged — a RenameType writes the renamed table with rows but
no indexes (inline build removed in Commit B); assert the rank index is
Degraded post-rename and Indexed after optimize reconciles it.
Refs dev-graph iss-848.
* test(engine): in-source apply tests reflect deferred index materialization
The two db::omnigraph in-source unit tests asserted the old "schema apply builds
/ preserves indexes inline" behavior (the only remaining full-suite failures):
- test_apply_schema_defers_index_then_reconciler_builds_it (was
test_apply_schema_adds_index_for_existing_property): apply records the @index
intent but builds nothing; assert the BTREE on `age` is absent after apply and
present after ensure_indices. (Uses `age`, unindexed in TEST_SCHEMA — `name
@key` is already FTS-indexed at seed.)
- test_apply_schema_rewrite_defers_index_then_reconciler_restores (was
test_apply_schema_rewrite_preserves_existing_indices): an AddProperty rewrite
no longer rebuilds indexes inline; assert ensure_indices restores id BTREE +
name FTS after the rewrite.
Verified by grep that these + the server/CLI tests are the complete set of
"apply builds an index" assertions; all other index-presence tests run after
load/ensure_indices/primitives, which still build.
Refs dev-graph iss-848.
* fix(engine): optimize always reports pending indexes, not only on create-work (PR review)
Cursor Bugbot (MED): pending_indexes was filled only when needs_index_create was
true, but the vector trainability precheck makes needs_index_work_node exclude an
untrainable Vector column. So a table whose sole missing index is untrainable, but
which optimize still compacts or reindexes, returned an empty pending_indexes —
contradicting the documented operator contract for deferred columns.
Run the (idempotent) build chokepoint unconditionally once past the no-op gate,
rather than gating it on needs_index_create. It skips existing indexes, builds
any buildable missing one, and reports an untrainable column as pending whether
the table entered for compaction, reindex, or index creation. needs_index_create
still gates the no-op decision (so an index-only table still enters the path).
Refs dev-graph iss-848.
* test(engine): reframe staged-BTREE-failure failpoint onto the reconciler path
ensure_indices_stage_btree_failure_leaves_existing_tables_writable fired
`ensure_indices.post_stage_pre_commit_btree` and expected `apply_schema` (adding
a type) to fail mid-BTREE-build. iss-848 removed apply's inline index build, so
that apply now succeeds and the test's unwrap_err panicked — it exercised a
removed code path.
Reframe onto where BTREE builds happen now: seed Person, add an `@index` on
`age` (apply records intent, defers the build), then `ensure_indices` builds the
deferred BTREE and the failpoint fires between stage and commit. Person's HEAD
is unchanged (no drift) and its EnsureIndices sidecar pins NoMovement; a write to
a different, unpinned table (Company) is unaffected (mutations/loads heal
roll-forward and proceed, unlike optimize/repair which refuse on a pending
sidecar). Preserves the original coverage (staged-index stage failure leaves
other tables writable, no drift) in the new architecture.
Refs dev-graph iss-848.
* feat(server): converge deferred indexes promptly after schema apply (iss-848)
Schema apply records @index intent but defers the physical build. On a
long-lived server, spawn a detached best-effort ensure_indices after a
successful apply so the indexes converge promptly instead of waiting for the
operator's next optimize. Fire-and-forget: it never blocks or fails the apply
response, and a failure is logged (the index still converges on the next
optimize). Guarded on result.applied. The CLI is one-shot, so it has no
equivalent; its convergence path is the optimize cadence.
handle.engine is already an Arc, so the spawn takes an owned clone. Convergence
itself is covered by the engine ensure_indices/optimize tests; the existing
empty-graph schema-apply route tests confirm the response is unaffected (the
spawn is a read-only no-op on an empty table).
Refs dev-graph iss-848.
* docs(maintenance): list pending_indexes in optimize per-table stats (consistency)
Operator config gains defaults.store (a file:///s3:// graph storage URI), the local-dev counterpart of defaults.server + default_graph. Mutually exclusive with defaults.server, and a store cannot carry default_graph (both refused at load). The zero-flag local default that survives the upcoming removal of omnigraph.yaml's cli.graph. Additive, non-breaking.
omnigraph query <name> / mutate <name> invoke a stored query by name from the served catalog (served-only). The verb asserts kind via a new expect_mutation on POST /queries/{name} (400 on mismatch). -e/--query + --store is the ad-hoc lane; the positional selects within the source (replacing --name). The bare positional graph URI, --uri, and --name are removed from query/mutate.
When a server/cluster scope resolves with no --graph and no default_graph, the CLI auto-uses a sole graph (cluster) or errors listing the candidate graph ids (cluster catalog; multi-graph server via best-effort GET /graphs), never a silent pick. GraphClient::resolve becomes async; flat/single-graph servers and happy paths are unaffected.
Operator aliases move from the --alias flag on query/mutate to a dedicated 'omnigraph alias <name> [args]' subcommand, so an alias can never shadow or be shadowed by a built-in verb. Unknown name errors listing defined aliases. Removes the legacy alias machinery from query/mutate (net -156 lines); legacy omnigraph.yaml aliases lose their CLI entry point.
RFC-011 Decision 9. Writes echo their resolved target + access path to stderr (suppress with --quiet). Destructive writes (cleanup, overwrite load, branch delete) against a non-local scope require consent: --yes, a TTY prompt, or a hard refusal for non-TTY/--json runs. Local file:// writes unaffected.
RFC-011: --graph is the single graph selector across server and cluster scopes; --cluster becomes a global scope primitive; --cluster-graph removed. Maintenance dispatch unified through resolve_scope. Wrong-address guard validates each scope flag against the verb it can consume.
* feat(cli): --server accepts a literal URL (RFC-011 Decision 2)
`resolve_server_flag` now treats a `--server` value containing `://` as a literal
base URL (trailing slash trimmed; `--graph` appends `/graphs/<id>`), bypassing the
operator-config `servers:` registry; a bare name still resolves through the
registry. This is the replacement the upcoming `--uri http(s)://` deprecation
points at, and a small ergonomic win on its own (`--server https://host` with no
config entry). Token resolution for a literal-URL server falls to the legacy
OMNIGRAPH_BEARER_TOKEN chain, same as a positional URL today.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(cli): address the parity-matrix arms with global --store/--server flags
Prep for removing the positional-http→remote dispatch. The parity harness
addressed both arms with a positional graph right after the verb
(`omnigraph <verb> <addr> <args…>`), which only parses for top-level verbs —
for nested subcommands (`schema show`, `branch list`, …) the address landed in
the subcommand slot and BOTH arms failed identically, so the test passed
vacuously (matching exit codes, never comparing output).
Address both arms with the global flags instead — local `--store <graph>`
(embedded), remote `--server <url>` (served) — appended after the verb + args,
valid regardless of nesting. The previously-vacuous nested-verb parity checks
now actually compare embedded vs remote (and pass — parity holds), and the
remote arm no longer relies on the positional-URL dispatch that's about to be
removed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: --as on a served write is a hard error (was a silent no-op)
A served write resolves the actor server-side from the bearer token, so `--as`
could never set identity there — it was silently ignored. It now errors (in the
remote write factory, before any HTTP call), pointing the user at removing `--as`
or writing directly with `--store`. Reads don't carry `--as`, so this is
write-path only. BREAKING for any script that passed `--as` to a remote write
(it was a no-op, so behavior is unchanged except the now-explicit error).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: a positional/--uri http(s):// URL no longer dispatches to a server
Remote graphs must be addressed with `--server <url>` (or a named server / a
profile binding one). A positional or `--uri` `http(s)://` URL on a data verb now
errors instead of silently routing to the remote HTTP client — the scheme no
longer carries transport semantics. The discriminator is `via_server`: a remote
URL produced by a server scope is fine; a remote URL from a positional/`--uri`
source is rejected (`reject_positional_remote` in both GraphClient factories).
Storage verbs are unaffected — they already reject remote URIs through
`resolve_local_graph` with the existing "direct (storage-native)" error.
Migrated the gh-host keyed-credential system test to `--server <url>` (the literal
URL still prefix-matches the operator server for token resolution). BREAKING:
scripts addressing a server by a bare URL must switch to `--server <url>`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli)!: remove the --target flag (use --store / --profile / --server)
Removes the legacy named-graph flag and threads its parameter out of the whole
resolver chain. `--target` resolved a graph name through `omnigraph.yaml`'s
`graphs:` map; its replacements (`--store <uri>`, `--profile <name>`,
`--server <name>`) all ship.
- Drops the 22 `target` clap fields + the `--cluster` exclusion that named it.
- Threads `target`/`cli_target` out of `resolve_uri`/`resolve_cli_graph`/
`resolve_local_graph`/`resolve_local_uri`/`resolve_storage_uri`/
`resolve_remote_bearer_token`/`apply_server_flag`/`execute_query_lint`/
`resolve_selected_graph`/`resolve_registry_selection_for_list`/
`execute_queries_{validate,list}`, the two `GraphClient` factories, and
`ScopeFlags`/`ResolvedScope`.
- Keeps the shared `OmnigraphConfig::resolve_target_uri` 3-arg (server boot uses
it); the CLI passes None for the explicit-target arm. The `cli.graph` default
(omnigraph.yaml bare-command fallback) is unchanged — its removal belongs to
the omnigraph.yaml excision.
- Operator/file aliases that bind a `graph` name still work: the name is now
resolved to a URI inline (a positional URI wins).
- Error messages and `--graph`/`--server`/`--store` help text no longer name
`--target`; the queries-list selection hint points at `cli.graph`.
BREAKING. Tests updated (named-target resolution rewritten onto `cli.graph`;
positional-URI tests unchanged). Full omnigraph-cli suite green (228).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(cli): drop --target and positional-http addressing; --as-on-served is an error
Update the user docs for the legacy data-plane addressing removals:
- the CLI `--target` flag is gone — address graphs with a positional URI,
`--store`, `--profile`, or `--server <name|url>`;
- a positional `http(s)://` URI no longer dispatches to a server (use `--server`);
- `--as` on a served write is now rejected (was a silent no-op).
Touches cli/reference.md (addressing intro, capability table, error examples,
scopes), cli/index.md (the remote-read example → --server), operations/maintenance
+ policy, and the cluster docs' data-plane load guidance. The server's own
`--target` boot flag is unchanged (server.md untouched). Also fixes a pre-existing
broken maintenance link in search/indexes.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(cli): --store is loudly exclusive with a positional URI / --server; test graphs→Served
Address two Greptile findings on the RFC-011 slices:
- Slice A (P1): `--store` combined with a positional URI silently dropped the URI
(`scope.rs` did `store.or(uri)`); `--store` + `--server` errored with a
misleading "positional URI" message. Now both combinations fail loudly with a
declared `--store is exclusive with a positional URI and --server` error.
- Slice B (P2): the `command_capability` unit test never exercised the one
Data→Served refinement (`graphs`); added the assertion so deleting that guard
can't pass silently.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): RFC-011 Slice B — capability vocabulary (any/served/direct/control/local)
User-facing CLI errors and --help now speak a single "capability" vocabulary —
what a command needs — instead of the internal four-plane jargon. Behavior is
unchanged: the --server/--graph allow set is identical (the served-graph
capabilities `any` ∪ `served` = the old `Data` plane, since `graphs` was already
allowed). Only error text and the --help legend change.
- planes.rs: add `Capability { Any, Served, Direct, Control, Local }` derived from
the existing exhaustive `command_plane` classifier (which stays as the drift
guard) plus the one Data→Served refinement (`graphs`). `guard_addressing` now
allows `--server`/`--graph` on `{Any, Served}` and rejects elsewhere with a
capability-worded message. The mapping reflects *current* behavior (`queries
list` → Local, `queries validate` → Direct); it converges to the RFC end-state
table when later slices re-route those verbs.
- scope.rs: `resolve_scope` takes `Capability` instead of `Plane`, so the whole
addressing path speaks one vocabulary; call sites in client.rs (Any) and the 3
maintenance verbs in main.rs (Direct) updated.
- helpers.rs: the storage-direct remote rejection reworded to "direct
(storage-native) command".
- cli.rs: the --help legend is now "COMMANDS BY CAPABILITY".
- Tests: the 5 assertions pinning the old plane text updated; added planes.rs unit
tests proving the allow set is exactly {Any, Served} (behavior-preservation),
the per-verb mapping, and distinct capability phrases.
Full omnigraph-cli suite: 225 green (222 + 3 new), zero behavior-test changes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(cli): capability vocabulary in the CLI reference + maintenance addressing
Rename the reference's "Command planes" section to "Command capabilities"
(any/served/direct/control/local), reword the error examples, and update the
maintenance doc's addressing note + its section cross-link to match Slice B.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): RFC-011 Slice A — operator-config scope structs (profiles/clusters/defaults)
Additive operator-config surface for the RFC-011 scope model. No behavior
change yet — these structs are parsed but not consumed until the scope
resolver lands.
- OperatorConfig gains `profiles:` (name → OperatorProfile) and `clusters:`
(name → OperatorCluster { root }) — the latter the only place a storage
root appears in operator config (RFC-011 storage-root rule).
- OperatorDefaults gains `server` and `default_graph` (the flat-default scope).
- OperatorProfile binds one of {server, cluster, store} + default_graph;
`binding()` validates exactly-one on use and returns a ScopeBinding.
- Accessors profile()/cluster_root()/default_server()/default_graph();
unknown-key warnings extended to the new blocks (forward-compat preserved —
old configs still load, new keys are no longer "unknown").
Tests: parse profiles/clusters/scope-defaults, binding rejects zero/multiple
entities, unknown keys in a profile warn.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): RFC-011 Slice A — scope resolver + --profile/--store, wired (additive)
Translate the new scope inputs into the existing addressing tuple, in front of
the unchanged resolvers. Purely additive: an explicit address
(--uri/--target/--server/--store) passes straight through, so every existing
invocation is byte-for-byte unchanged.
- scope.rs: resolve_scope() with the RFC-011 precedence (explicit > --profile /
OMNIGRAPH_PROFILE > flat defaults.server), producing the effective
(server, graph, uri, target) for data verbs and (cluster, cluster_graph) for
maintenance. Plane×scope capability check (server scope rejected on a
maintenance verb; cluster scope rejected on a data verb; store rejects --graph)
fires only on the new paths. 9 unit tests.
- cli.rs: global --profile <NAME> and --store <URI>. (--graph keeps
requires=server for now; profile/default graph comes from default_graph —
profile+--graph override is deferred to the --cluster-graph rework.)
- client.rs: the two GraphClient factories call resolve_scope (Plane::Data) up
front; the explicit branch reproduces today's behavior exactly.
- main.rs: the 15 data call sites forward --profile/--store; the 3 maintenance
verbs consult the scope (Plane::Storage) only when no explicit per-command
address is given, so cluster-binding profiles and --store reach
optimize/repair/cleanup.
Verified: the full omnigraph-cli suite (221 tests) stays green untouched.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test+docs(cli): RFC-011 Slice A — end-to-end scope test + reference docs
- cli_data.rs: prove --store and a --profile store binding drive a read
identically to the legacy positional URI (the additive-coexistence contract),
end to end against a local graph (no server needed).
- cli/reference.md: document profiles/clusters/defaults.server/default_graph,
the --profile/--store flags, and a "Scopes & profiles" section; note the model
coexists with legacy addressing (nothing removed yet).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* test(cli): reproduce branch-delete //branches 404 (failing)
Regression test for the `branch delete` 404 over a multi-graph
`--server`/`--graph` target: the composed URL must be `<base>/branches/<name>`
with no empty `//` segment. Fails against the current `remote_branch_url`,
which appends a trailing slash before extending path segments and so emits
`…/graphs/p9-os//branches/tmpbranch`. The next commit fixes it.
left: "http://host/graphs/p9-os//branches/tmpbranch"
right: "http://host/graphs/p9-os/branches/tmpbranch"
* fix(cli): unify remote URL builder, close the //branches 404 class
Correct-by-design fix for the failing test in the previous commit. The bug was
not specific to `branch delete`: URL assembly was scattered across a
string-concat `remote_url`, a url-crate `remote_branch_url`, and several
`format!` interpolations that left dynamic path/query components un-encoded
(commit id in the path, branch in the query string). `branch delete` was the
instance that surfaced because it is the only verb that puts a dynamic value in
the path.
Replace both helpers with one `remote_url(base, segments, query)` that every
remote call routes through. Callers pass structured segments and query pairs,
so trailing-slash normalization (pop_if_empty) and per-segment / per-value
percent-encoding live in one place. A stray `//` or an un-encoded dynamic
component is no longer representable, closing the whole class rather than the
reported instance.
Migrates the previous commit's failing test to the new builder and adds the
single-graph, trailing-slash, slash-in-name, commit-id-path, and query-value
cases (the last two cover the previously latent siblings). All 16 callsites
migrated; `remote_branch_url` removed.
Move the 23 flat docs/user/*.md files into topic subdirectories so the
user guide is organized by area (schema, queries, search, branching, cli,
operations, clusters, concepts, reference) instead of a flat list. This is
a pure structural move — whole files relocated, every cross-doc link
recomputed, no prose rewrites or content splits (those follow in Phase 2).
- 19 `git mv`s (install.md, deployment.md stay top-level); history preserved
(renames detected at 92–100% similarity).
- All intra-doc links, AGENTS.md's topic table (52 pointers), and the
docs/dev + docs/releases back-links recomputed via relpath from each
file's new location.
- docs/user/index.md rewritten as a sectioned nav hub.
- Fixed 5 doc-path references in Rust (comments + two user-facing server
settings error strings) to point at the new locations.
Verified: zero broken .md links across tracked docs; check-agents-md.sh
green (with the untracked scratch docs set aside); touched crates build.
Note: the public site (omnigraph-web) imports docs/ via a flat-only script;
its import-docs.mjs needs a subdir-aware update before the next re-sync.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cluster): cluster_root_for_graph_uri detection helper (RFC-010 Slice 3)
Public helper the CLI uses to refuse `init` into a cluster-managed location:
given a graph storage URI of the cluster layout (`<root>/graphs/<id>.omni`),
return the cluster root if `<root>` holds `__cluster/state.json`, else None.
Cheap by construction — a URI that doesn't match the `<root>/graphs/<id>.omni`
shape returns None with zero I/O, so ordinary `init` targets never probe
storage. Works for file:// and s3:// via the storage adapter. Adds two
ClusterStore accessors (`display_root`, `has_state`).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): cluster-managed maintenance addressing + init signpost (RFC-010 Slice 3)
Two cluster-graph-aware CLI behaviors, sharing the cluster-resolution path.
Maintenance addressing. `optimize`/`repair`/`cleanup` gain
`--cluster <dir|s3://…> --cluster-graph <id>`, which resolves the graph's
storage URI from the served cluster snapshot (the same truth a `--cluster`
server boots from — `read_serving_snapshot*`) and opens it embedded. The
operator no longer hand-types `<storage>/graphs/<id>.omni`. A distinct flag is
required because the global `--graph` is `requires = server` and means a remote
multi-graph id. clap enforces both-or-neither and exclusion with the positional
URI / `--target`; an unserved graph errors loudly, pointing at `cluster apply`.
init signpost. `init` refuses a cluster-managed positional path (the
`<root>/graphs/<id>.omni` layout where `<root>` holds `__cluster/state.json`,
detected by `cluster_root_for_graph_uri`) and points at `cluster apply` — graphs
in an established cluster are created with ledger/recovery/approvals, not by
hand. The check is gated on the path shape, so ordinary `init` does no extra I/O
and existing pre-apply cluster-graph inits are unaffected.
planes guard remediation now also mentions `--cluster … --cluster-graph …`
(the two Slice-1 guard-string tests track it). Docs updated (cli-reference
Command planes, maintenance.md, cluster.md §7); the stale "no S3-hosted cluster
directories" limitation is dropped (RFC-006 landed it).
Tests (cli_cluster.rs, reusing the apply-a-cluster fixture): resolve by id,
unknown-id error, `--cluster` requires `--cluster-graph`, init refusal +
signpost, and ordinary init still works.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(cli): resolve cluster graphs from the state ledger, not the serving snapshot
Addresses the Greptile review on #221. `read_serving_snapshot*` does
all-or-nothing serving validation — recovery-sidecar checks plus a digest
verify of every catalog payload (query .gq, policy blobs). Using it to resolve
a maintenance target coupled `optimize`/`repair`/`cleanup` to the readiness of
unrelated resources: a single corrupt policy blob, or a pending recovery sweep,
would block the command before it could touch the graph — worst for `repair`,
the tool you reach for *when the cluster is degraded*.
Add `omnigraph_cluster::resolve_graph_storage_uri(cluster, graph_id)`: read the
state ledger, confirm the graph is in the applied revision, return
`graph_root(id)` — the URI is deterministically derivable, no catalog
validation. The CLI's cluster resolver now calls it.
Test: `optimize --cluster … --cluster-graph …` still resolves after the catalog
payloads (`__cluster/resources/`) are removed — the ledger-only path is not
blocked by degraded/unrelated catalog state.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* chore(deps): bump clap to 4.6.1
Workspace constraint "4" → "4.6" so the resolver picks up the 4.6 line
(a plain `cargo update` stayed on 4.5.x). clap 4.5.58 → 4.6.1
(clap_builder 4.6.0, clap_derive 4.6.1). Minor bump, no API breakage; the
workspace builds and all CLI suites pass unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): group --help by plane (RFC-010 Slice 2)
Slice 1 declared the planes (the command_plane table + the wrong-plane
guard); this makes them visible in `--help`. clap can't print labeled
heading rows between subcommand groups (verified against the source —
help_heading is args-only, {subcommands} is one flat block), so per the
chosen approach: cluster + legend.
- Reorder the `Command` enum into plane bands (clap lists subcommands in
declaration order): data (query, mutate, load, branch, snapshot, export,
commit, schema, graphs) → storage/local-graph ops (init, optimize,
repair, cleanup, lint, queries) → control (cluster) → session (policy,
embed, login, logout, config, version). No magic display_order numbers —
the source order IS the help order, with band comments for readers. The
band placement matches `command_plane` (lint/queries are storage-plane:
they reject --server), so the help grouping and the guard agree.
- Add an `after_help` legend on `Cli` naming the planes. Written to
describe the planes (not enumerate every command) so it doesn't drift.
Help-polish (post-review): hide the deprecated `ingest` from the list
(still a valid command); trim the long `login` and `--as` descriptions to
one line each so the columns don't blow up.
The behavioral source of truth for planes stays `planes::command_plane`;
this ordering is its cosmetic counterpart.
Test: `help_groups_commands_by_plane` pins the legend phrase + the cluster
ordering (query < optimize < cluster). Doc: a line under cli-reference's
*Command planes* section.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): qualify mixed-plane commands in the --help legend
Addresses the Greptile P2 on #220: the legend placed `schema` entirely in
Data and `queries` entirely in Storage, but per `command_plane` the
subcommands differ — `schema plan` is storage-plane (rejects --server) and
`queries list` is session (no graph). A user reading the legend then running
`schema plan --server` would hit a rejection contradicting it. The Commands
list is one entry per top-level command (necessarily coarse), so the legend
carries the nuance: `schema [plan: storage]` and `queries [list: session]`.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the Greptile review on #217:
P1 — `lint` reported two different names. `command_label` returns `lint`, but
`execute_query_lint` passed `"query lint"` as the resolver operation string, so
`lint --server` said `lint` while `lint <https>` said `query lint`. Both were
pinned by tests. `query lint` is the *deprecated* alias (argv-rewritten to
`lint`), so the canonical name is `lint`: switch both user-facing strings in
`execute_query_lint` (the storage-plane bail label and the
requires-schema-or-target usage message) to `lint`, and update the two pinned
assertions in `cli_data.rs`.
P2 — user-doc debt (AGENTS.md rule 1: error text is observable behavior).
Document the plane model in `cli-reference.md` (new *Command planes* section:
data vs storage/maintenance vs control, which addressing flags apply, and the
declared wrong-plane / remote-target errors), and add an addressing note to
`maintenance.md` cross-referencing it.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): declared plane capability surface + wrong-plane guard (RFC-010 Slice 1)
New `planes.rs` is the single source of truth for which plane each subcommand
belongs to (Data / Storage / Control / Session). `command_plane` is an
exhaustive match — adding a `Command` variant is a compile error until its
plane is declared, so the surface cannot silently drift from the command set.
It descends into the nested enums where the plane differs per subcommand
(`schema plan` is storage while `schema show/apply` are data; `queries
validate` opens the graph while `queries list` reads only config).
`guard_addressing` runs once in `main` before dispatch: the data-plane
addressing flags `--server`/`--graph` on any non-data verb now fail with one
declared, pinned error instead of being silently ignored (`optimize --server
prod` previously dropped `--server`). `init`'s message drops the `--target`
half since it takes only a positional URI today.
Test: `cli_schema_config::schema_plan_with_server_flag_errors_wrong_plane`
pins the per-subcommand label, proving the guard descends into the nested enum.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(cli): storage-plane verbs fail loudly on a remote target (RFC-010 Slice 1)
`optimize`/`repair`/`cleanup` switch from `resolve_uri` to `resolve_local_uri`,
so a `--target` (or positional URI) that resolves to a remote server now fails
with a declared storage-plane message instead of whatever `Omnigraph::open`
said about an `http(s)://` URI. The `resolve_local_graph` bail is reworded to
that storage-plane message, so every storage verb already on the local resolver
(`schema plan`, `queries validate`, `lint`) speaks with one voice.
Net: `optimize --target knowledge` resolves to the graph's storage URI and runs
embedded; `optimize --target prod` (remote) fails loudly; `optimize --server`
is caught earlier by the guard. Positional-URI invocations are unchanged.
Tests (pinned strings, per RFC-010's test plan): optimize happy path on a local
graph, `optimize --server` wrong-plane error, `optimize <https>` storage-plane
error; the existing `query_lint_rejects_http_targets_without_schema` assertion
is updated to the new shared message.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The last two embedded-vs-remote forks move onto the enum, so every such
`if` in the CLI now lives in client.rs — the point of the refactor.
- `export<W: Write>`: the streaming verb 3b deferred (writes to a writer,
chunks the HTTP response body, rather than returning a DTO). Embedded
calls db.export_jsonl_to_writer; Remote streams the chunked body through.
Opens WITHOUT policy (like reads), so it routes via resolve().
- `list_graphs`: remote-only by design (no local enumeration endpoint), so
the Embedded arm keeps the loud "requires a remote multi-graph server"
bail verbatim. Routing it through the enum still buys the shared
resolve() addressing/token preamble the arm hand-rolled.
Retire the now-orphaned execute_export_to_writer /
execute_export_remote_to_writer pair, and sweep two pre-existing dead fns
while in the files: inferred_config_path (helpers.rs) and yaml_string
(output.rs, shadowed by test-local copies).
parity_matrix gains one row, parity_export — the single intended matrix
change in this phase. Export is a JSONL stream, not a single --json doc,
so it compares the two arms' output line-wise (sorted; twin graphs are
byte-copies so rows need no scrubbing). graphs-list gets no row: its
remote-only behavior is a documented exclusion, not an equality case.
Full workspace tests pass; all 12 parity rows green.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Phase 3a put the GraphClient enum in place and collapsed the five uniform
read forks. 3b folds the remaining data-plane forks onto the same enum:
load, ingest, mutate, query, branch create/delete/merge, and schema apply.
The wrinkle 3a deferred was the local policy attachment. Reads and query
open the local engine without a policy; writes open through
open_local_db_with_policy and attribute a resolved actor. So the Embedded
variant grows an optional policy context (graph/actor) filled by a second
factory, resolve_with_policy; resolve() leaves it empty. open_embedded
picks the open path from whether the context is present, preserving both
of today's behaviors exactly. query still uses resolve() (no policy), as
the read path did.
apply_schema takes the catalog-validator closure as impl FnOnce(&Catalog)
— the embedded arm runs it inside apply_schema_as_with_catalog_check, the
remote arm ignores it (the server runs its own check). That non-object-safe
closure is why GraphClient is an enum, not a trait. The stored-query
registry is still built caller-side and only for the local path.
load and ingest stay separate methods: same operation, but load surfaces
the CLI LoadOutput (two distinct per-arm mappings preserved) while ingest
surfaces the wire IngestOutput. The now-fully-dead execute_read/
execute_read_remote and execute_change/execute_change_remote pairs are
retired (legacy_change_request_body stays — client.rs uses it); the export
pair remains for 3c.
The Phase-1 parity matrix is unchanged and green; full workspace tests pass.
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
The embedded-vs-remote split gets one home: a GraphClient enum
(Embedded { uri } | Remote { http, base_url, token }) with a resolve()
factory that absorbs the shared preamble (apply_server_flag -> token ->
URI/remoteness) and a verb method per command. The five uniform read
forks — branch list, commit list, commit show, schema show, snapshot —
collapse from per-command if-graph-is-remote else to one line each
(main.rs: -113/+47). Behavior identical per verb (local reads still open
WITHOUT policy, as today); the Phase-1 parity matrix is the referee and
passes textually unchanged.
Enum, not the RFC trait: only two variants ever, and inherent async
methods avoid async_trait boxing and the apply_schema closure that is not
object-safe (3b) — same one-body-two-impls collapse, less ceremony.
Scope: the uniform reads only. The query verb (policy-open + operator-
alias early-return + param merge) joins the write verbs in 3b;
export/streaming and graphs-list in 3c, where the now-shared
execute_*_remote/execute_* pairs get retired.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The CLI's wire-DTO imports repoint from omnigraph_server::api to
omnigraph-api-types (the server's other exports — queries registry,
config types — still come from omnigraph-server). The local Load arm's
inline LoadOutput hand-construction in main.rs is extracted into
load_output_from_result next to load_output_from_tables in output.rs, so
both '-> LoadOutput' mappings (engine LoadResult for local, wire
IngestOutput for remote) live in one place.
Deviation from the plan, with reason: LoadOutput stays CLI-side rather
than moving into the wire-DTO crate — it is a rendered CLI output type,
not an HTTP wire DTO, and its mapping consumes a CLI clap type
(CliLoadMode). The shared crate stays strictly wire DTOs. Shapes
unchanged: the parity matrix passes textually unchanged.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The referee before any unification moves: every forked verb runs once
against the local graph and once against a spawned server on a twin copy
of the same fixture, with the SAME actor (--as locally; bearer-resolved
remotely) and the SAME Cedar bundle on both arms — like-for-like
enforcement is part of the harness (a tokens-only server is default-deny
by design; comparing that against a bare local arm measures
configuration, not the fork). Declared-volatile fields (ids, wall-clock,
transport locations) scrub to placeholders; everything else must match
exactly, and exit codes must match for shared failures.
Headline result: 11 rows green with an EMPTY divergence ledger — the
arms agree on every verb today. The ledger (KNOWN_DIVERGENCES) exists so
any future divergence is pinned or filed, never silently repaired;
repairs are Phase 3's job, gated by this referee staying green.
One engine observation surfaced and filed (#207): inline execution with
a declared-but-unbound param matches ALL rows on both arms, while the
stored-query invoke path hard-errors — a cross-path asymmetry the matrix
pins as agreeing behavior pending a deliberate fix. Documented
exclusions (graphs list, ingest/load-over-/ingest, storage-plane verbs)
map to RFC-009 Phases 4-5.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
All six crate manifests + their path-dependency constraints, Cargo.lock,
the regenerated openapi.json version metadata, AGENTS.md's surveyed
version, and the v0.7.0 release notes (object-storage clusters,
config-free --cluster serving, the operator config surface, keyed
credentials, operator targeting/aliases, and the omnigraph.yaml
deprecation stages).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Opt-in: with the env set, loading a legacy omnigraph.yaml is a hard
error pointing at config migrate — the regression guard for migrated
teams (a stray legacy file would otherwise silently outrank operator
config during the window) and the rehearsal for stage 5's removal.
Strict refuses the FILE, never its absence: flag-less invocations on
migrated setups are untouched. Inert unless set.
The RFC's stages-1-3-then-4 release gap collapsed honestly: no version
boundary was crossed between them, so all four ship in the same release
(noted in the RFC).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Andrew's call, and the right one by the repo's own lens: a minimal
cluster.yaml is five lines; a generator is a second copy of the schema to
keep in sync forever, emitting a file that is unusable until hand-edited
anyway (graphs: {} cannot apply or serve). Terraform has no config
scaffolder either. New users copy from the cluster quick-start; migrants
get a ready-to-review cluster.yaml from config migrate. RFC-008 stage 3
becomes purely subtractive.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
omnigraph init no longer writes a legacy config into cwd (the source of
the earlier test-pollution bug, and a scaffold for a deprecated file);
the scaffolder is deleted. omnigraph cluster init scaffolds the
replacement: a minimal valid cluster.yaml (version: 1, optional
metadata.name / storage:, a commented graphs example), refusing to
overwrite. The scaffold validates clean via cluster validate in the e2e.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reads a legacy omnigraph.yaml and produces the three-section split: team
half as a ready-to-review cluster.yaml proposal (graphs with TODO schema
pointers — the legacy file never knew schemas — per-graph queries
directories, policies with applies_to bindings), personal half as an
operator-config merge (actor, output/table defaults — OperatorDefaults
gains the two table keys with their cascade hops — remote graphs with
bearer_token_env become servers entries plus a printed login step, and
legacy aliases split per the RFC: content to the catalog as a manual
step, binding to an operator alias), plus a dropped-keys section with
reasons. Touches nothing without --write; with it, the operator merge is
key-level (existing entries always win; prior file backed up), and
cluster.yaml is emitted only when absent (else cluster.yaml.proposed).
--json emits the report structurally.
The completeness contract is a unit test: every top-level key of the
legacy schema must classify somewhere, or the RFC-008 map has a bug.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Loading a legacy file (flag, env, or cwd-found — never on defaults) emits
one stderr block listing each key actually present with its destination
from RFC-008's migration map — the map applied to YOUR file, not a
generic banner. Once per process; both binaries warn (cluster-mode boots
never reach load_config, silent by construction); suppressible via
OMNIGRAPH_SUPPRESS_YAML_DEPRECATION=1 for CI logs during the window.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Caught on the live smoke: with --alias, the first bare CLI arg lands in
the hidden legacy_uri positional, so an operator alias's positional param
never bound ('parameter not provided' from the server). An operator alias
always knows its target, so the existing normalize_legacy_alias_uri
reclaims the swallowed positional as the first alias arg — same rule the
legacy path already applies.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
aliases: in the operator config bind a personal name to (server, graph,
stored-query NAME, positional arg mapping, fixed param defaults, format)
— zero content, per the ratified bindings-not-content model. Invocation
goes through the server's stored-query endpoint (POST
{base}/graphs/{g}/queries/{name}) with the keyed credential resolving via
the ordinary URL match; param precedence --params > positionals > fixed
defaults; the result renders through the existing format cascade with the
alias's format as its hop. A legacy omnigraph.yaml alias with the same
name wins during the RFC-008 window, with a warning naming both.
E2e (spawned policy-gated server, invoke_query granted via a per-graph
bundle): the alias invokes with name + one positional and nothing else —
server, graph, query, and token all from the operator layer; --server/
--graph explicit targeting; unknown --server lists defined names;
--server exclusive with a positional URI.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Global flags --server (operator-defined server name) and --graph (graph id
on a multi-graph server, requires --server) resolve to the effective
remote URI through one helper and feed the ordinary uri slot — graph
resolution and the PR-2 keyed-token URL match work unchanged; the flag is
sugar for a URI the operator already owns. Exclusive with a positional
URI and --target (loud error, never silent precedence). Unknown names
fail listing the servers that ARE defined.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The operator config gains servers: (name -> url; never a token). A remote
command whose URL prefix-matches an operator server resolves its bearer
token through the keyed chain first — OMNIGRAPH_TOKEN_<NAME> env, then the
[<name>] section of ~/.omnigraph/credentials (created 0600 via temp+rename,
#139 finding 7; group/world-readable files refused loudly) — falling
through to the legacy chain unchanged. URL keying makes §D5 rule 3
structural: a token is only ever sent to the server it is keyed to.
Longest-prefix matching with a path-boundary check (http://h:8080 never
matches http://h:8080-evil). Inserting the keyed hop above the legacy chain
is safe by construction — no existing setup can have servers: defined.
omnigraph login <name> stores/rotates one section (token from --token or
one stdin line — the pipe flow keeps secrets out of shell history);
omnigraph logout removes it, idempotently; logging in before declaring the
server warns instead of failing (the gh model).
Coverage: URL-match/no-substring-trap, credentials round-trip preserving
sibling sections, 0600 write + over-permissive refusal, env-name mapping;
the legacy resolve test is now hermetic against a real ~/.omnigraph and
asserts byte-identical legacy behavior with no servers defined; one
spawned-binary e2e walks the whole lifecycle against an authed server:
refusal -> wrong-token login (stdin) -> rotate (--token) -> authorized read
-> env-beats-file -> non-matching-URL negative -> logout revokes.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
~/.omnigraph/config.yaml joins the resolution chains as the operator
surface: operator.actor becomes the last hop of THE actor chain (--as >
legacy cli.actor during the RFC-008 window > operator.actor > none, one
implementation for direct-engine and cluster commands alike) and
defaults.output joins the read-format cascade below every more-specific
source. Discovery honors $OMNIGRAPH_HOME (tilde-expanded, #139 finding 9);
an absent file is an empty layer; unknown keys WARN and load (a file
written for later slices must not break this CLI); malformed YAML is a
loud error. The module is CLI-only — the server never reads operator
config (invariant 11 by construction).
$OMNIGRAPH_CONFIG becomes a first-class stand-in for --config in
load_config (flag > env > ./omnigraph.yaml), one meaning in both binaries.
The test harness pins hermeticity: spawned binaries get a nonexistent
OMNIGRAPH_HOME by default so no test ever reads the developer's real
operator config. New coverage: loader unit tests, the env-precedence
matrix on load_config_in, and spawned-binary e2es for the actor chain
(operator wins with no flag/legacy key; legacy outranks it; --as wins) and
the format cascade.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Verbatim moves: the clap surface (every command/subcommand/arg struct) to
cli.rs, resolution helpers (config/actor/graph/branch/query, remote HTTP,
env/token, scaffolding) to helpers.rs, human/JSON formatting to output.rs,
the in-source test mod to main_tests.rs via #[path]. main.rs (1,184 lines)
keeps main() and the dispatch match. Visibility bumps only; 22 binary
tests green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
LocalStateBackend becomes ClusterStore: every stored byte — state ledger,
lock, recovery sidecars, approval artifacts — now flows through the
engine's StorageAdapter, making file:// and s3:// one code path. Behavior
on the file backend is byte-compatible (layout, CAS semantics, diagnostics,
lock release timing) and the entire pre-existing suite passes unchanged.
Mechanics: the ledger CAS keeps its public sha256 vocabulary while the
physical swap is token-conditioned (ETag If-Match on S3 via PR #186's
primitives; content-token + temp/rename locally — the pre-port semantics);
the lock is a create-only put (genuinely cross-machine on object stores)
with deterministic drop-release locally and best-effort spawned release on
S3; sidecars/approvals address by URI (SweepOutcome and the executors carry
strings); sweep row-1 retirement joins the uniform deferred post-CAS
cleanup. ClusterStore also gains the catalog-payload and graph-root
methods that commit 2 wires in.
Async ripple: status/force-unlock/serving-snapshot and the server's
settings loader chain go async (CLI dispatch and ~20 test hosts follow,
mechanically). tokio joins the cluster crate's runtime deps for the lock
guard's handle.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
omnigraph load is now the single data-write command:
- works against remote graphs (POSTs the server's /ingest endpoint with the
same bearer/actor resolution as other remote commands) — previously load
was the only data command forced to open Lance storage directly
- --from <base> opts into fork-if-missing for --branch (the former ingest
semantics); without --from a missing branch is an error, never a fork
- --mode is now required: overwrite is destructive, so there is no implicit
default (the old silent default was overwrite)
- output gains base_branch/branch_created (and table sums on remote loads)
omnigraph ingest stays as a deprecated alias (defaults preserved: --from
main --mode merge) that prints a one-line warning to stderr, matching the
read/change deprecation convention; removal in a later release.
Docs updated in the same change: cli.md, cli-reference.md, policy.md,
audit.md, execution.md (unified load section), AGENTS.md quick-flow,
README.md.
BREAKING CHANGE: scripts running omnigraph load without --mode must now
pass it explicitly (previously defaulted to the destructive overwrite).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Branch creation becomes opt-in by presence of the request's 'from' field.
Previously the handler defaulted from to 'main' and always auto-created a
missing branch — a typo'd branch name silently forked main and landed the
data there, with the client none the wiser. Now a request without 'from'
against a missing branch returns 404 branch-not-found and creates nothing;
with 'from' set, fork-if-missing behaves as before. The BranchCreate
authority is only consulted when a fork will actually happen.
The handler calls the unified load_as directly (the deprecated ingest_as
shim is no longer used in the server). IngestOutput.base_branch becomes
nullable: it echoes the request's 'from' and is null when absent. OpenAPI
regenerated; the CLI's local ingest arm moves to load_file_as + the new
converter shape.
BREAKING CHANGE: clients that relied on implicit fork-from-main with 'from'
omitted must now pass from='main' explicitly. IngestOutput.base_branch is
now nullable.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
load_as/load_file_as gain a base: Option<&str> parameter: with Some(base) a
missing target branch is forked from base first (the former ingest
semantics); with None the target branch must exist — staging fails on an
unknown branch, so a typo'd name can never create one. LoadResult gains
branch/base_branch/branch_created metadata (additive).
The ingest family (ingest, ingest_as, ingest_file, ingest_file_as) becomes
#[deprecated] shims over load_as that preserve the historical contract
exactly (from: None still means fork from main; base recorded even when no
fork happened). IngestResult and to_ingest_tables stay for the shims and
the server until the removal release.
The layered policy check is unchanged: Change on the target branch always,
BranchCreate additionally when a fork actually happens (enforced inside
branch_create_from_as with the actor threaded through).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- resolve_cluster_actor uses load_config directly: load_cli_config also
loads auth.env_file into the process env — a second thing, violating the
documented 'exactly one thing' omnigraph.yaml contract for cluster ops.
- resolve_cli_actor gets its doc comment back (the inserted helper had
absorbed the contiguous /// block).
- The actor-default test imports once as setup and asserts on apply alone,
idempotently, instead of re-importing inside the assertion helper.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
local_cli_s3_end_to_end_init_load_read_flow ran `omnigraph init` without a
current_dir, so init's project scaffold landed in crates/omnigraph-cli/ —
poisoning any later test that resolves a graph target from the cwd config
(query_lint_requires_schema_or_resolvable_graph_target fails determinis-
tically once the file exists). Only manifests when OMNIGRAPH_S3_TEST_BUCKET
is set, which is why local FS runs and CI's scoped rustfs job never caught
it. The init and load calls now run inside the test's tempdir.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A --cluster server process whose cwd contains a MALFORMED omnigraph.yaml
boots and serves — proving mode-inference rule 0 returns before any config
search can run. New spawn_server_with_cluster_in support helper sets the
spawned server's cwd explicitly.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cluster FACTS stay unlayered (cluster.yaml only), but the operator's
identity is a per-operator fact — exactly the per-operator omnigraph.yaml's
permanent job, and the cascade every data-plane write already uses. cluster
apply/approve now resolve: --as flag wins and skips any config read
entirely (containers and CI stay config-free); without it, the standard cwd
search supplies cli.actor, with a malformed config failing loudly and
actionably ('pass --as to skip this lookup') rather than silently dropping
attribution. approve's no-actor error now names both sources.
Tests pin the contract from both sides: cli.actor is the no-flag default
for apply (echoed actor) and approve (approved_by), the flag overrides it,
a malformed omnigraph.yaml in cwd breaks nothing except the no-flag actor
lookup, and a conflicting well-formed one leaks nothing into cluster
outputs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>