Commit graph

469 commits

Author SHA1 Message Date
aaltshuler
08ce8dc34d docs(rfc): align RFC-007 with RFC-008's two-surface architecture
RFC-007 now speaks the end-state language throughout: the operator surface
is one half of the two-surface split (cluster config / operator config),
not a layer over a living omnigraph.yaml. The precedence cascade drops the
project layer (cluster config carries no operator-resolvable keys — a
checkout can never supply identity); legacy omnigraph.yaml appears only as
the RFC-008 deprecation-window slot. The trust boundary is restated as
closed-by-construction in the end state, with the rules governing the
window. PR 3 becomes operator targeting (--server + operator aliases — the
replacement RFC-008 needs before legacy aliases migrate), and the schema
example gains the aliases block.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:54:34 +03:00
aaltshuler
320311e759 docs(rfc): RFC-008 — deprecate omnigraph.yaml, one concern per config surface
The file is three unrelated concerns wearing one filename — server
deployment config, project/CLI conveniences, operator identity — and the
mixture is the root cause of a recurring problem class (per-operator
copies of project files, checkout-supplied credential redirection, init
scaffold pollution). End state: two single-owner surfaces — cluster
config (team, repo) and operator config (person, $HOME) — plus the
zero-config flags/env tier.

Complete key-by-key migration map over the verified OmnigraphConfig
surface; staged retirement per the repo's Hyrum rules (warn with per-key
guidance -> `config migrate` tool -> stop scaffolding -> opt-in strict ->
removal at the next major). RFC-007's project-layer framing is amended to
transitional accordingly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:33:19 +03:00
aaltshuler
d531f60999 docs(rfc): RFC-007 — per-operator config, the operator slice of RFC-002
Terraform-style operator/project split: ~/.omnigraph/config.yaml for
identity (operator.actor in the --as cascade), credentials keyed by
server name (env -> 0600 credentials file; no inline secrets), and
operator-owned named servers that project configs reference but cannot
redefine. Explicitly a staged subset of RFC-002: adopts its settled
decisions (one dir, keyed credentials, env precedence), defers
GraphLocator/use/state-layer, and encodes the ten confirmed PR #139
findings as design rules (compat shims, key-level merges, atomic writes,
the project-layer trust boundary).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:29:55 +03:00
Andrew Altshuler
29dd827208
Merge pull request #194 from ModernRelay/feat/cluster-bucket-serving
feat(cluster,server): bucket-backed serving — config-free --cluster s3:// + RustFS e2e (RFC-006 PR 3/3)
2026-06-11 17:03:14 +03:00
aaltshuler
8d7aed065f test(cluster,server): gated object-storage cluster e2e + CI wiring + docs
s3_cluster.rs runs the full control-plane lifecycle against a real
bucket (CI: containerized RustFS; locally the RustFS binary): import →
lock released (pins the drop-time release regression caught on the first
live smoke) → apply (graph roots + catalog on the bucket, nothing local)
→ serving snapshots from both the config dir and the bare URI → schema
evolution → approved delete (prefix removal) → empty-cluster refusal.
The server suite gains the config-free boot test: --cluster s3://… with
zero local files serves a stored query over HTTP.

CI: the rustfs job runs both suites; the classify filter covers the
cluster store/serve modules and the new test files. The server smoke
drops its name filter — every test in the s3 target is bucket-gated, and
a filter matching nothing passes vacuously (which silently ran zero
tests for a while).

Docs: deployment.md gains the Bucket-no-volume shape as the preferred
cloud deployment; cluster.md/server.md document --cluster <uri>;
testing.md maps the new suite.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 15:56:40 +03:00
aaltshuler
58855c0a7c feat(cluster,server): inline policy content + config-free --cluster URI boot
Two serving changes that complete RFC-006's read side:

ServingPolicy carries the policy bundle CONTENT (digest-verified at
snapshot read) instead of a blob path — the catalog may live on object
storage, and the server must not re-read mutable state after the
snapshot. The server grows a PolicySource enum: File for omnigraph.yaml
deployments (unchanged), Inline for cluster boots, wired through
PolicyEngine::load_{graph,server}_from_source.

read_serving_snapshot_from_storage(uri) reads the applied revision
straight from a storage root, and --cluster accepts a scheme-qualified
URI (s3://bucket/prefix): config-free serving — a serving box needs only
the URI and credentials; the ledger and catalog on the bucket ARE the
deployment artifact. Bare paths keep the config-directory behavior.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 15:56:22 +03:00
Andrew Altshuler
7af3697397
Merge pull request #193 from ModernRelay/refactor/cli-modularize
refactor(cli): modularize main.rs and the test monolith — pure code movement
2026-06-11 15:37:28 +03:00
Andrew Altshuler
c116a12fc9
Merge pull request #192 from ModernRelay/refactor/server-modularize
refactor(server): modularize the test monolith and lib.rs — pure code movement
2026-06-11 15:37:23 +03:00
aaltshuler
4a3f8e3a96 ci: point the RustFS server smoke at the renamed s3 test target
The test-split renamed tests/server.rs away; the job now targets --test
s3. Also fixes a stale name filter (s3_repo vs the actual s3_graph test):
a substring filter matching nothing passes vacuously, so this step had
been running zero tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 15:21:44 +03:00
aaltshuler
d5e75df272 refactor(cli): split the test monolith into command-area suites
tests/cli.rs (4,548 lines, 112 tests) becomes five area files —
cli_cluster (24), cli_cluster_e2e (10, the spawned-binary lifecycle
compositions), cli_data (49), cli_schema_config (16), cli_queries (13) —
with the file-local helpers joining the existing tests/support harness.
Verbatim moves + visibility bumps; 161 crate tests green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 15:16:51 +03:00
aaltshuler
916015c416 refactor(cli): split main.rs into cli/helpers/output modules
Verbatim moves: the clap surface (every command/subcommand/arg struct) to
cli.rs, resolution helpers (config/actor/graph/branch/query, remote HTTP,
env/token, scaffolding) to helpers.rs, human/JSON formatting to output.rs,
the in-source test mod to main_tests.rs via #[path]. main.rs (1,184 lines)
keeps main() and the dispatch match. Visibility bumps only; 22 binary
tests green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 15:14:27 +03:00
aaltshuler
127440d873 refactor(server): split lib.rs into handlers and settings modules
Verbatim moves: route handlers + bearer-auth middleware + per-request
authorization + the cluster-prefix OpenAPI rewrite go to handlers.rs;
settings resolution (omnigraph.yaml/CLI/env, mode inference, bearer-token
sources, runtime-state classification) and its in-source test mod go to
settings.rs. lib.rs (1,158 lines) keeps the public types, app/router
assembly, and serve(). The ApiDoc derive references handlers::-qualified
paths; the one multi-line utoipa attribute the cut orphaned was relocated
with its handler. 289 crate tests green, OpenAPI drift check included.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 15:08:25 +03:00
aaltshuler
b036073ec6 refactor(server): split the test monolith into area suites
tests/server.rs (6,517 lines, 110 tests) becomes seven area files —
auth_policy, data_routes, schema_routes, stored_queries, multi_graph,
boot_settings, s3 — with shared helpers in tests/support/mod.rs. Verbatim
moves + visibility bumps (pub on helpers, pub(super)->pub inside the
matrix harness); cargo fix stripped the per-file unused imports. All 110
tests pass in their new homes (289 across the crate including lib and
openapi).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 15:03:51 +03:00
Andrew Altshuler
4e526b3e5a
Merge pull request #190 from ModernRelay/feat/cluster-storage-root-v2
feat(cluster): storage root — ledger, catalog, and graphs on the StorageAdapter (RFC-006 PR 2/3)
2026-06-11 14:54:10 +03:00
aaltshuler
f6ae3e4fa3 fix(cluster): lock release must complete before a CLI process exits
Caught by the first live s3 smoke: StateLockGuard's spawned async delete
dies with the runtime when a short-lived CLI process exits right after the
command — import's lock survived into the next command as state_lock_held.
On the multi-thread runtime (the CLI, and the gated s3 tests)
block_in_place waits for the delete to complete; current-thread runtimes
keep the spawn fallback with force-unlock as the documented recovery, same
as a crash.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:33:26 +03:00
aaltshuler
8dc2f15255 feat(cluster): the storage: root — state, catalog, and graph roots relocatable
cluster.yaml gains an optional storage: URI deciding where everything the
cluster STORES lives: the state ledger, lock, content-addressed catalog,
recovery sidecars, approval artifacts, and the derived graph roots
(<storage>/graphs/<id>.omni). Absent, it defaults to the config directory
itself — the original layout, byte-compatible, so pre-existing clusters and
the whole test suite are untouched. Declared configuration always stays in
the working tree (Terraform's config-local/state-remote split); credentials
are env-only, never in cluster.yaml.

Every command resolves its store from the declared root (a bad root is a
loud invalid_storage_root). Graph-root derivation, the delete executor
(prefix delete via the adapter), the sweep's existence probes, the catalog
payload write/verify/read paths, and the serving snapshot all flow through
ClusterStore — the last raw-fs holdouts for stored state are gone, and the
deny-list gains the rule that keeps it that way.

Tests: default-layout byte-compat, a file:// root relocating the entire
cluster (ledger+catalog+graphs under the new root, nothing under the config
dir, serving snapshot follows), invalid-root validation. 98 in-crate + 9
failpoints + full workspace gate green. The s3:// flavor lands with PR 3's
gated RustFS e2e.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:28:04 +03:00
aaltshuler
fd002abaa5 feat(cluster): port the storage backend to the engine StorageAdapter
LocalStateBackend becomes ClusterStore: every stored byte — state ledger,
lock, recovery sidecars, approval artifacts — now flows through the
engine's StorageAdapter, making file:// and s3:// one code path. Behavior
on the file backend is byte-compatible (layout, CAS semantics, diagnostics,
lock release timing) and the entire pre-existing suite passes unchanged.

Mechanics: the ledger CAS keeps its public sha256 vocabulary while the
physical swap is token-conditioned (ETag If-Match on S3 via PR #186's
primitives; content-token + temp/rename locally — the pre-port semantics);
the lock is a create-only put (genuinely cross-machine on object stores)
with deterministic drop-release locally and best-effort spawned release on
S3; sidecars/approvals address by URI (SweepOutcome and the executors carry
strings); sweep row-1 retirement joins the uniform deferred post-CAS
cleanup. ClusterStore also gains the catalog-payload and graph-root
methods that commit 2 wires in.

Async ripple: status/force-unlock/serving-snapshot and the server's
settings loader chain go async (CLI dispatch and ~20 test hosts follow,
mechanically). tokio joins the cluster crate's runtime deps for the lock
guard's handle.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 14:11:14 +03:00
Andrew Altshuler
2f58fc47fa
Merge pull request #188 from ModernRelay/refactor/cluster-modularize
refactor(cluster): modularize lib.rs — pure code movement
2026-06-11 12:07:32 +03:00
Andrew Altshuler
e19c095e8c
Merge pull request #189 from ModernRelay/ci/test-workspace-timeout
ci: raise Test Workspace timeout to 75 minutes
2026-06-11 11:39:13 +03:00
aaltshuler
7f32e6f1bc ci: raise Test Workspace timeout to 75 minutes
A cold rust-cache (every Cargo.lock change) means a full workspace +
failpoints-feature build on the 2-core runner, which now exceeds 45
minutes on slow runner days — and because a timed-out run never saves its
cache, an undersized budget self-perpetuates: every retry starts cold and
dies identically (observed four consecutive 45-minute cancellations on
main and PR #188 after #186's lock bump). Warm-cache runs stay ~15
minutes; 75 is headroom matching the rustfs job's budget, not a target.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:38:54 +03:00
aaltshuler
db6fe03be1 refactor(cluster): move type definitions to types.rs
Verbatim move of the public output/diagnostic types and the internal
state/sidecar/approval models; previously-private types and their fields
get pub(crate) (they were crate-visible by position before). lib.rs is now
the command pipeline + public API. 95 tests green; full workspace gate
green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:42:02 +03:00
aaltshuler
dc0a1fc5a5 refactor(cluster): move declared-config loading to config.rs
Verbatim move of cluster.yaml parsing, query discovery, source digesting,
header/id validation, path resolution, and live-graph observation. Two
helpers that the cut swept along were relocated to their right homes
(state-status helpers back to lib.rs, lock-file helpers to store.rs). 95
tests green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:37:20 +03:00
aaltshuler
dd17c0c50f refactor(cluster): move diffing and classification to diff.rs
Verbatim move of diff_resources, binding-change diffing, blast radius,
approval gating, ResourceKind, classify_changes, and demotion. 95 tests
green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:33:13 +03:00
aaltshuler
9c3e09e838 refactor(cluster): move the recovery sweep to sweep.rs
Verbatim move of the sidecar classification (all RFC-004 D3 rows),
tombstoning, and approval-consumption helpers. 95 tests green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:30:55 +03:00
aaltshuler
00fc5cf537 refactor(cluster): move the serving snapshot to serve.rs
Verbatim move of the Serving* types, read_serving_snapshot, and
read_verified_payload; public re-exports preserved (the server's imports
are unchanged). 95 tests green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:29:44 +03:00
aaltshuler
5a8047e5d0 refactor(cluster): move the storage backend to store.rs
Verbatim move of LocalStateBackend, StateSnapshot, StateLockGuard and their
impls — the single home for stored-state I/O (state ledger, lock, recovery
sidecars, approval artifacts), where the RFC-006 object-storage port lands
next as a focused diff. Visibility bumps (pub(crate)) only; 95 tests green
before and after.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:28:04 +03:00
Andrew Altshuler
1b1583d897
Merge pull request #186 from ModernRelay/feat/object-store-primitives
feat(storage,policy): object-store primitives for the cluster port (RFC-006 PR 1/3)
2026-06-11 05:26:55 +03:00
aaltshuler
fbb86dee0e refactor(cluster): move the in-source test suite to tests.rs
Verbatim move (indentation preserved — embedded raw-string fixtures are
content). lib.rs drops from 7,857 to ~4,750 lines; `use super::*` resolves
to the crate root through the #[path] module declaration unchanged. 95
tests green before and after.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:25:53 +03:00
aaltshuler
d702fd106a feat(policy): from-source twins for the policy loaders
PolicyConfig::from_source + PolicyEngine::load_graph_from_source /
load_server_from_source — the path-based loaders delegate to them. Needed by
callers whose policy bundles don't live on the local filesystem (the cluster
catalog on object storage); kind-alignment validation stays loud through the
new path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:09:45 +03:00
aaltshuler
f48e69b999 feat(storage): versioned CAS, conditional replace, and prefix delete on StorageAdapter
Three primitives the cluster's object-storage port (RFC-006) needs, on the
engine's existing adapter rather than a parallel store:

- read_text_versioned: content + an opaque backend version token (S3: the
  ETag from GET; local: content sha256 — ETags don't exist on a filesystem).
- write_text_if_match: replace only when the token still matches. S3 maps to
  a conditional put (PutMode::Update / If-Match) — verified against RustFS
  beta.8 through the real object_store 0.12.5 path, no extra builder config
  needed; local compares content then swaps via temp+rename, the same
  single-machine semantics callers had before this trait (safe under their
  own lock protocol, not a cross-process barrier by itself). CAS-lost is
  Ok(None), never silent.
- delete_prefix: recursive + idempotent (local remove_dir_all; S3 list +
  delete, with the non-atomicity documented for crash-retry callers).

Gated S3 coverage: s3_adapter_conditional_writes_contract pins the
conditional-write behavior the cluster ledger will depend on (red if a
backend bump regresses it), and s3_schema_apply_migrates_live_graph closes
the previously-untested schema-apply-on-S3 path before the cluster's schema
executor leans on it. Engine gains the sha2 workspace dep.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 05:09:45 +03:00
Andrew Altshuler
328bfef6fb
Merge pull request #184 from ModernRelay/refactor/load-ingest-unification
refactor: unify load/ingest — load survives, ingest deprecated
2026-06-11 04:43:36 +03:00
aaltshuler
fa6af775c1 feat(cli)!: unified load command; deprecate ingest as an alias
omnigraph load is now the single data-write command:
- works against remote graphs (POSTs the server's /ingest endpoint with the
  same bearer/actor resolution as other remote commands) — previously load
  was the only data command forced to open Lance storage directly
- --from <base> opts into fork-if-missing for --branch (the former ingest
  semantics); without --from a missing branch is an error, never a fork
- --mode is now required: overwrite is destructive, so there is no implicit
  default (the old silent default was overwrite)
- output gains base_branch/branch_created (and table sums on remote loads)

omnigraph ingest stays as a deprecated alias (defaults preserved: --from
main --mode merge) that prints a one-line warning to stderr, matching the
read/change deprecation convention; removal in a later release.

Docs updated in the same change: cli.md, cli-reference.md, policy.md,
audit.md, execution.md (unified load section), AGENTS.md quick-flow,
README.md.

BREAKING CHANGE: scripts running omnigraph load without --mode must now
pass it explicitly (previously defaulted to the destructive overwrite).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 04:18:00 +03:00
aaltshuler
90676ef52f feat(server)!: POST /ingest forks only when 'from' is present
Branch creation becomes opt-in by presence of the request's 'from' field.
Previously the handler defaulted from to 'main' and always auto-created a
missing branch — a typo'd branch name silently forked main and landed the
data there, with the client none the wiser. Now a request without 'from'
against a missing branch returns 404 branch-not-found and creates nothing;
with 'from' set, fork-if-missing behaves as before. The BranchCreate
authority is only consulted when a fork will actually happen.

The handler calls the unified load_as directly (the deprecated ingest_as
shim is no longer used in the server). IngestOutput.base_branch becomes
nullable: it echoes the request's 'from' and is null when absent. OpenAPI
regenerated; the CLI's local ingest arm moves to load_file_as + the new
converter shape.

BREAKING CHANGE: clients that relied on implicit fork-from-main with 'from'
omitted must now pass from='main' explicitly. IngestOutput.base_branch is
now nullable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 04:05:29 +03:00
aaltshuler
c236a4c2df refactor(loader): load_jsonl helpers take &Omnigraph and document their role
The free helpers needlessly demanded &mut Omnigraph (every load API takes
&self) and read as leftovers. Rather than rewriting their ~200 call sites
across the test suites — which would have to re-derive the active-branch
resolution at each site — keep the one convenience and make it honest:
borrow immutably (&mut callers coerce, no churn) and document it as the
active-branch shorthand over Omnigraph::load.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 03:57:41 +03:00
aaltshuler
e676c151bb feat(engine): unify load/ingest — load_as gains an optional fork base
load_as/load_file_as gain a base: Option<&str> parameter: with Some(base) a
missing target branch is forked from base first (the former ingest
semantics); with None the target branch must exist — staging fails on an
unknown branch, so a typo'd name can never create one. LoadResult gains
branch/base_branch/branch_created metadata (additive).

The ingest family (ingest, ingest_as, ingest_file, ingest_file_as) becomes
#[deprecated] shims over load_as that preserve the historical contract
exactly (from: None still means fork from main; base recorded even when no
fork happened). IngestResult and to_ingest_tables stay for the shims and
the server until the removal release.

The layered policy check is unchanged: Change on the target branch always,
BranchCreate additionally when a fork actually happens (enforced inside
branch_create_from_as with the actor threaded through).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 03:53:22 +03:00
aaltshuler
43d4e89fde docs(execution): Overwrite loads are staged since MR-793, not inline-commit
The LoadMode table still described Overwrite as an inline-commit-per-type
residual with a partial-truncation failure window. Since MR-793 Phase 2,
Overwrite goes through the same MutationStaging accumulator as Append/Merge,
staged as a Lance Operation::Overwrite transaction via stage_overwrite
(table_store.rs) and committed with commit_staged + publisher CAS — a
mid-load failure leaves Lance HEAD untouched in all three modes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 03:44:02 +03:00
Andrew Altshuler
4bd763f4b8
Merge pull request #183 from ModernRelay/feat/cluster-query-discovery
feat(cluster): Terraform-shaped query declaration; drop ./ path noise
2026-06-11 01:44:52 +03:00
aaltshuler
4558454bc7 fix(cluster): address review — discovery reads each file exactly once
resolve_query_decls hands its file contents to the caller; the per-query
digest/typecheck pass reuses them instead of re-reading (a file with N
queries was read N+1 times), which also closes the window where a file
changing between enumeration and validation produced a confusing
query_key_mismatch for a just-discovered name. Explicit-map declarations
read as before.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 01:35:47 +03:00
aaltshuler
44b5866516 docs: drop ./ path prefixes; document query discovery
Paths in cluster.yaml and command examples are relative to one explicit
config folder (Terraform-shaped) — the ./ prefixes were noise and are gone
across the user docs (109 instances; ../ links and ./scripts executables
untouched). The cluster docs now present directory discovery as the primary
queries form with the list and map forms documented alongside.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 01:33:30 +03:00
aaltshuler
677320ceec feat(cluster): Terraform-shaped query declaration — discover from files
cluster.yaml's graphs.<id>.queries previously accepted only an explicit
name->file map, forcing configs to re-enumerate every `query <name>` that
the .gq files already declare (the SPIKE cookbook needed 66 entries for 6
files). The files ARE the declaration now: `queries: queries/` discovers
every declaration in a directory's top-level *.gq (sorted), a list form
takes explicit files, and the map stays for fine-grained control.
Discovery is loud — unreadable/unparseable files and duplicate query names
fail validation (query_parse_error, duplicate_query_name). Downstream is
untouched: each discovered query is still an individually addressed
resource with the containing file's digest.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 00:46:21 +03:00
Andrew Altshuler
c3ff076e89
Merge pull request #181 from ModernRelay/feat/container-cluster-mode
feat(docker): cluster-mode container + AWS/Railway recipes
2026-06-10 23:57:34 +03:00
Andrew Altshuler
2b5fb7197e
Merge pull request #180 from ModernRelay/feat/cluster-local-config
feat(cli): per-operator actor for cluster ops; pin omnigraph.yaml isolation
2026-06-10 23:57:31 +03:00
aaltshuler
f165145b63 docs(deploy): address review — consistent placeholders, complete ECS command
The ECS day-2 apply gains its required --config flag (the image ships no
omnigraph.yaml, so the CLI cannot locate the cluster dir without it), and
the docker-exec example uses the <you> placeholder convention instead of a
real-looking actor name.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:54:26 +03:00
aaltshuler
3b2bf755ae fix(cli): address review — honor the one-thing contract, restore docs, untangle test phases
- resolve_cluster_actor uses load_config directly: load_cli_config also
  loads auth.env_file into the process env — a second thing, violating the
  documented 'exactly one thing' omnigraph.yaml contract for cluster ops.
- resolve_cli_actor gets its doc comment back (the inserted helper had
  absorbed the contiguous /// block).
- The actor-default test imports once as setup and asserts on apply alone,
  idempotently, instead of re-importing inside the assertion helper.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:54:05 +03:00
aaltshuler
6b3ae7ac79 docs(deploy): AWS and Railway cluster-mode recipes
The container contract (OMNIGRAPH_CLUSTER + mounted volume + token env),
ECS/Fargate+EFS and Railway-volume walkthroughs, the in-container day-2
loop, and the honest constraints list (volume mandatory, no hot reload,
single-writer apply, shared-volume replicas unvalidated). Operator guide
links the recipes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:45:30 +03:00
aaltshuler
d3ae31be08 feat(docker): cluster-mode entrypoint and the CLI in the image
OMNIGRAPH_CLUSTER boots the container from a mounted cluster directory's
applied revision — checked first and exclusive (exit 64 when combined with
OMNIGRAPH_TARGET_URI/CONFIG/TARGET), the entrypoint-level mirror of the
server's mode-inference rule 0. The omnigraph CLI joins the image so the
day-2 loop (cluster apply/approve/status, data loads by explicit URI) runs
in-container via docker/ECS exec or railway shell — no omnigraph.yaml
required, which the cluster-local-config PR pins. entrypoint_test gains the
cluster case plus all three exclusivity refusals.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:44:54 +03:00
aaltshuler
fbe9726ac7 test(cli): stop the S3 e2e scaffolding omnigraph.yaml into the crate dir
local_cli_s3_end_to_end_init_load_read_flow ran `omnigraph init` without a
current_dir, so init's project scaffold landed in crates/omnigraph-cli/ —
poisoning any later test that resolves a graph target from the cwd config
(query_lint_requires_schema_or_resolvable_graph_target fails determinis-
tically once the file exists). Only manifests when OMNIGRAPH_S3_TEST_BUCKET
is set, which is why local FS runs and CI's scoped rustfs job never caught
it. The init and load calls now run inside the test's tempdir.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:34:54 +03:00
aaltshuler
99f7f36864 docs(cluster): the precise omnigraph.yaml contract
The 'Relationship to omnigraph.yaml' section becomes the exact rule set:
cluster commands read the per-operator config for exactly one thing (the
cli.actor default when --as is omitted), a --cluster server reads it for
nothing, and pointing data-plane targets at derived roots is ergonomics,
not coupling. Operator guide and CLI reference updated to match.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:30:40 +03:00
aaltshuler
f7368b58a0 test(cli): pin --cluster boot isolation from cwd omnigraph.yaml
A --cluster server process whose cwd contains a MALFORMED omnigraph.yaml
boots and serves — proving mode-inference rule 0 returns before any config
search can run. New spawn_server_with_cluster_in support helper sets the
spawned server's cwd explicitly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:29:49 +03:00
aaltshuler
f3374ac6dc feat(cli): resolve cluster actor via the per-operator config cascade
Cluster FACTS stay unlayered (cluster.yaml only), but the operator's
identity is a per-operator fact — exactly the per-operator omnigraph.yaml's
permanent job, and the cascade every data-plane write already uses. cluster
apply/approve now resolve: --as flag wins and skips any config read
entirely (containers and CI stay config-free); without it, the standard cwd
search supplies cli.actor, with a malformed config failing loudly and
actionably ('pass --as to skip this lookup') rather than silently dropping
attribution. approve's no-actor error now names both sources.

Tests pin the contract from both sides: cli.actor is the no-flag default
for apply (echoed actor) and approve (approved_by), the flag overrides it,
a malformed omnigraph.yaml in cwd breaks nothing except the no-flag actor
lookup, and a conflicting well-formed one leaks nothing into cluster
outputs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:29:49 +03:00