mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
test(cluster,server): gated object-storage cluster e2e + CI wiring + docs
s3_cluster.rs runs the full control-plane lifecycle against a real bucket (CI: containerized RustFS; locally the RustFS binary): import → lock released (pins the drop-time release regression caught on the first live smoke) → apply (graph roots + catalog on the bucket, nothing local) → serving snapshots from both the config dir and the bare URI → schema evolution → approved delete (prefix removal) → empty-cluster refusal. The server suite gains the config-free boot test: --cluster s3://… with zero local files serves a stored query over HTTP. CI: the rustfs job runs both suites; the classify filter covers the cluster store/serve modules and the new test files. The server smoke drops its name filter — every test in the s3 target is bucket-gated, and a filter matching nothing passes vacuously (which silently ran zero tests for a while). Docs: deployment.md gains the Bucket-no-volume shape as the preferred cloud deployment; cluster.md/server.md document --cluster <uri>; testing.md maps the new suite. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
58855c0a7c
commit
8d7aed065f
7 changed files with 311 additions and 15 deletions
|
|
@ -8,7 +8,7 @@ This file is the always-on map of the test surface. **Consult it before every ta
|
|||
|---|---|---|
|
||||
| `omnigraph` (engine) | `crates/omnigraph/tests/` | Integration tests (21 files), fixture-driven, share `tests/helpers/mod.rs` |
|
||||
| `omnigraph-cli` | `crates/omnigraph-cli/tests/` | `cli.rs` (unit-ish; includes the `cluster_e2e_*` lifecycle compositions over the spawned binary — lost-state re-import recovery, out-of-band drift, graph-root destruction, multi-graph mixed-disposition convergence), `system_local.rs` (incl. the full-cycle cluster lifecycle with a spawned `--cluster` server — declare→serve→evolve→drift-heal→approved-delete — and applied-policy enforcement over HTTP), `system_remote.rs`, share `tests/support/mod.rs` |
|
||||
| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests`; `tests/failpoints.rs` (feature-gated) | Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations, config-only apply (content-addressed payload publish, disposition gating, composite-digest convergence, idempotent re-apply), catalog payload verification (status read-only, refresh drift + self-heal), failpoint crash-mid-apply / CAS-race coverage, Stage 4A graph creation (create executor, recovery sidecars + sweep rows, create crash windows), Stage 4B schema apply (migration previews in plan, schema executor, schema-apply sweep classification, schema crash windows), Stage 4C gated deletes (digest-bound approvals, delete executor + tombstones, delete sweep rows, delete crash windows), and 5A policy binding metadata (applies_to in the applied revision, binding-change diffing + convergence, pre-5A backfill), and the 5B serving-snapshot read API (converged read, refusal rows) |
|
||||
| `omnigraph-cluster` | mostly in-source `#[cfg(test)] mod tests`; `tests/failpoints.rs` (feature-gated); `tests/s3_cluster.rs` (bucket-gated full lifecycle on object storage) | Cluster config parser, local JSON state diff, state CAS/lock handling/recovery, read-only validate/plan/status plus explicit refresh/import graph observations, config-only apply (content-addressed payload publish, disposition gating, composite-digest convergence, idempotent re-apply), catalog payload verification (status read-only, refresh drift + self-heal), failpoint crash-mid-apply / CAS-race coverage, Stage 4A graph creation (create executor, recovery sidecars + sweep rows, create crash windows), Stage 4B schema apply (migration previews in plan, schema executor, schema-apply sweep classification, schema crash windows), Stage 4C gated deletes (digest-bound approvals, delete executor + tombstones, delete sweep rows, delete crash windows), and 5A policy binding metadata (applies_to in the applied revision, binding-change diffing + convergence, pre-5A backfill), and the 5B serving-snapshot read API (converged read, refusal rows) |
|
||||
| `omnigraph-server` | `crates/omnigraph-server/tests/` | `server.rs` (HTTP-level; incl. cluster-mode boot — converged-dir serving, policy binding wiring, boot refusals), `openapi.rs` (OpenAPI drift / regeneration) |
|
||||
| `omnigraph-compiler` | mostly in-source `#[cfg(test)] mod tests` | Parser, type-checker, IR lowering, lint |
|
||||
|
||||
|
|
@ -64,7 +64,8 @@ The engine's `tests/` is the principal coverage surface; most graph-shaped behav
|
|||
CI runs three S3-backed tests against a containerized RustFS server (`.github/workflows/ci.yml` → `rustfs_integration` job):
|
||||
|
||||
- `cargo test -p omnigraph-engine --test s3_storage`
|
||||
- `cargo test -p omnigraph-server --test server server_opens_s3_graph_directly_and_serves_snapshot_and_read`
|
||||
- `cargo test -p omnigraph-server --test s3` (single-graph serving + config-free `--cluster s3://` boot)
|
||||
- `cargo test -p omnigraph-cluster --test s3_cluster` (full control-plane lifecycle on the bucket)
|
||||
- `cargo test -p omnigraph-cli --test system_local local_cli_s3_end_to_end_init_load_read_flow`
|
||||
|
||||
Locally, set `OMNIGRAPH_S3_TEST_BUCKET` (and the usual `AWS_*` vars including `AWS_ENDPOINT_URL_S3` for non-AWS) before running. Without those, S3 tests skip gracefully.
|
||||
|
|
|
|||
|
|
@ -84,6 +84,12 @@ OMNIGRAPH_SERVER_BEARER_TOKENS_JSON='{"act-reader":"s3cret"}' \
|
|||
omnigraph-server --cluster company-brain --bind 0.0.0.0:8080
|
||||
```
|
||||
|
||||
`--cluster` accepts either a **config directory** (the storage root resolves
|
||||
through `cluster.yaml`'s `storage:` key) or a **storage-root URI directly**
|
||||
(`--cluster s3://bucket/prefix`) — config-free serving: a serving box needs
|
||||
only the URI and credentials, no checkout of the config repo. The ledger and
|
||||
catalog on the bucket are the deployment artifact.
|
||||
|
||||
`--cluster` is an **exclusive boot source**: it cannot be combined with a
|
||||
graph URI, `--target`, or `--config`, and `omnigraph.yaml` is never read in
|
||||
this mode. Routing is always multi-graph:
|
||||
|
|
|
|||
|
|
@ -47,10 +47,31 @@ omnigraph-server s3://my-bucket/graphs/example/releases/2026-04-10-v0.1.0 \
|
|||
|
||||
## Cluster Mode in Containers (AWS, Railway)
|
||||
|
||||
A cluster-booted deployment serves a **cluster directory** (config + state
|
||||
ledger + content-addressed catalog + graph data) from a mounted volume — the
|
||||
one structural difference from the stateless S3 single-graph shape, which
|
||||
needs no volume at all. The container contract:
|
||||
A cluster-booted deployment has **two shapes** since the `storage:` root
|
||||
(RFC-006):
|
||||
|
||||
- **Bucket, no volume (preferred for cloud)** — the cluster's ledger,
|
||||
catalog, and graph data live under an object-storage root
|
||||
(`storage: s3://bucket/prefix` in `cluster.yaml`). The server boots
|
||||
**config-free** from the bare URI; the container needs no volume at all:
|
||||
|
||||
```bash
|
||||
docker run -d \
|
||||
-e OMNIGRAPH_CLUSTER=s3://my-bucket/clusters/company-brain \
|
||||
-e AWS_ACCESS_KEY_ID=... -e AWS_SECRET_ACCESS_KEY=... \
|
||||
-e OMNIGRAPH_SERVER_BEARER_TOKEN=... \
|
||||
-p 8080:8080 <image>
|
||||
```
|
||||
|
||||
Day-2 runs from any operator checkout of the config repo:
|
||||
`omnigraph cluster apply --config ./company-brain` (the `storage:` key
|
||||
routes every stored byte to the bucket), then restart the service. The
|
||||
state lock is genuinely cross-machine on object storage, so CI and
|
||||
operator shells contend safely.
|
||||
|
||||
- **Volume (file-rooted)** — the original shape: the whole cluster
|
||||
directory on a mounted volume. Still fully supported; the container
|
||||
contract:
|
||||
|
||||
```bash
|
||||
docker run -d \
|
||||
|
|
@ -102,8 +123,6 @@ above).
|
|||
|
||||
### Constraints (current honest list)
|
||||
|
||||
- **Cluster directories are local-filesystem** — the volume is mandatory;
|
||||
S3-hosted cluster dirs are not supported.
|
||||
- **No hot reload** — applied changes serve on the next restart.
|
||||
- **Single-writer apply** — run `cluster apply` from one place at a time
|
||||
(the state lock enforces this; CI or one operator shell, not both).
|
||||
|
|
|
|||
|
|
@ -16,7 +16,7 @@ Axum 0.8 + tokio + utoipa-generated OpenAPI. **Two modes** (v0.6.0+): single-gra
|
|||
|
||||
### Cluster-booted multi mode (Phase 5)
|
||||
|
||||
`omnigraph-server --cluster <dir>` boots from the cluster catalog's **applied
|
||||
`omnigraph-server --cluster <dir-or-uri>` boots from the cluster catalog's **applied
|
||||
revision** (`state.json` + content-addressed blobs) instead of
|
||||
`omnigraph.yaml` — an exclusive boot source: combining it with `<URI>`,
|
||||
`--target`, or `--config` is a startup error, and `omnigraph.yaml` is never
|
||||
|
|
@ -27,7 +27,7 @@ for what is read and the fail-fast readiness rules. `--bind`,
|
|||
|
||||
Mode inference:
|
||||
|
||||
0. CLI `--cluster <dir>` → **multi, cluster-booted** (exclusive)
|
||||
0. CLI `--cluster <dir | s3://…>` → **multi, cluster-booted** (exclusive; a scheme-qualified argument reads the ledger straight from the storage root, no local config)
|
||||
1. CLI positional `<URI>` → single
|
||||
2. CLI `--target <name>` → single
|
||||
3. `server.graph` in config → single
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue