omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-21 02:28:07 +02:00

Author	SHA1	Message	Date
Andrew Altshuler	7fd23c54a3	fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap (#284 ) * fix(cluster): stop cluster-apply crash-loops from the recovery-sidecar trap A `cluster apply` carrying a schema change against a graph that has non-main branches, or an unsupported "needs backfill" migration, armed a recovery sidecar before calling the engine, then left it behind when the engine rejected the apply pre-movement. The server refuses to boot while any sidecar is pending, and re-running apply re-armed a fresh sidecar — an unescapable crash loop. None of the engine rejections are bugs; the trap is in the apply/serve choreography. Three coordinated changes: 1. Preview before arming the sidecar. `cluster apply` now runs `preview_schema_apply_with_options` before `write_recovery_sidecar`, so parser/planner rejections (non-main branches, unsupported plan) fail loudly without leaving recovery work behind. The post-preview engine error path now deletes the sidecar when the live schema still matches the recorded digest (nothing moved), and keeps it only on real mid-movement failure — both branches covered by new engine-failpoint tests (cluster failpoints now enable omnigraph/failpoints). 2. Per-graph quarantine at serve time instead of whole-cluster refusal. A graph-attributed pending sidecar, an unopenable graph root, a query parse failure, or an unresolvable embedding provider now quarantines just that graph (logged loudly at every boot layer) while healthy graphs serve; `/graphs` lists only ready graphs and quarantined routes 404. Cluster-global problems (missing/unreadable state, malformed or unattributable sidecars, shared-catalog or cluster-policy errors, zero healthy graphs) stay fail-fast. `--require-all-graphs` / OMNIGRAPH_REQUIRE_ALL_GRAPHS=1 restores all-or-nothing boot. 3. Backfill embedding-provider profile metadata on apply. Mirrors the existing policy-binding backfill: a pre-5A ledger missing `embedding_profile` is now detected as a metadata-only change and backfilled by a no-op apply, instead of bricking serve with `embedding_provider_profile_missing` forever. Tests: trap (no sidecar after a rejected apply), both digest-cleanup branches, per-graph quarantine (cluster + server), embedding backfill. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: resilient cluster boot + recovery-sidecar trap fix Amend RFC-005 D4 readiness posture (cluster-global fail-fast vs graph-local quarantine; deviation #5 for --require-all-graphs), add the v0.7.0 release note, and update the user cluster/server/deployment docs and the OMNIGRAPH_REQUIRE_ALL_GRAPHS env var. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(cluster): surface sidecar-cleanup failures; document severity promotion Address Greptile review on PR #284: - The pre-movement sidecar cleanup fast-path discarded `delete_object`'s result, so a transient delete failure left the graph quarantined with no signal. Add `try_delete_object` (Result-returning) and emit a `recovery_sidecar_cleanup_failed` warning diagnostic on failure; the fire-and-forget `delete_object` now delegates to it. - Document why the serve-time loop promotes every `list_recovery_sidecars` diagnostic to a cluster-fatal error (the listing only emits genuine read/parse/version failures, as warnings, whose blast radius serving cannot prove) and note the promote-by-code path if that ever changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 03:34:15 +03:00
aaltshuler	4f8c71fa23	Merge remote-tracking branch 'origin/main' into ragnorc/shaping-config-integration # Conflicts: # crates/omnigraph-cluster/src/lib.rs # crates/omnigraph-cluster/src/serve.rs # crates/omnigraph-server/src/lib.rs # crates/omnigraph-server/src/settings.rs # docs/user/clusters/config.md	2026-06-16 04:13:00 +03:00
aaltshuler	16e4a833c0	Wire cluster embedding providers	2026-06-16 04:02:08 +03:00
Andrew Altshuler	8b01c6e547	feat(server)!: cluster-only server — remove single-graph serving (RFC-011) (#250 ) omnigraph-server boots only from --cluster; all HTTP is /graphs/<id>/…; flat single-graph routes and the omnigraph.yaml server boot are removed. GraphRouting/ServerConfigMode collapse to multi-only; openapi.json regenerated to the nested shape; ~100 server route tests migrated; parity/system_local boot from a converged cluster. Gate green (1410 tests).	2026-06-15 20:17:25 +03:00
aaltshuler	58855c0a7c	feat(cluster,server): inline policy content + config-free --cluster URI boot Two serving changes that complete RFC-006's read side: ServingPolicy carries the policy bundle CONTENT (digest-verified at snapshot read) instead of a blob path — the catalog may live on object storage, and the server must not re-read mutable state after the snapshot. The server grows a PolicySource enum: File for omnigraph.yaml deployments (unchanged), Inline for cluster boots, wired through PolicyEngine::load_{graph,server}_from_source. read_serving_snapshot_from_storage(uri) reads the applied revision straight from a storage root, and --cluster accepts a scheme-qualified URI (s3://bucket/prefix): config-free serving — a serving box needs only the URI and credentials; the ledger and catalog on the bucket ARE the deployment artifact. Bare paths keep the config-directory behavior. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:56:22 +03:00
aaltshuler	b036073ec6	refactor(server): split the test monolith into area suites tests/server.rs (6,517 lines, 110 tests) becomes seven area files — auth_policy, data_routes, schema_routes, stored_queries, multi_graph, boot_settings, s3 — with shared helpers in tests/support/mod.rs. Verbatim moves + visibility bumps (pub on helpers, pub(super)->pub inside the matrix harness); cargo fix stripped the per-file unused imports. All 110 tests pass in their new homes (289 across the crate including lib and openapi). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 15:03:51 +03:00

6 commits