From 578141378d050d158640905f5a7e935893258d60 Mon Sep 17 00:00:00 2001 From: aaltshuler Date: Wed, 10 Jun 2026 14:08:09 +0300 Subject: [PATCH] docs(cluster): descope ETL pipelines to a separate project; keep the socket Pipelines (scheduler, connectors, mapping, idempotency, run ledger) leave the cluster control-plane rollout and become their own project with their own RFC. This rollout guarantees only the socket, all of which already exists and is enforced: the pipelines: config field is reserved (typed future_phase_field rejection, test-covered), the pipeline. typed address and Pipeline resource kind are reserved in the resource model, and axiom 13 fixes the contract any future implementation must satisfy (definition reconciled, execution data-plane, fan-out statusful). The ETL section in the high-level spec stands as the requirements record for that project; exit criterion 9 defers to its RFC. Co-Authored-By: Claude Fable 5 --- .../dev/cluster-config-implementation-spec.md | 25 +++++++++++++------ docs/dev/cluster-config-specs.md | 15 +++++++++++ 2 files changed, 32 insertions(+), 8 deletions(-) diff --git a/docs/dev/cluster-config-implementation-spec.md b/docs/dev/cluster-config-implementation-spec.md index d4cf3e6..8917426 100644 --- a/docs/dev/cluster-config-implementation-spec.md +++ b/docs/dev/cluster-config-implementation-spec.md @@ -529,7 +529,7 @@ These are the concrete "what requires downstream" rules. | Server registry | Boot from cluster state, eventually reload/reconcile graph handles, expose statuses | High | Affects routing, OpenAPI, auth, and workload admission | | API types/OpenAPI | Plan/status/apply DTOs if HTTP management endpoints ship | Medium/high | OpenAPI drift must be regenerated | | UI specs | New renderer/spec validator/binding checker | High | New product surface, not currently implemented | -| Pipelines | New scheduler/runtime/connector/mapping/idempotency/run ledger | Very high | Second data-plane seam; large product and correctness surface | +| Pipelines | New scheduler/runtime/connector/mapping/idempotency/run ledger | Very high | **Separate project** (socket reserved here); second data-plane seam, large product and correctness surface | | Embeddings | Cluster-level defaults, env refs, model/dimension validation, index interaction | Medium | Existing embedding code is mostly offline/client-side | | Docs | User docs for cluster config, policy, server, CLI; dev docs for invariants/testing | High | Public contract changes | | Tests | New cluster suites plus extensions to config/server/policy/recovery/schema/query tests | High | Needs boundary-matched coverage | @@ -616,13 +616,22 @@ actor threading, 4A/4B/4C staging). docs and migrations say it can be narrowed. - Deprecate and later remove `mcp.expose` from target-state cluster config. -### Phase 7: Pipeline Runtime +### Pipelines: separate project (socket only) -- Add scheduler/worker/runtime. -- Add source connector contracts, mapping validation, idempotency keys, - per-target run status, and retry behavior. -- Treat fan-out execution as data-plane writes unless explicitly staged through - branch/merge. +Pipelines are **descoped from this rollout** (2026-06-10): the runtime +(scheduler/worker, connector contracts, mapping validation, idempotency keys, +per-target run status, retry behavior) is a separate project with its own +RFC. This rollout guarantees only the socket: + +- `pipelines:` stays a reserved config field, rejected with a typed + `future_phase_field` diagnostic (enforced + test-covered in + `omnigraph-cluster`). +- `pipeline.` stays a reserved typed address; the resource model + (kind-agnostic state entries, extensible sidecar kinds, dependency edges) + accepts the new kind without reshaping. +- Axiom 13 is the contract the future implementation must satisfy: the + definition is reconciled, the execution is data-plane; fan-out is statusful, + never silently atomic. ## Test Ownership @@ -725,4 +734,4 @@ Before implementation begins beyond parser/validate, the RFC must answer: 6. Bootstrap authority and first-actor story. 7. Server startup and migration path from `omnigraph.yaml`. 8. Per-query policy schema and compatibility bridge for `mcp.expose`. -9. Pipeline runtime owner, status schema, and idempotency contract. +9. Pipeline runtime owner, status schema, and idempotency contract — **deferred to the separate pipelines project's own RFC**; this rollout only reserves the socket. diff --git a/docs/dev/cluster-config-specs.md b/docs/dev/cluster-config-specs.md index 8f36dc8..d248be2 100644 --- a/docs/dev/cluster-config-specs.md +++ b/docs/dev/cluster-config-specs.md @@ -178,6 +178,21 @@ but it remains a separate CAS step from graph manifest movement. --> ## ETL pipelines (the second data-plane seam) +> **Scope note (2026-06-10): descoped to a separate project.** Pipelines are +> a product surface of their own (scheduler, connectors, mapping language, +> idempotency, run ledger) and will be designed and built outside the cluster +> control-plane track. What this spec retains is the **socket** they plug +> into, which is already enforced: (1) the `pipelines:` config field is +> reserved — `cluster validate` rejects it with a typed `future_phase_field` +> diagnostic, so it can never be silently squatted; (2) the typed address +> form `pipeline.` and the `Pipeline` resource kind are reserved in the +> resource model; (3) axiom 13 fixes the contract any future implementation +> must satisfy — the pipeline *definition* is a reconciled cluster resource, +> its *execution* is data-plane and never reconciled. The design text below +> stands as the requirements record for that project, not as a phase of this +> one. + + External data — from another database, an API, a file drop, a stream — is a first-class config asset, not glue code that lives nowhere. A **Pipeline** is declared in config: a **source** (e.g. `notion`, `github`, `slack`, `gdrive`, `postgres`, `http`, `s3-files`, `kafka`), an optional **schedule/trigger**, and **one or more target graphs**, each with its own **mapping/transform** (external records → graph types & properties). A single feed can **fan out across graphs** — e.g. a GitHub sync that populates both the `engineering` graph and the people/teams in `knowledge`. It is reconciled like any resource — `apply` creates / updates / deletes / (re)schedules the pipeline *definition*. This is the canonical "company brain" move: the deployment's graphs are continuously assembled from the SaaS tools the org already uses.