docs(cluster): descope ETL pipelines to a separate project; keep the socket (#172)

Pipelines (scheduler, connectors, mapping, idempotency, run ledger) leave the cluster control-plane rollout and become their own project with their own RFC. This rollout guarantees only the socket, all of which already exists and is enforced: the pipelines: config field is reserved (typed future_phase_field rejection, test-covered), the pipeline.<name> typed address and Pipeline resource kind are reserved in the resource model, and axiom 13 fixes the contract any future implementation must satisfy (definition reconciled, execution data-plane, fan-out statusful). The ETL section in the high-level spec stands as the requirements record for that project; exit criterion 9 defers to its RFC. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-24 02:38:06 +02:00 · 2026-06-10 14:53:16 +03:00 · 2026-06-10 14:53:16 +03:00 · 61da7bf406
commit 61da7bf406
parent 14b85a59de
2 changed files with 32 additions and 8 deletions
--- a/docs/dev/cluster-config-specs.md
+++ b/docs/dev/cluster-config-specs.md
@ -178,6 +178,21 @@ but it remains a separate CAS step from graph manifest movement. -->

 ## ETL pipelines (the second data-plane seam)

+> **Scope note (2026-06-10): descoped to a separate project.** Pipelines are
+> a product surface of their own (scheduler, connectors, mapping language,
+> idempotency, run ledger) and will be designed and built outside the cluster
+> control-plane track. What this spec retains is the **socket** they plug
+> into, which is already enforced: (1) the `pipelines:` config field is
+> reserved — `cluster validate` rejects it with a typed `future_phase_field`
+> diagnostic, so it can never be silently squatted; (2) the typed address
+> form `pipeline.<name>` and the `Pipeline` resource kind are reserved in the
+> resource model; (3) axiom 13 fixes the contract any future implementation
+> must satisfy — the pipeline *definition* is a reconciled cluster resource,
+> its *execution* is data-plane and never reconciled. The design text below
+> stands as the requirements record for that project, not as a phase of this
+> one.
+
+
 External data — from another database, an API, a file drop, a stream — is a first-class config asset, not glue code that lives nowhere.

 A **Pipeline** is declared in config: a **source** (e.g. `notion`, `github`, `slack`, `gdrive`, `postgres`, `http`, `s3-files`, `kafka`), an optional **schedule/trigger**, and **one or more target graphs**, each with its own **mapping/transform** (external records → graph types & properties). A single feed can **fan out across graphs** — e.g. a GitHub sync that populates both the `engineering` graph and the people/teams in `knowledge`. It is reconciled like any resource — `apply` creates / updates / deletes / (re)schedules the pipeline *definition*. This is the canonical "company brain" move: the deployment's graphs are continuously assembled from the SaaS tools the org already uses.