docs(cluster): descope ETL pipelines to a separate project; keep the socket (#172)

Pipelines (scheduler, connectors, mapping, idempotency, run ledger) leave the
cluster control-plane rollout and become their own project with their own
RFC. This rollout guarantees only the socket, all of which already exists and
is enforced: the pipelines: config field is reserved (typed
future_phase_field rejection, test-covered), the pipeline.<name> typed
address and Pipeline resource kind are reserved in the resource model, and
axiom 13 fixes the contract any future implementation must satisfy
(definition reconciled, execution data-plane, fan-out statusful). The ETL
section in the high-level spec stands as the requirements record for that
project; exit criterion 9 defers to its RFC.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Andrew Altshuler 2026-06-10 14:53:16 +03:00 committed by GitHub
parent 14b85a59de
commit 61da7bf406
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 32 additions and 8 deletions

View file

@ -178,6 +178,21 @@ but it remains a separate CAS step from graph manifest movement. -->
## ETL pipelines (the second data-plane seam)
> **Scope note (2026-06-10): descoped to a separate project.** Pipelines are
> a product surface of their own (scheduler, connectors, mapping language,
> idempotency, run ledger) and will be designed and built outside the cluster
> control-plane track. What this spec retains is the **socket** they plug
> into, which is already enforced: (1) the `pipelines:` config field is
> reserved — `cluster validate` rejects it with a typed `future_phase_field`
> diagnostic, so it can never be silently squatted; (2) the typed address
> form `pipeline.<name>` and the `Pipeline` resource kind are reserved in the
> resource model; (3) axiom 13 fixes the contract any future implementation
> must satisfy — the pipeline *definition* is a reconciled cluster resource,
> its *execution* is data-plane and never reconciled. The design text below
> stands as the requirements record for that project, not as a phase of this
> one.
External data — from another database, an API, a file drop, a stream — is a first-class config asset, not glue code that lives nowhere.
A **Pipeline** is declared in config: a **source** (e.g. `notion`, `github`, `slack`, `gdrive`, `postgres`, `http`, `s3-files`, `kafka`), an optional **schedule/trigger**, and **one or more target graphs**, each with its own **mapping/transform** (external records → graph types & properties). A single feed can **fan out across graphs** — e.g. a GitHub sync that populates both the `engineering` graph and the people/teams in `knowledge`. It is reconciled like any resource — `apply` creates / updates / deletes / (re)schedules the pipeline *definition*. This is the canonical "company brain" move: the deployment's graphs are continuously assembled from the SaaS tools the org already uses.