feat(connectors): generalize readiness and constraint handling (#212)

* feat(connectors): add postgres maxConnections * feat(connectors): add mysql maxConnections * feat(connectors): add sqlserver maxConnections * feat(connectors): rename snowflake pool config * docs: document connector maxConnections * feat(scan): add constraint discovery warning helper * feat(scan): carry structural warnings through reports * feat(postgres): soft-fail denied constraint discovery * feat(mysql): soft-fail denied constraint discovery * feat(sqlserver): soft-fail denied constraint discovery * feat(bigquery): soft-fail denied primary key discovery * feat(snowflake): report denied primary key discovery * test(scan): verify constraint discovery warnings * feat(historic-sql): use shared readiness probes * docs: document query history readiness probes * test(historic-sql): verify readiness probe registry * test(ingest): account for live database warnings artifact * Add skip option for agent setup
2026-06-25 08:48:08 +02:00 · 2026-05-24 19:30:06 +02:00 · 2026-05-24 19:30:06 +02:00 · 78b8a0c025
commit 78b8a0c025
parent cfd1749ab9
42 changed files with 2763 additions and 554 deletions
--- a/docs-site/content/docs/configuration/ktx-yaml.mdx
+++ b/docs-site/content/docs/configuration/ktx-yaml.mdx
@ -157,11 +157,14 @@ connections:
    dataset_ids: [analytics, mart]
 ```

-For Snowflake connections, set `maxSessions` when deep ingest needs more or
-fewer concurrent warehouse sessions. The default is `4`. This caps all
-concurrent Snowflake SQL work for that connector instance, including schema
-introspection, table sampling, relationship profiling, relationship
-validation, and read-only SQL execution.
+For Postgres, MySQL, SQL Server, and Snowflake connections, set
+`maxConnections` when scan or ingest work needs to stay below the target's
+connection cap. Postgres, MySQL, and SQL Server default to `10`; Snowflake
+defaults to `4`. This caps all concurrent SQL work for that connector instance,
+including schema introspection, table sampling, relationship profiling,
+relationship validation, and read-only SQL execution. BigQuery and ClickHouse
+do not expose `maxConnections` because their connectors don't use client-side
+connection pools.

 For Postgres, BigQuery, and Snowflake, `historicSql` and `context.queryHistory`
 toggle query-history ingest. The shape is connector-specific; the setup wizard
@ -517,7 +520,7 @@ the manifest.
 | `relationships.maxLlmTablesPerBatch` | `int > 0` | `40` | Max tables included in a single LLM relationship-proposal batch. |
 | `relationships.maxCandidatesPerColumn` | `int > 0` | `25` | Max join partners considered per column. |
 | `relationships.profileSampleRows` | `int > 0` | `10000` | Rows sampled per table when profiling values for relationship inference. |
-| `relationships.profileConcurrency` | `int > 0` | `4` | Parallel relationship-profile queries against the database. For Snowflake, effective database concurrency is also bounded by the connection's `maxSessions`. |
+| `relationships.profileConcurrency` | `int > 0` | `4` | Parallel relationship-profile queries against the database. For pooled connectors, effective database concurrency is also bounded by the connection's `maxConnections`. |
 | `relationships.validationConcurrency` | `int > 0` | `4` | Parallel relationship validation queries against the database. |
 | `relationships.validationBudget` | `all` \| `int ≥ 0` | runtime default | Cap on validation queries per scan. `all` means unlimited. |