feat(connections): add execute-only warehouses; stop silent full-project scans

A configured warehouse was always a scan/ingest target. The only way to use a
connection purely for SQL execution (ktx sql / sql_execution) was the leaky
workaround of an empty setup.database_connection_ids — which actually re-includes
every warehouse via the 'fall back to all' branch — so e.g. a BigQuery connection
meant only for read-only queries triggered a full-billing-project scan.

- Add a per-connection scan_enabled flag (default true) to warehouse connections.
  scan_enabled: false registers the connection for execution only and never as a
  scan target.
- Route every scan-target selection path through one predicate
  (isScanTargetWarehouse): both ingest (primaryWarehouseConnectionIds, including
  the all-warehouses fallback) and setup (configuredPrimaryConnectionIds) now
  exclude execute-only connections. Setup validates the credential but skips
  scope discovery and scan for them. Execution paths are untouched — the warehouse
  descriptor still resolves, so ktx sql / sql_execution keep working.
- Scripted setup with no --database-schema no longer silently scopes the scan to
  every discovered schema/dataset: it warns with the count and names how to narrow
  (--database-schema) or opt out (scan_enabled: false).
This commit is contained in:
Andrey Avtomonov 2026-06-09 13:05:15 +02:00
parent a02fcab487
commit ece0dfb2c8
10 changed files with 187 additions and 3 deletions

View file

@ -15,6 +15,10 @@ Use `ktx sql` with a required connection id and positional SQL text.
ktx sql --connection <id> [options] <sql...>
```
`ktx sql` runs against any configured connection, whether or not it is a scan or
ingest target. Connections marked `scan_enabled: false` (execute-only) work here
too — see [execute-only connections](/docs/configuration/ktx-yaml#execute-only-connections).
## Options
Use output flags to choose between terminal display, TSV rows, and structured

View file

@ -158,6 +158,29 @@ connections:
dataset_ids: [analytics, mart]
```
#### Execute-only connections
Set `scan_enabled: false` to register a warehouse for SQL execution only. The
connection is usable by `ktx sql` and the agent `sql_execution` tool, but **ktx**
never introspects, scans, or ingests it — and `ktx setup` validates the
credential without discovering or scanning its schemas. This is the supported way
to run read-only queries against shared or public data (for example a BigQuery
billing project full of unrelated datasets) without making it a context source.
```yaml
connections:
public_bq:
driver: bigquery
credentials_json: file:./service-account.json
scan_enabled: false
```
Without `scan_enabled`, a warehouse is a scan target. In scripted setup
(`--no-input`) with no `--database-schema` and no `dataset_ids`/`schemas`, **ktx**
scopes the scan to every schema or dataset the credential can see and prints a
warning naming the count; pass `--database-schema` to narrow it, or
`scan_enabled: false` to register it for execution only.
For Postgres, MySQL, SQL Server, and Snowflake connections, set
`maxConnections` when scan or ingest work needs to stay below the target's
connection cap. Postgres, MySQL, and SQL Server default to `10`; Snowflake