mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-22 08:38:08 +02:00
feat(connections): add execute-only warehouses; stop silent full-project scans
A configured warehouse was always a scan/ingest target. The only way to use a connection purely for SQL execution (ktx sql / sql_execution) was the leaky workaround of an empty setup.database_connection_ids — which actually re-includes every warehouse via the 'fall back to all' branch — so e.g. a BigQuery connection meant only for read-only queries triggered a full-billing-project scan. - Add a per-connection scan_enabled flag (default true) to warehouse connections. scan_enabled: false registers the connection for execution only and never as a scan target. - Route every scan-target selection path through one predicate (isScanTargetWarehouse): both ingest (primaryWarehouseConnectionIds, including the all-warehouses fallback) and setup (configuredPrimaryConnectionIds) now exclude execute-only connections. Setup validates the credential but skips scope discovery and scan for them. Execution paths are untouched — the warehouse descriptor still resolves, so ktx sql / sql_execution keep working. - Scripted setup with no --database-schema no longer silently scopes the scan to every discovered schema/dataset: it warns with the count and names how to narrow (--database-schema) or opt out (scan_enabled: false).
This commit is contained in:
parent
a02fcab487
commit
ece0dfb2c8
10 changed files with 187 additions and 3 deletions
|
|
@ -15,6 +15,10 @@ Use `ktx sql` with a required connection id and positional SQL text.
|
|||
ktx sql --connection <id> [options] <sql...>
|
||||
```
|
||||
|
||||
`ktx sql` runs against any configured connection, whether or not it is a scan or
|
||||
ingest target. Connections marked `scan_enabled: false` (execute-only) work here
|
||||
too — see [execute-only connections](/docs/configuration/ktx-yaml#execute-only-connections).
|
||||
|
||||
## Options
|
||||
|
||||
Use output flags to choose between terminal display, TSV rows, and structured
|
||||
|
|
|
|||
|
|
@ -158,6 +158,29 @@ connections:
|
|||
dataset_ids: [analytics, mart]
|
||||
```
|
||||
|
||||
#### Execute-only connections
|
||||
|
||||
Set `scan_enabled: false` to register a warehouse for SQL execution only. The
|
||||
connection is usable by `ktx sql` and the agent `sql_execution` tool, but **ktx**
|
||||
never introspects, scans, or ingests it — and `ktx setup` validates the
|
||||
credential without discovering or scanning its schemas. This is the supported way
|
||||
to run read-only queries against shared or public data (for example a BigQuery
|
||||
billing project full of unrelated datasets) without making it a context source.
|
||||
|
||||
```yaml
|
||||
connections:
|
||||
public_bq:
|
||||
driver: bigquery
|
||||
credentials_json: file:./service-account.json
|
||||
scan_enabled: false
|
||||
```
|
||||
|
||||
Without `scan_enabled`, a warehouse is a scan target. In scripted setup
|
||||
(`--no-input`) with no `--database-schema` and no `dataset_ids`/`schemas`, **ktx**
|
||||
scopes the scan to every schema or dataset the credential can see and prints a
|
||||
warning naming the count; pass `--database-schema` to narrow it, or
|
||||
`scan_enabled: false` to register it for execution only.
|
||||
|
||||
For Postgres, MySQL, SQL Server, and Snowflake connections, set
|
||||
`maxConnections` when scan or ingest work needs to stay below the target's
|
||||
connection cap. Postgres, MySQL, and SQL Server default to `10`; Snowflake
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue