diff --git a/docs-site/content/docs/cli-reference/ktx-setup.mdx b/docs-site/content/docs/cli-reference/ktx-setup.mdx index 87fcddaa..a52a3eba 100644 --- a/docs-site/content/docs/cli-reference/ktx-setup.mdx +++ b/docs-site/content/docs/cli-reference/ktx-setup.mdx @@ -103,7 +103,7 @@ runtime features are missing. | Flag | Description | |------|-------------| -| `--database ` | Database driver to configure; repeatable. Choices: `sqlite`, `postgres`, `mysql`, `sqlserver`, `bigquery`, `snowflake` | +| `--database ` | Database driver to configure; repeatable. Choices: `sqlite`, `postgres`, `mysql`, `clickhouse`, `sqlserver`, `bigquery`, `snowflake` | | `--database-connection-id ` | Existing selected connection id; repeatable. With `--database` or `--database-url`, connection id for the new connection. | | `--database-url ` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection; also used as the SQLite path | | `--database-schema ` | Database schema or dataset to include; repeatable | @@ -113,6 +113,10 @@ runtime features are missing. context. Use `--skip-databases` only when intentionally leaving the project incomplete. +`--database-schema` maps to the driver's scope field: `schemas` for PostgreSQL, +MySQL, and SQL Server; `schema_names` for Snowflake; `dataset_ids` for +BigQuery; and `databases` for ClickHouse. + ### Query History | Flag | Description | diff --git a/docs-site/content/docs/configuration/ktx-yaml.mdx b/docs-site/content/docs/configuration/ktx-yaml.mdx index fac6f3f9..4008a45d 100644 --- a/docs-site/content/docs/configuration/ktx-yaml.mdx +++ b/docs-site/content/docs/configuration/ktx-yaml.mdx @@ -109,9 +109,9 @@ context-source drivers share the map. | `mysql` | Warehouse | `driver` | `url`, `enabled_tables` | | `sqlite` | Warehouse | `driver` | `url` or `path`, `enabled_tables` | | `sqlserver` | Warehouse | `driver` | `url`, `enabled_tables` | -| `bigquery` | Warehouse | `driver` | `url`, `enabled_tables`, `historicSql` | -| `snowflake` | Warehouse | `driver` | `url`, `enabled_tables`, `historicSql` | -| `clickhouse` | Warehouse | `driver` | `url`, `enabled_tables` | +| `bigquery` | Warehouse | `driver` | `credentials_json`, `dataset_ids`, `enabled_tables`, `historicSql` | +| `snowflake` | Warehouse | `driver` | `schema_names`, `enabled_tables`, `historicSql` | +| `clickhouse` | Warehouse | `driver` | `url`, `database`, `databases`, `enabled_tables` | | `metabase` | Context source | `driver`, `api_url` | `api_key_ref`, `mappings` | | `looker` | Context source | `driver`, `base_url`, `client_id` | `client_secret_ref`, `mappings` | | `lookml` | Context source | `driver`, `repoUrl` | `branch`, `path`, `auth_token_ref`, `mappings` | @@ -136,6 +136,27 @@ connections: - public.customers ``` +Connector-specific scope fields let setup and scan use the same warehouse +boundary: + +```yaml +connections: + mysql-warehouse: + driver: mysql + url: env:MYSQL_URL + schemas: [analytics, mart] + clickhouse-warehouse: + driver: clickhouse + url: env:CLICKHOUSE_URL + database: analytics + databases: [analytics, mart] + bigquery-warehouse: + driver: bigquery + credentials_json: file:./service-account.json + location: US + dataset_ids: [analytics, mart] +``` + For Postgres, BigQuery, and Snowflake, `historicSql` and `context.queryHistory` toggle query-history ingest. The shape is connector-specific; the setup wizard writes these fields when you pass `--enable-query-history`. diff --git a/docs-site/content/docs/integrations/primary-sources.mdx b/docs-site/content/docs/integrations/primary-sources.mdx index da916339..5e9483f9 100644 --- a/docs-site/content/docs/integrations/primary-sources.mdx +++ b/docs-site/content/docs/integrations/primary-sources.mdx @@ -1,6 +1,6 @@ --- title: Primary Sources -description: Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, SQL Server, or SQLite. +description: Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, or SQLite. --- **ktx** connects to your data warehouse or database to build schema context, @@ -26,7 +26,7 @@ Agents should prefer environment or file references over literal secrets. | Field | Required | Applies to | Description | |-------|----------|------------|-------------| -| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `sqlserver`, or `sqlite` | +| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `clickhouse`, `sqlserver`, or `sqlite` | | `url` | One of the connection methods | URL-style connectors | Database URL, `env:NAME`, or `file:/path/to/secret` | | `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, SQL Server | Field-by-field connection values | | `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan | @@ -216,6 +216,10 @@ For multiple datasets: - finance ``` +BigQuery dataset scope is stored in `connections..dataset_ids`. Interactive +setup discovers datasets from credentials plus location, then writes the chosen +dataset ids as the scan scope. + ### Authentication | Method | Config | @@ -282,6 +286,10 @@ connections: url: env:MYSQL_DATABASE_URL ``` +MySQL supports selecting one or more databases during `ktx setup`. The selected +database scope is stored in `connections..schemas`, and `ktx scan` reads +exactly those databases. + Or with individual fields: ```yaml title="ktx.yaml" @@ -320,12 +328,66 @@ connections: - Parameter binding uses positional `?` placeholders - Uses `LIMIT X OFFSET Y` for pagination -- Single database per connection (no multi-schema) +- Multi-database scanning uses `schemas` as the selected database list - Supports 20+ MySQL types including `enum`, `json`, `datetime`, `decimal` - Table comments extracted with InnoDB metadata prefix stripping --- +## ClickHouse + +Connects to ClickHouse over HTTP. Supports table and column introspection across +one or more selected databases. + +### Connection config + +```yaml title="ktx.yaml" +connections: + my-clickhouse: + driver: clickhouse + url: env:CLICKHOUSE_DATABASE_URL + database: analytics +``` + +For multiple databases: + +```yaml + databases: + - analytics + - mart +``` + +ClickHouse supports selecting one or more databases during `ktx setup`. The +selected scan scope is stored in `connections..databases`. The single +`database` field remains the connection default for raw SQL and `ktx sql`. + +### Authentication + +| Method | Config | +|--------|--------| +| URL | `url: env:CLICKHOUSE_DATABASE_URL` | +| Password | `password: env:CLICKHOUSE_PASSWORD` or `password: file:/path/to/secret` | + +### Features + +| Feature | Supported | Notes | +|---------|-----------|-------| +| Tables & views | Yes | Via `system.tables` | +| Primary keys | No | Not exposed as relational constraints | +| Foreign keys | No | Not available in ClickHouse | +| Row count estimates | Yes | From ClickHouse metadata where available | +| Column statistics | No | - | +| Query history | No | - | +| Table sampling | Yes | Uses ClickHouse sampling syntax when supported | + +### Dialect notes + +- Parameter binding uses named placeholders +- The `database` field sets the default database for SQL execution +- The `databases` array controls the scan scope + +--- + ## SQL Server Connects to Microsoft SQL Server and Azure SQL. Supports multi-schema scanning with `dbo` as the default schema.