feat: Add duckdb connector (#308)

* refactor(duckdb): extract shared json-safe bigint helper Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(duckdb): add and register the duckdb primary connector Add KtxDuckDbDialect, KtxDuckDbScanConnector (local file-backed, read-only, never-create, main-schema introspection via information_schema and duckdb_constraints() for foreign keys), and register the duckdb driver across the dialect factory, driver registry, connection-type enum, warehouse descriptor, config schema, scan normalization, connection test drivers, and status display. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(duckdb): route live-database ingest through the DuckDB connector Add the DuckDB live-database introspection bridge and dispatch duckdb connections to it in local-adapters, matching the SQLite path. Repoint the config-rejection test off duckdb (now a valid driver) onto the no-driver case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(duckdb): add duckdb to the setup database flow Offer DuckDB in the interactive checklist and via ktx setup --database duckdb, with a file-path prompt and duckdb-local default connection id, parallel to SQLite. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(duckdb): attach native duckdb files in federation Native .duckdb members ATTACH with (READ_ONLY) and no TYPE/INSTALL/LOAD, since the duckdb format needs no extension. attachTypeForDriver returns null for the native case; buildAttachStatements builds load statements from non-null types only and emits a conditional ATTACH clause. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(duckdb): document the duckdb primary-source connector Add a DuckDB section to the primary-sources integration page (config, read-only never-create behavior, main-schema scope, federation) and update the supported-driver assertion in dialects.test.ts to include duckdb. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(duckdb): use single-namespace display shape for main-only refs DuckDB v1 introspects the main schema and sets db=null on every table, so its display refs are single-namespace like SQLite. The ansi shape emitted a 1-part table display it then refused to parse, breaking column-level display resolution. Switch the dialect to the sqlite display shape and add a round-trip test plus a composite-foreign-key test that were missing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(duckdb): resolve connector dialect via getDialectForDriver Route the connector's dialect through the shared factory like every other connector, now that duckdb is registered. Single construction path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(duckdb): skip schema picker for single-file duckdb setup DuckDB is a single-file, single-namespace ('main') database like SQLite, but the setup scope step only skipped the schema picker for sqlite. DuckDB fell into the multi-schema path with an empty schema list, rendering a broken picker ("No matches found" for main). Extend the file-based-driver early-return to cover duckdb so it ingests every table directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(duckdb): reuse shared config helper and derive scope skip Route duckdb path resolution through the shared resolveStringReference helper instead of a local third copy of env:/file: handling. Derive the setup scope-picker skip from SCOPE_DISCOVERY_SPECS membership rather than a hardcoded sqlite/duckdb driver list. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(duckdb): use a genuinely unknown driver in the rejection test The merged "rejects unknown drivers" test used `driver: duckdb` as its unknown-driver stand-in, which stopped being unknown once this branch added the duckdb connector. Switch to `nonsense` so it again exercises the unsupported-driver config error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(duckdb): cover dialect, connector, and live-introspection branches Codecov flagged uncovered branches as dead code; all are real connector, dialect, and live-ingest behavior. Add unit tests instead of removing them. - dialect: precedence ladder, sample/clause builders, profiling expressions - connector: url/env config forms, error throws, never-create guard, cardinality cap branches, table-scope empty/non-empty paths - live-introspection: full-schema and table-scope extraction Functions 100%, lines ~99% across the duckdb connector dir. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: add DuckDB to supported-driver references The DuckDB connector PR documented the connector itself but left the scattered supported-driver enumerations stale. Add duckdb to the federation concept page (participation table, activation, table naming, limitations), the ktx setup CLI reference, the ktx.yaml warehouse-driver table, the primary-sources field reference, and the quickstart driver list (which also restores the missing ClickHouse entry). --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
2026-07-04 10:52:13 +02:00 · 2026-07-01 19:06:02 +07:00 · 2026-07-01 19:06:02 +07:00 · 3c4fcc27c7
commit 3c4fcc27c7
parent f21594c42a
39 changed files with 1366 additions and 59 deletions
--- a/docs-site/content/docs/cli-reference/ktx-setup.mdx
+++ b/docs-site/content/docs/cli-reference/ktx-setup.mdx
@ -120,9 +120,9 @@ runtime features are missing.

 | Flag | Description |
 |------|-------------|
-| `--database <driver>` | Database driver to configure; repeatable. Choices: `sqlite`, `postgres`, `mysql`, `clickhouse`, `sqlserver`, `bigquery`, `snowflake` |
+| `--database <driver>` | Database driver to configure; repeatable. Choices: `sqlite`, `duckdb`, `postgres`, `mysql`, `clickhouse`, `sqlserver`, `bigquery`, `snowflake` |
 | `--database-connection-id <id>` | Existing selected connection id; repeatable. With `--database` or `--database-url`, connection id for the new connection. |
-| `--database-url <url>` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection; also used as the SQLite path |
+| `--database-url <url>` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection; also used as the SQLite or DuckDB path |
 | `--database-schema <schema>` | Database schema or dataset to include; repeatable |
 | `--skip-databases` | Leave database setup incomplete |

--- a/docs-site/content/docs/concepts/cross-database-federation.mdx
+++ b/docs-site/content/docs/concepts/cross-database-federation.mdx
@ -1,6 +1,6 @@
 ---
 title: Cross-database federation
-description: How ktx federates postgres, mysql, and sqlite connections so a single read-only SQL query can join across them without copying data.
+description: How ktx federates postgres, mysql, sqlite, and duckdb connections so a single read-only SQL query can join across them without copying data.
 ---

 Cross-database federation lets a single read-only SQL query join tables that
@ -20,13 +20,14 @@ block to add. With zero or one compatible connection the behavior is unchanged.

 ## Which connections participate

-The v1 federation engine supports three drivers:
+The v1 federation engine supports four drivers:

 | Driver | Participates in federation |
 |--------|---------------------------|
 | `postgres` | Yes |
 | `mysql` | Yes |
 | `sqlite` | Yes |
+| `duckdb` | Yes |
 | `snowflake` | No — standalone connection |
 | `bigquery` | No — standalone connection |
 | `clickhouse` | No — standalone connection |
@ -38,7 +39,7 @@ queried independently; they do not appear as federation members.
 ## How it activates

 **ktx** inspects the connections in `ktx.yaml` at startup. When it finds two or
-more connections whose driver is `postgres`, `mysql`, or `sqlite`, it
+more connections whose driver is `postgres`, `mysql`, `sqlite`, or `duckdb`, it
 instantiates the DuckDB federation engine and attaches each one read-only.
 There is no `federation:` key, no opt-in flag, and no connection-level setting
 to enable. The engine is derived entirely from what is already declared.
@ -60,9 +61,10 @@ Two attach-compatible connections are present, so federation is active.
 ## Table naming in federated queries

 Inside a federated query, postgres and mysql tables use a three-part name:
-`connectionId.schema.table`. SQLite tables, which have no schema layer in
-DuckDB, use the two-part form `connectionId.table`. In both cases the
-connection's `id` field in `ktx.yaml` becomes the catalog name inside DuckDB.
+`connectionId.schema.table`. SQLite and DuckDB tables use the two-part form
+`connectionId.table`, since ktx addresses both as single-namespace members. In
+both cases the connection's `id` field in `ktx.yaml` becomes the catalog name
+inside DuckDB.

 If a connection `id` is not a bare SQL identifier — for example it contains a
 hyphen, like `books-db` — double-quote it in the query the same way DuckDB
@ -131,8 +133,8 @@ ktx sql -c _ktx_federated \
 Table names follow the rules from
 [Table naming in federated queries](#table-naming-in-federated-queries):
 three-part `connectionId.schema.table` for postgres and mysql, two-part
-`connectionId.table` for sqlite. The `_ktx_federated` id is virtual — it is
-never written to `ktx.yaml` and only exists when two or more attach-compatible
+`connectionId.table` for sqlite and duckdb. The `_ktx_federated` id is virtual —
+it is never written to `ktx.yaml` and only exists when two or more attach-compatible
 connections are declared. It surfaces in `ktx connection` and in the agent's
 connection list so the id is discoverable. Querying a single member database
 directly with its own connection id (`ktx sql -c pg_books ...`) is unchanged.
@ -149,6 +151,6 @@ database through the federation engine.
  them in a source's `joins:` block and automatic discovery of cross-database
  relationships are not available yet. Intra-database relationship discovery for
  each member connection is unchanged.
- **postgres, mysql, and sqlite only.** Other drivers (snowflake, bigquery,
-  clickhouse, sqlserver) do not participate in federation in this version. They
-  remain usable as standalone connections.
+- **postgres, mysql, sqlite, and duckdb only.** Other drivers (snowflake,
+  bigquery, clickhouse, sqlserver) do not participate in federation in this
+  version. They remain usable as standalone connections.
--- a/docs-site/content/docs/configuration/ktx-yaml.mdx
+++ b/docs-site/content/docs/configuration/ktx-yaml.mdx
@ -109,6 +109,7 @@ context-source drivers share the map.
 | `postgres` | Warehouse | `driver` | `url`, `enabled_tables`, `historicSql`, `context.queryHistory` |
 | `mysql` | Warehouse | `driver` | `url`, `enabled_tables` |
 | `sqlite` | Warehouse | `driver` | `url` or `path`, `enabled_tables` |
+| `duckdb` | Warehouse | `driver` | `url` or `path`, `enabled_tables` |
 | `sqlserver` | Warehouse | `driver` | `url`, `enabled_tables` |
 | `bigquery` | Warehouse | `driver` | `credentials_json`, `dataset_ids`, `enabled_tables`, `historicSql` |
 | `snowflake` | Warehouse | `driver` | `schema_names`, `enabled_tables`, `historicSql` |
--- a/docs-site/content/docs/getting-started/quickstart.mdx
+++ b/docs-site/content/docs/getting-started/quickstart.mdx
@ -218,7 +218,8 @@ The wizard walks you through everything **ktx** needs in one pass:
 3. **Embeddings** - picks an embeddings backend. Choose OpenAI for hosted
   embeddings or `sentence-transformers` to run locally without an API key.
 4. **Database** - adds at least one primary connection. Supported drivers:
-   SQLite, PostgreSQL, MySQL, SQL Server, BigQuery, and Snowflake.
+   PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, SQLite, and
+   DuckDB.
 5. **Context sources** - optionally adds dbt, MetricFlow, LookML, Looker,
   Metabase, or Notion. You can skip and add them later.
 6. **Build** - offers to run the first ingest so semantic sources and wiki
--- a/docs-site/content/docs/integrations/primary-sources.mdx
+++ b/docs-site/content/docs/integrations/primary-sources.mdx
@ -1,6 +1,6 @@
 ---
 title: Primary Sources
-description: Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, SQLite, or MongoDB.
+description: Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, SQLite, DuckDB, or MongoDB.
 ---

 **ktx** connects to your data warehouse or database to build schema context,
@ -26,14 +26,14 @@ Agents should prefer environment or file references over literal secrets.

 | Field | Required | Applies to | Description |
 |-------|----------|------------|-------------|
-| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `clickhouse`, `sqlserver`, `sqlite`, or `mongodb` |
+| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `clickhouse`, `sqlserver`, `sqlite`, `duckdb`, or `mongodb` |
 | `url` | One of the connection methods | URL-style connectors | Database URL, `env:NAME`, or `file:/path/to/secret` |
 | `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, SQL Server | Field-by-field connection values |
 | `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan |
 | `databases` | No | ClickHouse, MongoDB | List of databases to scan |
 | `sample_size`, `order_by` | No | MongoDB | Schema-inference sampling controls (recent documents, sort field) |
 | `context.queryHistory` | No | PostgreSQL, Snowflake, BigQuery | Enables query-history ingestion when the warehouse supports it |
-| `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference |
+| `path` | Yes for path-style SQLite/DuckDB | SQLite, DuckDB | Local SQLite or DuckDB database path or `env:NAME` reference |
 | `max_bytes_billed` | No | BigQuery | Maximum bytes billed per query job |
 | `query_timeout_ms` | No | all warehouses | Maximum execution time for a single read-only query, in milliseconds (default 30000). A query exceeding it is cancelled server-side (or, for SQLite, by terminating the off-process executor) and returns a `query exceeded Ns` error so the agent can revise. |
 | `project_id` | No | BigQuery | Optional local descriptor and mapping metadata; not used for BigQuery authentication |
@ -545,6 +545,52 @@ No authentication required - SQLite is file-based. The file must be readable by

 ---

+## DuckDB
+
+File-based connector using the DuckDB Node.js API. Ideal for local analytics, embedded warehouses, and cross-database federation.
+
+### Connection config
+
+```yaml title="ktx.yaml"
+connections:
+  warehouse:
+    driver: duckdb
+    path: data/warehouse.duckdb
+```
+
+`path` is resolved relative to the project directory. The `.duckdb` file must already exist — **ktx** never creates a missing database file.
+
+### Authentication
+
+No authentication required — DuckDB is file-based. The `.duckdb` file must be readable by the process running **ktx**.
+
+### Features
+
+| Feature | Supported | Notes |
+|---------|-----------|-------|
+| Tables & views | Yes | Via `information_schema` on the `main` schema |
+| Primary keys | Yes | Via `information_schema.table_constraints` |
+| Foreign keys | Yes | Via DuckDB's `duckdb_constraints()` catalog function |
+| Row count estimates | Yes | Exact count via `SELECT COUNT(*)` |
+| Column statistics | No | - |
+| Query history | No | - |
+| Table sampling | Yes | - |
+| Nested analysis | No | - |
+
+### Dialect notes
+
+- Introspection scans the `main` schema only
+- Execution is read-only; **ktx** opens the file without write access
+- Parameter binding uses positional `?` placeholders
+- Uses `LIMIT X OFFSET Y` for pagination
+- Database file must exist before `ktx connection test` or ingest runs
+
+### Cross-database federation
+
+When a project declares two or more attach-compatible connections — any combination of `postgres`, `mysql`, `sqlite`, and `duckdb` — **ktx** derives a cross-database federation connection. That connection can ATTACH a native `.duckdb` file, allowing semantic queries to join across sources without manually copying data.
+
+---
+
 ## MongoDB

 Connects to MongoDB as a primary context source. **ktx** treats each collection