mirror of
https://github.com/Kaelio/ktx.git
synced 2026-07-04 10:52:13 +02:00
feat: Add duckdb connector (#308)
* refactor(duckdb): extract shared json-safe bigint helper
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(duckdb): add and register the duckdb primary connector
Add KtxDuckDbDialect, KtxDuckDbScanConnector (local file-backed, read-only,
never-create, main-schema introspection via information_schema and
duckdb_constraints() for foreign keys), and register the duckdb driver across
the dialect factory, driver registry, connection-type enum, warehouse descriptor,
config schema, scan normalization, connection test drivers, and status display.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(duckdb): route live-database ingest through the DuckDB connector
Add the DuckDB live-database introspection bridge and dispatch duckdb
connections to it in local-adapters, matching the SQLite path. Repoint the
config-rejection test off duckdb (now a valid driver) onto the no-driver case.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(duckdb): add duckdb to the setup database flow
Offer DuckDB in the interactive checklist and via ktx setup --database duckdb,
with a file-path prompt and duckdb-local default connection id, parallel to SQLite.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(duckdb): attach native duckdb files in federation
Native .duckdb members ATTACH with (READ_ONLY) and no TYPE/INSTALL/LOAD, since
the duckdb format needs no extension. attachTypeForDriver returns null for the
native case; buildAttachStatements builds load statements from non-null types
only and emits a conditional ATTACH clause.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(duckdb): document the duckdb primary-source connector
Add a DuckDB section to the primary-sources integration page (config, read-only
never-create behavior, main-schema scope, federation) and update the
supported-driver assertion in dialects.test.ts to include duckdb.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(duckdb): use single-namespace display shape for main-only refs
DuckDB v1 introspects the main schema and sets db=null on every table, so its
display refs are single-namespace like SQLite. The ansi shape emitted a 1-part
table display it then refused to parse, breaking column-level display resolution.
Switch the dialect to the sqlite display shape and add a round-trip test plus a
composite-foreign-key test that were missing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(duckdb): resolve connector dialect via getDialectForDriver
Route the connector's dialect through the shared factory like every other
connector, now that duckdb is registered. Single construction path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(duckdb): skip schema picker for single-file duckdb setup
DuckDB is a single-file, single-namespace ('main') database like SQLite,
but the setup scope step only skipped the schema picker for sqlite. DuckDB
fell into the multi-schema path with an empty schema list, rendering a
broken picker ("No matches found" for main). Extend the file-based-driver
early-return to cover duckdb so it ingests every table directly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(duckdb): reuse shared config helper and derive scope skip
Route duckdb path resolution through the shared resolveStringReference
helper instead of a local third copy of env:/file: handling. Derive the
setup scope-picker skip from SCOPE_DISCOVERY_SPECS membership rather than
a hardcoded sqlite/duckdb driver list.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(duckdb): use a genuinely unknown driver in the rejection test
The merged "rejects unknown drivers" test used `driver: duckdb` as its
unknown-driver stand-in, which stopped being unknown once this branch
added the duckdb connector. Switch to `nonsense` so it again exercises
the unsupported-driver config error.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(duckdb): cover dialect, connector, and live-introspection branches
Codecov flagged uncovered branches as dead code; all are real connector,
dialect, and live-ingest behavior. Add unit tests instead of removing them.
- dialect: precedence ladder, sample/clause builders, profiling expressions
- connector: url/env config forms, error throws, never-create guard,
cardinality cap branches, table-scope empty/non-empty paths
- live-introspection: full-schema and table-scope extraction
Functions 100%, lines ~99% across the duckdb connector dir.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs: add DuckDB to supported-driver references
The DuckDB connector PR documented the connector itself but left the
scattered supported-driver enumerations stale. Add duckdb to the
federation concept page (participation table, activation, table naming,
limitations), the ktx setup CLI reference, the ktx.yaml warehouse-driver
table, the primary-sources field reference, and the quickstart driver
list (which also restores the missing ClickHouse entry).
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
This commit is contained in:
parent
f21594c42a
commit
3c4fcc27c7
39 changed files with 1366 additions and 59 deletions
|
|
@ -120,9 +120,9 @@ runtime features are missing.
|
|||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--database <driver>` | Database driver to configure; repeatable. Choices: `sqlite`, `postgres`, `mysql`, `clickhouse`, `sqlserver`, `bigquery`, `snowflake` |
|
||||
| `--database <driver>` | Database driver to configure; repeatable. Choices: `sqlite`, `duckdb`, `postgres`, `mysql`, `clickhouse`, `sqlserver`, `bigquery`, `snowflake` |
|
||||
| `--database-connection-id <id>` | Existing selected connection id; repeatable. With `--database` or `--database-url`, connection id for the new connection. |
|
||||
| `--database-url <url>` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection; also used as the SQLite path |
|
||||
| `--database-url <url>` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection; also used as the SQLite or DuckDB path |
|
||||
| `--database-schema <schema>` | Database schema or dataset to include; repeatable |
|
||||
| `--skip-databases` | Leave database setup incomplete |
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Cross-database federation
|
||||
description: How ktx federates postgres, mysql, and sqlite connections so a single read-only SQL query can join across them without copying data.
|
||||
description: How ktx federates postgres, mysql, sqlite, and duckdb connections so a single read-only SQL query can join across them without copying data.
|
||||
---
|
||||
|
||||
Cross-database federation lets a single read-only SQL query join tables that
|
||||
|
|
@ -20,13 +20,14 @@ block to add. With zero or one compatible connection the behavior is unchanged.
|
|||
|
||||
## Which connections participate
|
||||
|
||||
The v1 federation engine supports three drivers:
|
||||
The v1 federation engine supports four drivers:
|
||||
|
||||
| Driver | Participates in federation |
|
||||
|--------|---------------------------|
|
||||
| `postgres` | Yes |
|
||||
| `mysql` | Yes |
|
||||
| `sqlite` | Yes |
|
||||
| `duckdb` | Yes |
|
||||
| `snowflake` | No — standalone connection |
|
||||
| `bigquery` | No — standalone connection |
|
||||
| `clickhouse` | No — standalone connection |
|
||||
|
|
@ -38,7 +39,7 @@ queried independently; they do not appear as federation members.
|
|||
## How it activates
|
||||
|
||||
**ktx** inspects the connections in `ktx.yaml` at startup. When it finds two or
|
||||
more connections whose driver is `postgres`, `mysql`, or `sqlite`, it
|
||||
more connections whose driver is `postgres`, `mysql`, `sqlite`, or `duckdb`, it
|
||||
instantiates the DuckDB federation engine and attaches each one read-only.
|
||||
There is no `federation:` key, no opt-in flag, and no connection-level setting
|
||||
to enable. The engine is derived entirely from what is already declared.
|
||||
|
|
@ -60,9 +61,10 @@ Two attach-compatible connections are present, so federation is active.
|
|||
## Table naming in federated queries
|
||||
|
||||
Inside a federated query, postgres and mysql tables use a three-part name:
|
||||
`connectionId.schema.table`. SQLite tables, which have no schema layer in
|
||||
DuckDB, use the two-part form `connectionId.table`. In both cases the
|
||||
connection's `id` field in `ktx.yaml` becomes the catalog name inside DuckDB.
|
||||
`connectionId.schema.table`. SQLite and DuckDB tables use the two-part form
|
||||
`connectionId.table`, since ktx addresses both as single-namespace members. In
|
||||
both cases the connection's `id` field in `ktx.yaml` becomes the catalog name
|
||||
inside DuckDB.
|
||||
|
||||
If a connection `id` is not a bare SQL identifier — for example it contains a
|
||||
hyphen, like `books-db` — double-quote it in the query the same way DuckDB
|
||||
|
|
@ -131,8 +133,8 @@ ktx sql -c _ktx_federated \
|
|||
Table names follow the rules from
|
||||
[Table naming in federated queries](#table-naming-in-federated-queries):
|
||||
three-part `connectionId.schema.table` for postgres and mysql, two-part
|
||||
`connectionId.table` for sqlite. The `_ktx_federated` id is virtual — it is
|
||||
never written to `ktx.yaml` and only exists when two or more attach-compatible
|
||||
`connectionId.table` for sqlite and duckdb. The `_ktx_federated` id is virtual —
|
||||
it is never written to `ktx.yaml` and only exists when two or more attach-compatible
|
||||
connections are declared. It surfaces in `ktx connection` and in the agent's
|
||||
connection list so the id is discoverable. Querying a single member database
|
||||
directly with its own connection id (`ktx sql -c pg_books ...`) is unchanged.
|
||||
|
|
@ -149,6 +151,6 @@ database through the federation engine.
|
|||
them in a source's `joins:` block and automatic discovery of cross-database
|
||||
relationships are not available yet. Intra-database relationship discovery for
|
||||
each member connection is unchanged.
|
||||
- **postgres, mysql, and sqlite only.** Other drivers (snowflake, bigquery,
|
||||
clickhouse, sqlserver) do not participate in federation in this version. They
|
||||
remain usable as standalone connections.
|
||||
- **postgres, mysql, sqlite, and duckdb only.** Other drivers (snowflake,
|
||||
bigquery, clickhouse, sqlserver) do not participate in federation in this
|
||||
version. They remain usable as standalone connections.
|
||||
|
|
|
|||
|
|
@ -109,6 +109,7 @@ context-source drivers share the map.
|
|||
| `postgres` | Warehouse | `driver` | `url`, `enabled_tables`, `historicSql`, `context.queryHistory` |
|
||||
| `mysql` | Warehouse | `driver` | `url`, `enabled_tables` |
|
||||
| `sqlite` | Warehouse | `driver` | `url` or `path`, `enabled_tables` |
|
||||
| `duckdb` | Warehouse | `driver` | `url` or `path`, `enabled_tables` |
|
||||
| `sqlserver` | Warehouse | `driver` | `url`, `enabled_tables` |
|
||||
| `bigquery` | Warehouse | `driver` | `credentials_json`, `dataset_ids`, `enabled_tables`, `historicSql` |
|
||||
| `snowflake` | Warehouse | `driver` | `schema_names`, `enabled_tables`, `historicSql` |
|
||||
|
|
|
|||
|
|
@ -218,7 +218,8 @@ The wizard walks you through everything **ktx** needs in one pass:
|
|||
3. **Embeddings** - picks an embeddings backend. Choose OpenAI for hosted
|
||||
embeddings or `sentence-transformers` to run locally without an API key.
|
||||
4. **Database** - adds at least one primary connection. Supported drivers:
|
||||
SQLite, PostgreSQL, MySQL, SQL Server, BigQuery, and Snowflake.
|
||||
PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, SQLite, and
|
||||
DuckDB.
|
||||
5. **Context sources** - optionally adds dbt, MetricFlow, LookML, Looker,
|
||||
Metabase, or Notion. You can skip and add them later.
|
||||
6. **Build** - offers to run the first ingest so semantic sources and wiki
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Primary Sources
|
||||
description: Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, SQLite, or MongoDB.
|
||||
description: Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, SQLite, DuckDB, or MongoDB.
|
||||
---
|
||||
|
||||
**ktx** connects to your data warehouse or database to build schema context,
|
||||
|
|
@ -26,14 +26,14 @@ Agents should prefer environment or file references over literal secrets.
|
|||
|
||||
| Field | Required | Applies to | Description |
|
||||
|-------|----------|------------|-------------|
|
||||
| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `clickhouse`, `sqlserver`, `sqlite`, or `mongodb` |
|
||||
| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `clickhouse`, `sqlserver`, `sqlite`, `duckdb`, or `mongodb` |
|
||||
| `url` | One of the connection methods | URL-style connectors | Database URL, `env:NAME`, or `file:/path/to/secret` |
|
||||
| `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, SQL Server | Field-by-field connection values |
|
||||
| `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan |
|
||||
| `databases` | No | ClickHouse, MongoDB | List of databases to scan |
|
||||
| `sample_size`, `order_by` | No | MongoDB | Schema-inference sampling controls (recent documents, sort field) |
|
||||
| `context.queryHistory` | No | PostgreSQL, Snowflake, BigQuery | Enables query-history ingestion when the warehouse supports it |
|
||||
| `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference |
|
||||
| `path` | Yes for path-style SQLite/DuckDB | SQLite, DuckDB | Local SQLite or DuckDB database path or `env:NAME` reference |
|
||||
| `max_bytes_billed` | No | BigQuery | Maximum bytes billed per query job |
|
||||
| `query_timeout_ms` | No | all warehouses | Maximum execution time for a single read-only query, in milliseconds (default 30000). A query exceeding it is cancelled server-side (or, for SQLite, by terminating the off-process executor) and returns a `query exceeded Ns` error so the agent can revise. |
|
||||
| `project_id` | No | BigQuery | Optional local descriptor and mapping metadata; not used for BigQuery authentication |
|
||||
|
|
@ -545,6 +545,52 @@ No authentication required - SQLite is file-based. The file must be readable by
|
|||
|
||||
---
|
||||
|
||||
## DuckDB
|
||||
|
||||
File-based connector using the DuckDB Node.js API. Ideal for local analytics, embedded warehouses, and cross-database federation.
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
warehouse:
|
||||
driver: duckdb
|
||||
path: data/warehouse.duckdb
|
||||
```
|
||||
|
||||
`path` is resolved relative to the project directory. The `.duckdb` file must already exist — **ktx** never creates a missing database file.
|
||||
|
||||
### Authentication
|
||||
|
||||
No authentication required — DuckDB is file-based. The `.duckdb` file must be readable by the process running **ktx**.
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Via `information_schema` on the `main` schema |
|
||||
| Primary keys | Yes | Via `information_schema.table_constraints` |
|
||||
| Foreign keys | Yes | Via DuckDB's `duckdb_constraints()` catalog function |
|
||||
| Row count estimates | Yes | Exact count via `SELECT COUNT(*)` |
|
||||
| Column statistics | No | - |
|
||||
| Query history | No | - |
|
||||
| Table sampling | Yes | - |
|
||||
| Nested analysis | No | - |
|
||||
|
||||
### Dialect notes
|
||||
|
||||
- Introspection scans the `main` schema only
|
||||
- Execution is read-only; **ktx** opens the file without write access
|
||||
- Parameter binding uses positional `?` placeholders
|
||||
- Uses `LIMIT X OFFSET Y` for pagination
|
||||
- Database file must exist before `ktx connection test` or ingest runs
|
||||
|
||||
### Cross-database federation
|
||||
|
||||
When a project declares two or more attach-compatible connections — any combination of `postgres`, `mysql`, `sqlite`, and `duckdb` — **ktx** derives a cross-database federation connection. That connection can ATTACH a native `.duckdb` file, allowing semantic queries to join across sources without manually copying data.
|
||||
|
||||
---
|
||||
|
||||
## MongoDB
|
||||
|
||||
Connects to MongoDB as a primary context source. **ktx** treats each collection
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue