mirror of
https://github.com/Kaelio/ktx.git
synced 2026-07-04 10:52:13 +02:00
* refactor(duckdb): extract shared json-safe bigint helper
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(duckdb): add and register the duckdb primary connector
Add KtxDuckDbDialect, KtxDuckDbScanConnector (local file-backed, read-only,
never-create, main-schema introspection via information_schema and
duckdb_constraints() for foreign keys), and register the duckdb driver across
the dialect factory, driver registry, connection-type enum, warehouse descriptor,
config schema, scan normalization, connection test drivers, and status display.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(duckdb): route live-database ingest through the DuckDB connector
Add the DuckDB live-database introspection bridge and dispatch duckdb
connections to it in local-adapters, matching the SQLite path. Repoint the
config-rejection test off duckdb (now a valid driver) onto the no-driver case.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(duckdb): add duckdb to the setup database flow
Offer DuckDB in the interactive checklist and via ktx setup --database duckdb,
with a file-path prompt and duckdb-local default connection id, parallel to SQLite.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(duckdb): attach native duckdb files in federation
Native .duckdb members ATTACH with (READ_ONLY) and no TYPE/INSTALL/LOAD, since
the duckdb format needs no extension. attachTypeForDriver returns null for the
native case; buildAttachStatements builds load statements from non-null types
only and emits a conditional ATTACH clause.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs(duckdb): document the duckdb primary-source connector
Add a DuckDB section to the primary-sources integration page (config, read-only
never-create behavior, main-schema scope, federation) and update the
supported-driver assertion in dialects.test.ts to include duckdb.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(duckdb): use single-namespace display shape for main-only refs
DuckDB v1 introspects the main schema and sets db=null on every table, so its
display refs are single-namespace like SQLite. The ansi shape emitted a 1-part
table display it then refused to parse, breaking column-level display resolution.
Switch the dialect to the sqlite display shape and add a round-trip test plus a
composite-foreign-key test that were missing.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(duckdb): resolve connector dialect via getDialectForDriver
Route the connector's dialect through the shared factory like every other
connector, now that duckdb is registered. Single construction path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* fix(duckdb): skip schema picker for single-file duckdb setup
DuckDB is a single-file, single-namespace ('main') database like SQLite,
but the setup scope step only skipped the schema picker for sqlite. DuckDB
fell into the multi-schema path with an empty schema list, rendering a
broken picker ("No matches found" for main). Extend the file-based-driver
early-return to cover duckdb so it ingests every table directly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* refactor(duckdb): reuse shared config helper and derive scope skip
Route duckdb path resolution through the shared resolveStringReference
helper instead of a local third copy of env:/file: handling. Derive the
setup scope-picker skip from SCOPE_DISCOVERY_SPECS membership rather than
a hardcoded sqlite/duckdb driver list.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(duckdb): use a genuinely unknown driver in the rejection test
The merged "rejects unknown drivers" test used `driver: duckdb` as its
unknown-driver stand-in, which stopped being unknown once this branch
added the duckdb connector. Switch to `nonsense` so it again exercises
the unsupported-driver config error.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* test(duckdb): cover dialect, connector, and live-introspection branches
Codecov flagged uncovered branches as dead code; all are real connector,
dialect, and live-ingest behavior. Add unit tests instead of removing them.
- dialect: precedence ladder, sample/clause builders, profiling expressions
- connector: url/env config forms, error throws, never-create guard,
cardinality cap branches, table-scope empty/non-empty paths
- live-introspection: full-schema and table-scope extraction
Functions 100%, lines ~99% across the duckdb connector dir.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* docs: add DuckDB to supported-driver references
The DuckDB connector PR documented the connector itself but left the
scattered supported-driver enumerations stale. Add duckdb to the
federation concept page (participation table, activation, table naming,
limitations), the ktx setup CLI reference, the ktx.yaml warehouse-driver
table, the primary-sources field reference, and the quickstart driver
list (which also restores the missing ClickHouse entry).
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
316 lines
14 KiB
TypeScript
316 lines
14 KiB
TypeScript
import { describe, expect, it } from 'vitest';
|
|
import { getDialectForDriver, getSqlDialectForDriver } from '../../../src/context/connections/dialects.js';
|
|
import type { KtxConnectionDriver, KtxTableRef } from '../../../src/context/scan/types.js';
|
|
|
|
interface DialectFixture {
|
|
driver: KtxConnectionDriver;
|
|
table: KtxTableRef;
|
|
quoteInput: string;
|
|
quotedIdentifier: string;
|
|
formattedTable: string;
|
|
display: string;
|
|
invalidDisplay: string;
|
|
columnDisplayTablePartCount: 1 | 2 | 3;
|
|
limitClause: string;
|
|
topClause: string;
|
|
randomFilter: string;
|
|
tableSampleClause: string;
|
|
sampleQuery: string;
|
|
columnSampleContains: string;
|
|
nullCountExpression: string;
|
|
distinctCountExpression: string;
|
|
textLengthExpression: string;
|
|
castToText: string;
|
|
sampleValueAggregation: string;
|
|
cardinalityContains: string;
|
|
randomizedCardinalityContains: string;
|
|
distinctValuesContains: string;
|
|
statisticsContains: string | null;
|
|
dimensionInput: string;
|
|
dimensionType: 'time' | 'string' | 'number' | 'boolean';
|
|
nativeTypeInput: string;
|
|
normalizedType: string;
|
|
}
|
|
|
|
const innerSampleSql = 'SELECT status AS value FROM orders';
|
|
|
|
const fixtures: DialectFixture[] = [
|
|
{
|
|
driver: 'postgres',
|
|
table: { catalog: null, db: 'public', name: 'orders' },
|
|
quoteInput: 'order"items',
|
|
quotedIdentifier: '"order""items"',
|
|
formattedTable: '"public"."orders"',
|
|
display: 'public.orders',
|
|
invalidDisplay: 'orders',
|
|
columnDisplayTablePartCount: 2,
|
|
limitClause: 'LIMIT 25 OFFSET 5',
|
|
topClause: '',
|
|
randomFilter: 'RANDOM() < 0.25',
|
|
tableSampleClause: 'TABLESAMPLE SYSTEM (25)',
|
|
sampleQuery: 'SELECT "id", "status" FROM "public"."orders" LIMIT 5',
|
|
columnSampleContains: 'TRIM(CAST("status" AS TEXT)) != \'\'',
|
|
nullCountExpression: 'COUNT(*) FILTER (WHERE "status" IS NULL)',
|
|
distinctCountExpression: 'COUNT(DISTINCT "status")',
|
|
textLengthExpression: 'LENGTH(CAST("status" AS TEXT))',
|
|
castToText: 'CAST("status" AS TEXT)',
|
|
sampleValueAggregation:
|
|
'(SELECT STRING_AGG(CAST(value AS TEXT), CHR(31)) FROM (SELECT status AS value FROM orders) AS relationship_profile_values)',
|
|
cardinalityContains: 'SELECT COUNT(DISTINCT val) AS cardinality',
|
|
randomizedCardinalityContains: 'ORDER BY RANDOM()',
|
|
distinctValuesContains: 'SELECT DISTINCT "status"::text AS val',
|
|
statisticsContains: 'FROM pg_stats s',
|
|
dimensionInput: 'timestamp with time zone',
|
|
dimensionType: 'time',
|
|
nativeTypeInput: 'numeric(12,2)',
|
|
normalizedType: 'numeric(12,2)',
|
|
},
|
|
{
|
|
driver: 'mysql',
|
|
table: { catalog: null, db: 'analytics', name: 'orders' },
|
|
quoteInput: 'order`items',
|
|
quotedIdentifier: '`order``items`',
|
|
formattedTable: '`analytics`.`orders`',
|
|
display: 'analytics.orders',
|
|
invalidDisplay: 'orders',
|
|
columnDisplayTablePartCount: 2,
|
|
limitClause: 'LIMIT 25 OFFSET 5',
|
|
topClause: '',
|
|
randomFilter: 'RAND() < 0.25',
|
|
tableSampleClause: '',
|
|
sampleQuery: 'SELECT `id`, `status` FROM `analytics`.`orders` LIMIT 5',
|
|
columnSampleContains: 'TRIM(CAST(`status` AS CHAR)) != \'\'',
|
|
nullCountExpression: 'SUM(CASE WHEN `status` IS NULL THEN 1 ELSE 0 END)',
|
|
distinctCountExpression: 'COUNT(DISTINCT `status`)',
|
|
textLengthExpression: 'CHAR_LENGTH(CAST(`status` AS CHAR))',
|
|
castToText: 'CAST(`status` AS CHAR)',
|
|
sampleValueAggregation:
|
|
'(SELECT GROUP_CONCAT(CAST(value AS CHAR) SEPARATOR CHAR(31)) FROM (SELECT status AS value FROM orders) AS relationship_profile_values)',
|
|
cardinalityContains: 'SELECT COUNT(DISTINCT val) AS cardinality',
|
|
randomizedCardinalityContains: 'ORDER BY RAND()',
|
|
distinctValuesContains: 'SELECT DISTINCT CAST(`status` AS CHAR) AS val',
|
|
statisticsContains: 'INFORMATION_SCHEMA.STATISTICS',
|
|
dimensionInput: 'tinyint(1)',
|
|
dimensionType: 'boolean',
|
|
nativeTypeInput: 'varchar(255)',
|
|
normalizedType: 'varchar(255)',
|
|
},
|
|
{
|
|
driver: 'clickhouse',
|
|
table: { catalog: null, db: 'analytics', name: 'events' },
|
|
quoteInput: 'order`items',
|
|
quotedIdentifier: '`order``items`',
|
|
formattedTable: '`analytics`.`events`',
|
|
display: 'analytics.events',
|
|
invalidDisplay: 'events',
|
|
columnDisplayTablePartCount: 2,
|
|
limitClause: 'LIMIT 25 OFFSET 5',
|
|
topClause: '',
|
|
randomFilter: 'rand() / 4294967295.0 < 0.25',
|
|
tableSampleClause: '',
|
|
sampleQuery: 'SELECT `id`, `status` FROM `analytics`.`events` LIMIT 5',
|
|
columnSampleContains: 'trim(toString(`status`)) != \'\'',
|
|
nullCountExpression: 'countIf(`status` IS NULL)',
|
|
distinctCountExpression: 'COUNT(DISTINCT `status`)',
|
|
textLengthExpression: 'length(toString(`status`))',
|
|
castToText: 'toString(`status`)',
|
|
sampleValueAggregation:
|
|
'(SELECT arrayStringConcat(groupArray(toString(value)), \'\\x1F\') FROM (SELECT status AS value FROM orders) AS relationship_profile_values)',
|
|
cardinalityContains: 'SELECT COUNT(DISTINCT val) AS cardinality',
|
|
randomizedCardinalityContains: 'ORDER BY rand()',
|
|
distinctValuesContains: 'SELECT DISTINCT toString(`status`) AS val',
|
|
statisticsContains: null,
|
|
dimensionInput: 'Nullable(DateTime64(3))',
|
|
dimensionType: 'time',
|
|
nativeTypeInput: 'LowCardinality(String)',
|
|
normalizedType: 'LowCardinality(String)',
|
|
},
|
|
{
|
|
driver: 'sqlite',
|
|
table: { catalog: null, db: null, name: 'orders' },
|
|
quoteInput: 'order"items',
|
|
quotedIdentifier: '"order""items"',
|
|
formattedTable: '"orders"',
|
|
display: 'orders',
|
|
invalidDisplay: 'public.orders',
|
|
columnDisplayTablePartCount: 1,
|
|
limitClause: 'LIMIT 25 OFFSET 5',
|
|
topClause: '',
|
|
randomFilter: '(RANDOM() % 100) < 25',
|
|
tableSampleClause: '',
|
|
sampleQuery: 'SELECT "id", "status" FROM "orders" LIMIT 5',
|
|
columnSampleContains: 'TRIM(CAST("status" AS TEXT)) != \'\'',
|
|
nullCountExpression: 'SUM(CASE WHEN "status" IS NULL THEN 1 ELSE 0 END)',
|
|
distinctCountExpression: 'COUNT(DISTINCT "status")',
|
|
textLengthExpression: 'LENGTH(CAST("status" AS TEXT))',
|
|
castToText: 'CAST("status" AS TEXT)',
|
|
sampleValueAggregation:
|
|
'(SELECT GROUP_CONCAT(CAST(value AS TEXT), char(31)) FROM (SELECT status AS value FROM orders) AS relationship_profile_values)',
|
|
cardinalityContains: 'SELECT COUNT(DISTINCT val) AS cardinality',
|
|
randomizedCardinalityContains: 'ORDER BY RANDOM()',
|
|
distinctValuesContains: 'SELECT DISTINCT CAST("status" AS TEXT) AS val',
|
|
statisticsContains: null,
|
|
dimensionInput: 'INTEGER',
|
|
dimensionType: 'number',
|
|
nativeTypeInput: 'VARCHAR(255)',
|
|
normalizedType: 'VARCHAR(255)',
|
|
},
|
|
{
|
|
driver: 'snowflake',
|
|
table: { catalog: 'ANALYTICS', db: 'PUBLIC', name: 'ORDERS' },
|
|
quoteInput: 'order"items',
|
|
quotedIdentifier: '"order""items"',
|
|
formattedTable: '"ANALYTICS"."PUBLIC"."ORDERS"',
|
|
display: 'ANALYTICS.PUBLIC.ORDERS',
|
|
invalidDisplay: 'PUBLIC.ORDERS',
|
|
columnDisplayTablePartCount: 3,
|
|
limitClause: 'LIMIT 25 OFFSET 5',
|
|
topClause: '',
|
|
randomFilter: 'UNIFORM(0::FLOAT, 1::FLOAT, RANDOM()) < 0.25',
|
|
tableSampleClause: 'SAMPLE (25)',
|
|
sampleQuery: 'SELECT "id", "status" FROM "ANALYTICS"."PUBLIC"."ORDERS" SAMPLE ROW (5 ROWS)',
|
|
columnSampleContains: 'TRIM(CAST("status" AS STRING)) != \'\'',
|
|
nullCountExpression: 'COUNT_IF("status" IS NULL)',
|
|
distinctCountExpression: 'APPROX_COUNT_DISTINCT("status")',
|
|
textLengthExpression: 'LENGTH(CAST("status" AS TEXT))',
|
|
castToText: 'CAST("status" AS VARCHAR)',
|
|
sampleValueAggregation:
|
|
'(SELECT LISTAGG(CAST(value AS VARCHAR), \'\\x1f\') FROM (SELECT status AS value FROM orders) AS relationship_profile_values)',
|
|
cardinalityContains: 'SELECT COUNT(DISTINCT val) AS cardinality',
|
|
randomizedCardinalityContains: 'SAMPLE ROW (100 ROWS)',
|
|
distinctValuesContains: 'SELECT DISTINCT "status"::VARCHAR AS val',
|
|
statisticsContains: null,
|
|
dimensionInput: 'TIMESTAMP_NTZ',
|
|
dimensionType: 'time',
|
|
nativeTypeInput: 'NUMBER(38,0)',
|
|
normalizedType: 'NUMBER(38,0)',
|
|
},
|
|
{
|
|
driver: 'bigquery',
|
|
table: { catalog: 'analytics-project', db: 'warehouse', name: 'orders' },
|
|
quoteInput: 'order`items',
|
|
quotedIdentifier: '`order\\`items`',
|
|
formattedTable: '`analytics-project`.`warehouse`.`orders`',
|
|
display: 'analytics-project.warehouse.orders',
|
|
invalidDisplay: 'warehouse.orders',
|
|
columnDisplayTablePartCount: 3,
|
|
limitClause: 'LIMIT 25 OFFSET 5',
|
|
topClause: '',
|
|
randomFilter: 'RAND() < 0.25',
|
|
tableSampleClause: 'TABLESAMPLE SYSTEM (25 PERCENT)',
|
|
sampleQuery: 'SELECT `id`, `status` FROM `analytics-project`.`warehouse`.`orders` ORDER BY RAND() LIMIT 5',
|
|
columnSampleContains: 'TRIM(CAST(`status` AS STRING)) != \'\'',
|
|
nullCountExpression: 'COUNTIF(`status` IS NULL)',
|
|
distinctCountExpression: 'APPROX_COUNT_DISTINCT(`status`)',
|
|
textLengthExpression: 'LENGTH(CAST(`status` AS STRING))',
|
|
castToText: 'CAST(`status` AS STRING)',
|
|
sampleValueAggregation:
|
|
'(SELECT STRING_AGG(CAST(value AS STRING), \'\\u001F\') FROM (SELECT status AS value FROM orders) AS relationship_profile_values)',
|
|
cardinalityContains: 'SELECT APPROX_COUNT_DISTINCT(val) AS cardinality',
|
|
randomizedCardinalityContains: 'ORDER BY RAND()',
|
|
distinctValuesContains: 'SELECT DISTINCT CAST(`status` AS STRING) AS val',
|
|
statisticsContains: null,
|
|
dimensionInput: 'INT64',
|
|
dimensionType: 'number',
|
|
nativeTypeInput: 'INT64',
|
|
normalizedType: 'BIGINT',
|
|
},
|
|
{
|
|
driver: 'sqlserver',
|
|
table: { catalog: 'warehouse', db: 'dbo', name: 'events' },
|
|
quoteInput: 'odd]name',
|
|
quotedIdentifier: '[odd]]name]',
|
|
formattedTable: '[warehouse].[dbo].[events]',
|
|
display: 'warehouse.dbo.events',
|
|
invalidDisplay: 'dbo.events',
|
|
columnDisplayTablePartCount: 3,
|
|
limitClause: '',
|
|
topClause: 'TOP (25)',
|
|
randomFilter: 'ABS(CHECKSUM(NEWID())) % 100 < 25',
|
|
tableSampleClause: 'TABLESAMPLE (25 PERCENT)',
|
|
sampleQuery: 'SELECT TOP 5 [id], [status] FROM [warehouse].[dbo].[events]',
|
|
columnSampleContains: 'LTRIM(RTRIM(CAST([status] AS NVARCHAR(MAX)))) != \'\'',
|
|
nullCountExpression: 'SUM(CASE WHEN [status] IS NULL THEN 1 ELSE 0 END)',
|
|
distinctCountExpression: 'COUNT(DISTINCT [status])',
|
|
textLengthExpression: 'LEN(CAST([status] AS NVARCHAR(MAX)))',
|
|
castToText: 'CAST([status] AS NVARCHAR(MAX))',
|
|
sampleValueAggregation:
|
|
'(SELECT STRING_AGG(CAST(value AS NVARCHAR(MAX)), CHAR(31)) FROM (SELECT status AS value FROM orders) AS relationship_profile_values)',
|
|
cardinalityContains: 'SELECT COUNT(DISTINCT val) AS cardinality',
|
|
randomizedCardinalityContains: 'ORDER BY NEWID()',
|
|
distinctValuesContains: 'SELECT TOP 20 val',
|
|
statisticsContains: null,
|
|
dimensionInput: 'datetime2',
|
|
dimensionType: 'time',
|
|
nativeTypeInput: 'uniqueidentifier',
|
|
normalizedType: 'uniqueidentifier',
|
|
},
|
|
];
|
|
|
|
describe('getDialectForDriver', () => {
|
|
it.each(fixtures)('returns a full KtxSqlDialect for $driver', (fixture) => {
|
|
const dialect = getSqlDialectForDriver(fixture.driver);
|
|
const column = dialect.quoteIdentifier('status');
|
|
|
|
expect(dialect.type).toBe(fixture.driver);
|
|
expect(dialect.quoteIdentifier(fixture.quoteInput)).toBe(fixture.quotedIdentifier);
|
|
expect(dialect.formatTableName(fixture.table)).toBe(fixture.formattedTable);
|
|
expect(dialect.formatDisplayRef(fixture.table)).toBe(fixture.display);
|
|
expect(dialect.parseDisplayRef(fixture.display)).toEqual(fixture.table);
|
|
expect(dialect.parseDisplayRef(fixture.invalidDisplay)).toBeNull();
|
|
expect(dialect.columnDisplayTablePartCount()).toBe(fixture.columnDisplayTablePartCount);
|
|
expect(dialect.getLimitOffsetClause(25, 5)).toBe(fixture.limitClause);
|
|
expect(dialect.getTopClause(25)).toBe(fixture.topClause);
|
|
expect(dialect.getRandomSampleFilter(0.25)).toBe(fixture.randomFilter);
|
|
expect(dialect.getTableSampleClause(0.25)).toBe(fixture.tableSampleClause);
|
|
expect(dialect.generateSampleQuery(fixture.formattedTable, 5, ['id', 'status'])).toBe(fixture.sampleQuery);
|
|
expect(dialect.generateColumnSampleQuery(fixture.formattedTable, 'status', 10)).toContain(
|
|
fixture.columnSampleContains,
|
|
);
|
|
expect(dialect.getNullCountExpression(column)).toBe(fixture.nullCountExpression);
|
|
expect(dialect.getDistinctCountExpression(column)).toBe(fixture.distinctCountExpression);
|
|
expect(dialect.textLengthExpression(column)).toBe(fixture.textLengthExpression);
|
|
expect(dialect.castToText(column)).toBe(fixture.castToText);
|
|
expect(dialect.getSampleValueAggregation(innerSampleSql)).toBe(fixture.sampleValueAggregation);
|
|
expect(dialect.generateCardinalitySampleQuery(fixture.formattedTable, column, 100)).toContain(
|
|
fixture.cardinalityContains,
|
|
);
|
|
expect(dialect.generateRandomizedCardinalitySampleQuery(fixture.formattedTable, column, 100)).toContain(
|
|
fixture.randomizedCardinalityContains,
|
|
);
|
|
expect(dialect.generateDistinctValuesQuery(fixture.formattedTable, column, 20)).toContain(
|
|
fixture.distinctValuesContains,
|
|
);
|
|
const statistics = dialect.generateColumnStatisticsQuery(fixture.table.db ?? '', fixture.table.name);
|
|
if (fixture.statisticsContains) {
|
|
expect(statistics).toContain(fixture.statisticsContains);
|
|
} else {
|
|
expect(statistics).toBeNull();
|
|
}
|
|
expect(dialect.mapToDimensionType(fixture.dimensionInput)).toBe(fixture.dimensionType);
|
|
expect(dialect.mapDataType(fixture.nativeTypeInput)).toBe(fixture.normalizedType);
|
|
});
|
|
|
|
it('accepts three-part ANSI display refs while keeping one-part names caller-owned', () => {
|
|
for (const driver of ['postgres', 'mysql', 'clickhouse'] as const) {
|
|
const dialect = getDialectForDriver(driver);
|
|
expect(dialect.parseDisplayRef('warehouse.public.orders')).toEqual({
|
|
catalog: 'warehouse',
|
|
db: 'public',
|
|
name: 'orders',
|
|
});
|
|
expect(dialect.parseDisplayRef('orders')).toBeNull();
|
|
}
|
|
});
|
|
|
|
it('throws with a supported-driver list for unknown drivers', () => {
|
|
expect(() => getDialectForDriver('oracle')).toThrow(
|
|
'Unsupported driver "oracle". Supported drivers: bigquery, clickhouse, duckdb, mongodb, mysql, postgres, snowflake, sqlite, sqlserver',
|
|
);
|
|
});
|
|
|
|
it('rejects legacy driver aliases', () => {
|
|
expect(() => getDialectForDriver('postgresql')).toThrow('Unsupported driver "postgresql"');
|
|
expect(() => getDialectForDriver('sqlite3')).toThrow('Unsupported driver "sqlite3"');
|
|
});
|
|
});
|