fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)

* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure

Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.

- Remove the free-text Snowflake schema prompt; only pass `schema` to
  snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
  user for a comma-separated list, persist it as `schema_names`, and use
  it as both the table-list filter and the multiselect default. Applies
  to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
  documented single-schema shorthand.

* fix(snowflake): keep introspecting when primary-key discovery is denied

The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.

Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.

* fix(scan): unblock relationship discovery on Snowflake

Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:

- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
  for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
  profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
  Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
  (Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
  `connector.sampleColumn` into bare locals, losing the `this` binding when
  the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
  Every sample call threw "Cannot read properties of undefined (reading
  'assertConnection')" and degraded LLM descriptions to metadata-only
  prompts. Call the methods through the connector instead.

Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.

* test(scan): cover table-ref helpers

* feat(scan): plumb tableScope through live-database introspection port

* feat(scan): apply tableScope during metadata fetch

* feat(scan): enforce table scope at fetch boundary

* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)

* feat(cli): add RSA key-pair auth option to Snowflake setup wizard

Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.

* feat(scan): pool Snowflake sessions

* fix(scan): reuse structural snapshots and cleanup connectors

* feat(scan): parallelize relationship profiling

* feat(scan): batch table description generation

* docs: document Snowflake ingest concurrency knobs

* fix(scan): close Snowflake ingest perf verification gaps

* fix(scan): keep batched description failure bounded

* feat(scan): dispatch query-history probes by connection driver

Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.

Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.

* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject

The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.

Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.

generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.

* chore(scripts): add ktx-reset.sh project-cleanup helper

Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
This commit is contained in:
Andrey Avtomonov 2026-05-23 10:41:30 +02:00 committed by GitHub
parent b0dd13ce7c
commit 394a985d2a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
72 changed files with 3508 additions and 655 deletions

View file

@ -1,6 +1,7 @@
import { describe, expect, it, vi } from 'vitest';
import { bigQueryConnectionConfigFromConfig, isKtxBigQueryConnectionConfig, type KtxBigQueryClient, KtxBigQueryScanConnector, type KtxBigQueryClientFactory, type KtxBigQueryDataset, type KtxBigQueryQueryJob, type KtxBigQueryTableRef } from '../../connectors/bigquery/connector.js';
import { createBigQueryLiveDatabaseIntrospection } from '../../connectors/bigquery/live-database-introspection.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function fakeClientFactory(): KtxBigQueryClientFactory {
const queryResults = vi.fn(async (): ReturnType<KtxBigQueryQueryJob['getQueryResults']> => [
@ -234,6 +235,59 @@ describe('KtxBigQueryScanConnector', () => {
await connector.cleanup();
});
it('limits introspection to tables in tableScope', async () => {
const ordersGet = vi.fn(async (): ReturnType<KtxBigQueryTableRef['get']> => [
{
metadata: {
type: 'TABLE',
numRows: '12',
schema: { fields: [{ name: 'id', type: 'INT64', mode: 'REQUIRED' }] },
},
},
]);
const skippedGet = vi.fn(async (): ReturnType<KtxBigQueryTableRef['get']> => [
{ metadata: { type: 'TABLE', numRows: '1', schema: { fields: [] } } },
]);
const clientFactory: KtxBigQueryClientFactory = {
createClient: vi.fn(() => ({
getDatasets: vi.fn(async (): ReturnType<KtxBigQueryClient['getDatasets']> => [[{ id: 'analytics' }]]),
dataset: vi.fn(
(): KtxBigQueryDataset => ({
get: vi.fn(async () => [{ id: 'analytics' }]),
getTables: vi.fn(async (): ReturnType<KtxBigQueryDataset['getTables']> => [
[
{ id: 'orders', get: ordersGet },
{ id: 'customers', get: skippedGet },
],
]),
}),
),
createQueryJob: vi.fn(async (): ReturnType<KtxBigQueryClient['createQueryJob']> => [
{
getQueryResults: async (): ReturnType<KtxBigQueryQueryJob['getQueryResults']> => [
[],
undefined,
{ schema: { fields: [{ name: 'table_name', type: 'STRING' }, { name: 'column_name', type: 'STRING' }] } },
],
},
]),
})),
};
const connector = new KtxBigQueryScanConnector({
connectionId: 'warehouse',
connection,
clientFactory,
});
const scope = tableRefSet([{ catalog: 'project-1', db: 'analytics', name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'bigquery', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
expect(ordersGet).toHaveBeenCalledTimes(1);
expect(skippedGet).not.toHaveBeenCalled();
});
it('constructs for discovery without dataset scope and lists tables through one region information schema query', async () => {
const createQueryJob = vi.fn(
async (

View file

@ -2,6 +2,7 @@ import { BigQuery, type TableField } from '@google-cloud/bigquery';
import { normalizeBigQueryProjectId, normalizeBigQueryRegion } from '../../context/connections/bigquery-identifiers.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { readFileSync } from 'node:fs';
import { homedir } from 'node:os';
import { resolve } from 'node:path';
@ -289,7 +290,10 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
const tables: KtxSchemaTable[] = [];
const datasetIds = this.requireDatasetIdsForScan();
for (const datasetId of datasetIds) {
tables.push(...(await this.introspectDataset(datasetId)));
const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.resolved.projectId, db: datasetId })
: null;
tables.push(...(await this.introspectDataset(datasetId, scopedNames)));
}
return {
connectionId: this.connectionId,
@ -362,7 +366,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
if (!datasetId) {
return 0;
}
const tables = await this.introspectDataset(datasetId);
const tables = await this.introspectDataset(datasetId, null);
return tables.find((table) => table.name === tableName)?.estimatedRows ?? 0;
}
@ -463,12 +467,15 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
return firstNumber(rows[0]?.[header]);
}
private async introspectDataset(datasetId: string): Promise<KtxSchemaTable[]> {
private async introspectDataset(datasetId: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const dataset = this.getClient().dataset(datasetId);
const [tableRefs] = await dataset.getTables();
const scopeSet = scopedNames ? new Set(scopedNames) : null;
const filteredTableRefs = scopeSet ? tableRefs.filter((tableRef) => scopeSet.has(tableRef.id ?? '')) : tableRefs;
const primaryKeys = await this.primaryKeys(datasetId);
const tables: KtxSchemaTable[] = [];
for (const tableRef of tableRefs) {
for (const tableRef of filteredTableRefs) {
const tableName = tableRef.id || '';
const [table] = await tableRef.get();
const fields = table.metadata.schema?.fields ?? [];

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js';
import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import {
KtxBigQueryScanConnector,
@ -16,7 +19,7 @@ export function createBigQueryLiveDatabaseIntrospection(
options: CreateBigQueryLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
return {
async extractSchema(connectionId: string) {
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxBigQueryConnectionConfig | undefined;
const connector = new KtxBigQueryScanConnector({
connectionId,
@ -25,7 +28,14 @@ export function createBigQueryLiveDatabaseIntrospection(
now: options.now,
});
try {
return await connector.introspect({ connectionId, driver: 'bigquery' }, { runId: `bigquery-${connectionId}` });
return await connector.introspect(
{
connectionId,
driver: 'bigquery',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `bigquery-${connectionId}` },
);
} finally {
await connector.cleanup();
}

View file

@ -1,6 +1,7 @@
import { describe, expect, it, vi } from 'vitest';
import { clickHouseClientConfigFromConfig, isKtxClickHouseConnectionConfig, KtxClickHouseScanConnector, type KtxClickHouseClientFactory } from '../../connectors/clickhouse/connector.js';
import { createClickHouseLiveDatabaseIntrospection } from '../../connectors/clickhouse/live-database-introspection.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function result<T>(payload: T) {
return {
@ -238,6 +239,57 @@ describe('KtxClickHouseScanConnector', () => {
]);
});
it('limits introspection to tables in tableScope', async () => {
const queries: Array<{ query: string; query_params?: Record<string, unknown> }> = [];
const clientFactory: KtxClickHouseClientFactory = {
createClient: vi.fn(() => ({
query: vi.fn(async (input: { query: string; format: string; query_params?: Record<string, unknown> }) => {
queries.push({ query: input.query, query_params: input.query_params });
if (input.query.includes('FROM system.tables')) {
return result([{ database: 'analytics', name: 'events', engine: 'MergeTree', comment: '' }]);
}
if (input.query.includes('FROM system.columns')) {
return result([
{
database: 'analytics',
table: 'events',
name: 'id',
type: 'UInt64',
comment: '',
is_in_primary_key: 1,
},
]);
}
if (input.query.includes('FROM system.parts')) {
return result([{ database: 'analytics', table: 'events', row_count: '2' }]);
}
throw new Error(`Unexpected SQL: ${input.query}`);
}),
close: vi.fn(async () => undefined),
})),
};
const connector = new KtxClickHouseScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'clickhouse',
host: 'ch.example.test',
database: 'analytics',
username: 'reader',
password: 'test-pass', // pragma: allowlist secret
},
clientFactory,
});
const scope = tableRefSet([{ catalog: null, db: 'analytics', name: 'events' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'clickhouse', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['events']);
const tablesQuery = queries.find((query) => query.query.includes('FROM system.tables'));
expect(tablesQuery?.query).toContain('AND name IN {table_names:Array(String)}');
expect(tablesQuery?.query_params).toEqual({ databases: ['analytics'], table_names: ['events'] });
});
it('runs samples, distinct values, read-only SQL, row count, schema list, and cleanup', async () => {
const clientFactory = fakeClientFactory();
const connector = new KtxClickHouseScanConnector({

View file

@ -1,6 +1,7 @@
import { createClient } from '@clickhouse/client';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { readFileSync } from 'node:fs';
import { Agent as HttpsAgent } from 'node:https';
import { homedir } from 'node:os';
@ -285,24 +286,42 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
this.assertConnection(input.connectionId);
const databases = configuredClickHouseDatabases(this.connection, this.clientConfig.database);
let allScopedTables: string[] | null = null;
if (input.tableScope) {
allScopedTables = [];
for (const database of databases) {
allScopedTables.push(...scopedTableNames(input.tableScope, { catalog: null, db: database }));
}
if (allScopedTables.length === 0) {
return this.emptySnapshot(databases);
}
}
const queryParams: Record<string, unknown> = { databases };
const tableNameClause = allScopedTables ? 'AND name IN {table_names:Array(String)}' : '';
const columnTableNameClause = allScopedTables ? 'AND table IN {table_names:Array(String)}' : '';
if (allScopedTables) {
queryParams.table_names = allScopedTables;
}
const tables = await this.queryEachRow<ClickHouseTableRow>(
`
SELECT database, name, engine, comment
FROM system.tables
WHERE database IN {databases:Array(String)}
AND engine NOT IN ('Dictionary')
${tableNameClause}
ORDER BY database, name
`,
{ databases },
queryParams,
);
const columns = await this.queryEachRow<ClickHouseColumnRow>(
`
SELECT database, table, name, type, comment, is_in_primary_key
FROM system.columns
WHERE database IN {databases:Array(String)}
${columnTableNameClause}
ORDER BY database, table, position
`,
{ databases },
queryParams,
);
const rowCounts = await this.queryEachRow<ClickHouseRowCountRow>(
`
@ -310,9 +329,10 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
FROM system.parts
WHERE database IN {databases:Array(String)}
AND active = 1
${columnTableNameClause}
GROUP BY database, table
`,
{ databases },
queryParams,
);
const columnsByTable = new Map<string, ClickHouseColumnRow[]>();
for (const column of columns) {
@ -347,6 +367,23 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
};
}
private emptySnapshot(databases: string[]): KtxSchemaSnapshot {
return {
connectionId: this.connectionId,
driver: 'clickhouse',
extractedAt: this.now().toISOString(),
scope: { schemas: databases },
metadata: {
database: this.clientConfig.database,
databases,
host: this.clientConfig.host,
table_count: 0,
total_columns: 0,
},
tables: [],
};
}
async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise<KtxTableSampleResult> {
this.assertConnection(input.connectionId);
const result = await this.query(

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js';
import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import {
KtxClickHouseScanConnector,
@ -18,7 +21,7 @@ export function createClickHouseLiveDatabaseIntrospection(
options: CreateClickHouseLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
return {
async extractSchema(connectionId: string) {
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxClickHouseConnectionConfig | undefined;
const connector = new KtxClickHouseScanConnector({
connectionId,
@ -29,7 +32,11 @@ export function createClickHouseLiveDatabaseIntrospection(
});
try {
return await connector.introspect(
{ connectionId, driver: 'clickhouse' },
{
connectionId,
driver: 'clickhouse',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `clickhouse-${connectionId}` },
);
} finally {

View file

@ -2,6 +2,7 @@ import { describe, expect, it, vi } from 'vitest';
import type { FieldPacket, RowDataPacket } from 'mysql2/promise';
import { createMysqlLiveDatabaseIntrospection } from '../../connectors/mysql/live-database-introspection.js';
import { isKtxMysqlConnectionConfig, KtxMysqlScanConnector, mysqlConnectionPoolConfigFromConfig, type KtxMysqlPoolFactory } from '../../connectors/mysql/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function mysqlResult(rows: Record<string, unknown>[], fields: Array<{ name: string; type?: number }>): [RowDataPacket[], FieldPacket[]] {
return [rows as RowDataPacket[], fields as FieldPacket[]];
@ -275,6 +276,71 @@ describe('KtxMysqlScanConnector', () => {
]);
});
it('limits introspection to tables in tableScope', async () => {
const queries: Array<{ sql: string; params?: unknown }> = [];
const poolFactory: KtxMysqlPoolFactory = {
createPool: vi.fn(() => ({
getConnection: vi.fn(async () => ({
query: vi.fn(async (sql: string, params?: unknown): Promise<[RowDataPacket[], FieldPacket[]]> => {
queries.push({ sql, params });
if (sql.includes('INFORMATION_SCHEMA.TABLES')) {
return mysqlResult(
[
{
TABLE_SCHEMA: 'analytics',
TABLE_NAME: 'orders',
TABLE_TYPE: 'BASE TABLE',
TABLE_COMMENT: '',
TABLE_ROWS: 2,
},
],
[],
);
}
if (sql.includes('INFORMATION_SCHEMA.COLUMNS')) {
return mysqlResult(
[
{
TABLE_SCHEMA: 'analytics',
TABLE_NAME: 'orders',
COLUMN_NAME: 'id',
DATA_TYPE: 'int',
IS_NULLABLE: 'NO',
COLUMN_COMMENT: '',
},
],
[],
);
}
return mysqlResult([], []);
}),
release: vi.fn(),
})),
end: vi.fn(async () => undefined),
})),
};
const connector = new KtxMysqlScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'mysql',
host: 'db.example.test',
database: 'analytics',
username: 'reader',
password: 'secret', // pragma: allowlist secret
},
poolFactory,
});
const scope = tableRefSet([{ catalog: null, db: 'analytics', name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'mysql', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
const tablesQuery = queries.find((query) => query.sql.includes('INFORMATION_SCHEMA.TABLES'));
expect(tablesQuery?.sql).toMatch(/TABLE_NAME IN \(\?\)/);
expect(tablesQuery?.params).toEqual(['analytics', 'orders']);
});
it('runs samples, distinct values, read-only SQL, row count, schema list, and cleanup', async () => {
const poolFactory = fakePoolFactory();
const connector = new KtxMysqlScanConnector({

View file

@ -4,6 +4,7 @@ import { homedir } from 'node:os';
import { resolve } from 'node:path';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxTableListEntry, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { KtxMysqlDialect } from './dialect.js';
export interface KtxMysqlConnectionConfig {
@ -335,23 +336,37 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
this.assertConnection(input.connectionId);
const databases = configuredMysqlSchemas(this.connection, this.poolConfig.database);
const placeholders = databases.map(() => '?').join(', ');
let allScopedTables: string[] | null = null;
if (input.tableScope) {
allScopedTables = [];
for (const database of databases) {
allScopedTables.push(...scopedTableNames(input.tableScope, { catalog: null, db: database }));
}
if (allScopedTables.length === 0) {
return this.emptySnapshot(databases);
}
}
const tableNameClause = allScopedTables
? `AND TABLE_NAME IN (${allScopedTables.map(() => '?').join(', ')})`
: '';
const tableNameParams = allScopedTables ?? [];
const tables = await this.queryRaw<MysqlTableRow>(
`
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, TABLE_COMMENT, TABLE_ROWS
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA IN (${placeholders}) AND TABLE_TYPE IN ('BASE TABLE', 'VIEW')
WHERE TABLE_SCHEMA IN (${placeholders}) AND TABLE_TYPE IN ('BASE TABLE', 'VIEW') ${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME
`,
databases,
[...databases, ...tableNameParams],
);
const columns = await this.queryRaw<MysqlColumnRow>(
`
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COLUMN_COMMENT
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA IN (${placeholders})
WHERE TABLE_SCHEMA IN (${placeholders}) ${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION
`,
databases,
[...databases, ...tableNameParams],
);
const primaryKeys = await this.queryRaw<MysqlPrimaryKeyRow>(
`
@ -359,9 +374,10 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA IN (${placeholders})
AND CONSTRAINT_NAME = 'PRIMARY'
${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION
`,
databases,
[...databases, ...tableNameParams],
);
const foreignKeys = await this.queryRaw<MysqlForeignKeyRow>(
`
@ -369,9 +385,10 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA IN (${placeholders})
AND REFERENCED_TABLE_NAME IS NOT NULL
${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
`,
databases,
[...databases, ...tableNameParams],
);
const columnsByTable = groupByTable(columns, this.poolConfig.database);
@ -403,6 +420,23 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
};
}
private emptySnapshot(databases: string[]): KtxSchemaSnapshot {
return {
connectionId: this.connectionId,
driver: 'mysql',
extractedAt: this.now().toISOString(),
scope: { schemas: databases },
metadata: {
database: this.poolConfig.database,
schemas: databases,
host: this.poolConfig.host,
table_count: 0,
total_columns: 0,
},
tables: [],
};
}
async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise<KtxTableSampleResult> {
this.assertConnection(input.connectionId);
const result = await this.query(this.dialect.generateSampleQuery(this.qTableName(input.table), input.limit, input.columns));

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js';
import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import {
KtxMysqlScanConnector,
@ -18,7 +21,7 @@ export function createMysqlLiveDatabaseIntrospection(
options: CreateMysqlLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
return {
async extractSchema(connectionId: string) {
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxMysqlConnectionConfig | undefined;
const connector = new KtxMysqlScanConnector({
connectionId,
@ -28,7 +31,14 @@ export function createMysqlLiveDatabaseIntrospection(
now: options.now,
});
try {
return await connector.introspect({ connectionId, driver: 'mysql' }, { runId: `mysql-${connectionId}` });
return await connector.introspect(
{
connectionId,
driver: 'mysql',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `mysql-${connectionId}` },
);
} finally {
await connector.cleanup();
}

View file

@ -1,6 +1,7 @@
import { describe, expect, it, vi } from 'vitest';
import { createPostgresLiveDatabaseIntrospection } from '../../connectors/postgres/live-database-introspection.js';
import { isKtxPostgresConnectionConfig, KtxPostgresScanConnector, postgresPoolConfigFromConfig, type KtxPostgresPoolFactory } from '../../connectors/postgres/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
interface FakeQueryResult {
rows: Record<string, unknown>[];
@ -259,6 +260,63 @@ describe('KtxPostgresScanConnector', () => {
).rejects.toThrow('Only read-only SELECT/WITH queries can be executed locally');
});
it('limits introspection to tables in tableScope', async () => {
const queries: Array<{ sql: string; params?: unknown[] }> = [];
const poolFactory: KtxPostgresPoolFactory = {
createPool() {
return {
async connect() {
return {
query: vi.fn(async (sql: string, params?: unknown[]) => {
queries.push({ sql, params });
if (sql.includes('FROM pg_catalog.pg_class c')) {
return { rows: [{ table_name: 'orders', table_kind: 'r', row_count: '3', table_comment: null }] };
}
if (sql.includes('FROM pg_catalog.pg_attribute a')) {
return {
rows: [
{
table_name: 'orders',
column_name: 'id',
data_type: 'integer',
is_nullable: false,
column_comment: null,
},
],
};
}
return { rows: [] };
}),
release: vi.fn(),
};
},
end: vi.fn(async () => undefined),
};
},
};
const connector = new KtxPostgresScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'postgres',
host: 'db.example.test',
database: 'analytics',
username: 'reader',
password: 'test-password', // pragma: allowlist secret
schema: 'public',
},
poolFactory,
});
const scope = tableRefSet([{ catalog: null, db: 'public', name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'postgres', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
const tablesQuery = queries.find((query) => query.sql.includes('FROM pg_catalog.pg_class c'));
expect(tablesQuery?.sql).toMatch(/c\.relname = ANY\(\$2\)/);
expect(tablesQuery?.params).toEqual(['public', ['orders']]);
});
it('adapts native PostgreSQL snapshots to live-database introspection for local ingest', async () => {
const introspection = createPostgresLiveDatabaseIntrospection({
connections: {

View file

@ -3,6 +3,7 @@ import { homedir } from 'node:os';
import { resolve } from 'node:path';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { Pool } from 'pg';
import { KtxPostgresDialect } from './dialect.js';
@ -379,7 +380,9 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
const schemas = schemasFromConnection(this.connection);
const allTables: KtxSchemaTable[] = [];
for (const schema of schemas) {
const tables = await this.loadSchemaTables(schema);
const scopedNames = input.tableScope ? scopedTableNames(input.tableScope, { catalog: null, db: schema }) : null;
if (scopedNames && scopedNames.length === 0) continue;
const tables = await this.loadSchemaTables(schema, scopedNames);
allTables.push(...tables);
}
return {
@ -543,7 +546,11 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
}
}
private async loadSchemaTables(schema: string): Promise<KtxSchemaTable[]> {
private async loadSchemaTables(schema: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const pgCatalogScopeClause = scopedNames ? 'AND c.relname = ANY($2)' : '';
const tableConstraintScopeClause = scopedNames ? 'AND tc.table_name = ANY($2)' : '';
const scopeValues = scopedNames ? [scopedNames] : [];
const tables = await this.queryRaw<PostgresTableRow>(
`
SELECT
@ -557,9 +564,10 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
ON d.objoid = c.oid AND d.objsubid = 0
WHERE n.nspname = $1
AND c.relkind IN ('r', 'v')
${pgCatalogScopeClause}
ORDER BY c.relname
`,
[schema],
[schema, ...scopeValues],
);
const columns = await this.queryRaw<PostgresColumnRow>(
`
@ -578,9 +586,10 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
AND c.relkind IN ('r', 'v')
AND a.attnum > 0
AND NOT a.attisdropped
${pgCatalogScopeClause}
ORDER BY c.relname, a.attnum
`,
[schema],
[schema, ...scopeValues],
);
const primaryKeys = await this.queryRaw<PostgresPrimaryKeyRow>(
`
@ -591,9 +600,10 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
AND tc.table_schema = kcu.table_schema
WHERE tc.constraint_type = 'PRIMARY KEY'
AND tc.table_schema = $1
${tableConstraintScopeClause}
ORDER BY tc.table_name, kcu.ordinal_position
`,
[schema],
[schema, ...scopeValues],
);
const foreignKeys = await this.queryRaw<PostgresForeignKeyRow>(
`
@ -613,9 +623,10 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
AND ccu.table_schema = tc.table_schema
WHERE tc.constraint_type = 'FOREIGN KEY'
AND tc.table_schema = $1
${tableConstraintScopeClause}
ORDER BY tc.table_name, kcu.column_name
`,
[schema],
[schema, ...scopeValues],
);
const columnsByTable = groupByTable(columns);

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js';
import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import {
KtxPostgresScanConnector,
@ -18,7 +21,7 @@ export function createPostgresLiveDatabaseIntrospection(
options: CreatePostgresLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
return {
async extractSchema(connectionId: string) {
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxPostgresConnectionConfig | undefined;
const connector = new KtxPostgresScanConnector({
connectionId,
@ -28,7 +31,14 @@ export function createPostgresLiveDatabaseIntrospection(
now: options.now,
});
try {
return await connector.introspect({ connectionId, driver: 'postgres' }, { runId: `postgres-${connectionId}` });
return await connector.introspect(
{
connectionId,
driver: 'postgres',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `postgres-${connectionId}` },
);
} finally {
await connector.cleanup();
}

View file

@ -1,6 +1,15 @@
import { describe, expect, it, vi } from 'vitest';
const createPool = vi.hoisted(() => vi.fn());
vi.mock('snowflake-sdk', () => ({
default: { createPool },
createPool,
}));
import { createSnowflakeLiveDatabaseIntrospection } from '../../connectors/snowflake/live-database-introspection.js';
import { isKtxSnowflakeConnectionConfig, KtxSnowflakeScanConnector, snowflakeConnectionConfigFromConfig, type KtxSnowflakeDriver, type KtxSnowflakeDriverFactory } from '../../connectors/snowflake/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function fakeDriverFactory(): KtxSnowflakeDriverFactory {
const driver: KtxSnowflakeDriver = {
@ -63,6 +72,38 @@ function fakeDriverFactory(): KtxSnowflakeDriverFactory {
return { createDriver: vi.fn(() => driver) };
}
function fakeSnowflakeStatement(headers: string[] = ['ONE']) {
return {
getColumns: () => headers.map((header) => ({ getName: () => header, getType: () => 'TEXT' })),
};
}
function installSnowflakePoolMock() {
const executedSql: string[] = [];
const connection = {
execute: vi.fn(
(input: {
sqlText: string;
complete: (
error: Error | null,
statement: ReturnType<typeof fakeSnowflakeStatement>,
rows: Array<Record<string, unknown>>,
) => void;
}) => {
executedSql.push(input.sqlText);
input.complete(null, fakeSnowflakeStatement(), [{ ONE: 1 }]);
},
),
};
const pool = {
use: vi.fn(async (fn: (conn: typeof connection) => Promise<unknown>) => fn(connection)),
drain: vi.fn(async () => undefined),
clear: vi.fn(async () => undefined),
};
createPool.mockReturnValue(pool);
return { connection, pool, executedSql };
}
describe('KtxSnowflakeScanConnector', () => {
it('resolves Snowflake connection configuration safely', () => {
expect(
@ -99,6 +140,99 @@ describe('KtxSnowflakeScanConnector', () => {
});
});
it('defaults and validates Snowflake maxSessions', () => {
const baseConnection = {
driver: 'snowflake',
authMethod: 'password',
account: 'acct',
warehouse: 'WH',
database: 'ANALYTICS',
schema_name: 'PUBLIC',
username: 'reader',
password: 'fixture-pass', // pragma: allowlist secret
} as const;
expect(
snowflakeConnectionConfigFromConfig({
connectionId: 'warehouse',
connection: baseConnection,
}),
).toMatchObject({ maxSessions: 4 });
expect(
snowflakeConnectionConfigFromConfig({
connectionId: 'warehouse',
connection: { ...baseConnection, maxSessions: 8 },
}),
).toMatchObject({ maxSessions: 8 });
for (const maxSessions of [0, -1, 1.5, Number.NaN]) {
expect(() =>
snowflakeConnectionConfigFromConfig({
connectionId: 'warehouse',
connection: { ...baseConnection, maxSessions },
}),
).toThrow('connections.warehouse.maxSessions must be a positive integer');
}
});
it('uses one lazy Snowflake pool and drains it during cleanup', async () => {
const { pool, executedSql } = installSnowflakePoolMock();
const close = vi.fn(async () => undefined);
const connector = new KtxSnowflakeScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'snowflake',
authMethod: 'password',
account: 'acct',
warehouse: 'WH',
database: 'ANALYTICS',
schema_name: 'PUBLIC',
username: 'reader',
password: 'fixture-pass', // pragma: allowlist secret
role: 'ANALYST',
maxSessions: 3,
},
sdkOptionsProvider: {
resolve: vi.fn(async () => ({ sdkOptions: { application: 'ktx-test' }, close })),
},
});
expect(createPool).not.toHaveBeenCalled();
await connector.executeReadOnly({ connectionId: 'warehouse', sql: 'select 1', maxRows: 1 }, { runId: 'run-1' });
await connector.executeReadOnly({ connectionId: 'warehouse', sql: 'select 1', maxRows: 1 }, { runId: 'run-1' });
expect(createPool).toHaveBeenCalledTimes(1);
expect(createPool).toHaveBeenCalledWith(
expect.objectContaining({
account: 'acct',
username: 'reader',
warehouse: 'WH',
database: 'ANALYTICS',
schema: 'PUBLIC',
role: 'ANALYST',
password: 'fixture-pass', // pragma: allowlist secret
clientSessionKeepAlive: true,
clientSessionKeepAliveHeartbeatFrequency: 900,
application: 'ktx-test',
}),
expect.objectContaining({
min: 0,
max: 3,
evictionRunIntervalMillis: 30_000,
acquireTimeoutMillis: 60_000,
}),
);
expect(pool.use).toHaveBeenCalledTimes(2);
expect(executedSql.some((sql) => /^USE\s+/i.test(sql.trim()))).toBe(false);
await connector.cleanup();
expect(pool.drain).toHaveBeenCalledBefore(pool.clear);
expect(pool.clear).toHaveBeenCalledTimes(1);
expect(close).toHaveBeenCalledTimes(1);
});
it('introspects schema, primary keys, comments, row counts, and dimensions', async () => {
const connector = new KtxSnowflakeScanConnector({
connectionId: 'warehouse',
@ -157,6 +291,108 @@ describe('KtxSnowflakeScanConnector', () => {
]);
});
it('continues introspection when primary-key discovery is not authorized', async () => {
const driverFactory = fakeDriverFactory();
const driver = (driverFactory.createDriver as ReturnType<typeof vi.fn>).getMockImplementation() as
| (() => KtxSnowflakeDriver)
| undefined;
if (!driver) throw new Error('driver mock missing');
const built = driver();
(built.query as ReturnType<typeof vi.fn>).mockImplementation(async (sql: string) => {
if (sql.includes('TABLE_CONSTRAINTS')) {
throw new Error(
"SQL compilation error: Object 'ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE' does not exist or not authorized.",
);
}
throw new Error(`Unexpected SQL: ${sql}`);
});
(driverFactory.createDriver as ReturnType<typeof vi.fn>).mockReturnValue(built);
const warn = vi.spyOn(console, 'warn').mockImplementation(() => undefined);
try {
const connector = new KtxSnowflakeScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'snowflake',
authMethod: 'password',
account: 'acct',
warehouse: 'WH',
database: 'ANALYTICS',
schema_name: 'PUBLIC',
username: 'reader',
password: 'fixture-pass', // pragma: allowlist secret
},
driverFactory,
});
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'snowflake' },
{ runId: 'scan-run-pk-skip' },
);
expect(snapshot.tables.map((table) => table.name).sort()).toEqual(['ORDERS', 'ORDER_SUMMARY']);
expect(snapshot.tables.every((table) => table.columns.every((column) => column.primaryKey === false))).toBe(true);
expect(warn).not.toHaveBeenCalled();
} finally {
warn.mockRestore();
}
});
it('limits introspection to tables in tableScope', async () => {
const queries: Array<{ sql: string; params?: unknown }> = [];
const getSchemaMetadata = vi.fn(async (_schemaName?: string, scopedNames?: readonly string[] | null) =>
scopedNames?.includes('ORDERS')
? [
{
name: 'ORDERS',
catalog: 'ANALYTICS',
db: 'MARTS',
rowCount: 10,
comment: null,
columns: [{ name: 'ID', type: 'NUMBER', nullable: false, comment: null }],
},
]
: [],
);
const driverFactory: KtxSnowflakeDriverFactory = {
createDriver: vi.fn(() => ({
test: vi.fn(async () => ({ success: true })),
query: vi.fn(async (sql: string, params?: unknown) => {
queries.push({ sql, params });
return { headers: [], rows: [], totalRows: 0, rowCount: 0 };
}),
getSchemaMetadata,
listSchemas: vi.fn(async () => []),
listTables: vi.fn(async () => []),
cleanup: vi.fn(async () => undefined),
})),
};
const connector = new KtxSnowflakeScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'snowflake',
authMethod: 'password',
account: 'acct',
warehouse: 'WH',
database: 'ANALYTICS',
schema_name: 'MARTS',
username: 'reader',
password: 'fixture-pass', // pragma: allowlist secret
},
driverFactory,
});
const scope = tableRefSet([{ catalog: 'ANALYTICS', db: 'MARTS', name: 'ORDERS' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'snowflake', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['ORDERS']);
expect(getSchemaMetadata).toHaveBeenCalledWith('MARTS', ['ORDERS']);
const primaryKeysQuery = queries.find((query) => query.sql.includes('TABLE_CONSTRAINTS'));
expect(primaryKeysQuery?.sql).toMatch(/AND tc\.TABLE_NAME IN \(\?\)/);
expect(primaryKeysQuery?.params).toEqual(['MARTS', 'ANALYTICS', 'ORDERS']);
});
it('supports read-only query, sampling, distinct values, row counts, schema listing, and cleanup', async () => {
const driverFactory = fakeDriverFactory();
const connector = new KtxSnowflakeScanConnector({

View file

@ -4,9 +4,12 @@ import { homedir } from 'node:os';
import { resolve } from 'node:path';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import * as snowflake from 'snowflake-sdk';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import snowflake from 'snowflake-sdk';
import type { Bind, Binds, Connection, ConnectionOptions } from 'snowflake-sdk';
import { KtxSnowflakeDialect } from './dialect.js';
import { assertSafeSnowflakeIdentifier, quoteSnowflakeIdentifier } from './identifiers.js';
import { configureSnowflakeSdkLogger } from './sdk-logger.js';
export interface KtxSnowflakeConnectionConfig {
driver?: string;
@ -21,6 +24,7 @@ export interface KtxSnowflakeConnectionConfig {
privateKey?: string;
passphrase?: string;
role?: string;
maxSessions?: number;
[key: string]: unknown;
}
@ -35,6 +39,7 @@ export interface KtxSnowflakeResolvedConnectionConfig {
privateKey?: string;
passphrase?: string;
role?: string;
maxSessions: number;
}
export interface KtxSnowflakeRawColumnMetadata {
@ -56,7 +61,7 @@ export interface KtxSnowflakeRawTableMetadata {
export interface KtxSnowflakeDriver {
test(): Promise<{ success: boolean; error?: string }>;
query(sql: string, params?: unknown): Promise<KtxQueryResult>;
getSchemaMetadata(schemaName?: string): Promise<KtxSnowflakeRawTableMetadata[]>;
getSchemaMetadata(schemaName?: string, scopedTableNames?: readonly string[] | null): Promise<KtxSnowflakeRawTableMetadata[]>;
listSchemas(): Promise<string[]>;
listTables(schemas?: string[]): Promise<KtxTableListEntry[]>;
cleanup(): Promise<void>;
@ -79,6 +84,12 @@ export interface KtxSnowflakeSdkOptionsProvider {
export interface KtxSnowflakeScanConnectorOptions {
connectionId: string;
connection: KtxSnowflakeConnectionConfig | undefined;
/**
* KTX project directory. When provided, snowflake-sdk's logger is redirected to
* `<projectDir>/.ktx/logs/snowflake.log` so its JSON output does not bleed into
* the CLI's TTY. Tests that use a fake driverFactory can leave this undefined.
*/
projectDir?: string;
driverFactory?: KtxSnowflakeDriverFactory;
sdkOptionsProvider?: KtxSnowflakeSdkOptionsProvider;
env?: NodeJS.ProcessEnv;
@ -123,13 +134,31 @@ function stringConfigValue(
return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(value.trim(), env) : undefined;
}
function positiveIntegerConfigValue(input: {
connection: KtxSnowflakeConnectionConfig;
key: keyof KtxSnowflakeConnectionConfig;
connectionId: string;
defaultValue: number;
}): number {
const value = input.connection[input.key];
if (value === undefined) {
return input.defaultValue;
}
const numberValue = Number(value);
if (!Number.isInteger(numberValue) || numberValue < 1) {
throw new Error(`connections.${input.connectionId}.${String(input.key)} must be a positive integer`);
}
return numberValue;
}
function schemaNames(connection: KtxSnowflakeConnectionConfig, env: NodeJS.ProcessEnv): string[] {
if (Array.isArray(connection.schema_names) && connection.schema_names.length > 0) {
return connection.schema_names
.filter((schema) => schema.trim().length > 0)
.map((schema) => resolveStringReference(schema, env));
}
return [stringConfigValue(connection, 'schema_name', env) ?? 'PUBLIC'];
const single = stringConfigValue(connection, 'schema_name', env);
return single ? [single] : [];
}
function firstNumber(value: unknown): number | null {
@ -159,7 +188,7 @@ function normalizeSnowflakeValue(value: unknown, columnType?: string): unknown {
return value;
}
function toSnowflakeBind(value: unknown): snowflake.Bind {
function toSnowflakeBind(value: unknown): Bind {
if (value === null || typeof value === 'string' || typeof value === 'number' || typeof value === 'boolean') {
return value;
}
@ -169,7 +198,7 @@ function toSnowflakeBind(value: unknown): snowflake.Bind {
return String(value);
}
function toSnowflakeBinds(params: unknown[] | undefined): snowflake.Binds | undefined {
function toSnowflakeBinds(params: unknown[] | undefined): Binds | undefined {
return params?.map((value) => toSnowflakeBind(value));
}
@ -220,6 +249,12 @@ export function snowflakeConnectionConfigFromConfig(input: {
database,
schemas: resolvedSchemas,
username,
maxSessions: positiveIntegerConfigValue({
connection: input.connection,
key: 'maxSessions',
connectionId: input.connectionId,
defaultValue: 4,
}),
};
const role = stringConfigValue(input.connection, 'role', env);
if (role) {
@ -255,6 +290,7 @@ class DefaultSnowflakeDriverFactory implements KtxSnowflakeDriverFactory {
class SnowflakeSdkDriver implements KtxSnowflakeDriver {
private closeSdkOptions: Array<() => Promise<void>> = [];
private pool: ReturnType<typeof snowflake.createPool> | null = null;
constructor(
private readonly resolved: KtxSnowflakeResolvedConnectionConfig,
@ -275,37 +311,50 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
}
async query(sql: string, params?: unknown): Promise<KtxQueryResult> {
let connection: snowflake.Connection | null = null;
const binds = Array.isArray(params) ? toSnowflakeBinds(params) : undefined;
try {
connection = await this.createConnection();
const binds = Array.isArray(params) ? toSnowflakeBinds(params) : undefined;
const result = await this.executeSnowflakeQuery(connection, sql, binds);
const pool = await this.getPool();
const result = await pool.use(async (connection: snowflake.Connection) =>
this.executeSnowflakeQuery(connection, sql, binds),
);
return { ...result, totalRows: result.rows.length, rowCount: result.rows.length };
} finally {
if (connection) {
await this.destroyConnection(connection);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
if (/timeout/i.test(message) && /pool|acquire/i.test(message)) {
throw new Error(
"Snowflake session pool exhausted after 60s - consider lowering maxSessions or increasing your account's concurrent-statement limit.",
);
}
throw error;
}
}
async getSchemaMetadata(schemaName = this.resolved.schemas[0] ?? 'PUBLIC'): Promise<KtxSnowflakeRawTableMetadata[]> {
async getSchemaMetadata(
schemaName = this.resolved.schemas[0] ?? 'PUBLIC',
scopedTableNames: readonly string[] | null = null,
): Promise<KtxSnowflakeRawTableMetadata[]> {
const scopeClause =
scopedTableNames && scopedTableNames.length > 0
? `AND TABLE_NAME IN (${scopedTableNames.map(() => '?').join(', ')})`
: '';
const scopeParams = scopedTableNames ?? [];
const tablesResult = await this.query(
`
SELECT TABLE_NAME, TABLE_TYPE, COMMENT, ROW_COUNT
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = ? AND TABLE_CATALOG = ?
WHERE TABLE_SCHEMA = ? AND TABLE_CATALOG = ? ${scopeClause}
ORDER BY TABLE_NAME
`,
[schemaName, this.resolved.database],
[schemaName, this.resolved.database, ...scopeParams],
);
const columnsResult = await this.query(
`
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COMMENT, ORDINAL_POSITION
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = ? AND TABLE_CATALOG = ?
WHERE TABLE_SCHEMA = ? AND TABLE_CATALOG = ? ${scopeClause}
ORDER BY TABLE_NAME, ORDINAL_POSITION
`,
[schemaName, this.resolved.database],
[schemaName, this.resolved.database, ...scopeParams],
);
const columnsByTable = new Map<string, KtxSnowflakeRawColumnMetadata[]>();
for (const row of columnsResult.rows) {
@ -357,27 +406,41 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
}
async cleanup(): Promise<void> {
const pool = this.pool;
this.pool = null;
if (pool) {
// Drain before clear so in-flight Snowflake statements finish before idle
// sessions are closed.
await pool.drain();
await pool.clear();
}
const closers = this.closeSdkOptions;
this.closeSdkOptions = [];
await Promise.all(closers.map((close) => close()));
await Promise.all(closers.map((close) => Promise.resolve(close())));
}
private async runTest(): Promise<{ success: boolean; error?: string }> {
let connection: snowflake.Connection | null = null;
try {
connection = await this.createConnection();
await this.executeSnowflakeQuery(connection, 'SELECT 1');
await this.query('SELECT 1');
return { success: true };
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
} finally {
if (connection) {
await this.destroyConnection(connection);
}
}
}
private async createConnection(): Promise<snowflake.Connection> {
private async getPool(): Promise<ReturnType<typeof snowflake.createPool>> {
if (!this.pool) {
this.pool = snowflake.createPool(await this.resolveConnectionOptions(), {
min: 0,
max: this.resolved.maxSessions,
evictionRunIntervalMillis: 30_000,
acquireTimeoutMillis: 60_000,
});
}
return this.pool;
}
private async resolveConnectionOptions(): Promise<snowflake.ConnectionOptions> {
const patch = await this.sdkOptionsProvider?.resolve({
account: this.resolved.account,
connection: { ...this.resolved, driver: 'snowflake' },
@ -385,60 +448,27 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
if (patch?.close) {
this.closeSdkOptions.push(patch.close);
}
const baseConfig: snowflake.ConnectionOptions = {
const sessionSchema = this.resolved.schemas[0];
const baseConfig: ConnectionOptions = {
account: this.resolved.account,
username: this.resolved.username,
warehouse: this.resolved.warehouse,
database: this.resolved.database,
schema: this.resolved.schemas[0] ?? 'PUBLIC',
...(sessionSchema ? { schema: sessionSchema } : {}),
role: this.resolved.role,
clientSessionKeepAlive: true,
clientSessionKeepAliveHeartbeatFrequency: 900,
...patch?.sdkOptions,
};
const connectionConfig: snowflake.ConnectionOptions =
this.resolved.authMethod === 'rsa'
? { ...baseConfig, authenticator: 'SNOWFLAKE_JWT', privateKey: this.decryptPrivateKey() }
: { ...baseConfig, password: this.resolved.password };
const connection = snowflake.createConnection(connectionConfig);
return new Promise((resolveConnection, rejectConnection) => {
connection.connect((error, connected) => {
if (error) {
rejectConnection(error);
return;
}
const resolvedConnection = connected ?? connection;
this.setConnectionContext(resolvedConnection).then(
() => resolveConnection(resolvedConnection),
(contextError) => {
resolvedConnection.destroy(() => undefined);
rejectConnection(contextError);
},
);
});
});
}
private async setConnectionContext(connection: snowflake.Connection): Promise<void> {
if (this.resolved.role) {
await this.executeSnowflakeQuery(connection, `USE ROLE ${quoteSnowflakeIdentifier(this.resolved.role, 'role')}`);
}
await this.executeSnowflakeQuery(
connection,
`USE WAREHOUSE ${quoteSnowflakeIdentifier(this.resolved.warehouse, 'warehouse')}`,
);
await this.executeSnowflakeQuery(
connection,
`USE DATABASE ${quoteSnowflakeIdentifier(this.resolved.database, 'database')}`,
);
await this.executeSnowflakeQuery(
connection,
`USE SCHEMA ${quoteSnowflakeIdentifier(this.resolved.schemas[0] ?? 'PUBLIC', 'schema')}`,
);
return this.resolved.authMethod === 'rsa'
? { ...baseConfig, authenticator: 'SNOWFLAKE_JWT', privateKey: this.decryptPrivateKey() }
: { ...baseConfig, password: this.resolved.password };
}
private async executeSnowflakeQuery(
connection: snowflake.Connection,
connection: Connection,
sqlText: string,
binds?: snowflake.Binds,
binds?: Binds,
): Promise<{ headers: string[]; headerTypes?: string[]; rows: unknown[][] }> {
return new Promise((resolveQuery, rejectQuery) => {
connection.execute({
@ -461,18 +491,6 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
});
}
private destroyConnection(connection: snowflake.Connection): Promise<void> {
return new Promise((resolveDestroy, rejectDestroy) => {
connection.destroy((error) => {
if (error) {
rejectDestroy(error);
return;
}
resolveDestroy();
});
});
}
private decryptPrivateKey(): string {
if (!this.resolved.privateKey) {
throw new Error('Private key is required for RSA authentication');
@ -510,6 +528,9 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
this.driverFactory = options.driverFactory ?? new DefaultSnowflakeDriverFactory();
this.now = options.now ?? (() => new Date());
this.id = `snowflake:${options.connectionId}`;
if (options.projectDir) {
configureSnowflakeSdkLogger(options.projectDir);
}
}
async testConnection(): Promise<{ success: boolean; error?: string }> {
@ -520,7 +541,11 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
this.assertConnection(input.connectionId);
const tables: KtxSchemaTable[] = [];
for (const schemaName of this.resolved.schemas) {
const rawTables = await this.getDriver().getSchemaMetadata(schemaName);
const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.resolved.database, db: schemaName })
: null;
if (scopedNames && scopedNames.length === 0) continue;
const rawTables = await this.getDriver().getSchemaMetadata(schemaName, scopedNames);
const primaryKeys = await this.primaryKeys(rawTables.map((table) => table.name), schemaName);
tables.push(...rawTables.map((table) => this.toSchemaTable(table, primaryKeys)));
}
@ -653,32 +678,39 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
}
private async primaryKeys(tableNames: string[], schemaName: string): Promise<Map<string, Set<string>>> {
if (tableNames.length === 0) {
return new Map();
}
const result = await this.getDriver().query(
`
SELECT tc.TABLE_NAME, kcu.COLUMN_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA
AND tc.TABLE_CATALOG = kcu.TABLE_CATALOG
WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
AND tc.TABLE_SCHEMA = ?
AND tc.TABLE_CATALOG = ?
ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION
`,
[schemaName, this.resolved.database],
);
const grouped = new Map<string, Set<string>>();
for (const tableName of tableNames) {
grouped.set(tableName, new Set());
}
for (const row of result.rows) {
const tableName = String(row[0]);
const columnName = String(row[1]);
grouped.get(tableName)?.add(columnName);
if (tableNames.length === 0) {
return grouped;
}
const tableNamePlaceholders = tableNames.map(() => '?').join(', ');
try {
const result = await this.getDriver().query(
`
SELECT tc.TABLE_NAME, kcu.COLUMN_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA
AND tc.TABLE_CATALOG = kcu.TABLE_CATALOG
WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
AND tc.TABLE_SCHEMA = ?
AND tc.TABLE_CATALOG = ?
AND tc.TABLE_NAME IN (${tableNamePlaceholders})
ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION
`,
[schemaName, this.resolved.database, ...tableNames],
);
for (const row of result.rows) {
const tableName = String(row[0]);
const columnName = String(row[1]);
grouped.get(tableName)?.add(columnName);
}
} catch {
// INFORMATION_SCHEMA.KEY_COLUMN_USAGE often isn't granted to read-only roles;
// continue with empty PK map and let FK inference + profiling carry the slack.
}
return grouped;
}

View file

@ -0,0 +1,31 @@
import { KtxSnowflakeScanConnector, type KtxSnowflakeScanConnectorOptions } from './connector.js';
export type KtxSnowflakeHistoricSqlQueryClientOptions = KtxSnowflakeScanConnectorOptions;
export class KtxSnowflakeHistoricSqlQueryClient {
private readonly connectionId: string;
private readonly connector: KtxSnowflakeScanConnector;
constructor(options: KtxSnowflakeHistoricSqlQueryClientOptions) {
this.connectionId = options.connectionId;
this.connector = new KtxSnowflakeScanConnector(options);
}
async executeQuery(
sql: string,
): Promise<{ headers: string[]; rows: unknown[][]; totalRows: number }> {
const result = await this.connector.executeReadOnly(
{ connectionId: this.connectionId, sql },
{} as never,
);
return {
headers: result.headers,
rows: result.rows,
totalRows: result.totalRows,
};
}
async cleanup(): Promise<void> {
await this.connector.cleanup();
}
}

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js';
import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import {
KtxSnowflakeScanConnector,
@ -9,6 +12,7 @@ import {
interface CreateSnowflakeLiveDatabaseIntrospectionOptions {
connections: Record<string, KtxProjectConnectionConfig>;
projectDir?: string;
driverFactory?: KtxSnowflakeDriverFactory;
sdkOptionsProvider?: KtxSnowflakeSdkOptionsProvider;
now?: () => Date;
@ -18,18 +22,23 @@ export function createSnowflakeLiveDatabaseIntrospection(
options: CreateSnowflakeLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
return {
async extractSchema(connectionId: string) {
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxSnowflakeConnectionConfig | undefined;
const connector = new KtxSnowflakeScanConnector({
connectionId,
connection,
...(options.projectDir ? { projectDir: options.projectDir } : {}),
driverFactory: options.driverFactory,
sdkOptionsProvider: options.sdkOptionsProvider,
now: options.now,
});
try {
return await connector.introspect(
{ connectionId, driver: 'snowflake' },
{
connectionId,
driver: 'snowflake',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `snowflake-${connectionId}` },
);
} finally {

View file

@ -0,0 +1,57 @@
import { mkdtempSync, rmSync, statSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join, resolve } from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
const { configure } = vi.hoisted(() => ({ configure: vi.fn() }));
vi.mock('snowflake-sdk', () => ({
default: { configure },
}));
import {
configureSnowflakeSdkLogger,
resetSnowflakeSdkLoggerConfigurationForTests,
} from './sdk-logger.js';
describe('configureSnowflakeSdkLogger', () => {
let projectDir: string;
beforeEach(() => {
configure.mockReset();
resetSnowflakeSdkLoggerConfigurationForTests();
projectDir = mkdtempSync(join(tmpdir(), 'ktx-snowflake-logger-'));
});
afterEach(() => {
rmSync(projectDir, { recursive: true, force: true });
});
it('routes logs to <projectDir>/.ktx/logs/snowflake.log with console output disabled', () => {
const expected = resolve(projectDir, '.ktx', 'logs', 'snowflake.log');
const returned = configureSnowflakeSdkLogger(projectDir);
expect(returned).toBe(expected);
expect(configure).toHaveBeenCalledTimes(1);
expect(configure).toHaveBeenCalledWith({
logFilePath: expected,
additionalLogToConsole: false,
});
expect(statSync(resolve(projectDir, '.ktx', 'logs')).isDirectory()).toBe(true);
});
it('is idempotent for the same projectDir', () => {
configureSnowflakeSdkLogger(projectDir);
configureSnowflakeSdkLogger(projectDir);
expect(configure).toHaveBeenCalledTimes(1);
});
it('reconfigures when projectDir changes', () => {
const other = mkdtempSync(join(tmpdir(), 'ktx-snowflake-logger-other-'));
try {
configureSnowflakeSdkLogger(projectDir);
configureSnowflakeSdkLogger(other);
expect(configure).toHaveBeenCalledTimes(2);
} finally {
rmSync(other, { recursive: true, force: true });
}
});
});

View file

@ -0,0 +1,32 @@
import { mkdirSync } from 'node:fs';
import { resolve } from 'node:path';
import snowflake from 'snowflake-sdk';
let configuredLogFilePath: string | null = null;
/**
* Redirects the snowflake-sdk logger to a project-scoped file so its JSON output
* does not bleed into the CLI's TTY (which would pollute the setup wizard and
* break the in-place progress repainter in `context-build-view.ts`).
*
* Idempotent per process: re-calling with the same projectDir is a no-op.
*/
export function configureSnowflakeSdkLogger(projectDir: string): string {
const logDir = resolve(projectDir, '.ktx', 'logs');
const logFilePath = resolve(logDir, 'snowflake.log');
if (configuredLogFilePath === logFilePath) {
return logFilePath;
}
mkdirSync(logDir, { recursive: true });
snowflake.configure({
logFilePath,
additionalLogToConsole: false,
});
configuredLogFilePath = logFilePath;
return logFilePath;
}
/** @internal */
export function resetSnowflakeSdkLoggerConfigurationForTests(): void {
configuredLogFilePath = null;
}

View file

@ -6,6 +6,7 @@ import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { createSqliteLiveDatabaseIntrospection } from '../../connectors/sqlite/live-database-introspection.js';
import { isKtxSqliteConnectionConfig, KtxSqliteScanConnector, sqliteDatabasePathFromConfig } from '../../connectors/sqlite/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
describe('KtxSqliteScanConnector', () => {
let tempDir: string;
@ -196,6 +197,19 @@ describe('KtxSqliteScanConnector', () => {
).resolves.toBeNull();
});
it('limits introspection to tables in tableScope', async () => {
const connector = new KtxSqliteScanConnector({
connectionId: 'warehouse',
connection: { driver: 'sqlite', path: dbPath },
});
const scope = tableRefSet([{ catalog: null, db: null, name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'sqlite', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
});
it('adapts native SQLite snapshots to live-database introspection for local ingest', async () => {
const introspection = createSqliteLiveDatabaseIntrospection({
projectDir: tempDir,

View file

@ -6,6 +6,7 @@ import { fileURLToPath } from 'node:url';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { normalizeQueryRows } from '../../context/connections/query-executor.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { KtxSqliteDialect } from './dialect.js';
export interface KtxSqliteConnectionConfig {
@ -181,11 +182,16 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
this.assertConnection(input.connectionId);
const database = this.database();
const rawTables = database
.prepare(
`SELECT name, type FROM sqlite_master WHERE type IN ('table', 'view') AND name NOT LIKE 'sqlite_%' ORDER BY name`,
)
.all() as SqliteMasterRow[];
const scopedNames = input.tableScope ? scopedTableNames(input.tableScope, { catalog: null, db: null }) : null;
const scopeClause = scopedNames ? `AND name IN (${scopedNames.map(() => '?').join(', ')})` : '';
const rawTables =
scopedNames && scopedNames.length === 0
? []
: (database
.prepare(
`SELECT name, type FROM sqlite_master WHERE type IN ('table', 'view') AND name NOT LIKE 'sqlite_%' ${scopeClause} ORDER BY name`,
)
.all(...(scopedNames ?? [])) as SqliteMasterRow[]);
const tables = rawTables.map((table) => this.readTable(database, table));
const fileStats = existsSync(this.dbPath) ? statSync(this.dbPath) : null;
return {

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js';
import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import { KtxSqliteScanConnector, type KtxSqliteConnectionConfig } from './connector.js';
@ -12,7 +15,7 @@ export function createSqliteLiveDatabaseIntrospection(
options: CreateSqliteLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
return {
async extractSchema(connectionId: string) {
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxSqliteConnectionConfig | undefined;
const connector = new KtxSqliteScanConnector({
connectionId,
@ -21,7 +24,14 @@ export function createSqliteLiveDatabaseIntrospection(
now: options.now,
});
try {
return await connector.introspect({ connectionId, driver: 'sqlite' }, { runId: `sqlite-${connectionId}` });
return await connector.introspect(
{
connectionId,
driver: 'sqlite',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `sqlite-${connectionId}` },
);
} finally {
await connector.cleanup();
}

View file

@ -1,6 +1,7 @@
import { describe, expect, it, vi } from 'vitest';
import { createSqlServerLiveDatabaseIntrospection } from '../../connectors/sqlserver/live-database-introspection.js';
import { isKtxSqlServerConnectionConfig, KtxSqlServerScanConnector, sqlServerConnectionPoolConfigFromConfig, type KtxSqlServerPoolFactory, type KtxSqlServerQueryResult } from '../../connectors/sqlserver/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function recordset<T extends Record<string, unknown>>(
rows: T[],
@ -290,6 +291,55 @@ describe('KtxSqlServerScanConnector', () => {
await connector.cleanup();
});
it('limits introspection to tables in tableScope', async () => {
const queries: string[] = [];
const inputs: Array<{ name: string; value: unknown }> = [];
const request = {
input: vi.fn((name: string, value: unknown) => {
inputs.push({ name, value });
return request;
}),
query: vi.fn(async (sql: string): Promise<KtxSqlServerQueryResult> => {
queries.push(sql);
if (sql.includes('INFORMATION_SCHEMA.TABLES')) {
return result([{ table_name: 'orders', table_type: 'BASE TABLE' }], ['table_name', 'table_type']);
}
if (sql.includes('INFORMATION_SCHEMA.COLUMNS')) {
return result(
[{ table_name: 'orders', column_name: 'id', data_type: 'int', is_nullable: 'NO' }],
['table_name', 'column_name', 'data_type', 'is_nullable'],
);
}
return result([], []);
}),
};
const poolFactory: KtxSqlServerPoolFactory = {
createPool: vi.fn(async () => ({
request: () => request,
close: vi.fn(async () => undefined),
})),
};
const connector = new KtxSqlServerScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'sqlserver',
host: 'db.example.test',
database: 'analytics',
username: 'reader',
schema: 'dbo',
},
poolFactory,
});
const scope = tableRefSet([{ catalog: 'analytics', db: 'dbo', name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'sqlserver', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
expect(queries.find((query) => query.includes('INFORMATION_SCHEMA.TABLES'))).toMatch(/TABLE_NAME IN \(@table_0\)/);
expect(inputs).toEqual(expect.arrayContaining([{ name: 'table_0', value: 'orders' }]));
});
it('adapts native SQL Server snapshots to live-database introspection for local ingest', async () => {
const introspection = createSqlServerLiveDatabaseIntrospection({
connections: {

View file

@ -1,5 +1,6 @@
import { assertReadOnlySql } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { readFileSync } from 'node:fs';
import { homedir } from 'node:os';
import { resolve } from 'node:path';
@ -121,6 +122,20 @@ function sqlRecordset(
return recordset;
}
function tableScopeSql(
scopedNames: readonly string[] | null,
columnExpression: string,
): { clause: string; params: Record<string, unknown> } {
if (!scopedNames) return { clause: '', params: {} };
const params: Record<string, unknown> = {};
const placeholders = scopedNames.map((name, index) => {
const key = `table_${index}`;
params[key] = name;
return `@${key}`;
});
return { clause: `AND ${columnExpression} IN (${placeholders.join(', ')})`, params };
}
class DefaultSqlServerPoolFactory implements KtxSqlServerPoolFactory {
async createPool(config: KtxSqlServerPoolConfig): Promise<KtxSqlServerPool> {
const pool = await new sql.ConnectionPool(config as sql.config).connect();
@ -314,7 +329,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
this.assertConnection(input.connectionId);
const tables: KtxSchemaTable[] = [];
for (const schemaName of this.schemas) {
tables.push(...(await this.introspectSchema(schemaName)));
const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.poolConfig.database, db: schemaName })
: null;
tables.push(...(await this.introspectSchema(schemaName, scopedNames)));
}
return {
connectionId: this.connectionId,
@ -461,16 +479,19 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
}
}
private async introspectSchema(schemaName: string): Promise<KtxSchemaTable[]> {
private async introspectSchema(schemaName: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const tableScope = tableScopeSql(scopedNames, 'TABLE_NAME');
const tables = await this.queryRaw<{ table_name: string; table_type: string }>(
`
SELECT TABLE_NAME AS table_name, TABLE_TYPE AS table_type
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = @schemaName
AND TABLE_TYPE IN ('BASE TABLE', 'VIEW')
${tableScope.clause}
ORDER BY TABLE_NAME
`,
{ schemaName },
{ schemaName, ...tableScope.params },
);
const columns = await this.queryRaw<{
table_name: string;
@ -482,15 +503,16 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
SELECT TABLE_NAME AS table_name, COLUMN_NAME AS column_name, DATA_TYPE AS data_type, IS_NULLABLE AS is_nullable
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @schemaName
${tableScope.clause}
ORDER BY TABLE_NAME, ORDINAL_POSITION
`,
{ schemaName },
{ schemaName, ...tableScope.params },
);
const tableComments = await this.tableComments(schemaName);
const columnComments = await this.columnComments(schemaName);
const primaryKeys = await this.primaryKeys(schemaName);
const foreignKeys = await this.foreignKeys(schemaName);
const rowCounts = await this.rowCounts(schemaName);
const tableComments = await this.tableComments(schemaName, scopedNames);
const columnComments = await this.columnComments(schemaName, scopedNames);
const primaryKeys = await this.primaryKeys(schemaName, scopedNames);
const foreignKeys = await this.foreignKeys(schemaName, scopedNames);
const rowCounts = await this.rowCounts(schemaName, scopedNames);
const columnsByTable = groupByTable(columns);
const foreignKeysByTable = groupByTable(foreignKeys);
@ -508,7 +530,8 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
}));
}
private async tableComments(schemaName: string): Promise<Map<string, string>> {
private async tableComments(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, string>> {
const tableScope = tableScopeSql(scopedNames, 'o.name');
const rows = await this.queryRaw<{ table_name: string; table_comment: string }>(
`
SELECT o.name AS table_name, CAST(ep.value AS NVARCHAR(MAX)) AS table_comment
@ -519,13 +542,15 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
AND ep.name = 'MS_Description'
WHERE s.name = @schemaName
AND o.type IN ('U', 'V')
${tableScope.clause}
`,
{ schemaName },
{ schemaName, ...tableScope.params },
);
return new Map(rows.map((row) => [row.table_name, row.table_comment]));
}
private async columnComments(schemaName: string): Promise<Map<string, string>> {
private async columnComments(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, string>> {
const tableScope = tableScopeSql(scopedNames, 'o.name');
const rows = await this.queryRaw<{ table_name: string; column_name: string; column_comment: string }>(
`
SELECT o.name AS table_name, c.name AS column_name, CAST(ep.value AS NVARCHAR(MAX)) AS column_comment
@ -537,13 +562,18 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
AND ep.name = 'MS_Description'
WHERE s.name = @schemaName
AND o.type IN ('U', 'V')
${tableScope.clause}
`,
{ schemaName },
{ schemaName, ...tableScope.params },
);
return new Map(rows.map((row) => [`${row.table_name}.${row.column_name}`, row.column_comment]));
}
private async primaryKeys(schemaName: string): Promise<Map<string, Set<string>>> {
private async primaryKeys(
schemaName: string,
scopedNames: readonly string[] | null,
): Promise<Map<string, Set<string>>> {
const tableScope = tableScopeSql(scopedNames, 'tc.TABLE_NAME');
const rows = await this.queryRaw<{ table_name: string; column_name: string }>(
`
SELECT tc.TABLE_NAME AS table_name, kcu.COLUMN_NAME AS column_name
@ -553,9 +583,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA
WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
AND tc.TABLE_SCHEMA = @schemaName
${tableScope.clause}
ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION
`,
{ schemaName },
{ schemaName, ...tableScope.params },
);
const grouped = new Map<string, Set<string>>();
for (const row of rows) {
@ -566,7 +597,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
return grouped;
}
private async foreignKeys(schemaName: string): Promise<
private async foreignKeys(
schemaName: string,
scopedNames: readonly string[] | null,
): Promise<
Array<{
table_name: string;
column_name: string;
@ -576,6 +610,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
constraint_name: string;
}>
> {
const tableScope = tableScopeSql(scopedNames, 'fk.TABLE_NAME');
return this.queryRaw(
`
SELECT
@ -596,13 +631,15 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
AND pk.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME
AND pk.ORDINAL_POSITION = fk.ORDINAL_POSITION
WHERE fk.TABLE_SCHEMA = @schemaName
${tableScope.clause}
ORDER BY fk.TABLE_NAME, fk.COLUMN_NAME
`,
{ schemaName },
{ schemaName, ...tableScope.params },
);
}
private async rowCounts(schemaName: string): Promise<Map<string, number>> {
private async rowCounts(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, number>> {
const tableScope = tableScopeSql(scopedNames, 't.name');
const rows = await this.queryRaw<{ table_name: string; row_count: unknown }>(
`
SELECT t.name AS table_name, SUM(p.rows) AS row_count
@ -611,9 +648,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
WHERE s.name = @schemaName
AND p.index_id IN (0, 1)
${tableScope.clause}
GROUP BY t.name
`,
{ schemaName },
{ schemaName, ...tableScope.params },
);
return new Map(rows.map((row) => [row.table_name, firstNumber(row.row_count) ?? 0]));
}

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js';
import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import {
KtxSqlServerScanConnector,
@ -18,7 +21,7 @@ export function createSqlServerLiveDatabaseIntrospection(
options: CreateSqlServerLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
return {
async extractSchema(connectionId: string) {
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxSqlServerConnectionConfig | undefined;
const connector = new KtxSqlServerScanConnector({
connectionId,
@ -29,7 +32,11 @@ export function createSqlServerLiveDatabaseIntrospection(
});
try {
return await connector.introspect(
{ connectionId, driver: 'sqlserver' },
{
connectionId,
driver: 'sqlserver',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `sqlserver-${connectionId}` },
);
} finally {