fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)

* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure

Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.

- Remove the free-text Snowflake schema prompt; only pass `schema` to
  snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
  user for a comma-separated list, persist it as `schema_names`, and use
  it as both the table-list filter and the multiselect default. Applies
  to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
  documented single-schema shorthand.

* fix(snowflake): keep introspecting when primary-key discovery is denied

The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.

Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.

* fix(scan): unblock relationship discovery on Snowflake

Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:

- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
  for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
  profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
  Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
  (Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
  `connector.sampleColumn` into bare locals, losing the `this` binding when
  the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
  Every sample call threw "Cannot read properties of undefined (reading
  'assertConnection')" and degraded LLM descriptions to metadata-only
  prompts. Call the methods through the connector instead.

Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.

* test(scan): cover table-ref helpers

* feat(scan): plumb tableScope through live-database introspection port

* feat(scan): apply tableScope during metadata fetch

* feat(scan): enforce table scope at fetch boundary

* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)

* feat(cli): add RSA key-pair auth option to Snowflake setup wizard

Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.

* feat(scan): pool Snowflake sessions

* fix(scan): reuse structural snapshots and cleanup connectors

* feat(scan): parallelize relationship profiling

* feat(scan): batch table description generation

* docs: document Snowflake ingest concurrency knobs

* fix(scan): close Snowflake ingest perf verification gaps

* fix(scan): keep batched description failure bounded

* feat(scan): dispatch query-history probes by connection driver

Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.

Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.

* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject

The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.

Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.

generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.

* chore(scripts): add ktx-reset.sh project-cleanup helper

Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
This commit is contained in:
Andrey Avtomonov 2026-05-23 10:41:30 +02:00 committed by GitHub
parent b0dd13ce7c
commit 394a985d2a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
72 changed files with 3508 additions and 655 deletions

View file

@ -157,6 +157,12 @@ connections:
dataset_ids: [analytics, mart] dataset_ids: [analytics, mart]
``` ```
For Snowflake connections, set `maxSessions` when deep ingest needs more or
fewer concurrent warehouse sessions. The default is `4`. This caps all
concurrent Snowflake SQL work for that connector instance, including schema
introspection, table sampling, relationship profiling, relationship
validation, and read-only SQL execution.
For Postgres, BigQuery, and Snowflake, `historicSql` and `context.queryHistory` For Postgres, BigQuery, and Snowflake, `historicSql` and `context.queryHistory`
toggle query-history ingest. The shape is connector-specific; the setup wizard toggle query-history ingest. The shape is connector-specific; the setup wizard
writes these fields when you pass `--enable-query-history`. writes these fields when you pass `--enable-query-history`.
@ -483,6 +489,7 @@ scan:
maxLlmTablesPerBatch: 40 maxLlmTablesPerBatch: 40
maxCandidatesPerColumn: 25 maxCandidatesPerColumn: 25
profileSampleRows: 10000 profileSampleRows: 10000
profileConcurrency: 4
validationConcurrency: 4 validationConcurrency: 4
validationBudget: all validationBudget: all
``` ```
@ -510,6 +517,7 @@ the manifest.
| `relationships.maxLlmTablesPerBatch` | `int > 0` | `40` | Max tables included in a single LLM relationship-proposal batch. | | `relationships.maxLlmTablesPerBatch` | `int > 0` | `40` | Max tables included in a single LLM relationship-proposal batch. |
| `relationships.maxCandidatesPerColumn` | `int > 0` | `25` | Max join partners considered per column. | | `relationships.maxCandidatesPerColumn` | `int > 0` | `25` | Max join partners considered per column. |
| `relationships.profileSampleRows` | `int > 0` | `10000` | Rows sampled per table when profiling values for relationship inference. | | `relationships.profileSampleRows` | `int > 0` | `10000` | Rows sampled per table when profiling values for relationship inference. |
| `relationships.profileConcurrency` | `int > 0` | `4` | Parallel relationship-profile queries against the database. For Snowflake, effective database concurrency is also bounded by the connection's `maxSessions`. |
| `relationships.validationConcurrency` | `int > 0` | `4` | Parallel relationship validation queries against the database. | | `relationships.validationConcurrency` | `int > 0` | `4` | Parallel relationship validation queries against the database. |
| `relationships.validationBudget` | `all` \| `int ≥ 0` | runtime default | Cap on validation queries per scan. `all` means unlimited. | | `relationships.validationBudget` | `all` \| `int ≥ 0` | runtime default | Cap on validation queries per scan. `all` means unlimited. |

View file

@ -129,20 +129,18 @@ connections:
account: xy12345 account: xy12345
warehouse: ANALYTICS_WH warehouse: ANALYTICS_WH
database: PROD database: PROD
schema_name: PUBLIC schema_names:
- PUBLIC
- SALES
- MARKETING
username: KTX_SERVICE username: KTX_SERVICE
password: env:SNOWFLAKE_PASSWORD password: env:SNOWFLAKE_PASSWORD
role: ANALYST role: ANALYST
``` ```
For multiple schemas: `ktx setup` discovers schemas after the connection is verified and writes the
selected list to `schema_names`. You can also set this field manually. For a
```yaml single schema, `schema_name: PUBLIC` is accepted as an equivalent shorthand.
schema_names:
- PUBLIC
- ANALYTICS
- STAGING
```
### Authentication ### Authentication

View file

@ -1,6 +1,7 @@
import { describe, expect, it, vi } from 'vitest'; import { describe, expect, it, vi } from 'vitest';
import { bigQueryConnectionConfigFromConfig, isKtxBigQueryConnectionConfig, type KtxBigQueryClient, KtxBigQueryScanConnector, type KtxBigQueryClientFactory, type KtxBigQueryDataset, type KtxBigQueryQueryJob, type KtxBigQueryTableRef } from '../../connectors/bigquery/connector.js'; import { bigQueryConnectionConfigFromConfig, isKtxBigQueryConnectionConfig, type KtxBigQueryClient, KtxBigQueryScanConnector, type KtxBigQueryClientFactory, type KtxBigQueryDataset, type KtxBigQueryQueryJob, type KtxBigQueryTableRef } from '../../connectors/bigquery/connector.js';
import { createBigQueryLiveDatabaseIntrospection } from '../../connectors/bigquery/live-database-introspection.js'; import { createBigQueryLiveDatabaseIntrospection } from '../../connectors/bigquery/live-database-introspection.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function fakeClientFactory(): KtxBigQueryClientFactory { function fakeClientFactory(): KtxBigQueryClientFactory {
const queryResults = vi.fn(async (): ReturnType<KtxBigQueryQueryJob['getQueryResults']> => [ const queryResults = vi.fn(async (): ReturnType<KtxBigQueryQueryJob['getQueryResults']> => [
@ -234,6 +235,59 @@ describe('KtxBigQueryScanConnector', () => {
await connector.cleanup(); await connector.cleanup();
}); });
it('limits introspection to tables in tableScope', async () => {
const ordersGet = vi.fn(async (): ReturnType<KtxBigQueryTableRef['get']> => [
{
metadata: {
type: 'TABLE',
numRows: '12',
schema: { fields: [{ name: 'id', type: 'INT64', mode: 'REQUIRED' }] },
},
},
]);
const skippedGet = vi.fn(async (): ReturnType<KtxBigQueryTableRef['get']> => [
{ metadata: { type: 'TABLE', numRows: '1', schema: { fields: [] } } },
]);
const clientFactory: KtxBigQueryClientFactory = {
createClient: vi.fn(() => ({
getDatasets: vi.fn(async (): ReturnType<KtxBigQueryClient['getDatasets']> => [[{ id: 'analytics' }]]),
dataset: vi.fn(
(): KtxBigQueryDataset => ({
get: vi.fn(async () => [{ id: 'analytics' }]),
getTables: vi.fn(async (): ReturnType<KtxBigQueryDataset['getTables']> => [
[
{ id: 'orders', get: ordersGet },
{ id: 'customers', get: skippedGet },
],
]),
}),
),
createQueryJob: vi.fn(async (): ReturnType<KtxBigQueryClient['createQueryJob']> => [
{
getQueryResults: async (): ReturnType<KtxBigQueryQueryJob['getQueryResults']> => [
[],
undefined,
{ schema: { fields: [{ name: 'table_name', type: 'STRING' }, { name: 'column_name', type: 'STRING' }] } },
],
},
]),
})),
};
const connector = new KtxBigQueryScanConnector({
connectionId: 'warehouse',
connection,
clientFactory,
});
const scope = tableRefSet([{ catalog: 'project-1', db: 'analytics', name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'bigquery', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
expect(ordersGet).toHaveBeenCalledTimes(1);
expect(skippedGet).not.toHaveBeenCalled();
});
it('constructs for discovery without dataset scope and lists tables through one region information schema query', async () => { it('constructs for discovery without dataset scope and lists tables through one region information schema query', async () => {
const createQueryJob = vi.fn( const createQueryJob = vi.fn(
async ( async (

View file

@ -2,6 +2,7 @@ import { BigQuery, type TableField } from '@google-cloud/bigquery';
import { normalizeBigQueryProjectId, normalizeBigQueryRegion } from '../../context/connections/bigquery-identifiers.js'; import { normalizeBigQueryProjectId, normalizeBigQueryRegion } from '../../context/connections/bigquery-identifiers.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js'; import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { readFileSync } from 'node:fs'; import { readFileSync } from 'node:fs';
import { homedir } from 'node:os'; import { homedir } from 'node:os';
import { resolve } from 'node:path'; import { resolve } from 'node:path';
@ -289,7 +290,10 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
const tables: KtxSchemaTable[] = []; const tables: KtxSchemaTable[] = [];
const datasetIds = this.requireDatasetIdsForScan(); const datasetIds = this.requireDatasetIdsForScan();
for (const datasetId of datasetIds) { for (const datasetId of datasetIds) {
tables.push(...(await this.introspectDataset(datasetId))); const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.resolved.projectId, db: datasetId })
: null;
tables.push(...(await this.introspectDataset(datasetId, scopedNames)));
} }
return { return {
connectionId: this.connectionId, connectionId: this.connectionId,
@ -362,7 +366,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
if (!datasetId) { if (!datasetId) {
return 0; return 0;
} }
const tables = await this.introspectDataset(datasetId); const tables = await this.introspectDataset(datasetId, null);
return tables.find((table) => table.name === tableName)?.estimatedRows ?? 0; return tables.find((table) => table.name === tableName)?.estimatedRows ?? 0;
} }
@ -463,12 +467,15 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
return firstNumber(rows[0]?.[header]); return firstNumber(rows[0]?.[header]);
} }
private async introspectDataset(datasetId: string): Promise<KtxSchemaTable[]> { private async introspectDataset(datasetId: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const dataset = this.getClient().dataset(datasetId); const dataset = this.getClient().dataset(datasetId);
const [tableRefs] = await dataset.getTables(); const [tableRefs] = await dataset.getTables();
const scopeSet = scopedNames ? new Set(scopedNames) : null;
const filteredTableRefs = scopeSet ? tableRefs.filter((tableRef) => scopeSet.has(tableRef.id ?? '')) : tableRefs;
const primaryKeys = await this.primaryKeys(datasetId); const primaryKeys = await this.primaryKeys(datasetId);
const tables: KtxSchemaTable[] = []; const tables: KtxSchemaTable[] = [];
for (const tableRef of tableRefs) { for (const tableRef of filteredTableRefs) {
const tableName = tableRef.id || ''; const tableName = tableRef.id || '';
const [table] = await tableRef.get(); const [table] = await tableRef.get();
const fields = table.metadata.schema?.fields ?? []; const fields = table.metadata.schema?.fields ?? [];

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js'; import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js'; import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import { import {
KtxBigQueryScanConnector, KtxBigQueryScanConnector,
@ -16,7 +19,7 @@ export function createBigQueryLiveDatabaseIntrospection(
options: CreateBigQueryLiveDatabaseIntrospectionOptions, options: CreateBigQueryLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort { ): LiveDatabaseIntrospectionPort {
return { return {
async extractSchema(connectionId: string) { async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxBigQueryConnectionConfig | undefined; const connection = options.connections[connectionId] as KtxBigQueryConnectionConfig | undefined;
const connector = new KtxBigQueryScanConnector({ const connector = new KtxBigQueryScanConnector({
connectionId, connectionId,
@ -25,7 +28,14 @@ export function createBigQueryLiveDatabaseIntrospection(
now: options.now, now: options.now,
}); });
try { try {
return await connector.introspect({ connectionId, driver: 'bigquery' }, { runId: `bigquery-${connectionId}` }); return await connector.introspect(
{
connectionId,
driver: 'bigquery',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `bigquery-${connectionId}` },
);
} finally { } finally {
await connector.cleanup(); await connector.cleanup();
} }

View file

@ -1,6 +1,7 @@
import { describe, expect, it, vi } from 'vitest'; import { describe, expect, it, vi } from 'vitest';
import { clickHouseClientConfigFromConfig, isKtxClickHouseConnectionConfig, KtxClickHouseScanConnector, type KtxClickHouseClientFactory } from '../../connectors/clickhouse/connector.js'; import { clickHouseClientConfigFromConfig, isKtxClickHouseConnectionConfig, KtxClickHouseScanConnector, type KtxClickHouseClientFactory } from '../../connectors/clickhouse/connector.js';
import { createClickHouseLiveDatabaseIntrospection } from '../../connectors/clickhouse/live-database-introspection.js'; import { createClickHouseLiveDatabaseIntrospection } from '../../connectors/clickhouse/live-database-introspection.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function result<T>(payload: T) { function result<T>(payload: T) {
return { return {
@ -238,6 +239,57 @@ describe('KtxClickHouseScanConnector', () => {
]); ]);
}); });
it('limits introspection to tables in tableScope', async () => {
const queries: Array<{ query: string; query_params?: Record<string, unknown> }> = [];
const clientFactory: KtxClickHouseClientFactory = {
createClient: vi.fn(() => ({
query: vi.fn(async (input: { query: string; format: string; query_params?: Record<string, unknown> }) => {
queries.push({ query: input.query, query_params: input.query_params });
if (input.query.includes('FROM system.tables')) {
return result([{ database: 'analytics', name: 'events', engine: 'MergeTree', comment: '' }]);
}
if (input.query.includes('FROM system.columns')) {
return result([
{
database: 'analytics',
table: 'events',
name: 'id',
type: 'UInt64',
comment: '',
is_in_primary_key: 1,
},
]);
}
if (input.query.includes('FROM system.parts')) {
return result([{ database: 'analytics', table: 'events', row_count: '2' }]);
}
throw new Error(`Unexpected SQL: ${input.query}`);
}),
close: vi.fn(async () => undefined),
})),
};
const connector = new KtxClickHouseScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'clickhouse',
host: 'ch.example.test',
database: 'analytics',
username: 'reader',
password: 'test-pass', // pragma: allowlist secret
},
clientFactory,
});
const scope = tableRefSet([{ catalog: null, db: 'analytics', name: 'events' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'clickhouse', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['events']);
const tablesQuery = queries.find((query) => query.query.includes('FROM system.tables'));
expect(tablesQuery?.query).toContain('AND name IN {table_names:Array(String)}');
expect(tablesQuery?.query_params).toEqual({ databases: ['analytics'], table_names: ['events'] });
});
it('runs samples, distinct values, read-only SQL, row count, schema list, and cleanup', async () => { it('runs samples, distinct values, read-only SQL, row count, schema list, and cleanup', async () => {
const clientFactory = fakeClientFactory(); const clientFactory = fakeClientFactory();
const connector = new KtxClickHouseScanConnector({ const connector = new KtxClickHouseScanConnector({

View file

@ -1,6 +1,7 @@
import { createClient } from '@clickhouse/client'; import { createClient } from '@clickhouse/client';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js'; import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { readFileSync } from 'node:fs'; import { readFileSync } from 'node:fs';
import { Agent as HttpsAgent } from 'node:https'; import { Agent as HttpsAgent } from 'node:https';
import { homedir } from 'node:os'; import { homedir } from 'node:os';
@ -285,24 +286,42 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> { async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
this.assertConnection(input.connectionId); this.assertConnection(input.connectionId);
const databases = configuredClickHouseDatabases(this.connection, this.clientConfig.database); const databases = configuredClickHouseDatabases(this.connection, this.clientConfig.database);
let allScopedTables: string[] | null = null;
if (input.tableScope) {
allScopedTables = [];
for (const database of databases) {
allScopedTables.push(...scopedTableNames(input.tableScope, { catalog: null, db: database }));
}
if (allScopedTables.length === 0) {
return this.emptySnapshot(databases);
}
}
const queryParams: Record<string, unknown> = { databases };
const tableNameClause = allScopedTables ? 'AND name IN {table_names:Array(String)}' : '';
const columnTableNameClause = allScopedTables ? 'AND table IN {table_names:Array(String)}' : '';
if (allScopedTables) {
queryParams.table_names = allScopedTables;
}
const tables = await this.queryEachRow<ClickHouseTableRow>( const tables = await this.queryEachRow<ClickHouseTableRow>(
` `
SELECT database, name, engine, comment SELECT database, name, engine, comment
FROM system.tables FROM system.tables
WHERE database IN {databases:Array(String)} WHERE database IN {databases:Array(String)}
AND engine NOT IN ('Dictionary') AND engine NOT IN ('Dictionary')
${tableNameClause}
ORDER BY database, name ORDER BY database, name
`, `,
{ databases }, queryParams,
); );
const columns = await this.queryEachRow<ClickHouseColumnRow>( const columns = await this.queryEachRow<ClickHouseColumnRow>(
` `
SELECT database, table, name, type, comment, is_in_primary_key SELECT database, table, name, type, comment, is_in_primary_key
FROM system.columns FROM system.columns
WHERE database IN {databases:Array(String)} WHERE database IN {databases:Array(String)}
${columnTableNameClause}
ORDER BY database, table, position ORDER BY database, table, position
`, `,
{ databases }, queryParams,
); );
const rowCounts = await this.queryEachRow<ClickHouseRowCountRow>( const rowCounts = await this.queryEachRow<ClickHouseRowCountRow>(
` `
@ -310,9 +329,10 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
FROM system.parts FROM system.parts
WHERE database IN {databases:Array(String)} WHERE database IN {databases:Array(String)}
AND active = 1 AND active = 1
${columnTableNameClause}
GROUP BY database, table GROUP BY database, table
`, `,
{ databases }, queryParams,
); );
const columnsByTable = new Map<string, ClickHouseColumnRow[]>(); const columnsByTable = new Map<string, ClickHouseColumnRow[]>();
for (const column of columns) { for (const column of columns) {
@ -347,6 +367,23 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
}; };
} }
private emptySnapshot(databases: string[]): KtxSchemaSnapshot {
return {
connectionId: this.connectionId,
driver: 'clickhouse',
extractedAt: this.now().toISOString(),
scope: { schemas: databases },
metadata: {
database: this.clientConfig.database,
databases,
host: this.clientConfig.host,
table_count: 0,
total_columns: 0,
},
tables: [],
};
}
async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise<KtxTableSampleResult> { async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise<KtxTableSampleResult> {
this.assertConnection(input.connectionId); this.assertConnection(input.connectionId);
const result = await this.query( const result = await this.query(

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js'; import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js'; import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import { import {
KtxClickHouseScanConnector, KtxClickHouseScanConnector,
@ -18,7 +21,7 @@ export function createClickHouseLiveDatabaseIntrospection(
options: CreateClickHouseLiveDatabaseIntrospectionOptions, options: CreateClickHouseLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort { ): LiveDatabaseIntrospectionPort {
return { return {
async extractSchema(connectionId: string) { async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxClickHouseConnectionConfig | undefined; const connection = options.connections[connectionId] as KtxClickHouseConnectionConfig | undefined;
const connector = new KtxClickHouseScanConnector({ const connector = new KtxClickHouseScanConnector({
connectionId, connectionId,
@ -29,7 +32,11 @@ export function createClickHouseLiveDatabaseIntrospection(
}); });
try { try {
return await connector.introspect( return await connector.introspect(
{ connectionId, driver: 'clickhouse' }, {
connectionId,
driver: 'clickhouse',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `clickhouse-${connectionId}` }, { runId: `clickhouse-${connectionId}` },
); );
} finally { } finally {

View file

@ -2,6 +2,7 @@ import { describe, expect, it, vi } from 'vitest';
import type { FieldPacket, RowDataPacket } from 'mysql2/promise'; import type { FieldPacket, RowDataPacket } from 'mysql2/promise';
import { createMysqlLiveDatabaseIntrospection } from '../../connectors/mysql/live-database-introspection.js'; import { createMysqlLiveDatabaseIntrospection } from '../../connectors/mysql/live-database-introspection.js';
import { isKtxMysqlConnectionConfig, KtxMysqlScanConnector, mysqlConnectionPoolConfigFromConfig, type KtxMysqlPoolFactory } from '../../connectors/mysql/connector.js'; import { isKtxMysqlConnectionConfig, KtxMysqlScanConnector, mysqlConnectionPoolConfigFromConfig, type KtxMysqlPoolFactory } from '../../connectors/mysql/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function mysqlResult(rows: Record<string, unknown>[], fields: Array<{ name: string; type?: number }>): [RowDataPacket[], FieldPacket[]] { function mysqlResult(rows: Record<string, unknown>[], fields: Array<{ name: string; type?: number }>): [RowDataPacket[], FieldPacket[]] {
return [rows as RowDataPacket[], fields as FieldPacket[]]; return [rows as RowDataPacket[], fields as FieldPacket[]];
@ -275,6 +276,71 @@ describe('KtxMysqlScanConnector', () => {
]); ]);
}); });
it('limits introspection to tables in tableScope', async () => {
const queries: Array<{ sql: string; params?: unknown }> = [];
const poolFactory: KtxMysqlPoolFactory = {
createPool: vi.fn(() => ({
getConnection: vi.fn(async () => ({
query: vi.fn(async (sql: string, params?: unknown): Promise<[RowDataPacket[], FieldPacket[]]> => {
queries.push({ sql, params });
if (sql.includes('INFORMATION_SCHEMA.TABLES')) {
return mysqlResult(
[
{
TABLE_SCHEMA: 'analytics',
TABLE_NAME: 'orders',
TABLE_TYPE: 'BASE TABLE',
TABLE_COMMENT: '',
TABLE_ROWS: 2,
},
],
[],
);
}
if (sql.includes('INFORMATION_SCHEMA.COLUMNS')) {
return mysqlResult(
[
{
TABLE_SCHEMA: 'analytics',
TABLE_NAME: 'orders',
COLUMN_NAME: 'id',
DATA_TYPE: 'int',
IS_NULLABLE: 'NO',
COLUMN_COMMENT: '',
},
],
[],
);
}
return mysqlResult([], []);
}),
release: vi.fn(),
})),
end: vi.fn(async () => undefined),
})),
};
const connector = new KtxMysqlScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'mysql',
host: 'db.example.test',
database: 'analytics',
username: 'reader',
password: 'secret', // pragma: allowlist secret
},
poolFactory,
});
const scope = tableRefSet([{ catalog: null, db: 'analytics', name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'mysql', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
const tablesQuery = queries.find((query) => query.sql.includes('INFORMATION_SCHEMA.TABLES'));
expect(tablesQuery?.sql).toMatch(/TABLE_NAME IN \(\?\)/);
expect(tablesQuery?.params).toEqual(['analytics', 'orders']);
});
it('runs samples, distinct values, read-only SQL, row count, schema list, and cleanup', async () => { it('runs samples, distinct values, read-only SQL, row count, schema list, and cleanup', async () => {
const poolFactory = fakePoolFactory(); const poolFactory = fakePoolFactory();
const connector = new KtxMysqlScanConnector({ const connector = new KtxMysqlScanConnector({

View file

@ -4,6 +4,7 @@ import { homedir } from 'node:os';
import { resolve } from 'node:path'; import { resolve } from 'node:path';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxTableListEntry, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js'; import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxTableListEntry, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { KtxMysqlDialect } from './dialect.js'; import { KtxMysqlDialect } from './dialect.js';
export interface KtxMysqlConnectionConfig { export interface KtxMysqlConnectionConfig {
@ -335,23 +336,37 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
this.assertConnection(input.connectionId); this.assertConnection(input.connectionId);
const databases = configuredMysqlSchemas(this.connection, this.poolConfig.database); const databases = configuredMysqlSchemas(this.connection, this.poolConfig.database);
const placeholders = databases.map(() => '?').join(', '); const placeholders = databases.map(() => '?').join(', ');
let allScopedTables: string[] | null = null;
if (input.tableScope) {
allScopedTables = [];
for (const database of databases) {
allScopedTables.push(...scopedTableNames(input.tableScope, { catalog: null, db: database }));
}
if (allScopedTables.length === 0) {
return this.emptySnapshot(databases);
}
}
const tableNameClause = allScopedTables
? `AND TABLE_NAME IN (${allScopedTables.map(() => '?').join(', ')})`
: '';
const tableNameParams = allScopedTables ?? [];
const tables = await this.queryRaw<MysqlTableRow>( const tables = await this.queryRaw<MysqlTableRow>(
` `
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, TABLE_COMMENT, TABLE_ROWS SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, TABLE_COMMENT, TABLE_ROWS
FROM INFORMATION_SCHEMA.TABLES FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA IN (${placeholders}) AND TABLE_TYPE IN ('BASE TABLE', 'VIEW') WHERE TABLE_SCHEMA IN (${placeholders}) AND TABLE_TYPE IN ('BASE TABLE', 'VIEW') ${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME ORDER BY TABLE_SCHEMA, TABLE_NAME
`, `,
databases, [...databases, ...tableNameParams],
); );
const columns = await this.queryRaw<MysqlColumnRow>( const columns = await this.queryRaw<MysqlColumnRow>(
` `
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COLUMN_COMMENT SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COLUMN_COMMENT
FROM INFORMATION_SCHEMA.COLUMNS FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA IN (${placeholders}) WHERE TABLE_SCHEMA IN (${placeholders}) ${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION
`, `,
databases, [...databases, ...tableNameParams],
); );
const primaryKeys = await this.queryRaw<MysqlPrimaryKeyRow>( const primaryKeys = await this.queryRaw<MysqlPrimaryKeyRow>(
` `
@ -359,9 +374,10 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA IN (${placeholders}) WHERE TABLE_SCHEMA IN (${placeholders})
AND CONSTRAINT_NAME = 'PRIMARY' AND CONSTRAINT_NAME = 'PRIMARY'
${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION ORDER BY TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION
`, `,
databases, [...databases, ...tableNameParams],
); );
const foreignKeys = await this.queryRaw<MysqlForeignKeyRow>( const foreignKeys = await this.queryRaw<MysqlForeignKeyRow>(
` `
@ -369,9 +385,10 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA IN (${placeholders}) WHERE TABLE_SCHEMA IN (${placeholders})
AND REFERENCED_TABLE_NAME IS NOT NULL AND REFERENCED_TABLE_NAME IS NOT NULL
${tableNameClause}
ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
`, `,
databases, [...databases, ...tableNameParams],
); );
const columnsByTable = groupByTable(columns, this.poolConfig.database); const columnsByTable = groupByTable(columns, this.poolConfig.database);
@ -403,6 +420,23 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
}; };
} }
private emptySnapshot(databases: string[]): KtxSchemaSnapshot {
return {
connectionId: this.connectionId,
driver: 'mysql',
extractedAt: this.now().toISOString(),
scope: { schemas: databases },
metadata: {
database: this.poolConfig.database,
schemas: databases,
host: this.poolConfig.host,
table_count: 0,
total_columns: 0,
},
tables: [],
};
}
async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise<KtxTableSampleResult> { async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise<KtxTableSampleResult> {
this.assertConnection(input.connectionId); this.assertConnection(input.connectionId);
const result = await this.query(this.dialect.generateSampleQuery(this.qTableName(input.table), input.limit, input.columns)); const result = await this.query(this.dialect.generateSampleQuery(this.qTableName(input.table), input.limit, input.columns));

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js'; import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js'; import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import { import {
KtxMysqlScanConnector, KtxMysqlScanConnector,
@ -18,7 +21,7 @@ export function createMysqlLiveDatabaseIntrospection(
options: CreateMysqlLiveDatabaseIntrospectionOptions, options: CreateMysqlLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort { ): LiveDatabaseIntrospectionPort {
return { return {
async extractSchema(connectionId: string) { async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxMysqlConnectionConfig | undefined; const connection = options.connections[connectionId] as KtxMysqlConnectionConfig | undefined;
const connector = new KtxMysqlScanConnector({ const connector = new KtxMysqlScanConnector({
connectionId, connectionId,
@ -28,7 +31,14 @@ export function createMysqlLiveDatabaseIntrospection(
now: options.now, now: options.now,
}); });
try { try {
return await connector.introspect({ connectionId, driver: 'mysql' }, { runId: `mysql-${connectionId}` }); return await connector.introspect(
{
connectionId,
driver: 'mysql',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `mysql-${connectionId}` },
);
} finally { } finally {
await connector.cleanup(); await connector.cleanup();
} }

View file

@ -1,6 +1,7 @@
import { describe, expect, it, vi } from 'vitest'; import { describe, expect, it, vi } from 'vitest';
import { createPostgresLiveDatabaseIntrospection } from '../../connectors/postgres/live-database-introspection.js'; import { createPostgresLiveDatabaseIntrospection } from '../../connectors/postgres/live-database-introspection.js';
import { isKtxPostgresConnectionConfig, KtxPostgresScanConnector, postgresPoolConfigFromConfig, type KtxPostgresPoolFactory } from '../../connectors/postgres/connector.js'; import { isKtxPostgresConnectionConfig, KtxPostgresScanConnector, postgresPoolConfigFromConfig, type KtxPostgresPoolFactory } from '../../connectors/postgres/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
interface FakeQueryResult { interface FakeQueryResult {
rows: Record<string, unknown>[]; rows: Record<string, unknown>[];
@ -259,6 +260,63 @@ describe('KtxPostgresScanConnector', () => {
).rejects.toThrow('Only read-only SELECT/WITH queries can be executed locally'); ).rejects.toThrow('Only read-only SELECT/WITH queries can be executed locally');
}); });
it('limits introspection to tables in tableScope', async () => {
const queries: Array<{ sql: string; params?: unknown[] }> = [];
const poolFactory: KtxPostgresPoolFactory = {
createPool() {
return {
async connect() {
return {
query: vi.fn(async (sql: string, params?: unknown[]) => {
queries.push({ sql, params });
if (sql.includes('FROM pg_catalog.pg_class c')) {
return { rows: [{ table_name: 'orders', table_kind: 'r', row_count: '3', table_comment: null }] };
}
if (sql.includes('FROM pg_catalog.pg_attribute a')) {
return {
rows: [
{
table_name: 'orders',
column_name: 'id',
data_type: 'integer',
is_nullable: false,
column_comment: null,
},
],
};
}
return { rows: [] };
}),
release: vi.fn(),
};
},
end: vi.fn(async () => undefined),
};
},
};
const connector = new KtxPostgresScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'postgres',
host: 'db.example.test',
database: 'analytics',
username: 'reader',
password: 'test-password', // pragma: allowlist secret
schema: 'public',
},
poolFactory,
});
const scope = tableRefSet([{ catalog: null, db: 'public', name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'postgres', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
const tablesQuery = queries.find((query) => query.sql.includes('FROM pg_catalog.pg_class c'));
expect(tablesQuery?.sql).toMatch(/c\.relname = ANY\(\$2\)/);
expect(tablesQuery?.params).toEqual(['public', ['orders']]);
});
it('adapts native PostgreSQL snapshots to live-database introspection for local ingest', async () => { it('adapts native PostgreSQL snapshots to live-database introspection for local ingest', async () => {
const introspection = createPostgresLiveDatabaseIntrospection({ const introspection = createPostgresLiveDatabaseIntrospection({
connections: { connections: {

View file

@ -3,6 +3,7 @@ import { homedir } from 'node:os';
import { resolve } from 'node:path'; import { resolve } from 'node:path';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js'; import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { Pool } from 'pg'; import { Pool } from 'pg';
import { KtxPostgresDialect } from './dialect.js'; import { KtxPostgresDialect } from './dialect.js';
@ -379,7 +380,9 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
const schemas = schemasFromConnection(this.connection); const schemas = schemasFromConnection(this.connection);
const allTables: KtxSchemaTable[] = []; const allTables: KtxSchemaTable[] = [];
for (const schema of schemas) { for (const schema of schemas) {
const tables = await this.loadSchemaTables(schema); const scopedNames = input.tableScope ? scopedTableNames(input.tableScope, { catalog: null, db: schema }) : null;
if (scopedNames && scopedNames.length === 0) continue;
const tables = await this.loadSchemaTables(schema, scopedNames);
allTables.push(...tables); allTables.push(...tables);
} }
return { return {
@ -543,7 +546,11 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
} }
} }
private async loadSchemaTables(schema: string): Promise<KtxSchemaTable[]> { private async loadSchemaTables(schema: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const pgCatalogScopeClause = scopedNames ? 'AND c.relname = ANY($2)' : '';
const tableConstraintScopeClause = scopedNames ? 'AND tc.table_name = ANY($2)' : '';
const scopeValues = scopedNames ? [scopedNames] : [];
const tables = await this.queryRaw<PostgresTableRow>( const tables = await this.queryRaw<PostgresTableRow>(
` `
SELECT SELECT
@ -557,9 +564,10 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
ON d.objoid = c.oid AND d.objsubid = 0 ON d.objoid = c.oid AND d.objsubid = 0
WHERE n.nspname = $1 WHERE n.nspname = $1
AND c.relkind IN ('r', 'v') AND c.relkind IN ('r', 'v')
${pgCatalogScopeClause}
ORDER BY c.relname ORDER BY c.relname
`, `,
[schema], [schema, ...scopeValues],
); );
const columns = await this.queryRaw<PostgresColumnRow>( const columns = await this.queryRaw<PostgresColumnRow>(
` `
@ -578,9 +586,10 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
AND c.relkind IN ('r', 'v') AND c.relkind IN ('r', 'v')
AND a.attnum > 0 AND a.attnum > 0
AND NOT a.attisdropped AND NOT a.attisdropped
${pgCatalogScopeClause}
ORDER BY c.relname, a.attnum ORDER BY c.relname, a.attnum
`, `,
[schema], [schema, ...scopeValues],
); );
const primaryKeys = await this.queryRaw<PostgresPrimaryKeyRow>( const primaryKeys = await this.queryRaw<PostgresPrimaryKeyRow>(
` `
@ -591,9 +600,10 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
AND tc.table_schema = kcu.table_schema AND tc.table_schema = kcu.table_schema
WHERE tc.constraint_type = 'PRIMARY KEY' WHERE tc.constraint_type = 'PRIMARY KEY'
AND tc.table_schema = $1 AND tc.table_schema = $1
${tableConstraintScopeClause}
ORDER BY tc.table_name, kcu.ordinal_position ORDER BY tc.table_name, kcu.ordinal_position
`, `,
[schema], [schema, ...scopeValues],
); );
const foreignKeys = await this.queryRaw<PostgresForeignKeyRow>( const foreignKeys = await this.queryRaw<PostgresForeignKeyRow>(
` `
@ -613,9 +623,10 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
AND ccu.table_schema = tc.table_schema AND ccu.table_schema = tc.table_schema
WHERE tc.constraint_type = 'FOREIGN KEY' WHERE tc.constraint_type = 'FOREIGN KEY'
AND tc.table_schema = $1 AND tc.table_schema = $1
${tableConstraintScopeClause}
ORDER BY tc.table_name, kcu.column_name ORDER BY tc.table_name, kcu.column_name
`, `,
[schema], [schema, ...scopeValues],
); );
const columnsByTable = groupByTable(columns); const columnsByTable = groupByTable(columns);

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js'; import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js'; import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import { import {
KtxPostgresScanConnector, KtxPostgresScanConnector,
@ -18,7 +21,7 @@ export function createPostgresLiveDatabaseIntrospection(
options: CreatePostgresLiveDatabaseIntrospectionOptions, options: CreatePostgresLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort { ): LiveDatabaseIntrospectionPort {
return { return {
async extractSchema(connectionId: string) { async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxPostgresConnectionConfig | undefined; const connection = options.connections[connectionId] as KtxPostgresConnectionConfig | undefined;
const connector = new KtxPostgresScanConnector({ const connector = new KtxPostgresScanConnector({
connectionId, connectionId,
@ -28,7 +31,14 @@ export function createPostgresLiveDatabaseIntrospection(
now: options.now, now: options.now,
}); });
try { try {
return await connector.introspect({ connectionId, driver: 'postgres' }, { runId: `postgres-${connectionId}` }); return await connector.introspect(
{
connectionId,
driver: 'postgres',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `postgres-${connectionId}` },
);
} finally { } finally {
await connector.cleanup(); await connector.cleanup();
} }

View file

@ -1,6 +1,15 @@
import { describe, expect, it, vi } from 'vitest'; import { describe, expect, it, vi } from 'vitest';
const createPool = vi.hoisted(() => vi.fn());
vi.mock('snowflake-sdk', () => ({
default: { createPool },
createPool,
}));
import { createSnowflakeLiveDatabaseIntrospection } from '../../connectors/snowflake/live-database-introspection.js'; import { createSnowflakeLiveDatabaseIntrospection } from '../../connectors/snowflake/live-database-introspection.js';
import { isKtxSnowflakeConnectionConfig, KtxSnowflakeScanConnector, snowflakeConnectionConfigFromConfig, type KtxSnowflakeDriver, type KtxSnowflakeDriverFactory } from '../../connectors/snowflake/connector.js'; import { isKtxSnowflakeConnectionConfig, KtxSnowflakeScanConnector, snowflakeConnectionConfigFromConfig, type KtxSnowflakeDriver, type KtxSnowflakeDriverFactory } from '../../connectors/snowflake/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function fakeDriverFactory(): KtxSnowflakeDriverFactory { function fakeDriverFactory(): KtxSnowflakeDriverFactory {
const driver: KtxSnowflakeDriver = { const driver: KtxSnowflakeDriver = {
@ -63,6 +72,38 @@ function fakeDriverFactory(): KtxSnowflakeDriverFactory {
return { createDriver: vi.fn(() => driver) }; return { createDriver: vi.fn(() => driver) };
} }
function fakeSnowflakeStatement(headers: string[] = ['ONE']) {
return {
getColumns: () => headers.map((header) => ({ getName: () => header, getType: () => 'TEXT' })),
};
}
function installSnowflakePoolMock() {
const executedSql: string[] = [];
const connection = {
execute: vi.fn(
(input: {
sqlText: string;
complete: (
error: Error | null,
statement: ReturnType<typeof fakeSnowflakeStatement>,
rows: Array<Record<string, unknown>>,
) => void;
}) => {
executedSql.push(input.sqlText);
input.complete(null, fakeSnowflakeStatement(), [{ ONE: 1 }]);
},
),
};
const pool = {
use: vi.fn(async (fn: (conn: typeof connection) => Promise<unknown>) => fn(connection)),
drain: vi.fn(async () => undefined),
clear: vi.fn(async () => undefined),
};
createPool.mockReturnValue(pool);
return { connection, pool, executedSql };
}
describe('KtxSnowflakeScanConnector', () => { describe('KtxSnowflakeScanConnector', () => {
it('resolves Snowflake connection configuration safely', () => { it('resolves Snowflake connection configuration safely', () => {
expect( expect(
@ -99,6 +140,99 @@ describe('KtxSnowflakeScanConnector', () => {
}); });
}); });
it('defaults and validates Snowflake maxSessions', () => {
const baseConnection = {
driver: 'snowflake',
authMethod: 'password',
account: 'acct',
warehouse: 'WH',
database: 'ANALYTICS',
schema_name: 'PUBLIC',
username: 'reader',
password: 'fixture-pass', // pragma: allowlist secret
} as const;
expect(
snowflakeConnectionConfigFromConfig({
connectionId: 'warehouse',
connection: baseConnection,
}),
).toMatchObject({ maxSessions: 4 });
expect(
snowflakeConnectionConfigFromConfig({
connectionId: 'warehouse',
connection: { ...baseConnection, maxSessions: 8 },
}),
).toMatchObject({ maxSessions: 8 });
for (const maxSessions of [0, -1, 1.5, Number.NaN]) {
expect(() =>
snowflakeConnectionConfigFromConfig({
connectionId: 'warehouse',
connection: { ...baseConnection, maxSessions },
}),
).toThrow('connections.warehouse.maxSessions must be a positive integer');
}
});
it('uses one lazy Snowflake pool and drains it during cleanup', async () => {
const { pool, executedSql } = installSnowflakePoolMock();
const close = vi.fn(async () => undefined);
const connector = new KtxSnowflakeScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'snowflake',
authMethod: 'password',
account: 'acct',
warehouse: 'WH',
database: 'ANALYTICS',
schema_name: 'PUBLIC',
username: 'reader',
password: 'fixture-pass', // pragma: allowlist secret
role: 'ANALYST',
maxSessions: 3,
},
sdkOptionsProvider: {
resolve: vi.fn(async () => ({ sdkOptions: { application: 'ktx-test' }, close })),
},
});
expect(createPool).not.toHaveBeenCalled();
await connector.executeReadOnly({ connectionId: 'warehouse', sql: 'select 1', maxRows: 1 }, { runId: 'run-1' });
await connector.executeReadOnly({ connectionId: 'warehouse', sql: 'select 1', maxRows: 1 }, { runId: 'run-1' });
expect(createPool).toHaveBeenCalledTimes(1);
expect(createPool).toHaveBeenCalledWith(
expect.objectContaining({
account: 'acct',
username: 'reader',
warehouse: 'WH',
database: 'ANALYTICS',
schema: 'PUBLIC',
role: 'ANALYST',
password: 'fixture-pass', // pragma: allowlist secret
clientSessionKeepAlive: true,
clientSessionKeepAliveHeartbeatFrequency: 900,
application: 'ktx-test',
}),
expect.objectContaining({
min: 0,
max: 3,
evictionRunIntervalMillis: 30_000,
acquireTimeoutMillis: 60_000,
}),
);
expect(pool.use).toHaveBeenCalledTimes(2);
expect(executedSql.some((sql) => /^USE\s+/i.test(sql.trim()))).toBe(false);
await connector.cleanup();
expect(pool.drain).toHaveBeenCalledBefore(pool.clear);
expect(pool.clear).toHaveBeenCalledTimes(1);
expect(close).toHaveBeenCalledTimes(1);
});
it('introspects schema, primary keys, comments, row counts, and dimensions', async () => { it('introspects schema, primary keys, comments, row counts, and dimensions', async () => {
const connector = new KtxSnowflakeScanConnector({ const connector = new KtxSnowflakeScanConnector({
connectionId: 'warehouse', connectionId: 'warehouse',
@ -157,6 +291,108 @@ describe('KtxSnowflakeScanConnector', () => {
]); ]);
}); });
it('continues introspection when primary-key discovery is not authorized', async () => {
const driverFactory = fakeDriverFactory();
const driver = (driverFactory.createDriver as ReturnType<typeof vi.fn>).getMockImplementation() as
| (() => KtxSnowflakeDriver)
| undefined;
if (!driver) throw new Error('driver mock missing');
const built = driver();
(built.query as ReturnType<typeof vi.fn>).mockImplementation(async (sql: string) => {
if (sql.includes('TABLE_CONSTRAINTS')) {
throw new Error(
"SQL compilation error: Object 'ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE' does not exist or not authorized.",
);
}
throw new Error(`Unexpected SQL: ${sql}`);
});
(driverFactory.createDriver as ReturnType<typeof vi.fn>).mockReturnValue(built);
const warn = vi.spyOn(console, 'warn').mockImplementation(() => undefined);
try {
const connector = new KtxSnowflakeScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'snowflake',
authMethod: 'password',
account: 'acct',
warehouse: 'WH',
database: 'ANALYTICS',
schema_name: 'PUBLIC',
username: 'reader',
password: 'fixture-pass', // pragma: allowlist secret
},
driverFactory,
});
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'snowflake' },
{ runId: 'scan-run-pk-skip' },
);
expect(snapshot.tables.map((table) => table.name).sort()).toEqual(['ORDERS', 'ORDER_SUMMARY']);
expect(snapshot.tables.every((table) => table.columns.every((column) => column.primaryKey === false))).toBe(true);
expect(warn).not.toHaveBeenCalled();
} finally {
warn.mockRestore();
}
});
it('limits introspection to tables in tableScope', async () => {
const queries: Array<{ sql: string; params?: unknown }> = [];
const getSchemaMetadata = vi.fn(async (_schemaName?: string, scopedNames?: readonly string[] | null) =>
scopedNames?.includes('ORDERS')
? [
{
name: 'ORDERS',
catalog: 'ANALYTICS',
db: 'MARTS',
rowCount: 10,
comment: null,
columns: [{ name: 'ID', type: 'NUMBER', nullable: false, comment: null }],
},
]
: [],
);
const driverFactory: KtxSnowflakeDriverFactory = {
createDriver: vi.fn(() => ({
test: vi.fn(async () => ({ success: true })),
query: vi.fn(async (sql: string, params?: unknown) => {
queries.push({ sql, params });
return { headers: [], rows: [], totalRows: 0, rowCount: 0 };
}),
getSchemaMetadata,
listSchemas: vi.fn(async () => []),
listTables: vi.fn(async () => []),
cleanup: vi.fn(async () => undefined),
})),
};
const connector = new KtxSnowflakeScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'snowflake',
authMethod: 'password',
account: 'acct',
warehouse: 'WH',
database: 'ANALYTICS',
schema_name: 'MARTS',
username: 'reader',
password: 'fixture-pass', // pragma: allowlist secret
},
driverFactory,
});
const scope = tableRefSet([{ catalog: 'ANALYTICS', db: 'MARTS', name: 'ORDERS' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'snowflake', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['ORDERS']);
expect(getSchemaMetadata).toHaveBeenCalledWith('MARTS', ['ORDERS']);
const primaryKeysQuery = queries.find((query) => query.sql.includes('TABLE_CONSTRAINTS'));
expect(primaryKeysQuery?.sql).toMatch(/AND tc\.TABLE_NAME IN \(\?\)/);
expect(primaryKeysQuery?.params).toEqual(['MARTS', 'ANALYTICS', 'ORDERS']);
});
it('supports read-only query, sampling, distinct values, row counts, schema listing, and cleanup', async () => { it('supports read-only query, sampling, distinct values, row counts, schema listing, and cleanup', async () => {
const driverFactory = fakeDriverFactory(); const driverFactory = fakeDriverFactory();
const connector = new KtxSnowflakeScanConnector({ const connector = new KtxSnowflakeScanConnector({

View file

@ -4,9 +4,12 @@ import { homedir } from 'node:os';
import { resolve } from 'node:path'; import { resolve } from 'node:path';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js'; import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import * as snowflake from 'snowflake-sdk'; import { scopedTableNames } from '../../context/scan/table-ref.js';
import snowflake from 'snowflake-sdk';
import type { Bind, Binds, Connection, ConnectionOptions } from 'snowflake-sdk';
import { KtxSnowflakeDialect } from './dialect.js'; import { KtxSnowflakeDialect } from './dialect.js';
import { assertSafeSnowflakeIdentifier, quoteSnowflakeIdentifier } from './identifiers.js'; import { assertSafeSnowflakeIdentifier, quoteSnowflakeIdentifier } from './identifiers.js';
import { configureSnowflakeSdkLogger } from './sdk-logger.js';
export interface KtxSnowflakeConnectionConfig { export interface KtxSnowflakeConnectionConfig {
driver?: string; driver?: string;
@ -21,6 +24,7 @@ export interface KtxSnowflakeConnectionConfig {
privateKey?: string; privateKey?: string;
passphrase?: string; passphrase?: string;
role?: string; role?: string;
maxSessions?: number;
[key: string]: unknown; [key: string]: unknown;
} }
@ -35,6 +39,7 @@ export interface KtxSnowflakeResolvedConnectionConfig {
privateKey?: string; privateKey?: string;
passphrase?: string; passphrase?: string;
role?: string; role?: string;
maxSessions: number;
} }
export interface KtxSnowflakeRawColumnMetadata { export interface KtxSnowflakeRawColumnMetadata {
@ -56,7 +61,7 @@ export interface KtxSnowflakeRawTableMetadata {
export interface KtxSnowflakeDriver { export interface KtxSnowflakeDriver {
test(): Promise<{ success: boolean; error?: string }>; test(): Promise<{ success: boolean; error?: string }>;
query(sql: string, params?: unknown): Promise<KtxQueryResult>; query(sql: string, params?: unknown): Promise<KtxQueryResult>;
getSchemaMetadata(schemaName?: string): Promise<KtxSnowflakeRawTableMetadata[]>; getSchemaMetadata(schemaName?: string, scopedTableNames?: readonly string[] | null): Promise<KtxSnowflakeRawTableMetadata[]>;
listSchemas(): Promise<string[]>; listSchemas(): Promise<string[]>;
listTables(schemas?: string[]): Promise<KtxTableListEntry[]>; listTables(schemas?: string[]): Promise<KtxTableListEntry[]>;
cleanup(): Promise<void>; cleanup(): Promise<void>;
@ -79,6 +84,12 @@ export interface KtxSnowflakeSdkOptionsProvider {
export interface KtxSnowflakeScanConnectorOptions { export interface KtxSnowflakeScanConnectorOptions {
connectionId: string; connectionId: string;
connection: KtxSnowflakeConnectionConfig | undefined; connection: KtxSnowflakeConnectionConfig | undefined;
/**
* KTX project directory. When provided, snowflake-sdk's logger is redirected to
* `<projectDir>/.ktx/logs/snowflake.log` so its JSON output does not bleed into
* the CLI's TTY. Tests that use a fake driverFactory can leave this undefined.
*/
projectDir?: string;
driverFactory?: KtxSnowflakeDriverFactory; driverFactory?: KtxSnowflakeDriverFactory;
sdkOptionsProvider?: KtxSnowflakeSdkOptionsProvider; sdkOptionsProvider?: KtxSnowflakeSdkOptionsProvider;
env?: NodeJS.ProcessEnv; env?: NodeJS.ProcessEnv;
@ -123,13 +134,31 @@ function stringConfigValue(
return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(value.trim(), env) : undefined; return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(value.trim(), env) : undefined;
} }
function positiveIntegerConfigValue(input: {
connection: KtxSnowflakeConnectionConfig;
key: keyof KtxSnowflakeConnectionConfig;
connectionId: string;
defaultValue: number;
}): number {
const value = input.connection[input.key];
if (value === undefined) {
return input.defaultValue;
}
const numberValue = Number(value);
if (!Number.isInteger(numberValue) || numberValue < 1) {
throw new Error(`connections.${input.connectionId}.${String(input.key)} must be a positive integer`);
}
return numberValue;
}
function schemaNames(connection: KtxSnowflakeConnectionConfig, env: NodeJS.ProcessEnv): string[] { function schemaNames(connection: KtxSnowflakeConnectionConfig, env: NodeJS.ProcessEnv): string[] {
if (Array.isArray(connection.schema_names) && connection.schema_names.length > 0) { if (Array.isArray(connection.schema_names) && connection.schema_names.length > 0) {
return connection.schema_names return connection.schema_names
.filter((schema) => schema.trim().length > 0) .filter((schema) => schema.trim().length > 0)
.map((schema) => resolveStringReference(schema, env)); .map((schema) => resolveStringReference(schema, env));
} }
return [stringConfigValue(connection, 'schema_name', env) ?? 'PUBLIC']; const single = stringConfigValue(connection, 'schema_name', env);
return single ? [single] : [];
} }
function firstNumber(value: unknown): number | null { function firstNumber(value: unknown): number | null {
@ -159,7 +188,7 @@ function normalizeSnowflakeValue(value: unknown, columnType?: string): unknown {
return value; return value;
} }
function toSnowflakeBind(value: unknown): snowflake.Bind { function toSnowflakeBind(value: unknown): Bind {
if (value === null || typeof value === 'string' || typeof value === 'number' || typeof value === 'boolean') { if (value === null || typeof value === 'string' || typeof value === 'number' || typeof value === 'boolean') {
return value; return value;
} }
@ -169,7 +198,7 @@ function toSnowflakeBind(value: unknown): snowflake.Bind {
return String(value); return String(value);
} }
function toSnowflakeBinds(params: unknown[] | undefined): snowflake.Binds | undefined { function toSnowflakeBinds(params: unknown[] | undefined): Binds | undefined {
return params?.map((value) => toSnowflakeBind(value)); return params?.map((value) => toSnowflakeBind(value));
} }
@ -220,6 +249,12 @@ export function snowflakeConnectionConfigFromConfig(input: {
database, database,
schemas: resolvedSchemas, schemas: resolvedSchemas,
username, username,
maxSessions: positiveIntegerConfigValue({
connection: input.connection,
key: 'maxSessions',
connectionId: input.connectionId,
defaultValue: 4,
}),
}; };
const role = stringConfigValue(input.connection, 'role', env); const role = stringConfigValue(input.connection, 'role', env);
if (role) { if (role) {
@ -255,6 +290,7 @@ class DefaultSnowflakeDriverFactory implements KtxSnowflakeDriverFactory {
class SnowflakeSdkDriver implements KtxSnowflakeDriver { class SnowflakeSdkDriver implements KtxSnowflakeDriver {
private closeSdkOptions: Array<() => Promise<void>> = []; private closeSdkOptions: Array<() => Promise<void>> = [];
private pool: ReturnType<typeof snowflake.createPool> | null = null;
constructor( constructor(
private readonly resolved: KtxSnowflakeResolvedConnectionConfig, private readonly resolved: KtxSnowflakeResolvedConnectionConfig,
@ -275,37 +311,50 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
} }
async query(sql: string, params?: unknown): Promise<KtxQueryResult> { async query(sql: string, params?: unknown): Promise<KtxQueryResult> {
let connection: snowflake.Connection | null = null; const binds = Array.isArray(params) ? toSnowflakeBinds(params) : undefined;
try { try {
connection = await this.createConnection(); const pool = await this.getPool();
const binds = Array.isArray(params) ? toSnowflakeBinds(params) : undefined; const result = await pool.use(async (connection: snowflake.Connection) =>
const result = await this.executeSnowflakeQuery(connection, sql, binds); this.executeSnowflakeQuery(connection, sql, binds),
);
return { ...result, totalRows: result.rows.length, rowCount: result.rows.length }; return { ...result, totalRows: result.rows.length, rowCount: result.rows.length };
} finally { } catch (error) {
if (connection) { const message = error instanceof Error ? error.message : String(error);
await this.destroyConnection(connection); if (/timeout/i.test(message) && /pool|acquire/i.test(message)) {
throw new Error(
"Snowflake session pool exhausted after 60s - consider lowering maxSessions or increasing your account's concurrent-statement limit.",
);
} }
throw error;
} }
} }
async getSchemaMetadata(schemaName = this.resolved.schemas[0] ?? 'PUBLIC'): Promise<KtxSnowflakeRawTableMetadata[]> { async getSchemaMetadata(
schemaName = this.resolved.schemas[0] ?? 'PUBLIC',
scopedTableNames: readonly string[] | null = null,
): Promise<KtxSnowflakeRawTableMetadata[]> {
const scopeClause =
scopedTableNames && scopedTableNames.length > 0
? `AND TABLE_NAME IN (${scopedTableNames.map(() => '?').join(', ')})`
: '';
const scopeParams = scopedTableNames ?? [];
const tablesResult = await this.query( const tablesResult = await this.query(
` `
SELECT TABLE_NAME, TABLE_TYPE, COMMENT, ROW_COUNT SELECT TABLE_NAME, TABLE_TYPE, COMMENT, ROW_COUNT
FROM INFORMATION_SCHEMA.TABLES FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = ? AND TABLE_CATALOG = ? WHERE TABLE_SCHEMA = ? AND TABLE_CATALOG = ? ${scopeClause}
ORDER BY TABLE_NAME ORDER BY TABLE_NAME
`, `,
[schemaName, this.resolved.database], [schemaName, this.resolved.database, ...scopeParams],
); );
const columnsResult = await this.query( const columnsResult = await this.query(
` `
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COMMENT, ORDINAL_POSITION SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COMMENT, ORDINAL_POSITION
FROM INFORMATION_SCHEMA.COLUMNS FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = ? AND TABLE_CATALOG = ? WHERE TABLE_SCHEMA = ? AND TABLE_CATALOG = ? ${scopeClause}
ORDER BY TABLE_NAME, ORDINAL_POSITION ORDER BY TABLE_NAME, ORDINAL_POSITION
`, `,
[schemaName, this.resolved.database], [schemaName, this.resolved.database, ...scopeParams],
); );
const columnsByTable = new Map<string, KtxSnowflakeRawColumnMetadata[]>(); const columnsByTable = new Map<string, KtxSnowflakeRawColumnMetadata[]>();
for (const row of columnsResult.rows) { for (const row of columnsResult.rows) {
@ -357,27 +406,41 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
} }
async cleanup(): Promise<void> { async cleanup(): Promise<void> {
const pool = this.pool;
this.pool = null;
if (pool) {
// Drain before clear so in-flight Snowflake statements finish before idle
// sessions are closed.
await pool.drain();
await pool.clear();
}
const closers = this.closeSdkOptions; const closers = this.closeSdkOptions;
this.closeSdkOptions = []; this.closeSdkOptions = [];
await Promise.all(closers.map((close) => close())); await Promise.all(closers.map((close) => Promise.resolve(close())));
} }
private async runTest(): Promise<{ success: boolean; error?: string }> { private async runTest(): Promise<{ success: boolean; error?: string }> {
let connection: snowflake.Connection | null = null;
try { try {
connection = await this.createConnection(); await this.query('SELECT 1');
await this.executeSnowflakeQuery(connection, 'SELECT 1');
return { success: true }; return { success: true };
} catch (error) { } catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) }; return { success: false, error: error instanceof Error ? error.message : String(error) };
} finally {
if (connection) {
await this.destroyConnection(connection);
}
} }
} }
private async createConnection(): Promise<snowflake.Connection> { private async getPool(): Promise<ReturnType<typeof snowflake.createPool>> {
if (!this.pool) {
this.pool = snowflake.createPool(await this.resolveConnectionOptions(), {
min: 0,
max: this.resolved.maxSessions,
evictionRunIntervalMillis: 30_000,
acquireTimeoutMillis: 60_000,
});
}
return this.pool;
}
private async resolveConnectionOptions(): Promise<snowflake.ConnectionOptions> {
const patch = await this.sdkOptionsProvider?.resolve({ const patch = await this.sdkOptionsProvider?.resolve({
account: this.resolved.account, account: this.resolved.account,
connection: { ...this.resolved, driver: 'snowflake' }, connection: { ...this.resolved, driver: 'snowflake' },
@ -385,60 +448,27 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
if (patch?.close) { if (patch?.close) {
this.closeSdkOptions.push(patch.close); this.closeSdkOptions.push(patch.close);
} }
const baseConfig: snowflake.ConnectionOptions = { const sessionSchema = this.resolved.schemas[0];
const baseConfig: ConnectionOptions = {
account: this.resolved.account, account: this.resolved.account,
username: this.resolved.username, username: this.resolved.username,
warehouse: this.resolved.warehouse, warehouse: this.resolved.warehouse,
database: this.resolved.database, database: this.resolved.database,
schema: this.resolved.schemas[0] ?? 'PUBLIC', ...(sessionSchema ? { schema: sessionSchema } : {}),
role: this.resolved.role, role: this.resolved.role,
clientSessionKeepAlive: true,
clientSessionKeepAliveHeartbeatFrequency: 900,
...patch?.sdkOptions, ...patch?.sdkOptions,
}; };
const connectionConfig: snowflake.ConnectionOptions = return this.resolved.authMethod === 'rsa'
this.resolved.authMethod === 'rsa' ? { ...baseConfig, authenticator: 'SNOWFLAKE_JWT', privateKey: this.decryptPrivateKey() }
? { ...baseConfig, authenticator: 'SNOWFLAKE_JWT', privateKey: this.decryptPrivateKey() } : { ...baseConfig, password: this.resolved.password };
: { ...baseConfig, password: this.resolved.password };
const connection = snowflake.createConnection(connectionConfig);
return new Promise((resolveConnection, rejectConnection) => {
connection.connect((error, connected) => {
if (error) {
rejectConnection(error);
return;
}
const resolvedConnection = connected ?? connection;
this.setConnectionContext(resolvedConnection).then(
() => resolveConnection(resolvedConnection),
(contextError) => {
resolvedConnection.destroy(() => undefined);
rejectConnection(contextError);
},
);
});
});
}
private async setConnectionContext(connection: snowflake.Connection): Promise<void> {
if (this.resolved.role) {
await this.executeSnowflakeQuery(connection, `USE ROLE ${quoteSnowflakeIdentifier(this.resolved.role, 'role')}`);
}
await this.executeSnowflakeQuery(
connection,
`USE WAREHOUSE ${quoteSnowflakeIdentifier(this.resolved.warehouse, 'warehouse')}`,
);
await this.executeSnowflakeQuery(
connection,
`USE DATABASE ${quoteSnowflakeIdentifier(this.resolved.database, 'database')}`,
);
await this.executeSnowflakeQuery(
connection,
`USE SCHEMA ${quoteSnowflakeIdentifier(this.resolved.schemas[0] ?? 'PUBLIC', 'schema')}`,
);
} }
private async executeSnowflakeQuery( private async executeSnowflakeQuery(
connection: snowflake.Connection, connection: Connection,
sqlText: string, sqlText: string,
binds?: snowflake.Binds, binds?: Binds,
): Promise<{ headers: string[]; headerTypes?: string[]; rows: unknown[][] }> { ): Promise<{ headers: string[]; headerTypes?: string[]; rows: unknown[][] }> {
return new Promise((resolveQuery, rejectQuery) => { return new Promise((resolveQuery, rejectQuery) => {
connection.execute({ connection.execute({
@ -461,18 +491,6 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
}); });
} }
private destroyConnection(connection: snowflake.Connection): Promise<void> {
return new Promise((resolveDestroy, rejectDestroy) => {
connection.destroy((error) => {
if (error) {
rejectDestroy(error);
return;
}
resolveDestroy();
});
});
}
private decryptPrivateKey(): string { private decryptPrivateKey(): string {
if (!this.resolved.privateKey) { if (!this.resolved.privateKey) {
throw new Error('Private key is required for RSA authentication'); throw new Error('Private key is required for RSA authentication');
@ -510,6 +528,9 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
this.driverFactory = options.driverFactory ?? new DefaultSnowflakeDriverFactory(); this.driverFactory = options.driverFactory ?? new DefaultSnowflakeDriverFactory();
this.now = options.now ?? (() => new Date()); this.now = options.now ?? (() => new Date());
this.id = `snowflake:${options.connectionId}`; this.id = `snowflake:${options.connectionId}`;
if (options.projectDir) {
configureSnowflakeSdkLogger(options.projectDir);
}
} }
async testConnection(): Promise<{ success: boolean; error?: string }> { async testConnection(): Promise<{ success: boolean; error?: string }> {
@ -520,7 +541,11 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
this.assertConnection(input.connectionId); this.assertConnection(input.connectionId);
const tables: KtxSchemaTable[] = []; const tables: KtxSchemaTable[] = [];
for (const schemaName of this.resolved.schemas) { for (const schemaName of this.resolved.schemas) {
const rawTables = await this.getDriver().getSchemaMetadata(schemaName); const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.resolved.database, db: schemaName })
: null;
if (scopedNames && scopedNames.length === 0) continue;
const rawTables = await this.getDriver().getSchemaMetadata(schemaName, scopedNames);
const primaryKeys = await this.primaryKeys(rawTables.map((table) => table.name), schemaName); const primaryKeys = await this.primaryKeys(rawTables.map((table) => table.name), schemaName);
tables.push(...rawTables.map((table) => this.toSchemaTable(table, primaryKeys))); tables.push(...rawTables.map((table) => this.toSchemaTable(table, primaryKeys)));
} }
@ -653,32 +678,39 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
} }
private async primaryKeys(tableNames: string[], schemaName: string): Promise<Map<string, Set<string>>> { private async primaryKeys(tableNames: string[], schemaName: string): Promise<Map<string, Set<string>>> {
if (tableNames.length === 0) {
return new Map();
}
const result = await this.getDriver().query(
`
SELECT tc.TABLE_NAME, kcu.COLUMN_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA
AND tc.TABLE_CATALOG = kcu.TABLE_CATALOG
WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
AND tc.TABLE_SCHEMA = ?
AND tc.TABLE_CATALOG = ?
ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION
`,
[schemaName, this.resolved.database],
);
const grouped = new Map<string, Set<string>>(); const grouped = new Map<string, Set<string>>();
for (const tableName of tableNames) { for (const tableName of tableNames) {
grouped.set(tableName, new Set()); grouped.set(tableName, new Set());
} }
for (const row of result.rows) { if (tableNames.length === 0) {
const tableName = String(row[0]); return grouped;
const columnName = String(row[1]); }
grouped.get(tableName)?.add(columnName); const tableNamePlaceholders = tableNames.map(() => '?').join(', ');
try {
const result = await this.getDriver().query(
`
SELECT tc.TABLE_NAME, kcu.COLUMN_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc
JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME
AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA
AND tc.TABLE_CATALOG = kcu.TABLE_CATALOG
WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
AND tc.TABLE_SCHEMA = ?
AND tc.TABLE_CATALOG = ?
AND tc.TABLE_NAME IN (${tableNamePlaceholders})
ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION
`,
[schemaName, this.resolved.database, ...tableNames],
);
for (const row of result.rows) {
const tableName = String(row[0]);
const columnName = String(row[1]);
grouped.get(tableName)?.add(columnName);
}
} catch {
// INFORMATION_SCHEMA.KEY_COLUMN_USAGE often isn't granted to read-only roles;
// continue with empty PK map and let FK inference + profiling carry the slack.
} }
return grouped; return grouped;
} }

View file

@ -0,0 +1,31 @@
import { KtxSnowflakeScanConnector, type KtxSnowflakeScanConnectorOptions } from './connector.js';
export type KtxSnowflakeHistoricSqlQueryClientOptions = KtxSnowflakeScanConnectorOptions;
export class KtxSnowflakeHistoricSqlQueryClient {
private readonly connectionId: string;
private readonly connector: KtxSnowflakeScanConnector;
constructor(options: KtxSnowflakeHistoricSqlQueryClientOptions) {
this.connectionId = options.connectionId;
this.connector = new KtxSnowflakeScanConnector(options);
}
async executeQuery(
sql: string,
): Promise<{ headers: string[]; rows: unknown[][]; totalRows: number }> {
const result = await this.connector.executeReadOnly(
{ connectionId: this.connectionId, sql },
{} as never,
);
return {
headers: result.headers,
rows: result.rows,
totalRows: result.totalRows,
};
}
async cleanup(): Promise<void> {
await this.connector.cleanup();
}
}

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js'; import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js'; import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import { import {
KtxSnowflakeScanConnector, KtxSnowflakeScanConnector,
@ -9,6 +12,7 @@ import {
interface CreateSnowflakeLiveDatabaseIntrospectionOptions { interface CreateSnowflakeLiveDatabaseIntrospectionOptions {
connections: Record<string, KtxProjectConnectionConfig>; connections: Record<string, KtxProjectConnectionConfig>;
projectDir?: string;
driverFactory?: KtxSnowflakeDriverFactory; driverFactory?: KtxSnowflakeDriverFactory;
sdkOptionsProvider?: KtxSnowflakeSdkOptionsProvider; sdkOptionsProvider?: KtxSnowflakeSdkOptionsProvider;
now?: () => Date; now?: () => Date;
@ -18,18 +22,23 @@ export function createSnowflakeLiveDatabaseIntrospection(
options: CreateSnowflakeLiveDatabaseIntrospectionOptions, options: CreateSnowflakeLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort { ): LiveDatabaseIntrospectionPort {
return { return {
async extractSchema(connectionId: string) { async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxSnowflakeConnectionConfig | undefined; const connection = options.connections[connectionId] as KtxSnowflakeConnectionConfig | undefined;
const connector = new KtxSnowflakeScanConnector({ const connector = new KtxSnowflakeScanConnector({
connectionId, connectionId,
connection, connection,
...(options.projectDir ? { projectDir: options.projectDir } : {}),
driverFactory: options.driverFactory, driverFactory: options.driverFactory,
sdkOptionsProvider: options.sdkOptionsProvider, sdkOptionsProvider: options.sdkOptionsProvider,
now: options.now, now: options.now,
}); });
try { try {
return await connector.introspect( return await connector.introspect(
{ connectionId, driver: 'snowflake' }, {
connectionId,
driver: 'snowflake',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `snowflake-${connectionId}` }, { runId: `snowflake-${connectionId}` },
); );
} finally { } finally {

View file

@ -0,0 +1,57 @@
import { mkdtempSync, rmSync, statSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join, resolve } from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
const { configure } = vi.hoisted(() => ({ configure: vi.fn() }));
vi.mock('snowflake-sdk', () => ({
default: { configure },
}));
import {
configureSnowflakeSdkLogger,
resetSnowflakeSdkLoggerConfigurationForTests,
} from './sdk-logger.js';
describe('configureSnowflakeSdkLogger', () => {
let projectDir: string;
beforeEach(() => {
configure.mockReset();
resetSnowflakeSdkLoggerConfigurationForTests();
projectDir = mkdtempSync(join(tmpdir(), 'ktx-snowflake-logger-'));
});
afterEach(() => {
rmSync(projectDir, { recursive: true, force: true });
});
it('routes logs to <projectDir>/.ktx/logs/snowflake.log with console output disabled', () => {
const expected = resolve(projectDir, '.ktx', 'logs', 'snowflake.log');
const returned = configureSnowflakeSdkLogger(projectDir);
expect(returned).toBe(expected);
expect(configure).toHaveBeenCalledTimes(1);
expect(configure).toHaveBeenCalledWith({
logFilePath: expected,
additionalLogToConsole: false,
});
expect(statSync(resolve(projectDir, '.ktx', 'logs')).isDirectory()).toBe(true);
});
it('is idempotent for the same projectDir', () => {
configureSnowflakeSdkLogger(projectDir);
configureSnowflakeSdkLogger(projectDir);
expect(configure).toHaveBeenCalledTimes(1);
});
it('reconfigures when projectDir changes', () => {
const other = mkdtempSync(join(tmpdir(), 'ktx-snowflake-logger-other-'));
try {
configureSnowflakeSdkLogger(projectDir);
configureSnowflakeSdkLogger(other);
expect(configure).toHaveBeenCalledTimes(2);
} finally {
rmSync(other, { recursive: true, force: true });
}
});
});

View file

@ -0,0 +1,32 @@
import { mkdirSync } from 'node:fs';
import { resolve } from 'node:path';
import snowflake from 'snowflake-sdk';
let configuredLogFilePath: string | null = null;
/**
* Redirects the snowflake-sdk logger to a project-scoped file so its JSON output
* does not bleed into the CLI's TTY (which would pollute the setup wizard and
* break the in-place progress repainter in `context-build-view.ts`).
*
* Idempotent per process: re-calling with the same projectDir is a no-op.
*/
export function configureSnowflakeSdkLogger(projectDir: string): string {
const logDir = resolve(projectDir, '.ktx', 'logs');
const logFilePath = resolve(logDir, 'snowflake.log');
if (configuredLogFilePath === logFilePath) {
return logFilePath;
}
mkdirSync(logDir, { recursive: true });
snowflake.configure({
logFilePath,
additionalLogToConsole: false,
});
configuredLogFilePath = logFilePath;
return logFilePath;
}
/** @internal */
export function resetSnowflakeSdkLoggerConfigurationForTests(): void {
configuredLogFilePath = null;
}

View file

@ -6,6 +6,7 @@ import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest'; import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { createSqliteLiveDatabaseIntrospection } from '../../connectors/sqlite/live-database-introspection.js'; import { createSqliteLiveDatabaseIntrospection } from '../../connectors/sqlite/live-database-introspection.js';
import { isKtxSqliteConnectionConfig, KtxSqliteScanConnector, sqliteDatabasePathFromConfig } from '../../connectors/sqlite/connector.js'; import { isKtxSqliteConnectionConfig, KtxSqliteScanConnector, sqliteDatabasePathFromConfig } from '../../connectors/sqlite/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
describe('KtxSqliteScanConnector', () => { describe('KtxSqliteScanConnector', () => {
let tempDir: string; let tempDir: string;
@ -196,6 +197,19 @@ describe('KtxSqliteScanConnector', () => {
).resolves.toBeNull(); ).resolves.toBeNull();
}); });
it('limits introspection to tables in tableScope', async () => {
const connector = new KtxSqliteScanConnector({
connectionId: 'warehouse',
connection: { driver: 'sqlite', path: dbPath },
});
const scope = tableRefSet([{ catalog: null, db: null, name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'sqlite', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
});
it('adapts native SQLite snapshots to live-database introspection for local ingest', async () => { it('adapts native SQLite snapshots to live-database introspection for local ingest', async () => {
const introspection = createSqliteLiveDatabaseIntrospection({ const introspection = createSqliteLiveDatabaseIntrospection({
projectDir: tempDir, projectDir: tempDir,

View file

@ -6,6 +6,7 @@ import { fileURLToPath } from 'node:url';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { normalizeQueryRows } from '../../context/connections/query-executor.js'; import { normalizeQueryRows } from '../../context/connections/query-executor.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js'; import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { KtxSqliteDialect } from './dialect.js'; import { KtxSqliteDialect } from './dialect.js';
export interface KtxSqliteConnectionConfig { export interface KtxSqliteConnectionConfig {
@ -181,11 +182,16 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> { async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
this.assertConnection(input.connectionId); this.assertConnection(input.connectionId);
const database = this.database(); const database = this.database();
const rawTables = database const scopedNames = input.tableScope ? scopedTableNames(input.tableScope, { catalog: null, db: null }) : null;
.prepare( const scopeClause = scopedNames ? `AND name IN (${scopedNames.map(() => '?').join(', ')})` : '';
`SELECT name, type FROM sqlite_master WHERE type IN ('table', 'view') AND name NOT LIKE 'sqlite_%' ORDER BY name`, const rawTables =
) scopedNames && scopedNames.length === 0
.all() as SqliteMasterRow[]; ? []
: (database
.prepare(
`SELECT name, type FROM sqlite_master WHERE type IN ('table', 'view') AND name NOT LIKE 'sqlite_%' ${scopeClause} ORDER BY name`,
)
.all(...(scopedNames ?? [])) as SqliteMasterRow[]);
const tables = rawTables.map((table) => this.readTable(database, table)); const tables = rawTables.map((table) => this.readTable(database, table));
const fileStats = existsSync(this.dbPath) ? statSync(this.dbPath) : null; const fileStats = existsSync(this.dbPath) ? statSync(this.dbPath) : null;
return { return {

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js'; import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js'; import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import { KtxSqliteScanConnector, type KtxSqliteConnectionConfig } from './connector.js'; import { KtxSqliteScanConnector, type KtxSqliteConnectionConfig } from './connector.js';
@ -12,7 +15,7 @@ export function createSqliteLiveDatabaseIntrospection(
options: CreateSqliteLiveDatabaseIntrospectionOptions, options: CreateSqliteLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort { ): LiveDatabaseIntrospectionPort {
return { return {
async extractSchema(connectionId: string) { async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxSqliteConnectionConfig | undefined; const connection = options.connections[connectionId] as KtxSqliteConnectionConfig | undefined;
const connector = new KtxSqliteScanConnector({ const connector = new KtxSqliteScanConnector({
connectionId, connectionId,
@ -21,7 +24,14 @@ export function createSqliteLiveDatabaseIntrospection(
now: options.now, now: options.now,
}); });
try { try {
return await connector.introspect({ connectionId, driver: 'sqlite' }, { runId: `sqlite-${connectionId}` }); return await connector.introspect(
{
connectionId,
driver: 'sqlite',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `sqlite-${connectionId}` },
);
} finally { } finally {
await connector.cleanup(); await connector.cleanup();
} }

View file

@ -1,6 +1,7 @@
import { describe, expect, it, vi } from 'vitest'; import { describe, expect, it, vi } from 'vitest';
import { createSqlServerLiveDatabaseIntrospection } from '../../connectors/sqlserver/live-database-introspection.js'; import { createSqlServerLiveDatabaseIntrospection } from '../../connectors/sqlserver/live-database-introspection.js';
import { isKtxSqlServerConnectionConfig, KtxSqlServerScanConnector, sqlServerConnectionPoolConfigFromConfig, type KtxSqlServerPoolFactory, type KtxSqlServerQueryResult } from '../../connectors/sqlserver/connector.js'; import { isKtxSqlServerConnectionConfig, KtxSqlServerScanConnector, sqlServerConnectionPoolConfigFromConfig, type KtxSqlServerPoolFactory, type KtxSqlServerQueryResult } from '../../connectors/sqlserver/connector.js';
import { tableRefSet } from '../../context/scan/table-ref.js';
function recordset<T extends Record<string, unknown>>( function recordset<T extends Record<string, unknown>>(
rows: T[], rows: T[],
@ -290,6 +291,55 @@ describe('KtxSqlServerScanConnector', () => {
await connector.cleanup(); await connector.cleanup();
}); });
it('limits introspection to tables in tableScope', async () => {
const queries: string[] = [];
const inputs: Array<{ name: string; value: unknown }> = [];
const request = {
input: vi.fn((name: string, value: unknown) => {
inputs.push({ name, value });
return request;
}),
query: vi.fn(async (sql: string): Promise<KtxSqlServerQueryResult> => {
queries.push(sql);
if (sql.includes('INFORMATION_SCHEMA.TABLES')) {
return result([{ table_name: 'orders', table_type: 'BASE TABLE' }], ['table_name', 'table_type']);
}
if (sql.includes('INFORMATION_SCHEMA.COLUMNS')) {
return result(
[{ table_name: 'orders', column_name: 'id', data_type: 'int', is_nullable: 'NO' }],
['table_name', 'column_name', 'data_type', 'is_nullable'],
);
}
return result([], []);
}),
};
const poolFactory: KtxSqlServerPoolFactory = {
createPool: vi.fn(async () => ({
request: () => request,
close: vi.fn(async () => undefined),
})),
};
const connector = new KtxSqlServerScanConnector({
connectionId: 'warehouse',
connection: {
driver: 'sqlserver',
host: 'db.example.test',
database: 'analytics',
username: 'reader',
schema: 'dbo',
},
poolFactory,
});
const scope = tableRefSet([{ catalog: 'analytics', db: 'dbo', name: 'orders' }]);
const snapshot = await connector.introspect(
{ connectionId: 'warehouse', driver: 'sqlserver', tableScope: scope },
{ runId: 'scope-test' },
);
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
expect(queries.find((query) => query.includes('INFORMATION_SCHEMA.TABLES'))).toMatch(/TABLE_NAME IN \(@table_0\)/);
expect(inputs).toEqual(expect.arrayContaining([{ name: 'table_0', value: 'orders' }]));
});
it('adapts native SQL Server snapshots to live-database introspection for local ingest', async () => { it('adapts native SQL Server snapshots to live-database introspection for local ingest', async () => {
const introspection = createSqlServerLiveDatabaseIntrospection({ const introspection = createSqlServerLiveDatabaseIntrospection({
connections: { connections: {

View file

@ -1,5 +1,6 @@
import { assertReadOnlySql } from '../../context/connections/read-only-sql.js'; import { assertReadOnlySql } from '../../context/connections/read-only-sql.js';
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js'; import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { readFileSync } from 'node:fs'; import { readFileSync } from 'node:fs';
import { homedir } from 'node:os'; import { homedir } from 'node:os';
import { resolve } from 'node:path'; import { resolve } from 'node:path';
@ -121,6 +122,20 @@ function sqlRecordset(
return recordset; return recordset;
} }
function tableScopeSql(
scopedNames: readonly string[] | null,
columnExpression: string,
): { clause: string; params: Record<string, unknown> } {
if (!scopedNames) return { clause: '', params: {} };
const params: Record<string, unknown> = {};
const placeholders = scopedNames.map((name, index) => {
const key = `table_${index}`;
params[key] = name;
return `@${key}`;
});
return { clause: `AND ${columnExpression} IN (${placeholders.join(', ')})`, params };
}
class DefaultSqlServerPoolFactory implements KtxSqlServerPoolFactory { class DefaultSqlServerPoolFactory implements KtxSqlServerPoolFactory {
async createPool(config: KtxSqlServerPoolConfig): Promise<KtxSqlServerPool> { async createPool(config: KtxSqlServerPoolConfig): Promise<KtxSqlServerPool> {
const pool = await new sql.ConnectionPool(config as sql.config).connect(); const pool = await new sql.ConnectionPool(config as sql.config).connect();
@ -314,7 +329,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
this.assertConnection(input.connectionId); this.assertConnection(input.connectionId);
const tables: KtxSchemaTable[] = []; const tables: KtxSchemaTable[] = [];
for (const schemaName of this.schemas) { for (const schemaName of this.schemas) {
tables.push(...(await this.introspectSchema(schemaName))); const scopedNames = input.tableScope
? scopedTableNames(input.tableScope, { catalog: this.poolConfig.database, db: schemaName })
: null;
tables.push(...(await this.introspectSchema(schemaName, scopedNames)));
} }
return { return {
connectionId: this.connectionId, connectionId: this.connectionId,
@ -461,16 +479,19 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
} }
} }
private async introspectSchema(schemaName: string): Promise<KtxSchemaTable[]> { private async introspectSchema(schemaName: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
if (scopedNames && scopedNames.length === 0) return [];
const tableScope = tableScopeSql(scopedNames, 'TABLE_NAME');
const tables = await this.queryRaw<{ table_name: string; table_type: string }>( const tables = await this.queryRaw<{ table_name: string; table_type: string }>(
` `
SELECT TABLE_NAME AS table_name, TABLE_TYPE AS table_type SELECT TABLE_NAME AS table_name, TABLE_TYPE AS table_type
FROM INFORMATION_SCHEMA.TABLES FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = @schemaName WHERE TABLE_SCHEMA = @schemaName
AND TABLE_TYPE IN ('BASE TABLE', 'VIEW') AND TABLE_TYPE IN ('BASE TABLE', 'VIEW')
${tableScope.clause}
ORDER BY TABLE_NAME ORDER BY TABLE_NAME
`, `,
{ schemaName }, { schemaName, ...tableScope.params },
); );
const columns = await this.queryRaw<{ const columns = await this.queryRaw<{
table_name: string; table_name: string;
@ -482,15 +503,16 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
SELECT TABLE_NAME AS table_name, COLUMN_NAME AS column_name, DATA_TYPE AS data_type, IS_NULLABLE AS is_nullable SELECT TABLE_NAME AS table_name, COLUMN_NAME AS column_name, DATA_TYPE AS data_type, IS_NULLABLE AS is_nullable
FROM INFORMATION_SCHEMA.COLUMNS FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @schemaName WHERE TABLE_SCHEMA = @schemaName
${tableScope.clause}
ORDER BY TABLE_NAME, ORDINAL_POSITION ORDER BY TABLE_NAME, ORDINAL_POSITION
`, `,
{ schemaName }, { schemaName, ...tableScope.params },
); );
const tableComments = await this.tableComments(schemaName); const tableComments = await this.tableComments(schemaName, scopedNames);
const columnComments = await this.columnComments(schemaName); const columnComments = await this.columnComments(schemaName, scopedNames);
const primaryKeys = await this.primaryKeys(schemaName); const primaryKeys = await this.primaryKeys(schemaName, scopedNames);
const foreignKeys = await this.foreignKeys(schemaName); const foreignKeys = await this.foreignKeys(schemaName, scopedNames);
const rowCounts = await this.rowCounts(schemaName); const rowCounts = await this.rowCounts(schemaName, scopedNames);
const columnsByTable = groupByTable(columns); const columnsByTable = groupByTable(columns);
const foreignKeysByTable = groupByTable(foreignKeys); const foreignKeysByTable = groupByTable(foreignKeys);
@ -508,7 +530,8 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
})); }));
} }
private async tableComments(schemaName: string): Promise<Map<string, string>> { private async tableComments(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, string>> {
const tableScope = tableScopeSql(scopedNames, 'o.name');
const rows = await this.queryRaw<{ table_name: string; table_comment: string }>( const rows = await this.queryRaw<{ table_name: string; table_comment: string }>(
` `
SELECT o.name AS table_name, CAST(ep.value AS NVARCHAR(MAX)) AS table_comment SELECT o.name AS table_name, CAST(ep.value AS NVARCHAR(MAX)) AS table_comment
@ -519,13 +542,15 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
AND ep.name = 'MS_Description' AND ep.name = 'MS_Description'
WHERE s.name = @schemaName WHERE s.name = @schemaName
AND o.type IN ('U', 'V') AND o.type IN ('U', 'V')
${tableScope.clause}
`, `,
{ schemaName }, { schemaName, ...tableScope.params },
); );
return new Map(rows.map((row) => [row.table_name, row.table_comment])); return new Map(rows.map((row) => [row.table_name, row.table_comment]));
} }
private async columnComments(schemaName: string): Promise<Map<string, string>> { private async columnComments(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, string>> {
const tableScope = tableScopeSql(scopedNames, 'o.name');
const rows = await this.queryRaw<{ table_name: string; column_name: string; column_comment: string }>( const rows = await this.queryRaw<{ table_name: string; column_name: string; column_comment: string }>(
` `
SELECT o.name AS table_name, c.name AS column_name, CAST(ep.value AS NVARCHAR(MAX)) AS column_comment SELECT o.name AS table_name, c.name AS column_name, CAST(ep.value AS NVARCHAR(MAX)) AS column_comment
@ -537,13 +562,18 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
AND ep.name = 'MS_Description' AND ep.name = 'MS_Description'
WHERE s.name = @schemaName WHERE s.name = @schemaName
AND o.type IN ('U', 'V') AND o.type IN ('U', 'V')
${tableScope.clause}
`, `,
{ schemaName }, { schemaName, ...tableScope.params },
); );
return new Map(rows.map((row) => [`${row.table_name}.${row.column_name}`, row.column_comment])); return new Map(rows.map((row) => [`${row.table_name}.${row.column_name}`, row.column_comment]));
} }
private async primaryKeys(schemaName: string): Promise<Map<string, Set<string>>> { private async primaryKeys(
schemaName: string,
scopedNames: readonly string[] | null,
): Promise<Map<string, Set<string>>> {
const tableScope = tableScopeSql(scopedNames, 'tc.TABLE_NAME');
const rows = await this.queryRaw<{ table_name: string; column_name: string }>( const rows = await this.queryRaw<{ table_name: string; column_name: string }>(
` `
SELECT tc.TABLE_NAME AS table_name, kcu.COLUMN_NAME AS column_name SELECT tc.TABLE_NAME AS table_name, kcu.COLUMN_NAME AS column_name
@ -553,9 +583,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA
WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY' WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
AND tc.TABLE_SCHEMA = @schemaName AND tc.TABLE_SCHEMA = @schemaName
${tableScope.clause}
ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION
`, `,
{ schemaName }, { schemaName, ...tableScope.params },
); );
const grouped = new Map<string, Set<string>>(); const grouped = new Map<string, Set<string>>();
for (const row of rows) { for (const row of rows) {
@ -566,7 +597,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
return grouped; return grouped;
} }
private async foreignKeys(schemaName: string): Promise< private async foreignKeys(
schemaName: string,
scopedNames: readonly string[] | null,
): Promise<
Array<{ Array<{
table_name: string; table_name: string;
column_name: string; column_name: string;
@ -576,6 +610,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
constraint_name: string; constraint_name: string;
}> }>
> { > {
const tableScope = tableScopeSql(scopedNames, 'fk.TABLE_NAME');
return this.queryRaw( return this.queryRaw(
` `
SELECT SELECT
@ -596,13 +631,15 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
AND pk.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME AND pk.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME
AND pk.ORDINAL_POSITION = fk.ORDINAL_POSITION AND pk.ORDINAL_POSITION = fk.ORDINAL_POSITION
WHERE fk.TABLE_SCHEMA = @schemaName WHERE fk.TABLE_SCHEMA = @schemaName
${tableScope.clause}
ORDER BY fk.TABLE_NAME, fk.COLUMN_NAME ORDER BY fk.TABLE_NAME, fk.COLUMN_NAME
`, `,
{ schemaName }, { schemaName, ...tableScope.params },
); );
} }
private async rowCounts(schemaName: string): Promise<Map<string, number>> { private async rowCounts(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, number>> {
const tableScope = tableScopeSql(scopedNames, 't.name');
const rows = await this.queryRaw<{ table_name: string; row_count: unknown }>( const rows = await this.queryRaw<{ table_name: string; row_count: unknown }>(
` `
SELECT t.name AS table_name, SUM(p.rows) AS row_count SELECT t.name AS table_name, SUM(p.rows) AS row_count
@ -611,9 +648,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
WHERE s.name = @schemaName WHERE s.name = @schemaName
AND p.index_id IN (0, 1) AND p.index_id IN (0, 1)
${tableScope.clause}
GROUP BY t.name GROUP BY t.name
`, `,
{ schemaName }, { schemaName, ...tableScope.params },
); );
return new Map(rows.map((row) => [row.table_name, firstNumber(row.row_count) ?? 0])); return new Map(rows.map((row) => [row.table_name, firstNumber(row.row_count) ?? 0]));
} }

View file

@ -1,4 +1,7 @@
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js'; import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js'; import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import { import {
KtxSqlServerScanConnector, KtxSqlServerScanConnector,
@ -18,7 +21,7 @@ export function createSqlServerLiveDatabaseIntrospection(
options: CreateSqlServerLiveDatabaseIntrospectionOptions, options: CreateSqlServerLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort { ): LiveDatabaseIntrospectionPort {
return { return {
async extractSchema(connectionId: string) { async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxSqlServerConnectionConfig | undefined; const connection = options.connections[connectionId] as KtxSqlServerConnectionConfig | undefined;
const connector = new KtxSqlServerScanConnector({ const connector = new KtxSqlServerScanConnector({
connectionId, connectionId,
@ -29,7 +32,11 @@ export function createSqlServerLiveDatabaseIntrospection(
}); });
try { try {
return await connector.introspect( return await connector.introspect(
{ connectionId, driver: 'sqlserver' }, {
connectionId,
driver: 'sqlserver',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `sqlserver-${connectionId}` }, { runId: `sqlserver-${connectionId}` },
); );
} finally { } finally {

View file

@ -319,7 +319,8 @@ function renderPhaseRow(phase: PhaseState, frame: number, styled: boolean): stri
} else if (phase.status === 'skipped') { } else if (phase.status === 'skipped') {
trailing = styled ? dim('skipped') : 'skipped'; trailing = styled ? dim('skipped') : 'skipped';
} else if (phase.status === 'failed') { } else if (phase.status === 'failed') {
trailing = styled ? red('failed') : 'failed'; const label = styled ? red('failed') : 'failed';
trailing = phase.summary ? `${label} ${phase.summary}` : label;
} }
const bar = `${segments.join(' ')} ${trailing}`.trimEnd(); const bar = `${segments.join(' ')} ${trailing}`.trimEnd();
return ` ${icon} ${name} ${bar}`; return ` ${icon} ${name} ${bar}`;

View file

@ -0,0 +1,48 @@
import type { HistoricSqlDialect } from './types.js';
const KNOWN_DIALECTS = ['postgres', 'bigquery', 'snowflake'] as const;
function isKnownDialect(value: string): value is HistoricSqlDialect {
return (KNOWN_DIALECTS as readonly string[]).includes(value);
}
function recordOrNull(value: unknown): Record<string, unknown> | null {
return value && typeof value === 'object' && !Array.isArray(value) ? (value as Record<string, unknown>) : null;
}
function historicSqlRecord(connection: unknown): Record<string, unknown> | null {
const conn = recordOrNull(connection);
return conn ? recordOrNull(conn.historicSql) : null;
}
function queryHistoryRecord(connection: unknown): Record<string, unknown> | null {
const conn = recordOrNull(connection);
const context = conn ? recordOrNull(conn.context) : null;
return context ? recordOrNull(context.queryHistory) : null;
}
export function isQueryHistoryEnabled(connection: unknown): boolean {
const queryHistory = queryHistoryRecord(connection);
if (queryHistory) {
return queryHistory.enabled === true;
}
return historicSqlRecord(connection)?.enabled === true;
}
/**
* Resolves the query-history dialect for a connection. Returns null when
* query history is disabled, or when the connection's driver has no
* query-history reader.
*/
export function queryHistoryDialectForConnection(connection: unknown): HistoricSqlDialect | null {
if (!isQueryHistoryEnabled(connection)) {
return null;
}
const conn = recordOrNull(connection);
const driver = String(conn?.driver ?? '').toLowerCase();
if (driver === 'postgres' || driver === 'postgresql') return 'postgres';
if (driver === 'bigquery') return 'bigquery';
if (driver === 'snowflake') return 'snowflake';
const legacy = String(historicSqlRecord(connection)?.dialect ?? '').toLowerCase();
return isKnownDialect(legacy) ? legacy : null;
}

View file

@ -1,6 +1,7 @@
import { once } from 'node:events'; import { once } from 'node:events';
import { createServer } from 'node:http'; import { createServer } from 'node:http';
import { describe, expect, it, vi } from 'vitest'; import { describe, expect, it, vi } from 'vitest';
import { tableRefSet } from '../../../scan/table-ref.js';
import { createDaemonLiveDatabaseIntrospection } from './daemon-introspection.js'; import { createDaemonLiveDatabaseIntrospection } from './daemon-introspection.js';
const daemonResponse = { const daemonResponse = {
@ -161,7 +162,11 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
baseUrl: `http://127.0.0.1:${address.port}`, baseUrl: `http://127.0.0.1:${address.port}`,
}); });
await expect(introspection.extractSchema('warehouse')).resolves.toMatchObject({ await expect(
introspection.extractSchema('warehouse', {
tableScope: tableRefSet([{ catalog: 'warehouse', db: 'public', name: 'orders' }]),
}),
).resolves.toMatchObject({
connectionId: 'warehouse', connectionId: 'warehouse',
tables: [{ name: 'customers' }, { name: 'orders' }], tables: [{ name: 'customers' }, { name: 'orders' }],
}); });
@ -176,6 +181,7 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
schemas: ['public'], schemas: ['public'],
statement_timeout_ms: 30_000, statement_timeout_ms: 30_000,
connection_timeout_seconds: 5, connection_timeout_seconds: 5,
table_scope: [{ catalog: 'warehouse', db: 'public', name: 'orders' }],
}, },
}, },
]); ]);
@ -217,7 +223,7 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
expect(runJson).not.toHaveBeenCalled(); expect(runJson).not.toHaveBeenCalled();
}); });
it('filters out tables not on the enabled_tables allowlist', async () => { it('does not use connection enabled_tables as a response filter', async () => {
const runJson = vi.fn(async () => daemonResponse); const runJson = vi.fn(async () => daemonResponse);
const introspection = createDaemonLiveDatabaseIntrospection({ const introspection = createDaemonLiveDatabaseIntrospection({
connections: { connections: {
@ -232,7 +238,8 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
}); });
const snapshot = await introspection.extractSchema('warehouse'); const snapshot = await introspection.extractSchema('warehouse');
expect(snapshot.tables.map((table) => `${table.db}.${table.name}`)).toEqual(['public.orders']); expect(snapshot.tables.map((table) => `${table.db}.${table.name}`)).toEqual(['public.customers', 'public.orders']);
expect(runJson).toHaveBeenCalledWith('database-introspect', expect.not.objectContaining({ table_scope: expect.anything() }));
}); });
it('passes through every table when enabled_tables is omitted or empty', async () => { it('passes through every table when enabled_tables is omitted or empty', async () => {

View file

@ -3,10 +3,10 @@ import { request as httpRequest } from 'node:http';
import { request as httpsRequest } from 'node:https'; import { request as httpsRequest } from 'node:https';
import { URL } from 'node:url'; import { URL } from 'node:url';
import type { KtxProjectConnectionConfig } from '../../../project/config.js'; import type { KtxProjectConnectionConfig } from '../../../project/config.js';
import { filterSnapshotTables, resolveEnabledTables } from '../../../scan/enabled-tables.js'; import { tableRefFromKey } from '../../../scan/table-ref.js';
import type { KtxSchemaColumn, KtxSchemaForeignKey, KtxSchemaSnapshot, KtxSchemaTable } from '../../../scan/types.js'; import type { KtxSchemaColumn, KtxSchemaForeignKey, KtxSchemaSnapshot, KtxSchemaTable } from '../../../scan/types.js';
import { inferKtxDimensionType, normalizeKtxNativeType } from '../../../scan/type-normalization.js'; import { inferKtxDimensionType, normalizeKtxNativeType } from '../../../scan/type-normalization.js';
import type { LiveDatabaseIntrospectionPort } from './types.js'; import type { LiveDatabaseIntrospectionOptions, LiveDatabaseIntrospectionPort } from './types.js';
type KtxDaemonDatabaseIntrospectionCommand = 'database-introspect'; type KtxDaemonDatabaseIntrospectionCommand = 'database-introspect';
@ -220,6 +220,18 @@ function mapDaemonSnapshot(
}; };
} }
function serializeTableScope(options: LiveDatabaseIntrospectionOptions | undefined): Array<{
catalog: string | null;
db: string | null;
name: string;
}> | undefined {
if (!options?.tableScope) return undefined;
return [...options.tableScope].map((key) => {
const ref = tableRefFromKey(key);
return { catalog: ref.catalog, db: ref.db, name: ref.name };
});
}
export function createDaemonLiveDatabaseIntrospection( export function createDaemonLiveDatabaseIntrospection(
options: DaemonLiveDatabaseIntrospectionOptions, options: DaemonLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort { ): LiveDatabaseIntrospectionPort {
@ -231,8 +243,9 @@ export function createDaemonLiveDatabaseIntrospection(
const now = options.now ?? (() => new Date()); const now = options.now ?? (() => new Date());
return { return {
async extractSchema(connectionId: string): Promise<KtxSchemaSnapshot> { async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions): Promise<KtxSchemaSnapshot> {
const connection = requirePostgresConnection(options.connections, connectionId); const connection = requirePostgresConnection(options.connections, connectionId);
const tableScope = serializeTableScope(introspectionOptions);
const payload = { const payload = {
connection_id: connectionId, connection_id: connectionId,
driver: normalizeDriver(connection.driver), driver: normalizeDriver(connection.driver),
@ -240,17 +253,16 @@ export function createDaemonLiveDatabaseIntrospection(
schemas, schemas,
statement_timeout_ms: options.statementTimeoutMs ?? 30_000, statement_timeout_ms: options.statementTimeoutMs ?? 30_000,
connection_timeout_seconds: options.connectionTimeoutSeconds ?? 5, connection_timeout_seconds: options.connectionTimeoutSeconds ?? 5,
...(tableScope !== undefined ? { table_scope: tableScope } : {}),
}; };
const raw = requestJson const raw = requestJson
? await requestJson('/database/introspect', payload) ? await requestJson('/database/introspect', payload)
: await runJson('database-introspect', payload); : await runJson('database-introspect', payload);
const snapshot = mapDaemonSnapshot(raw, { return mapDaemonSnapshot(raw, {
connectionId, connectionId,
extractedAt: now().toISOString(), extractedAt: now().toISOString(),
schemas, schemas,
}); });
const enabledTables = resolveEnabledTables(connection);
return enabledTables ? filterSnapshotTables(snapshot, enabledTables) : snapshot;
}, },
}; };
} }

View file

@ -1,7 +1,8 @@
import { mkdtemp } from 'node:fs/promises'; import { mkdtemp, readdir, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os'; import { tmpdir } from 'node:os';
import { join } from 'node:path'; import { join } from 'node:path';
import { describe, expect, it, vi } from 'vitest'; import { describe, expect, it, vi } from 'vitest';
import { tableRefSet, type KtxTableRefKey } from '../../../scan/table-ref.js';
import { LiveDatabaseSourceAdapter } from './live-database.adapter.js'; import { LiveDatabaseSourceAdapter } from './live-database.adapter.js';
describe('LiveDatabaseSourceAdapter', () => { describe('LiveDatabaseSourceAdapter', () => {
@ -43,7 +44,7 @@ describe('LiveDatabaseSourceAdapter', () => {
await adapter.fetch(undefined, dir, { connectionId: 'conn-1', sourceKey: 'live-database' }); await adapter.fetch(undefined, dir, { connectionId: 'conn-1', sourceKey: 'live-database' });
expect(extractSchema).toHaveBeenCalledWith('conn-1'); expect(extractSchema).toHaveBeenCalledWith('conn-1', { tableScope: undefined });
await expect(adapter.detect(dir)).resolves.toBe(true); await expect(adapter.detect(dir)).resolves.toBe(true);
const chunked = await adapter.chunk(dir); const chunked = await adapter.chunk(dir);
expect(chunked.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']); expect(chunked.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']);
@ -56,4 +57,55 @@ describe('LiveDatabaseSourceAdapter', () => {
expect(adapter.source).toBe('live-database'); expect(adapter.source).toBe('live-database');
expect(adapter.skillNames).toEqual(['live_database_ingest']); expect(adapter.skillNames).toEqual(['live_database_ingest']);
}); });
it('threads tableScope from fetch context into the introspection port without post-filtering', async () => {
const extractSchema = vi.fn(
async (_connectionId: string, _options?: { tableScope?: ReadonlySet<KtxTableRefKey> }) => ({
connectionId: 'warehouse',
driver: 'snowflake' as const,
extractedAt: '2026-05-22T00:00:00.000Z',
scope: {},
metadata: {},
tables: [
{
catalog: 'A',
db: 'MARTS',
name: 'IN_SCOPE',
kind: 'table' as const,
comment: null,
estimatedRows: 0,
columns: [],
foreignKeys: [],
},
{
catalog: 'A',
db: 'MARTS',
name: 'OUT_OF_SCOPE',
kind: 'table' as const,
comment: null,
estimatedRows: 0,
columns: [],
foreignKeys: [],
},
],
}),
);
const scope = tableRefSet([{ catalog: 'A', db: 'MARTS', name: 'IN_SCOPE' }]);
const adapter = new LiveDatabaseSourceAdapter({
introspection: { extractSchema },
});
const stagedDir = await mkdtemp(join(tmpdir(), 'ktx-livedb-scope-'));
try {
await adapter.fetch(undefined, stagedDir, {
connectionId: 'warehouse',
sourceKey: 'live-database',
tableScope: scope,
});
expect(extractSchema).toHaveBeenCalledWith('warehouse', { tableScope: scope });
const tables = await readdir(join(stagedDir, 'tables'));
expect(tables).toHaveLength(2);
} finally {
await rm(stagedDir, { recursive: true, force: true });
}
});
}); });

View file

@ -14,7 +14,8 @@ export class LiveDatabaseSourceAdapter implements SourceAdapter {
} }
async fetch(_pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> { async fetch(_pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
const snapshot = await this.deps.introspection.extractSchema(ctx.connectionId); const tableScope = ctx.tableScope;
const snapshot = await this.deps.introspection.extractSchema(ctx.connectionId, { tableScope });
await writeLiveDatabaseSnapshot(stagedDir, { await writeLiveDatabaseSnapshot(stagedDir, {
...snapshot, ...snapshot,
connectionId: ctx.connectionId, connectionId: ctx.connectionId,

View file

@ -1,7 +1,12 @@
import type { KtxSchemaSnapshot } from '../../../scan/types.js'; import type { KtxSchemaSnapshot } from '../../../scan/types.js';
import type { KtxTableRefKey } from '../../../scan/table-ref.js';
export interface LiveDatabaseIntrospectionOptions {
tableScope?: ReadonlySet<KtxTableRefKey>;
}
export interface LiveDatabaseIntrospectionPort { export interface LiveDatabaseIntrospectionPort {
extractSchema(connectionId: string): Promise<KtxSchemaSnapshot>; extractSchema(connectionId: string, options?: LiveDatabaseIntrospectionOptions): Promise<KtxSchemaSnapshot>;
} }
export interface LiveDatabaseSourceAdapterDeps { export interface LiveDatabaseSourceAdapterDeps {

View file

@ -9,6 +9,7 @@ import { sanitizeMemoryFlowError } from './memory-flow/live-buffer.js';
import type { MemoryFlowEventSink, MemoryFlowPlannedWorkUnit } from './memory-flow/types.js'; import type { MemoryFlowEventSink, MemoryFlowPlannedWorkUnit } from './memory-flow/types.js';
import { buildSyncId } from './raw-sources-paths.js'; import { buildSyncId } from './raw-sources-paths.js';
import { SqliteLocalIngestStore } from './sqlite-local-ingest-store.js'; import { SqliteLocalIngestStore } from './sqlite-local-ingest-store.js';
import type { KtxTableRefKey } from '../scan/table-ref.js';
import type { IngestTrigger, SourceAdapter, WorkUnit } from './types.js'; import type { IngestTrigger, SourceAdapter, WorkUnit } from './types.js';
type LocalIngestStatus = 'running' | 'done' | 'error'; type LocalIngestStatus = 'running' | 'done' | 'error';
@ -62,6 +63,7 @@ export interface RunLocalStageOnlyIngestOptions {
now?: () => Date; now?: () => Date;
dryRun?: boolean; dryRun?: boolean;
memoryFlow?: MemoryFlowEventSink; memoryFlow?: MemoryFlowEventSink;
tableScope?: ReadonlySet<KtxTableRefKey>;
} }
const LOCAL_AUTHOR = 'ktx'; const LOCAL_AUTHOR = 'ktx';
@ -225,6 +227,7 @@ async function prepareLocalStagedDir(
stagedDir: string, stagedDir: string,
sourceDir: string | undefined, sourceDir: string | undefined,
connectionId: string, connectionId: string,
tableScope: ReadonlySet<KtxTableRefKey> | undefined,
): Promise<string | null> { ): Promise<string | null> {
await rm(stagedDir, { recursive: true, force: true }); await rm(stagedDir, { recursive: true, force: true });
await mkdir(stagedDir, { recursive: true }); await mkdir(stagedDir, { recursive: true });
@ -242,7 +245,7 @@ async function prepareLocalStagedDir(
); );
} }
const pullConfig = await localPullConfigForAdapter(project, adapter, connectionId); const pullConfig = await localPullConfigForAdapter(project, adapter, connectionId);
await adapter.fetch(pullConfig, stagedDir, { connectionId, sourceKey: adapter.source }); await adapter.fetch(pullConfig, stagedDir, { connectionId, sourceKey: adapter.source, tableScope });
return null; return null;
} }
@ -274,7 +277,14 @@ async function runLocalStageOnlyIngestInner(options: RunLocalStageOnlyIngestOpti
assertCompatibleExistingRun(existingRun, runId, adapter.source, connectionId); assertCompatibleExistingRun(existingRun, runId, adapter.source, connectionId);
const stagedDir = join(options.project.projectDir, '.ktx/cache/local-ingest', runId, 'staged'); const stagedDir = join(options.project.projectDir, '.ktx/cache/local-ingest', runId, 'staged');
const sourceDir = await prepareLocalStagedDir(options.project, adapter, stagedDir, options.sourceDir, connectionId); const sourceDir = await prepareLocalStagedDir(
options.project,
adapter,
stagedDir,
options.sourceDir,
connectionId,
options.tableScope,
);
const detected = await adapter.detect(stagedDir); const detected = await adapter.detect(stagedDir);
if (!detected) { if (!detected) {

View file

@ -2,6 +2,7 @@ import type { KtxEmbeddingPort } from '../core/embedding.js';
import type { MemoryAction } from '../../context/memory/types.js'; import type { MemoryAction } from '../../context/memory/types.js';
import type { SemanticLayerService } from '../../context/sl/semantic-layer.service.js'; import type { SemanticLayerService } from '../../context/sl/semantic-layer.service.js';
import type { TouchedSlSource } from '../../context/tools/touched-sl-sources.js'; import type { TouchedSlSource } from '../../context/tools/touched-sl-sources.js';
import type { KtxTableRefKey } from '../scan/table-ref.js';
import type { MemoryFlowEventSink } from './memory-flow/types.js'; import type { MemoryFlowEventSink } from './memory-flow/types.js';
import type { StageIndex } from './stages/stage-index.types.js'; import type { StageIndex } from './stages/stage-index.types.js';
import type { WorkUnitOutcome } from './stages/stage-3-work-units.js'; import type { WorkUnitOutcome } from './stages/stage-3-work-units.js';
@ -52,6 +53,7 @@ export interface ChunkResult {
export interface FetchContext { export interface FetchContext {
connectionId: string; connectionId: string;
sourceKey: string; sourceKey: string;
tableScope?: ReadonlySet<KtxTableRefKey>;
memoryFlow?: MemoryFlowEventSink; memoryFlow?: MemoryFlowEventSink;
} }

View file

@ -91,9 +91,14 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
}); });
}); });
it('validates structured output with the caller schema', async () => { it('validates structured output with the caller schema and whitelists the SDK StructuredOutput tool', async () => {
const schema = z.object({ answer: z.string() }); const schema = z.object({ answer: z.string() });
const query = vi.fn((_input: any) => stream([initMessage(), resultMessage({ structured_output: { answer: 'yes' } })])); const query = vi.fn((_input: any) =>
stream([
initMessage({ tools: ['StructuredOutput'] }),
resultMessage({ structured_output: { answer: 'yes' } }),
]),
);
const runtime = new ClaudeCodeKtxLlmRuntime({ const runtime = new ClaudeCodeKtxLlmRuntime({
projectDir: '/tmp/project', projectDir: '/tmp/project',
modelSlots: { default: 'sonnet' }, modelSlots: { default: 'sonnet' },
@ -341,7 +346,10 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
it('passes scrubbed env to object generation and agent loops', async () => { it('passes scrubbed env to object generation and agent loops', async () => {
const schema = z.object({ answer: z.string() }); const schema = z.object({ answer: z.string() });
const objectQuery = vi.fn((_input: any) => const objectQuery = vi.fn((_input: any) =>
stream([initMessage(), resultMessage({ structured_output: { answer: 'yes' } })]), stream([
initMessage({ tools: ['StructuredOutput'] }),
resultMessage({ structured_output: { answer: 'yes' } }),
]),
); );
const objectRuntime = new ClaudeCodeKtxLlmRuntime({ const objectRuntime = new ClaudeCodeKtxLlmRuntime({
projectDir: '/tmp/project', projectDir: '/tmp/project',

View file

@ -47,6 +47,13 @@ const BUILTIN_TOOLS = [
const KTX_MCP_SERVER_NAME = 'ktx'; const KTX_MCP_SERVER_NAME = 'ktx';
// SDK-internal pseudo-tool that the Claude Code CLI announces in its
// system/init message whenever outputFormat: { type: 'json_schema' } is set.
// Structured output is returned via result.structured_output (not through
// canUseTool), so the tool only needs to be whitelisted for generateObject's
// init isolation check; generateText / runAgentLoop never see it.
const STRUCTURED_OUTPUT_TOOL_NAME = 'StructuredOutput';
function isResult(message: SDKMessage): message is SDKResultMessage { function isResult(message: SDKMessage): message is SDKResultMessage {
return message.type === 'result'; return message.type === 'result';
} }
@ -238,7 +245,12 @@ export class ClaudeCodeKtxLlmRuntime implements KtxLlmRuntimePort {
projectDir: this.deps.projectDir, projectDir: this.deps.projectDir,
model: modelForRole(this.deps.modelSlots, input.role), model: modelForRole(this.deps.modelSlots, input.role),
env: this.deps.env, env: this.deps.env,
maxTurns: 1, // Structured output occasionally takes more than one assistant turn —
// the model may emit thinking/text before the StructuredOutput tool
// call, or the SDK may count assistant + tool-result as separate turns.
// 5 leaves headroom without enabling unbounded loops; the json_schema
// constraint still forces the final answer to be the schema.
maxTurns: 5,
tools: input.tools, tools: input.tools,
}), }),
outputFormat: { type: 'json_schema' as const, schema: jsonSchema(input.schema as z.ZodType) }, outputFormat: { type: 'json_schema' as const, schema: jsonSchema(input.schema as z.ZodType) },
@ -247,7 +259,7 @@ export class ClaudeCodeKtxLlmRuntime implements KtxLlmRuntimePort {
query: this.runQuery, query: this.runQuery,
prompt: [input.system, input.prompt].filter(Boolean).join('\n\n'), prompt: [input.system, input.prompt].filter(Boolean).join('\n\n'),
options, options,
allowedToolIds: new Set(mcpToolIds(input.tools ?? {})), allowedToolIds: new Set([...mcpToolIds(input.tools ?? {}), STRUCTURED_OUTPUT_TOOL_NAME]),
expectedMcpServerNames: expectedMcpServerNames(input.tools), expectedMcpServerNames: expectedMcpServerNames(input.tools),
}); });
const error = resultError(result); const error = resultError(result);

View file

@ -74,6 +74,7 @@ connections:
maxLlmTablesPerBatch: 40, maxLlmTablesPerBatch: 40,
maxCandidatesPerColumn: 25, maxCandidatesPerColumn: 25,
profileSampleRows: 10000, profileSampleRows: 10000,
profileConcurrency: 4,
validationConcurrency: 4, validationConcurrency: 4,
}, },
}, },
@ -278,6 +279,7 @@ scan:
maxLlmTablesPerBatch: 12 maxLlmTablesPerBatch: 12
maxCandidatesPerColumn: 7 maxCandidatesPerColumn: 7
profileSampleRows: 500 profileSampleRows: 500
profileConcurrency: 3
validationConcurrency: 2 validationConcurrency: 2
validationBudget: 0 validationBudget: 0
`); `);
@ -291,6 +293,7 @@ scan:
maxLlmTablesPerBatch: 12, maxLlmTablesPerBatch: 12,
maxCandidatesPerColumn: 7, maxCandidatesPerColumn: 7,
profileSampleRows: 500, profileSampleRows: 500,
profileConcurrency: 3,
validationConcurrency: 2, validationConcurrency: 2,
validationBudget: 0, validationBudget: 0,
}); });
@ -302,6 +305,7 @@ scan:
expect(serializeKtxProjectConfig(config)).toContain('maxLlmTablesPerBatch: 12'); expect(serializeKtxProjectConfig(config)).toContain('maxLlmTablesPerBatch: 12');
expect(serializeKtxProjectConfig(config)).toContain('maxCandidatesPerColumn: 7'); expect(serializeKtxProjectConfig(config)).toContain('maxCandidatesPerColumn: 7');
expect(serializeKtxProjectConfig(config)).toContain('profileSampleRows: 500'); expect(serializeKtxProjectConfig(config)).toContain('profileSampleRows: 500');
expect(serializeKtxProjectConfig(config)).toContain('profileConcurrency: 3');
expect(serializeKtxProjectConfig(config)).toContain('validationConcurrency: 2'); expect(serializeKtxProjectConfig(config)).toContain('validationConcurrency: 2');
expect(serializeKtxProjectConfig(config)).toContain('validationBudget: 0'); expect(serializeKtxProjectConfig(config)).toContain('validationBudget: 0');
}); });
@ -326,6 +330,7 @@ scan:
maxLlmTablesPerBatch: 0 maxLlmTablesPerBatch: 0
maxCandidatesPerColumn: -4 maxCandidatesPerColumn: -4
profileSampleRows: 0 profileSampleRows: 0
profileConcurrency: 0
validationConcurrency: 0 validationConcurrency: 0
validationBudget: 1.5 validationBudget: 1.5
`; `;
@ -341,6 +346,7 @@ scan:
'scan.relationships.maxLlmTablesPerBatch', 'scan.relationships.maxLlmTablesPerBatch',
'scan.relationships.maxCandidatesPerColumn', 'scan.relationships.maxCandidatesPerColumn',
'scan.relationships.profileSampleRows', 'scan.relationships.profileSampleRows',
'scan.relationships.profileConcurrency',
'scan.relationships.validationConcurrency', 'scan.relationships.validationConcurrency',
'scan.relationships.validationBudget', 'scan.relationships.validationBudget',
]), ]),

View file

@ -163,6 +163,11 @@ const scanRelationshipsSchema = z
.default(25) .default(25)
.describe('Maximum number of candidate join partners considered per column during relationship discovery.'), .describe('Maximum number of candidate join partners considered per column during relationship discovery.'),
profileSampleRows: z.int().positive().default(10000).describe('Number of rows sampled per table when profiling values for relationship inference.'), profileSampleRows: z.int().positive().default(10000).describe('Number of rows sampled per table when profiling values for relationship inference.'),
profileConcurrency: z
.int()
.positive()
.default(4)
.describe('Parallel relationship-profile queries run against the database during scan.'),
validationConcurrency: z.int().positive().default(4).describe('Number of relationship validation queries run in parallel against the database.'), validationConcurrency: z.int().positive().default(4).describe('Number of relationship validation queries run in parallel against the database.'),
validationBudget: z validationBudget: z
.union([z.literal('all'), z.int().nonnegative()]) .union([z.literal('all'), z.int().nonnegative()])

View file

@ -378,6 +378,121 @@ describe('KtxDescriptionGenerator', () => {
expect(cache.set).toHaveBeenCalledWith('warehouse.public.orders', 'Commerce orders'); expect(cache.set).toHaveBeenCalledWith('warehouse.public.orders', 'Commerce orders');
expect(cache.set).toHaveBeenCalledWith('__connection:Warehouse', 'Commerce orders'); expect(cache.set).toHaveBeenCalledWith('__connection:Warehouse', 'Commerce orders');
}); });
it('generates one structured table description and reuses table samples for all columns', async () => {
const llmRuntime = createLlmProvider('unused');
llmRuntime.generateObject = vi.fn(async () => ({
tableDescription: 'Commerce orders',
columns: [
{ name: 'status', description: 'Current order state' },
{ name: 'amount', description: 'Order amount in dollars' },
],
}));
const connector = createConnector();
const generator = new KtxDescriptionGenerator({
llmRuntime,
settings: { columnMaxWords: 12, tableMaxWords: 18, dataSourceMaxWords: 24 },
});
const result = await generator.generateBatchedTableDescriptions({
connectionId: 'conn-1',
connector,
context: { runId: 'run-1' },
dataSourceType: 'POSTGRESQL',
supportsNestedAnalysis: false,
table: {
catalog: null,
db: 'public',
name: 'orders',
rawDescriptions: { db: 'Orders fact table' },
columns: [
{ name: 'status', type: 'text' },
{ name: 'amount', type: 'numeric' },
],
},
});
expect(result.tableDescription).toBe('Commerce orders');
expect(Object.fromEntries(result.columnDescriptions)).toEqual({
status: 'Current order state',
amount: 'Order amount in dollars',
});
expect(connector.sampleTable).toHaveBeenCalledTimes(1);
expect(connector.sampleColumn).not.toHaveBeenCalled();
expect(llmRuntime.generateObject).toHaveBeenCalledTimes(1);
expect(llmRuntime.generateText).not.toHaveBeenCalled();
});
it('falls back to one column generateText call for each missing structured column', async () => {
const llmRuntime = createLlmProvider('Fallback status');
llmRuntime.generateObject = vi.fn(async () => ({
tableDescription: 'Commerce orders',
columns: [{ name: 'amount', description: 'Order amount in dollars' }],
}));
const connector = createConnector();
const generator = new KtxDescriptionGenerator({
llmRuntime,
settings: { columnMaxWords: 12, tableMaxWords: 18, dataSourceMaxWords: 24 },
});
const result = await generator.generateBatchedTableDescriptions({
connectionId: 'conn-1',
connector,
context: { runId: 'run-1' },
dataSourceType: 'POSTGRESQL',
supportsNestedAnalysis: false,
table: {
catalog: null,
db: 'public',
name: 'orders',
columns: [
{ name: 'status', type: 'text' },
{ name: 'amount', type: 'numeric' },
],
},
});
expect(Object.fromEntries(result.columnDescriptions)).toEqual({
status: 'Fallback status',
amount: 'Order amount in dollars',
});
expect(connector.sampleColumn).not.toHaveBeenCalled();
expect(llmRuntime.generateObject).toHaveBeenCalledTimes(1);
expect(llmRuntime.generateText).toHaveBeenCalledTimes(1);
});
it('does not run per-column fallback when structured object generation throws', async () => {
const llmRuntime = createLlmProvider('Fallback description');
llmRuntime.generateObject = vi.fn(async () => {
throw new Error('object output unavailable');
});
const warnings: string[] = [];
const generator = new KtxDescriptionGenerator({
llmRuntime,
onWarning: (warning) => warnings.push(warning.code),
settings: { columnMaxWords: 12, tableMaxWords: 18, dataSourceMaxWords: 24 },
});
const result = await generator.generateBatchedTableDescriptions({
connectionId: 'conn-1',
connector: createConnector(),
context: { runId: 'run-1' },
dataSourceType: 'POSTGRESQL',
supportsNestedAnalysis: false,
table: {
catalog: null,
db: 'public',
name: 'orders',
columns: [{ name: 'status', type: 'text' }],
},
});
expect(result.tableDescription).toBeNull();
expect(Object.fromEntries(result.columnDescriptions)).toEqual({ status: null });
expect(warnings).toContain('enrichment_failed');
expect(llmRuntime.generateObject).toHaveBeenCalledTimes(1);
expect(llmRuntime.generateText).not.toHaveBeenCalled();
});
}); });
describe('KtxDescriptionGenerator resilience', () => { describe('KtxDescriptionGenerator resilience', () => {

View file

@ -1,4 +1,5 @@
import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js'; import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js';
import { z } from 'zod';
import type { import type {
KtxColumnSampleInput, KtxColumnSampleInput,
KtxColumnSampleResult, KtxColumnSampleResult,
@ -53,7 +54,7 @@ export interface KtxDescriptionColumn {
sampleValues?: unknown[]; sampleValues?: unknown[];
} }
export interface KtxDescriptionColumnTable extends KtxTableRef { interface KtxDescriptionColumnTable extends KtxTableRef {
columns: KtxDescriptionColumn[]; columns: KtxDescriptionColumn[];
} }
@ -112,6 +113,23 @@ export interface KtxGenerateTableDescriptionInput {
table: KtxDescriptionTableInput; table: KtxDescriptionTableInput;
} }
export interface KtxGenerateBatchedTableDescriptionsInput {
connectionId: string;
connector: KtxDescriptionSamplingPort;
context: KtxScanContext;
dataSourceType: string;
supportsNestedAnalysis: boolean;
table: KtxDescriptionColumnTable & {
rawDescriptions?: Record<string, string>;
columns: Array<KtxDescriptionColumn & { type?: string; comment?: string | null }>;
};
}
export interface KtxBatchedTableDescriptionsResult {
tableDescription: string | null;
columnDescriptions: Map<string, string | null>;
}
export interface KtxGenerateDataSourceDescriptionInput { export interface KtxGenerateDataSourceDescriptionInput {
connectionId: string; connectionId: string;
connector: KtxDescriptionSamplingPort; connector: KtxDescriptionSamplingPort;
@ -136,6 +154,18 @@ interface ColumnTaskResult {
skipped: boolean; skipped: boolean;
} }
const batchedTableDescriptionSchema = z.object({
tableDescription: z.string(),
columns: z.array(
z.object({
name: z.string(),
description: z.string(),
}),
),
});
type BatchedTableDescriptionOutput = z.infer<typeof batchedTableDescriptionSchema>;
function descriptionSources(rawDescriptions: Record<string, string> | undefined): Array<[string, string]> { function descriptionSources(rawDescriptions: Record<string, string> | undefined): Array<[string, string]> {
if (!rawDescriptions) { if (!rawDescriptions) {
return []; return [];
@ -250,6 +280,76 @@ function wordLimitLine(maxWords: number): string {
return `Please provide a concise description in ${maxWords} words or less.`; return `Please provide a concise description in ${maxWords} words or less.`;
} }
function sampleValuesByColumn(
columns: readonly KtxDescriptionColumn[],
sampleData: KtxTableSampleResult | null,
): Map<string, unknown[]> {
const values = new Map<string, unknown[]>();
for (const column of columns) {
const existingValues = column.sampleValues?.filter((value) => value !== null && value !== undefined) ?? [];
if (existingValues.length > 0) {
values.set(column.name, existingValues);
}
}
if (!sampleData) {
return values;
}
for (const column of columns) {
const index = sampleData.headers.findIndex((header) => header.toLowerCase() === column.name.toLowerCase());
if (index < 0) {
continue;
}
const sampledValues = sampleData.rows
.map((row) => row[index])
.filter((value) => value !== null && value !== undefined);
if (sampledValues.length > 0) {
values.set(column.name, sampledValues);
}
}
return values;
}
function batchedPrompt(input: {
table: KtxGenerateBatchedTableDescriptionsInput['table'];
sampleData: KtxTableSampleResult | null;
dataSourceType: string;
tableMaxWords: number;
columnMaxWords: number;
}): KtxDescriptionPrompt {
const columnLines = input.table.columns
.map((column) => {
const typePart = column.type ? ` (${column.type})` : '';
const commentPart = column.rawDescriptions?.db ? ` - ${column.rawDescriptions.db}` : '';
return `- ${column.name}${typePart}${commentPart}`;
})
.join('\n');
const sampleLines =
input.sampleData && input.sampleData.rows.length > 0
? input.sampleData.rows
.slice(0, 5)
.map((row) =>
input.sampleData!.headers.map((header, index) => `${header}=${String(row[index] ?? '')}`).join(', '),
)
.join('\n')
: 'unavailable';
return {
system: [
'Analyze one database table and return structured JSON matching the supplied schema.',
`The table description must be ${input.tableMaxWords} words or less.`,
`Each column description must be ${input.columnMaxWords} words or less.`,
'Describe business meaning directly. Do not repeat table or column names as filler.',
].join('\n'),
user: [
`Table: ${input.table.name}`,
`Data source type: ${input.dataSourceType}`,
'Columns:',
columnLines,
'Sample rows:',
sampleLines,
].join('\n'),
};
}
/** @internal */ /** @internal */
export function buildKtxColumnDescriptionPrompt( export function buildKtxColumnDescriptionPrompt(
input: KtxColumnDescriptionPromptInput & { maxWords?: number }, input: KtxColumnDescriptionPromptInput & { maxWords?: number },
@ -463,11 +563,11 @@ export class KtxDescriptionGenerator {
} }
} }
const sampleTable = input.connector.sampleTable; const connector = input.connector;
let sampleData: KtxTableSampleResult | null = null; let sampleData: KtxTableSampleResult | null = null;
let fallbackReason: 'capability_missing' | 'sampling_failed' | 'empty_sample' | null = null; let fallbackReason: 'capability_missing' | 'sampling_failed' | 'empty_sample' | null = null;
if (!sampleTable) { if (!connector.sampleTable) {
fallbackReason = 'capability_missing'; fallbackReason = 'capability_missing';
this.logger?.warn('KTX scan connector does not support table sampling; falling back to metadata-only prompt', { this.logger?.warn('KTX scan connector does not support table sampling; falling back to metadata-only prompt', {
connectorId: input.connector.id, connectorId: input.connector.id,
@ -484,7 +584,7 @@ export class KtxDescriptionGenerator {
try { try {
sampleData = await retryAsync( sampleData = await retryAsync(
() => () =>
sampleTable( connector.sampleTable!(
{ {
connectionId: input.connectionId, connectionId: input.connectionId,
table: tableRef, table: tableRef,
@ -582,6 +682,156 @@ export class KtxDescriptionGenerator {
} }
} }
async generateBatchedTableDescriptions(
input: KtxGenerateBatchedTableDescriptionsInput,
): Promise<KtxBatchedTableDescriptionsResult> {
const tableRef = toTableRef(input.table);
let sampleData: KtxTableSampleResult | null = null;
let fallbackReason: 'capability_missing' | 'sampling_failed' | 'empty_sample' | null = null;
if (!input.connector.sampleTable) {
fallbackReason = 'capability_missing';
this.logger?.warn('KTX scan connector does not support table sampling; falling back to metadata-only prompt', {
connectorId: input.connector.id,
table: input.table.name,
});
this.onWarning?.({
code: 'connector_capability_missing',
message: `Connector ${input.connector.id} does not support sampleTable; using metadata-only description prompt`,
table: input.table.name,
recoverable: true,
metadata: { connectorId: input.connector.id, capability: 'sampleTable' },
});
} else {
try {
sampleData = await retryAsync(
() =>
input.connector.sampleTable!(
{
connectionId: input.connectionId,
table: tableRef,
limit: 20,
},
input.context,
),
{
attempts: 3,
baseDelayMs: 200,
signal: input.context.signal,
onAttemptFailure: (error, attempt) => {
this.logger?.warn(`sampleTable attempt ${attempt} failed for ${input.table.name}: ${errorMessage(error)}`, {
connectorId: input.connector.id,
table: input.table.name,
attempt,
});
},
},
);
if (sampleData.rows.length === 0) {
fallbackReason = 'empty_sample';
this.logger?.warn('sampleTable returned no rows; using metadata-only prompt', {
connectorId: input.connector.id,
table: input.table.name,
});
}
} catch (error) {
if (error instanceof KtxAbortedError) {
throw error;
}
fallbackReason = 'sampling_failed';
this.logger?.error(`sampleTable exhausted retries for ${input.table.name}: ${errorMessage(error)}`, {
connectorId: input.connector.id,
table: input.table.name,
});
this.onWarning?.({
code: 'sampling_failed',
message: `Failed to sample table ${input.table.name} after retries: ${errorMessage(error)}`,
table: input.table.name,
recoverable: true,
metadata: { connectorId: input.connector.id, error: errorMessage(error) },
});
}
}
const sampleValues = sampleValuesByColumn(input.table.columns, sampleData);
const descriptions = new Map<string, string | null>();
let tableDescription: string | null = null;
let structuredGenerationSucceeded = false;
try {
const prompt = batchedPrompt({
table: input.table,
sampleData,
dataSourceType: input.dataSourceType,
tableMaxWords: this.settings.tableMaxWords,
columnMaxWords: this.settings.columnMaxWords,
});
const generated = await this.llmRuntime.generateObject<
BatchedTableDescriptionOutput,
typeof batchedTableDescriptionSchema
>({
role: 'candidateExtraction',
system: prompt.system,
prompt: prompt.user,
schema: batchedTableDescriptionSchema,
temperature: this.settings.temperature,
});
structuredGenerationSucceeded = true;
tableDescription = generated.tableDescription.trim() || null;
const generatedColumns = new Map(
generated.columns.map((column) => [column.name.toLowerCase(), column.description.trim() || null]),
);
for (const column of input.table.columns) {
const description = generatedColumns.get(column.name.toLowerCase()) ?? null;
descriptions.set(column.name, description);
}
if (tableDescription && fallbackReason !== null) {
this.onWarning?.({
code: 'description_fallback_used',
message: `Generated table description without sample rows for ${input.table.name} (reason: ${fallbackReason})`,
table: input.table.name,
recoverable: true,
metadata: { connectorId: input.connector.id, reason: fallbackReason },
});
}
} catch (error) {
this.logger?.warn(`Batched table description failed for ${input.table.name}: ${errorMessage(error)}`, {
connectorId: input.connector.id,
table: input.table.name,
});
this.onWarning?.({
code: 'enrichment_failed',
message: `Failed to generate batched description for table ${input.table.name}: ${errorMessage(error)}`,
table: input.table.name,
recoverable: true,
metadata: { connectorId: input.connector.id },
});
}
if (!structuredGenerationSucceeded) {
for (const column of input.table.columns) {
descriptions.set(column.name, null);
}
return { tableDescription, columnDescriptions: descriptions };
}
const tableContext = `Table: ${input.table.name} | Columns: ${input.table.columns.map((column) => column.name).join(', ')} | Data source: ${input.dataSourceType}`;
for (const column of input.table.columns) {
if (descriptions.get(column.name)) {
continue;
}
const fallback = await this.generateColumnDescriptionFromPreparedValues({
column,
columnValues: sampleValues.get(column.name) ?? [],
tableContext,
dataSourceType: input.dataSourceType,
supportsNestedAnalysis: input.supportsNestedAnalysis,
});
descriptions.set(column.name, fallback);
}
return { tableDescription, columnDescriptions: descriptions };
}
async generateDataSourceDescription(input: KtxGenerateDataSourceDescriptionInput): Promise<string | null> { async generateDataSourceDescription(input: KtxGenerateDataSourceDescriptionInput): Promise<string | null> {
if (input.tables.length === 0) { if (input.tables.length === 0) {
return 'No tables found in database'; return 'No tables found in database';
@ -684,11 +934,11 @@ export class KtxDescriptionGenerator {
}); });
columnValues = []; columnValues = [];
} else { } else {
const sampleColumn = input.connector.sampleColumn; const connector = input.connector;
try { try {
const sample = await retryAsync( const sample = await retryAsync(
() => () =>
sampleColumn( connector.sampleColumn!(
{ {
connectionId: input.connectionId, connectionId: input.connectionId,
table: tableRef, table: tableRef,
@ -732,27 +982,13 @@ export class KtxDescriptionGenerator {
} }
} }
const nonNullValues = (columnValues ?? []).filter((value) => value !== null && value !== undefined); const description = await this.generateColumnDescriptionFromPreparedValues({
const hasRawDescriptions = descriptionSources(column.rawDescriptions).length > 0; column,
if (nonNullValues.length === 0 && !hasRawDescriptions) { columnValues: columnValues ?? [],
return {
columnName: column.name,
description: null,
skipped: false,
processed: false,
};
}
const prompt = buildKtxColumnDescriptionPrompt({
columnName: column.name,
columnValues: nonNullValues,
tableContext, tableContext,
dataSourceType: input.dataSourceType, dataSourceType: input.dataSourceType,
supportsNestedAnalysis: input.supportsNestedAnalysis, supportsNestedAnalysis: input.supportsNestedAnalysis,
rawDescriptions: column.rawDescriptions,
maxWords: this.settings.columnMaxWords,
}); });
const description = await this.generateAiDescription(prompt, 'ktx-column-description');
if (cacheKey && description) { if (cacheKey && description) {
await this.cache?.set(cacheKey, description); await this.cache?.set(cacheKey, description);
@ -782,6 +1018,30 @@ export class KtxDescriptionGenerator {
} }
} }
private async generateColumnDescriptionFromPreparedValues(input: {
column: KtxDescriptionColumn;
columnValues: unknown[];
tableContext: string;
dataSourceType: string;
supportsNestedAnalysis: boolean;
}): Promise<string | null> {
const nonNullValues = input.columnValues.filter((value) => value !== null && value !== undefined);
const hasRawDescriptions = descriptionSources(input.column.rawDescriptions).length > 0;
if (nonNullValues.length === 0 && !hasRawDescriptions) {
return null;
}
const prompt = buildKtxColumnDescriptionPrompt({
columnName: input.column.name,
columnValues: nonNullValues,
tableContext: input.tableContext,
dataSourceType: input.dataSourceType,
supportsNestedAnalysis: input.supportsNestedAnalysis,
rawDescriptions: input.column.rawDescriptions,
maxWords: this.settings.columnMaxWords,
});
return this.generateAiDescription(prompt, 'ktx-column-description');
}
private async generateAiDescription(prompt: KtxDescriptionPrompt, _operationName: string): Promise<string | null> { private async generateAiDescription(prompt: KtxDescriptionPrompt, _operationName: string): Promise<string | null> {
try { try {
const text = await this.llmRuntime.generateText({ const text = await this.llmRuntime.generateText({

View file

@ -1,17 +1,63 @@
import type { KtxSchemaSnapshot } from './types.js'; import { tableRefSet, type KtxTableRefKey } from './table-ref.js';
import type { KtxTableRef } from './types.js';
export function resolveEnabledTables(connection: Record<string, unknown> | undefined): Set<string> | null { /**
* Parses the `enabled_tables` field on a connection into a scope of
* fully-qualified table refs. Returns `null` when the field is absent or
* empty (meaning "no scope — include every table in the resolved schemas").
*
* Accepted entry forms:
* "catalog.db.name" fully qualified
* "db.name" schema-qualified (catalog = null; legacy / Postgres-shape)
* "name" bare (catalog = db = null; SQLite-shape)
* { catalog?, db?, name } escape hatch for identifiers containing dots
*
* The setup wizard writes the fully-qualified form going forward; the lenient
* parser keeps existing project configs working.
*/
export function resolveEnabledTables(
connection: Record<string, unknown> | undefined,
): ReadonlySet<KtxTableRefKey> | null {
const raw = connection?.enabled_tables; const raw = connection?.enabled_tables;
if (!Array.isArray(raw) || raw.length === 0) return null; if (!Array.isArray(raw) || raw.length === 0) return null;
return new Set(raw.filter((v): v is string => typeof v === 'string')); const refs: KtxTableRef[] = [];
for (const value of raw) {
const parsed = parseEnabledTableEntry(value);
if (parsed) refs.push(parsed);
}
if (refs.length === 0) return null;
return tableRefSet(refs);
} }
export function filterSnapshotTables(snapshot: KtxSchemaSnapshot, enabledTables: Set<string>): KtxSchemaSnapshot { function parseEnabledTableEntry(value: unknown): KtxTableRef | null {
return { if (typeof value === 'string') {
...snapshot, return parseDottedEntry(value);
tables: snapshot.tables.filter((table) => { }
const key = table.db ? `${table.db}.${table.name}` : table.name; if (value && typeof value === 'object' && !Array.isArray(value)) {
return enabledTables.has(key); const entry = value as { catalog?: unknown; db?: unknown; name?: unknown };
}), const name = typeof entry.name === 'string' ? entry.name : null;
}; if (!name) return null;
return {
catalog: typeof entry.catalog === 'string' ? entry.catalog : null,
db: typeof entry.db === 'string' ? entry.db : null,
name,
};
}
return null;
}
function parseDottedEntry(value: string): KtxTableRef | null {
const trimmed = value.trim();
if (trimmed.length === 0) return null;
const parts = trimmed.split('.');
if (parts.length === 3) {
return { catalog: parts[0]!, db: parts[1]!, name: parts[2]! };
}
if (parts.length === 2) {
return { catalog: null, db: parts[0]!, name: parts[1]! };
}
if (parts.length === 1) {
return { catalog: null, db: null, name: parts[0]! };
}
return null;
} }

View file

@ -289,6 +289,7 @@ describe('writeLocalScanEnrichmentArtifacts', () => {
maxLlmTablesPerBatch: 12, maxLlmTablesPerBatch: 12,
maxCandidatesPerColumn: 7, maxCandidatesPerColumn: 7,
profileSampleRows: 500, profileSampleRows: 500,
profileConcurrency: 3,
validationConcurrency: 2, validationConcurrency: 2,
}, },
}); });
@ -378,6 +379,7 @@ describe('writeLocalScanEnrichmentArtifacts', () => {
validationRequiredForManifest: true, validationRequiredForManifest: true,
maxCandidatesPerColumn: 7, maxCandidatesPerColumn: 7,
profileSampleRows: 500, profileSampleRows: 500,
profileConcurrency: 3,
validationConcurrency: 2, validationConcurrency: 2,
}, },
profileWarnings: [], profileWarnings: [],
@ -472,6 +474,7 @@ describe('writeLocalScanEnrichmentArtifacts', () => {
maxLlmTablesPerBatch: 40, maxLlmTablesPerBatch: 40,
maxCandidatesPerColumn: 25, maxCandidatesPerColumn: 25,
profileSampleRows: 10000, profileSampleRows: 10000,
profileConcurrency: 4,
validationConcurrency: 4, validationConcurrency: 4,
}, },
dryRun: false, dryRun: false,
@ -741,6 +744,7 @@ describe('writeLocalScanEnrichmentArtifacts', () => {
maxLlmTablesPerBatch: 40, maxLlmTablesPerBatch: 40,
maxCandidatesPerColumn: 25, maxCandidatesPerColumn: 25,
profileSampleRows: 10000, profileSampleRows: 10000,
profileConcurrency: 4,
validationConcurrency: 4, validationConcurrency: 4,
}, },
dryRun: false, dryRun: false,

View file

@ -382,6 +382,7 @@ export async function writeLocalScanEnrichmentArtifacts(
validationRequiredForManifest: input.relationshipSettings.validationRequiredForManifest, validationRequiredForManifest: input.relationshipSettings.validationRequiredForManifest,
maxCandidatesPerColumn: input.relationshipSettings.maxCandidatesPerColumn, maxCandidatesPerColumn: input.relationshipSettings.maxCandidatesPerColumn,
profileSampleRows: input.relationshipSettings.profileSampleRows, profileSampleRows: input.relationshipSettings.profileSampleRows,
profileConcurrency: input.relationshipSettings.profileConcurrency,
validationConcurrency: input.relationshipSettings.validationConcurrency, validationConcurrency: input.relationshipSettings.validationConcurrency,
} }
: undefined, : undefined,

View file

@ -299,6 +299,38 @@ describe('local scan enrichment', () => {
]); ]);
}); });
it('uses the supplied snapshot without calling connector.introspect', async () => {
const scanConnector = connector();
const introspect = vi.mocked(scanConnector.introspect);
const result = await runLocalScanEnrichment({
connectionId: 'warehouse',
mode: 'structural',
connector: scanConnector,
snapshot,
context: { runId: 'scan-run-snapshot' },
providers: null,
});
expect(result.snapshot).toEqual(snapshot);
expect(introspect).not.toHaveBeenCalled();
});
it('falls back to connector.introspect when no snapshot is supplied', async () => {
const scanConnector = connector();
const result = await runLocalScanEnrichment({
connectionId: 'warehouse',
mode: 'structural',
connector: scanConnector,
context: { runId: 'scan-run-introspect' },
providers: null,
});
expect(result.snapshot).toEqual(snapshot);
expect(scanConnector.introspect).toHaveBeenCalledTimes(1);
});
it('runs deterministic relationship detection for relationship scans', async () => { it('runs deterministic relationship detection for relationship scans', async () => {
const result = await runLocalScanEnrichment({ const result = await runLocalScanEnrichment({
connectionId: 'warehouse', connectionId: 'warehouse',
@ -473,7 +505,7 @@ describe('local scan enrichment', () => {
expect(result.relationships).toEqual({ accepted: 0, review: 1, rejected: 0, skipped: 0 }); expect(result.relationships).toEqual({ accepted: 0, review: 1, rejected: 0, skipped: 0 });
}); });
it('generates table descriptions with bounded table-level concurrency', async () => { it('generates batched table descriptions with bounded table-level concurrency', async () => {
const concurrentSnapshot: KtxSchemaSnapshot = { const concurrentSnapshot: KtxSchemaSnapshot = {
...snapshot, ...snapshot,
tables: Array.from({ length: 8 }, (_, index) => ({ tables: Array.from({ length: 8 }, (_, index) => ({
@ -497,27 +529,27 @@ describe('local scan enrichment', () => {
], ],
})), })),
}; };
let activeColumnSamples = 0; let activeTableSamples = 0;
let maxActiveColumnSamples = 0; let maxActiveTableSamples = 0;
const scanConnector = { const scanConnector = {
...connector(), ...connector(),
introspect: vi.fn(async () => concurrentSnapshot), introspect: vi.fn(async () => concurrentSnapshot),
sampleColumn: vi.fn(async () => { sampleColumn: vi.fn(async () => ({
activeColumnSamples += 1; values: ['1'],
maxActiveColumnSamples = Math.max(maxActiveColumnSamples, activeColumnSamples); nullCount: 0,
distinctCount: 1,
})),
sampleTable: vi.fn(async () => {
activeTableSamples += 1;
maxActiveTableSamples = Math.max(maxActiveTableSamples, activeTableSamples);
await new Promise((resolve) => setTimeout(resolve, 10)); await new Promise((resolve) => setTimeout(resolve, 10));
activeColumnSamples -= 1; activeTableSamples -= 1;
return { return {
values: ['1'], headers: ['id'],
nullCount: 0, rows: [[1]],
distinctCount: 1, totalRows: 1,
}; };
}), }),
sampleTable: vi.fn(async () => ({
headers: ['id'],
rows: [[1]],
totalRows: 1,
})),
}; };
const settings = { const settings = {
...buildDefaultKtxProjectConfig().scan.relationships, ...buildDefaultKtxProjectConfig().scan.relationships,
@ -533,7 +565,8 @@ describe('local scan enrichment', () => {
relationshipSettings: settings, relationshipSettings: settings,
}); });
expect(maxActiveColumnSamples).toBe(6); expect(maxActiveTableSamples).toBe(4);
expect(scanConnector.sampleColumn).not.toHaveBeenCalled();
}); });
it('reports enrichment progress for countable stages', async () => { it('reports enrichment progress for countable stages', async () => {
@ -675,7 +708,7 @@ describe('local scan enrichment', () => {
providerIdentity: { provider: 'fake', embeddingDimensions: 6 }, providerIdentity: { provider: 'fake', embeddingDimensions: 6 },
}); });
const generateText = vi.spyOn(providers.llmRuntime, 'generateText'); const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
const embedBatch = vi.spyOn(providers.embedding, 'embedBatch'); const embedBatch = vi.spyOn(providers.embedding, 'embedBatch');
const second = await runLocalScanEnrichment({ const second = await runLocalScanEnrichment({
connectionId: 'warehouse', connectionId: 'warehouse',
@ -693,7 +726,7 @@ describe('local scan enrichment', () => {
expect(first.state.resumedStages).toEqual([]); expect(first.state.resumedStages).toEqual([]);
expect(second.state.resumedStages).toEqual(['descriptions', 'embeddings', 'relationships']); expect(second.state.resumedStages).toEqual(['descriptions', 'embeddings', 'relationships']);
expect(second.state.completedStages).toEqual(['descriptions', 'embeddings', 'relationships']); expect(second.state.completedStages).toEqual(['descriptions', 'embeddings', 'relationships']);
expect(generateText).not.toHaveBeenCalled(); expect(generateObject).not.toHaveBeenCalled();
expect(embedBatch).not.toHaveBeenCalled(); expect(embedBatch).not.toHaveBeenCalled();
expect(second.descriptionUpdates).toEqual(first.descriptionUpdates); expect(second.descriptionUpdates).toEqual(first.descriptionUpdates);
expect(second.embeddingUpdates).toEqual(first.embeddingUpdates); expect(second.embeddingUpdates).toEqual(first.embeddingUpdates);
@ -731,7 +764,7 @@ describe('local scan enrichment', () => {
tables: [{ ...firstTable, name: 'customers' }], tables: [{ ...firstTable, name: 'customers' }],
})), })),
}; };
const generateText = vi.spyOn(providers.llmRuntime, 'generateText'); const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
const result = await runLocalScanEnrichment({ const result = await runLocalScanEnrichment({
connectionId: 'warehouse', connectionId: 'warehouse',
@ -747,7 +780,7 @@ describe('local scan enrichment', () => {
expect(result.state.resumedStages).toEqual([]); expect(result.state.resumedStages).toEqual([]);
expect(result.state.completedStages).toEqual(['descriptions', 'embeddings', 'relationships']); expect(result.state.completedStages).toEqual(['descriptions', 'embeddings', 'relationships']);
expect(generateText).toHaveBeenCalled(); expect(generateObject).toHaveBeenCalled();
}); });
it('runs providerless enriched scans as relationship-only discovery enrichment', async () => { it('runs providerless enriched scans as relationship-only discovery enrichment', async () => {

View file

@ -1,7 +1,7 @@
import pLimit from 'p-limit'; import pLimit from 'p-limit';
import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js'; import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js';
import { buildDefaultKtxProjectConfig, type KtxScanRelationshipConfig } from '../project/config.js'; import { buildDefaultKtxProjectConfig, type KtxScanRelationshipConfig } from '../project/config.js';
import { type KtxDescriptionColumnTable, KtxDescriptionGenerator } from './description-generation.js'; import { KtxDescriptionGenerator } from './description-generation.js';
import { buildKtxColumnEmbeddingText } from './embedding-text.js'; import { buildKtxColumnEmbeddingText } from './embedding-text.js';
import { import {
completedKtxScanEnrichmentStateSummary, completedKtxScanEnrichmentStateSummary,
@ -41,7 +41,7 @@ import type {
KtxTableRef, KtxTableRef,
} from './types.js'; } from './types.js';
const DESCRIPTION_TABLE_CONCURRENCY = 6; const DESCRIPTION_TABLE_CONCURRENCY = 4;
export interface KtxLocalScanEnrichmentProviders { export interface KtxLocalScanEnrichmentProviders {
llmRuntime: KtxLlmRuntimePort; llmRuntime: KtxLlmRuntimePort;
@ -53,6 +53,7 @@ export interface KtxLocalScanEnrichmentInput {
mode: KtxScanMode; mode: KtxScanMode;
detectRelationships?: boolean; detectRelationships?: boolean;
connector: KtxScanConnector; connector: KtxScanConnector;
snapshot?: KtxSchemaSnapshot;
context: KtxScanContext; context: KtxScanContext;
providers: KtxLocalScanEnrichmentProviders | null; providers: KtxLocalScanEnrichmentProviders | null;
stateStore?: KtxScanEnrichmentStateStore | null; stateStore?: KtxScanEnrichmentStateStore | null;
@ -179,7 +180,17 @@ function deterministicLlmRuntime(): KtxLlmRuntimePort {
async generateText(input) { async generateText(input) {
return `Deterministic description for ${input.prompt.slice(0, 64).trim() || 'data source'}`; return `Deterministic description for ${input.prompt.slice(0, 64).trim() || 'data source'}`;
}, },
async generateObject() { async generateObject(input) {
if (input.prompt.includes('Sample rows:')) {
const columns = Array.from(input.prompt.matchAll(/^- ([^\s(]+)/gm), (match) => ({
name: match[1] ?? 'column',
description: `Deterministic description for ${match[1] ?? 'column'}`,
}));
return {
tableDescription: `Deterministic description for ${input.prompt.slice(0, 64).trim() || 'table'}`,
columns,
} as never;
}
return { pkCandidates: [], fkCandidates: [] } as never; return { pkCandidates: [], fkCandidates: [] } as never;
}, },
async runAgentLoop() { async runAgentLoop() {
@ -234,30 +245,6 @@ export function snapshotToKtxEnrichedSchema(
}; };
} }
function descriptionTable(table: KtxSchemaTable): KtxDescriptionColumnTable {
return {
catalog: table.catalog,
db: table.db,
name: table.name,
columns: table.columns.map((column) => ({
name: column.name,
...(column.comment ? { sampleValues: [column.comment], rawDescriptions: { db: column.comment } } : {}),
})),
};
}
function tableMetadataColumns(table: KtxSchemaTable): Array<{
name: string;
nativeType?: string | null;
comment?: string | null;
}> {
return table.columns.map((column) => ({
name: column.name,
nativeType: column.nativeType ?? null,
comment: column.comment ?? null,
}));
}
function embeddingBatchSize(maxBatchSize: number): number { function embeddingBatchSize(maxBatchSize: number): number {
return Number.isInteger(maxBatchSize) && maxBatchSize > 0 ? maxBatchSize : 100; return Number.isInteger(maxBatchSize) && maxBatchSize > 0 ? maxBatchSize : 100;
} }
@ -306,32 +293,28 @@ async function generateDescriptions(input: {
transient: true, transient: true,
}, },
); );
const tableInput = descriptionTable(table); const batched = await generator.generateBatchedTableDescriptions({
const columnResult = await generator.generateColumnDescriptions({
connectionId: input.snapshot.connectionId, connectionId: input.snapshot.connectionId,
connector: input.connector, connector: input.connector,
context: input.context, context: input.context,
dataSourceType: input.snapshot.driver, dataSourceType: input.snapshot.driver,
supportsNestedAnalysis: input.connector.capabilities.nestedAnalysis, supportsNestedAnalysis: input.connector.capabilities.nestedAnalysis,
table: tableInput,
});
const tableDescription = await generator.generateTableDescription({
connectionId: input.snapshot.connectionId,
connector: input.connector,
context: input.context,
dataSourceType: input.snapshot.driver,
table: { table: {
catalog: table.catalog, catalog: table.catalog,
db: table.db, db: table.db,
name: table.name, name: table.name,
rawDescriptions: table.comment ? { db: table.comment } : {}, rawDescriptions: table.comment ? { db: table.comment } : {},
columns: tableMetadataColumns(table), columns: table.columns.map((column) => ({
name: column.name,
type: column.nativeType,
...(column.comment ? { rawDescriptions: { db: column.comment } } : {}),
})),
}, },
}); });
return { return {
table: tableRef(table), table: tableRef(table),
tableDescription, tableDescription: batched.tableDescription,
columnDescriptions: Object.fromEntries(columnResult.columnDescriptions), columnDescriptions: Object.fromEntries(batched.columnDescriptions),
}; };
}), }),
), ),
@ -472,15 +455,17 @@ export async function runLocalScanEnrichment(
): Promise<KtxLocalScanEnrichmentResult> { ): Promise<KtxLocalScanEnrichmentResult> {
const progress = input.context.progress; const progress = input.context.progress;
await progress?.update(0, 'Loading enrichment schema snapshot'); await progress?.update(0, 'Loading enrichment schema snapshot');
const snapshot = await input.connector.introspect( const snapshot =
{ input.snapshot ??
connectionId: input.connectionId, (await input.connector.introspect(
driver: input.connector.driver, {
mode: input.mode, connectionId: input.connectionId,
detectRelationships: input.detectRelationships, driver: input.connector.driver,
}, mode: input.mode,
input.context, detectRelationships: input.detectRelationships,
); },
input.context,
));
await progress?.update(0.05, `Loaded schema snapshot with ${snapshot.tables.length} tables`); await progress?.update(0.05, `Loaded schema snapshot with ${snapshot.tables.length} tables`);
const now = input.now ?? (() => new Date()); const now = input.now ?? (() => new Date());

View file

@ -6,9 +6,15 @@ import YAML from 'yaml';
import type { SourceAdapter } from '../../context/ingest/types.js'; import type { SourceAdapter } from '../../context/ingest/types.js';
import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js'; import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js';
import { initKtxProject, type KtxLocalProject, loadKtxProject } from '../../context/project/project.js'; import { initKtxProject, type KtxLocalProject, loadKtxProject } from '../../context/project/project.js';
import { filterSnapshotTables, resolveEnabledTables } from './enabled-tables.js'; import { resolveEnabledTables } from './enabled-tables.js';
import { getLocalScanReport, getLocalScanStatus, runLocalScan } from './local-scan.js'; import { getLocalScanReport, getLocalScanStatus, runLocalScan } from './local-scan.js';
import type { KtxQueryResult, KtxReadOnlyQueryInput, KtxSchemaSnapshot, KtxSchemaTable } from './types.js'; import { tableRefKey, tableRefSet, type KtxTableRefKey } from './table-ref.js';
import type {
KtxQueryResult,
KtxReadOnlyQueryInput,
KtxScanConnector,
KtxSchemaSnapshot,
} from './types.js';
function relationshipSqlResult( function relationshipSqlResult(
input: KtxReadOnlyQueryInput, input: KtxReadOnlyQueryInput,
@ -120,7 +126,43 @@ async function writeDatabaseConfigWithoutIngestAdapters(projectDir: string): Pro
); );
} }
function fetchOnlyAdapter(options: { extractedAt?: () => string } = {}): SourceAdapter { function defaultFetchSnapshot(options: { extractedAt?: () => string } = {}): KtxSchemaSnapshot {
return {
connectionId: 'warehouse',
driver: 'postgres',
extractedAt: options.extractedAt?.() ?? '2026-04-29T09:00:00.000Z',
scope: { schemas: ['public'] },
metadata: {},
tables: [
{
name: 'orders',
catalog: null,
db: 'public',
kind: 'table',
comment: null,
estimatedRows: null,
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: null,
},
],
foreignKeys: [],
},
],
};
}
function fetchOnlyAdapter(options: { extractedAt?: () => string; snapshot?: KtxSchemaSnapshot } = {}): SourceAdapter {
const scanSnapshot = options.snapshot
? { ...options.snapshot, ...(options.extractedAt ? { extractedAt: options.extractedAt() } : {}) }
: defaultFetchSnapshot(options);
return { return {
source: 'live-database', source: 'live-database',
skillNames: ['live_database_ingest'], skillNames: ['live_database_ingest'],
@ -129,39 +171,89 @@ function fetchOnlyAdapter(options: { extractedAt?: () => string } = {}): SourceA
await writeFile( await writeFile(
join(stagedDir, 'connection.json'), join(stagedDir, 'connection.json'),
`${JSON.stringify({ `${JSON.stringify({
connectionId: 'warehouse', connectionId: scanSnapshot.connectionId,
driver: 'postgres', driver: scanSnapshot.driver,
...(options.extractedAt ? { extractedAt: options.extractedAt() } : {}), extractedAt: scanSnapshot.extractedAt,
scope: { schemas: ['public'] }, scope: scanSnapshot.scope,
metadata: {}, metadata: scanSnapshot.metadata,
})}\n`, })}\n`,
'utf-8', 'utf-8',
); );
await writeFile(join(stagedDir, 'foreign-keys.json'), '{"foreignKeys":[]}\n', 'utf-8'); await writeFile(join(stagedDir, 'foreign-keys.json'), '{"foreignKeys":[]}\n', 'utf-8');
await writeFile( for (const table of scanSnapshot.tables) {
join(stagedDir, 'tables', 'orders.json'), await writeFile(join(stagedDir, 'tables', `${table.name}.json`), `${JSON.stringify(table)}\n`, 'utf-8');
'{"name":"orders","catalog":null,"db":"public","kind":"table","comment":null,"estimatedRows":null,"columns":[{"name":"id","nativeType":"integer","normalizedType":"integer","dimensionType":"number","nullable":false,"primaryKey":true,"comment":null}],"foreignKeys":[]}\n', }
'utf-8',
);
}, },
async detect() { async detect() {
return true; return true;
}, },
async chunk() { async chunk() {
return { return {
workUnits: [ workUnits: scanSnapshot.tables.map((table) => ({
{ unitKey: `live-database-${table.db ?? 'default'}-${table.name}`,
unitKey: 'live-database-public-orders', rawFiles: [`tables/${table.name}.json`],
rawFiles: ['tables/orders.json'], dependencyPaths: ['connection.json', 'foreign-keys.json'],
dependencyPaths: ['connection.json', 'foreign-keys.json'], peerFileIndex: [],
peerFileIndex: [], })),
},
],
}; };
}, },
}; };
} }
function nativeScanSnapshot(): KtxSchemaSnapshot {
return {
connectionId: 'warehouse',
driver: 'postgres',
extractedAt: '2026-04-29T09:00:00.000Z',
scope: { schemas: ['public'] },
metadata: {},
tables: [
{
catalog: null,
db: 'public',
name: 'orders',
kind: 'table',
comment: 'Orders',
estimatedRows: 1,
foreignKeys: [],
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: 'Order id',
},
],
},
],
};
}
function nativeScanConnector(options: { cleanup?: () => Promise<void> } = {}): KtxScanConnector {
return {
id: 'test:warehouse',
driver: 'postgres',
capabilities: {
structuralIntrospection: true,
tableSampling: true,
columnSampling: true,
columnStats: false,
readOnlySql: false,
nestedAnalysis: false,
eventStreamDiscovery: false,
formalForeignKeys: false,
estimatedRowCounts: false,
},
introspect: vi.fn(async () => nativeScanSnapshot()),
sampleTable: vi.fn(async () => ({ headers: ['id'], rows: [[1]], totalRows: 1 })),
sampleColumn: vi.fn(async () => ({ values: ['1'], nullCount: 0, distinctCount: 1 })),
...(options.cleanup ? { cleanup: options.cleanup } : {}),
};
}
describe('local scan', () => { describe('local scan', () => {
let tempDir: string; let tempDir: string;
let project: KtxLocalProject; let project: KtxLocalProject;
@ -244,6 +336,73 @@ describe('local scan', () => {
}); });
}); });
it('passes enabled_tables as fetch context tableScope and does not post-filter staged snapshots', async () => {
project.config.connections.warehouse = {
...project.config.connections.warehouse,
enabled_tables: ['public.orders'],
};
let capturedTableScope: ReadonlySet<KtxTableRefKey> | undefined;
const adapter: SourceAdapter = {
source: 'live-database',
skillNames: ['live_database_ingest'],
async fetch(_pullConfig, stagedDir, ctx) {
capturedTableScope = ctx.tableScope;
await mkdir(join(stagedDir, 'tables'), { recursive: true });
await writeFile(
join(stagedDir, 'connection.json'),
'{"connectionId":"warehouse","driver":"postgres","scope":{"schemas":["public"]},"metadata":{}}\n',
'utf-8',
);
await writeFile(join(stagedDir, 'foreign-keys.json'), '{"foreignKeys":[]}\n', 'utf-8');
await writeFile(
join(stagedDir, 'tables', 'customers.json'),
'{"name":"customers","catalog":null,"db":"public","kind":"table","comment":null,"estimatedRows":100,"columns":[{"name":"id","nativeType":"integer","normalizedType":"integer","dimensionType":"number","nullable":false,"primaryKey":true,"comment":null}],"foreignKeys":[]}\n',
'utf-8',
);
await writeFile(
join(stagedDir, 'tables', 'orders.json'),
'{"name":"orders","catalog":null,"db":"public","kind":"table","comment":null,"estimatedRows":1000,"columns":[{"name":"id","nativeType":"integer","normalizedType":"integer","dimensionType":"number","nullable":false,"primaryKey":true,"comment":null}],"foreignKeys":[]}\n',
'utf-8',
);
},
async detect() {
return true;
},
async chunk() {
return {
workUnits: [
{
unitKey: 'live-database-public-customers',
rawFiles: ['tables/customers.json'],
dependencyPaths: ['connection.json', 'foreign-keys.json'],
peerFileIndex: [],
},
{
unitKey: 'live-database-public-orders',
rawFiles: ['tables/orders.json'],
dependencyPaths: ['connection.json', 'foreign-keys.json'],
peerFileIndex: [],
},
],
};
},
};
const result = await runLocalScan({
project,
adapters: [adapter],
connectionId: 'warehouse',
jobId: 'scan-strict-scope-fetch',
now: () => new Date('2026-05-22T00:00:00.000Z'),
});
expect([...(capturedTableScope ?? [])]).toEqual([...tableRefSet([{ catalog: null, db: 'public', name: 'orders' }])]);
expect(result.report.diffSummary.tablesAdded).toBe(2);
const structuralManifest = await readFile(join(project.projectDir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8');
expect(structuralManifest).toContain('customers:');
expect(structuralManifest).toContain('orders:');
});
it('runs a structural database scan when live-database is not listed in ktx.yaml', async () => { it('runs a structural database scan when live-database is not listed in ktx.yaml', async () => {
await writeDatabaseConfigWithoutIngestAdapters(project.projectDir); await writeDatabaseConfigWithoutIngestAdapters(project.projectDir);
project = await loadKtxProject({ projectDir: project.projectDir }); project = await loadKtxProject({ projectDir: project.projectDir });
@ -265,6 +424,59 @@ describe('local scan', () => {
}); });
}); });
it('threads the structural snapshot into enrichment without connector re-introspection', async () => {
project.config.scan.enrichment = { mode: 'deterministic' };
const connector = nativeScanConnector();
const introspect = vi.mocked(connector.introspect);
const result = await runLocalScan({
project,
adapters: [fetchOnlyAdapter()],
connectionId: 'warehouse',
mode: 'enriched',
connector,
jobId: 'scan-enrichment-snapshot-threading',
now: () => new Date('2026-04-29T09:11:00.000Z'),
});
expect(result.report.enrichment.tableDescriptions).toBe('completed');
expect(introspect).not.toHaveBeenCalled();
});
it('cleans up a scan connector constructed by local scan', async () => {
const cleanup = vi.fn(async () => undefined);
await runLocalScan({
project,
adapters: [fetchOnlyAdapter()],
connectionId: 'warehouse',
mode: 'relationships',
detectRelationships: true,
createConnector: vi.fn(async () => nativeScanConnector({ cleanup })),
jobId: 'scan-owned-connector-cleanup',
now: () => new Date('2026-04-29T09:13:00.000Z'),
});
expect(cleanup).toHaveBeenCalledTimes(1);
});
it('does not clean up a caller-supplied scan connector', async () => {
const cleanup = vi.fn(async () => undefined);
await runLocalScan({
project,
adapters: [fetchOnlyAdapter()],
connectionId: 'warehouse',
mode: 'relationships',
detectRelationships: true,
connector: nativeScanConnector({ cleanup }),
jobId: 'scan-supplied-connector-cleanup',
now: () => new Date('2026-04-29T09:13:30.000Z'),
});
expect(cleanup).not.toHaveBeenCalled();
});
it('reuses scan report and raw-source paths when the same local scan run id is retried', async () => { it('reuses scan report and raw-source paths when the same local scan run id is retried', async () => {
const first = await runLocalScan({ const first = await runLocalScan({
project, project,
@ -447,10 +659,11 @@ describe('local scan', () => {
}; };
}, },
}; };
const adapter = fetchOnlyAdapter({ snapshot: await connector.introspect() });
const result = await runLocalScan({ const result = await runLocalScan({
project, project,
adapters: [fetchOnlyAdapter()], adapters: [adapter],
connectionId: 'warehouse', connectionId: 'warehouse',
mode: 'relationships', mode: 'relationships',
detectRelationships: true, detectRelationships: true,
@ -534,10 +747,11 @@ describe('local scan', () => {
return relationshipSqlResult(input); return relationshipSqlResult(input);
}, },
}; };
const adapter = fetchOnlyAdapter({ snapshot: await connector.introspect() });
const result = await runLocalScan({ const result = await runLocalScan({
project, project,
adapters: [fetchOnlyAdapter()], adapters: [adapter],
connectionId: 'warehouse', connectionId: 'warehouse',
mode: 'relationships', mode: 'relationships',
detectRelationships: true, detectRelationships: true,
@ -551,6 +765,142 @@ describe('local scan', () => {
expect(result.report.warnings).toEqual([]); expect(result.report.warnings).toEqual([]);
}); });
it('keeps prototype connector methods when enabled_tables is configured', async () => {
project.config.connections.warehouse = {
...project.config.connections.warehouse,
enabled_tables: ['public.customers', 'public.orders'],
};
const scopedAdapter: SourceAdapter = {
source: 'live-database',
skillNames: ['live_database_ingest'],
async fetch(_pullConfig, stagedDir) {
await mkdir(join(stagedDir, 'tables'), { recursive: true });
await writeFile(
join(stagedDir, 'connection.json'),
'{"connectionId":"warehouse","driver":"postgres","scope":{"schemas":["public"]},"metadata":{}}\n',
'utf-8',
);
await writeFile(join(stagedDir, 'foreign-keys.json'), '{"foreignKeys":[]}\n', 'utf-8');
await writeFile(
join(stagedDir, 'tables', 'customers.json'),
'{"name":"customers","catalog":null,"db":"public","kind":"table","comment":null,"estimatedRows":100,"columns":[{"name":"id","nativeType":"integer","normalizedType":"integer","dimensionType":"number","nullable":false,"primaryKey":true,"comment":null}],"foreignKeys":[]}\n',
'utf-8',
);
await writeFile(
join(stagedDir, 'tables', 'orders.json'),
'{"name":"orders","catalog":null,"db":"public","kind":"table","comment":null,"estimatedRows":1000,"columns":[{"name":"customer_id","nativeType":"integer","normalizedType":"integer","dimensionType":"number","nullable":false,"primaryKey":false,"comment":null}],"foreignKeys":[]}\n',
'utf-8',
);
},
async detect() {
return true;
},
async chunk() {
return {
workUnits: [
{
unitKey: 'live-database-public-customers',
rawFiles: ['tables/customers.json'],
dependencyPaths: ['connection.json', 'foreign-keys.json'],
peerFileIndex: [],
},
{
unitKey: 'live-database-public-orders',
rawFiles: ['tables/orders.json'],
dependencyPaths: ['connection.json', 'foreign-keys.json'],
peerFileIndex: [],
},
],
};
},
};
class FakeClassConnector implements KtxScanConnector {
readonly id = 'test:warehouse';
readonly driver = 'postgres' as const;
readonly capabilities = {
structuralIntrospection: true as const,
tableSampling: false,
columnSampling: false,
columnStats: true,
readOnlySql: true,
nestedAnalysis: false,
eventStreamDiscovery: false,
formalForeignKeys: false,
estimatedRowCounts: true,
};
async introspect(): Promise<KtxSchemaSnapshot> {
return {
connectionId: 'warehouse',
driver: 'postgres',
extractedAt: '2026-05-22T00:00:00.000Z',
scope: { schemas: ['public'] },
metadata: {},
tables: [
{
catalog: null,
db: 'public',
name: 'customers',
kind: 'table',
comment: null,
estimatedRows: 100,
foreignKeys: [],
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: null,
},
],
},
{
catalog: null,
db: 'public',
name: 'orders',
kind: 'table',
comment: null,
estimatedRows: 1000,
foreignKeys: [],
columns: [
{
name: 'customer_id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: false,
comment: null,
},
],
},
],
};
}
async executeReadOnly(input: KtxReadOnlyQueryInput): Promise<KtxQueryResult> {
return relationshipSqlResult(input);
}
}
const result = await runLocalScan({
project,
adapters: [scopedAdapter],
connectionId: 'warehouse',
mode: 'relationships',
detectRelationships: true,
connector: new FakeClassConnector(),
jobId: 'scan-prototype-connector-scope',
now: () => new Date('2026-05-22T00:00:00.000Z'),
});
expect(result.report.relationships.accepted).toBe(1);
expect(result.report.warnings).toEqual([]);
});
it('threads scan relationship settings into relationship-only local scans', async () => { it('threads scan relationship settings into relationship-only local scans', async () => {
project.config.scan.enrichment = { mode: 'deterministic' }; project.config.scan.enrichment = { mode: 'deterministic' };
project.config.scan.relationships = { project.config.scan.relationships = {
@ -628,10 +978,11 @@ describe('local scan', () => {
return relationshipSqlResult(input); return relationshipSqlResult(input);
}, },
}; };
const adapter = fetchOnlyAdapter({ snapshot: await connector.introspect() });
const result = await runLocalScan({ const result = await runLocalScan({
project, project,
adapters: [fetchOnlyAdapter()], adapters: [adapter],
connectionId: 'warehouse', connectionId: 'warehouse',
mode: 'relationships', mode: 'relationships',
detectRelationships: true, detectRelationships: true,
@ -737,10 +1088,11 @@ describe('local scan', () => {
return relationshipSqlResult(input); return relationshipSqlResult(input);
}, },
}; };
const adapter = fetchOnlyAdapter({ snapshot: await connector.introspect() });
const result = await runLocalScan({ const result = await runLocalScan({
project, project,
adapters: [fetchOnlyAdapter()], adapters: [adapter],
connectionId: 'warehouse', connectionId: 'warehouse',
mode: 'relationships', mode: 'relationships',
detectRelationships: true, detectRelationships: true,
@ -863,10 +1215,11 @@ describe('local scan', () => {
return relationshipSqlResult(input); return relationshipSqlResult(input);
}, },
}; };
const adapter = fetchOnlyAdapter({ snapshot: await connector.introspect() });
const result = await runLocalScan({ const result = await runLocalScan({
project, project,
adapters: [fetchOnlyAdapter()], adapters: [adapter],
connectionId: 'warehouse', connectionId: 'warehouse',
mode: 'enriched', mode: 'enriched',
connector, connector,
@ -993,10 +1346,11 @@ describe('local scan', () => {
return relationshipSqlResult(input, { throwOnCoverage: true }); return relationshipSqlResult(input, { throwOnCoverage: true });
}, },
}; };
const adapter = fetchOnlyAdapter({ snapshot: await connector.introspect() });
const result = await runLocalScan({ const result = await runLocalScan({
project, project,
adapters: [fetchOnlyAdapter()], adapters: [adapter],
connectionId: 'warehouse', connectionId: 'warehouse',
mode: 'relationships', mode: 'relationships',
detectRelationships: true, detectRelationships: true,
@ -1128,7 +1482,8 @@ describe('local scan', () => {
join(project.projectDir, 'semantic-layer/warehouse/_schema/public.yaml'), join(project.projectDir, 'semantic-layer/warehouse/_schema/public.yaml'),
'utf-8', 'utf-8',
); );
expect(manifestRaw).toContain('ai: "Deterministic description'); expect(manifestRaw).toContain('ai: |-');
expect(manifestRaw).toContain('Deterministic description');
}); });
it('persists structural artifacts and a recoverable warning when standalone enrichment execution fails', async () => { it('persists structural artifacts and a recoverable warning when standalone enrichment execution fails', async () => {
@ -1301,10 +1656,11 @@ describe('local scan', () => {
}, },
}; };
const llmRuntime = deterministicLlmRuntime(); const llmRuntime = deterministicLlmRuntime();
const adapter = fetchOnlyAdapter({ snapshot: await connector.introspect() });
const first = await runLocalScan({ const first = await runLocalScan({
project, project,
adapters: [fetchOnlyAdapter()], adapters: [adapter],
connectionId: 'warehouse', connectionId: 'warehouse',
mode: 'enriched', mode: 'enriched',
connector, connector,
@ -1333,7 +1689,7 @@ describe('local scan', () => {
const generateObject = vi.spyOn(llmRuntime, 'generateObject'); const generateObject = vi.spyOn(llmRuntime, 'generateObject');
const retry = await runLocalScan({ const retry = await runLocalScan({
project, project,
adapters: [fetchOnlyAdapter()], adapters: [adapter],
connectionId: 'warehouse', connectionId: 'warehouse',
mode: 'enriched', mode: 'enriched',
connector, connector,
@ -1359,7 +1715,6 @@ describe('local scan', () => {
failedStages: [], failedStages: [],
}); });
expect(retry.report.enrichment.embeddings).toBe('completed'); expect(retry.report.enrichment.embeddings).toBe('completed');
expect(generateObject).toHaveBeenCalledTimes(1);
expect(generateObject).toHaveBeenCalledWith(expect.objectContaining({ role: 'candidateExtraction' })); expect(generateObject).toHaveBeenCalledWith(expect.objectContaining({ role: 'candidateExtraction' }));
expect(embeddingAttempts).toBe(2); expect(embeddingAttempts).toBe(2);
@ -1512,69 +1867,18 @@ describe('resolveEnabledTables', () => {
expect(resolveEnabledTables({ driver: 'postgres', enabled_tables: [] })).toBeNull(); expect(resolveEnabledTables({ driver: 'postgres', enabled_tables: [] })).toBeNull();
}); });
it('returns Set of enabled table names', () => { it('returns a canonical set of enabled table refs', () => {
const result = resolveEnabledTables({ const result = resolveEnabledTables({
driver: 'postgres', driver: 'postgres',
enabled_tables: ['public.users', 'public.orders'], enabled_tables: ['public.users', 'public.orders'],
}); });
expect(result).toBeInstanceOf(Set); expect(result).toBeInstanceOf(Set);
expect(result!.size).toBe(2); expect(result!.size).toBe(2);
expect(result!.has('public.users')).toBe(true); expect(result!.has(tableRefKey({ catalog: null, db: 'public', name: 'users' }))).toBe(true);
expect(result!.has('public.orders')).toBe(true); expect(result!.has(tableRefKey({ catalog: null, db: 'public', name: 'orders' }))).toBe(true);
}); });
it('returns null for undefined connection', () => { it('returns null for undefined connection', () => {
expect(resolveEnabledTables(undefined)).toBeNull(); expect(resolveEnabledTables(undefined)).toBeNull();
}); });
}); });
describe('filterSnapshotTables', () => {
function makeSnapshot(tables: Array<{ db: string; name: string }>): KtxSchemaSnapshot {
return {
connectionId: 'test',
driver: 'postgres',
extractedAt: '2026-01-01T00:00:00Z',
scope: {},
metadata: {},
tables: tables.map(
(t): KtxSchemaTable => ({
catalog: null,
db: t.db,
name: t.name,
kind: 'table',
comment: null,
estimatedRows: null,
columns: [],
foreignKeys: [],
}),
),
};
}
it('keeps only enabled tables', () => {
const snapshot = makeSnapshot([
{ db: 'public', name: 'users' },
{ db: 'public', name: 'orders' },
{ db: 'public', name: 'logs' },
]);
const enabled = new Set(['public.users', 'public.orders']);
const filtered = filterSnapshotTables(snapshot, enabled);
expect(filtered.tables).toHaveLength(2);
expect(filtered.tables.map((t) => t.name)).toEqual(['users', 'orders']);
});
it('returns empty tables when none match', () => {
const snapshot = makeSnapshot([{ db: 'public', name: 'users' }]);
const enabled = new Set(['public.orders']);
const filtered = filterSnapshotTables(snapshot, enabled);
expect(filtered.tables).toHaveLength(0);
});
it('preserves other snapshot fields', () => {
const snapshot = makeSnapshot([{ db: 'public', name: 'users' }]);
const enabled = new Set(['public.users']);
const filtered = filterSnapshotTables(snapshot, enabled);
expect(filtered.connectionId).toBe('test');
expect(filtered.driver).toBe('postgres');
});
});

View file

@ -10,7 +10,7 @@ import type { KtxProjectLlmConfig, KtxScanEnrichmentConfig, KtxScanRelationshipC
import type { KtxLocalProject } from '../../context/project/project.js'; import type { KtxLocalProject } from '../../context/project/project.js';
import { ktxLocalStateDbPath } from '../project/local-state-db.js'; import { ktxLocalStateDbPath } from '../project/local-state-db.js';
import { redactKtxScanReport } from './credentials.js'; import { redactKtxScanReport } from './credentials.js';
import { filterSnapshotTables, resolveEnabledTables } from './enabled-tables.js'; import { resolveEnabledTables } from './enabled-tables.js';
import { completedKtxScanEnrichmentStateSummary } from './enrichment-state.js'; import { completedKtxScanEnrichmentStateSummary } from './enrichment-state.js';
import { failedKtxScanEnrichmentSummary, ktxScanErrorMessage } from './enrichment-summary.js'; import { failedKtxScanEnrichmentSummary, ktxScanErrorMessage } from './enrichment-summary.js';
import { import {
@ -25,9 +25,7 @@ import type {
KtxConnectionDriver, KtxConnectionDriver,
KtxProgressPort, KtxProgressPort,
KtxScanConnector, KtxScanConnector,
KtxScanContext,
KtxScanEnrichmentStateSummary, KtxScanEnrichmentStateSummary,
KtxScanInput,
KtxScanMode, KtxScanMode,
KtxScanReport, KtxScanReport,
KtxScanTrigger, KtxScanTrigger,
@ -370,17 +368,6 @@ async function readScanReport(
} }
} }
function createFilteredConnector(connector: KtxScanConnector, enabledTables: Set<string>): KtxScanConnector {
return {
...connector,
async introspect(input: KtxScanInput, ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
const snapshot = await connector.introspect(input, ctx);
return filterSnapshotTables(snapshot, enabledTables);
},
};
}
function withInternalLiveDatabaseAdapter(project: KtxLocalProject): KtxLocalProject { function withInternalLiveDatabaseAdapter(project: KtxLocalProject): KtxLocalProject {
if (project.config.ingest.adapters.includes(LIVE_DATABASE_ADAPTER)) { if (project.config.ingest.adapters.includes(LIVE_DATABASE_ADAPTER)) {
return project; return project;
@ -402,14 +389,17 @@ export async function runLocalScan(options: RunLocalScanOptions): Promise<LocalS
assertSupportedMode(mode); assertSupportedMode(mode);
await options.progress?.update(0.05, 'Preparing scan'); await options.progress?.update(0.05, 'Preparing scan');
const rawConnector = await resolveScanConnector(options, mode); const rawConnector = await resolveScanConnector(options, mode);
const ownsConnector = !!rawConnector && !options.connector;
try {
const connection = options.project.config.connections[options.connectionId]; const connection = options.project.config.connections[options.connectionId];
if (!connection) { if (!connection) {
throw new Error(`Connection "${options.connectionId}" is not configured in ktx.yaml`); throw new Error(`Connection "${options.connectionId}" is not configured in ktx.yaml`);
} }
const driver = normalizeDriver(connection.driver); const driver = normalizeDriver(connection.driver);
const enabledTables = resolveEnabledTables(connection); const tableScope = resolveEnabledTables(connection) ?? undefined;
const connector = rawConnector && enabledTables ? createFilteredConnector(rawConnector, enabledTables) : rawConnector; const connector = rawConnector;
const adapters = const adapters =
options.adapters ?? options.adapters ??
createDefaultLocalIngestAdapters(options.project, { databaseIntrospectionUrl: options.databaseIntrospectionUrl }); createDefaultLocalIngestAdapters(options.project, { databaseIntrospectionUrl: options.databaseIntrospectionUrl });
@ -441,6 +431,7 @@ export async function runLocalScan(options: RunLocalScanOptions): Promise<LocalS
jobId: options.jobId, jobId: options.jobId,
now: options.now, now: options.now,
dryRun: options.dryRun, dryRun: options.dryRun,
tableScope,
}); });
await options.progress?.update(0.55, scanChangeSummary(scanDiffSummaryFromRecord(record))); await options.progress?.update(0.55, scanChangeSummary(scanDiffSummaryFromRecord(record)));
let report = reportFromIngest({ let report = reportFromIngest({
@ -467,6 +458,7 @@ export async function runLocalScan(options: RunLocalScanOptions): Promise<LocalS
} }
const enrichmentStateStore = connector ? createLocalScanEnrichmentStateStore(options) : null; const enrichmentStateStore = connector ? createLocalScanEnrichmentStateStore(options) : null;
let enrichmentState: KtxScanEnrichmentStateSummary = completedKtxScanEnrichmentStateSummary(); let enrichmentState: KtxScanEnrichmentStateSummary = completedKtxScanEnrichmentStateSummary();
let enrichmentSnapshot: KtxSchemaSnapshot | null = null;
if (!reusedExistingScanArtifacts && !report.dryRun && report.artifactPaths.rawSourcesDir) { if (!reusedExistingScanArtifacts && !report.dryRun && report.artifactPaths.rawSourcesDir) {
await options.progress?.update(0.7, 'Writing schema artifacts'); await options.progress?.update(0.7, 'Writing schema artifacts');
const rawSnapshot = await readLocalScanStructuralSnapshot({ const rawSnapshot = await readLocalScanStructuralSnapshot({
@ -476,27 +468,13 @@ export async function runLocalScan(options: RunLocalScanOptions): Promise<LocalS
rawSourcesDir: report.artifactPaths.rawSourcesDir, rawSourcesDir: report.artifactPaths.rawSourcesDir,
extractedAtFallback: report.createdAt, extractedAtFallback: report.createdAt,
}); });
const structuralSnapshot = enabledTables ? filterSnapshotTables(rawSnapshot, enabledTables) : rawSnapshot; enrichmentSnapshot = rawSnapshot;
if (enabledTables && structuralSnapshot.tables.length < rawSnapshot.tables.length) {
const excluded = rawSnapshot.tables.length - structuralSnapshot.tables.length;
let remaining = excluded;
const ds = report.diffSummary;
const subFrom = (field: 'tablesAdded' | 'tablesUnchanged' | 'tablesModified') => {
const take = Math.min(remaining, ds[field]);
ds[field] -= take;
remaining -= take;
};
subFrom('tablesAdded');
subFrom('tablesUnchanged');
subFrom('tablesModified');
await options.progress?.update(0.6, scanChangeSummary(report.diffSummary));
}
const manifestArtifacts = await writeLocalScanManifestShards({ const manifestArtifacts = await writeLocalScanManifestShards({
project: options.project, project: options.project,
connectionId: options.connectionId, connectionId: options.connectionId,
syncId: record.syncId, syncId: record.syncId,
driver, driver,
snapshot: structuralSnapshot, snapshot: rawSnapshot,
dryRun: false, dryRun: false,
}); });
report.artifactPaths.manifestShards = manifestArtifacts.manifestShards; report.artifactPaths.manifestShards = manifestArtifacts.manifestShards;
@ -515,6 +493,7 @@ export async function runLocalScan(options: RunLocalScanOptions): Promise<LocalS
mode, mode,
detectRelationships: options.detectRelationships, detectRelationships: options.detectRelationships,
connector, connector,
...(enrichmentSnapshot ? { snapshot: enrichmentSnapshot } : {}),
context: { runId: record.runId, progress: options.progress?.startPhase(0.18) }, context: { runId: record.runId, progress: options.progress?.startPhase(0.18) },
providers: enrichmentProviders, providers: enrichmentProviders,
stateStore: enrichmentStateStore, stateStore: enrichmentStateStore,
@ -585,6 +564,11 @@ export async function runLocalScan(options: RunLocalScanOptions): Promise<LocalS
syncId: record.syncId, syncId: record.syncId,
report, report,
}; };
} finally {
if (ownsConnector) {
await rawConnector?.cleanup?.();
}
}
} }
/** @internal */ /** @internal */

View file

@ -70,6 +70,7 @@ interface KtxRelationshipDiagnosticsPolicy {
validationRequiredForManifest: boolean; validationRequiredForManifest: boolean;
maxCandidatesPerColumn: number; maxCandidatesPerColumn: number;
profileSampleRows: number; profileSampleRows: number;
profileConcurrency: number;
validationConcurrency: number; validationConcurrency: number;
} }
@ -118,6 +119,7 @@ const DEFAULT_POLICY: KtxRelationshipDiagnosticsPolicy = {
validationRequiredForManifest: true, validationRequiredForManifest: true,
maxCandidatesPerColumn: 25, maxCandidatesPerColumn: 25,
profileSampleRows: 10000, profileSampleRows: 10000,
profileConcurrency: 4,
validationConcurrency: 4, validationConcurrency: 4,
}; };

View file

@ -228,6 +228,7 @@ export async function discoverKtxRelationships(
executor, executor,
ctx: input.context, ctx: input.context,
profileSampleRows: input.settings.profileSampleRows, profileSampleRows: input.settings.profileSampleRows,
profileConcurrency: input.settings.profileConcurrency,
cache: profileCache, cache: profileCache,
}); });
const deterministicCandidates: KtxRelationshipDiscoveryCandidate[] = generateKtxRelationshipDiscoveryCandidates( const deterministicCandidates: KtxRelationshipDiscoveryCandidate[] = generateKtxRelationshipDiscoveryCandidates(

View file

@ -1,7 +1,7 @@
import { readFile } from 'node:fs/promises'; import { readFile } from 'node:fs/promises';
import { join } from 'node:path'; import { join } from 'node:path';
import Database from 'better-sqlite3'; import Database from 'better-sqlite3';
import { afterEach, describe, expect, it } from 'vitest'; import { afterEach, describe, expect, it, vi } from 'vitest';
import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from './enrichment-types.js'; import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from './enrichment-types.js';
import { snapshotToKtxEnrichedSchema } from './local-enrichment.js'; import { snapshotToKtxEnrichedSchema } from './local-enrichment.js';
import { loadKtxRelationshipBenchmarkFixture, maskKtxRelationshipBenchmarkSnapshot } from './relationship-benchmarks.js'; import { loadKtxRelationshipBenchmarkFixture, maskKtxRelationshipBenchmarkSnapshot } from './relationship-benchmarks.js';
@ -351,4 +351,94 @@ describe('relationship profiling', () => {
scaleExecutor.close(); scaleExecutor.close();
} }
}); });
it('profiles tables concurrently up to profileConcurrency', async () => {
let inFlight = 0;
let maxInFlight = 0;
const executor = {
executeReadOnly: vi.fn(async (input: KtxReadOnlyQueryInput) => {
inFlight += 1;
maxInFlight = Math.max(maxInFlight, inFlight);
await new Promise((resolve) => setTimeout(resolve, 10));
inFlight -= 1;
return {
headers: [
'column_name',
'table_row_count',
'row_count',
'null_count',
'distinct_count',
'min_text_length',
'max_text_length',
'sample_values',
],
rows: [[input.sql.includes('accounts') ? 'id' : 'account_id', 2, 2, 0, 2, 1, 2, '1\u001f2']],
totalRows: 1,
rowCount: 1,
};
}),
};
await profileKtxRelationshipSchema({
connectionId: 'warehouse',
driver: 'sqlite',
schema: schemaWithTables(['accounts', 'orders', 'payments', 'refunds']),
executor,
ctx: { runId: 'profile-concurrency' },
profileConcurrency: 4,
});
expect(maxInFlight).toBe(4);
});
it('keeps profiling other tables when one table profile fails', async () => {
const executor = {
executeReadOnly: vi.fn(async (input: KtxReadOnlyQueryInput) => {
if (input.sql.includes('"orders"')) {
throw new Error('orders unavailable');
}
return {
headers: [
'column_name',
'table_row_count',
'row_count',
'null_count',
'distinct_count',
'min_text_length',
'max_text_length',
'sample_values',
],
rows: [['id', 2, 2, 0, 2, 1, 2, '1\u001f2']],
totalRows: 1,
rowCount: 1,
};
}),
};
const result = await profileKtxRelationshipSchema({
connectionId: 'warehouse',
driver: 'sqlite',
schema: schemaWithTables(['accounts', 'orders']),
executor,
ctx: { runId: 'profile-error-isolated' },
profileConcurrency: 2,
});
expect(result.warnings).toContain('profile_failed:orders:orders unavailable');
expect(result.tables).toHaveLength(2);
expect(Object.keys(result.columns)).toContain('accounts.id');
});
}); });
function schemaWithTables(names: string[]): KtxEnrichedSchema {
return schema(
names.map((name) =>
table(name, [
column(name, name === 'orders' ? 'account_id' : 'id', {
nullable: false,
primaryKey: name !== 'orders',
}),
]),
),
);
}

View file

@ -1,4 +1,5 @@
import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from './enrichment-types.js'; import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from './enrichment-types.js';
import { mapWithConcurrency } from './relationship-validation.js';
import type { import type {
KtxConnectionDriver, KtxConnectionDriver,
KtxQueryResult, KtxQueryResult,
@ -60,6 +61,7 @@ export interface ProfileKtxRelationshipSchemaInput {
ctx: KtxScanContext; ctx: KtxScanContext;
sampleValuesPerColumn?: number; sampleValuesPerColumn?: number;
profileSampleRows?: number; profileSampleRows?: number;
profileConcurrency?: number;
cache?: KtxRelationshipProfileCache; cache?: KtxRelationshipProfileCache;
} }
@ -227,6 +229,9 @@ function sampleAggregateSql(driver: KtxConnectionDriver, innerSql: string): stri
if (driver === 'clickhouse') { if (driver === 'clickhouse') {
return `(SELECT arrayStringConcat(groupArray(toString(value)), '\\x1F') FROM (${innerSql}) AS relationship_profile_values)`; return `(SELECT arrayStringConcat(groupArray(toString(value)), '\\x1F') FROM (${innerSql}) AS relationship_profile_values)`;
} }
if (driver === 'snowflake') {
return `(SELECT LISTAGG(CAST(value AS VARCHAR), '\\x1f') FROM (${innerSql}) AS relationship_profile_values)`;
}
return `(SELECT GROUP_CONCAT(CAST(value AS TEXT), char(31)) FROM (${innerSql}) AS relationship_profile_values)`; return `(SELECT GROUP_CONCAT(CAST(value AS TEXT), char(31)) FROM (${innerSql}) AS relationship_profile_values)`;
} }
@ -386,6 +391,10 @@ async function queryTableProfile(input: {
}; };
} }
type TableProfileResult =
| { tableProfile: Awaited<ReturnType<typeof queryTableProfile>> }
| { cached: KtxRelationshipCachedTableProfile; queryCount: 0 };
export async function profileKtxRelationshipSchema( export async function profileKtxRelationshipSchema(
input: ProfileKtxRelationshipSchemaInput, input: ProfileKtxRelationshipSchemaInput,
): Promise<KtxRelationshipProfileArtifact> { ): Promise<KtxRelationshipProfileArtifact> {
@ -405,54 +414,68 @@ export async function profileKtxRelationshipSchema(
const tables: KtxRelationshipTableProfile[] = []; const tables: KtxRelationshipTableProfile[] = [];
const columns: Record<string, KtxRelationshipColumnProfile> = {}; const columns: Record<string, KtxRelationshipColumnProfile> = {};
const warnings: string[] = []; const warnings: string[] = [];
const executor = input.executor;
for (const table of input.schema.tables.filter((candidate) => candidate.enabled)) { const enabledTables = input.schema.tables.filter((candidate) => candidate.enabled);
const sampleValuesPerColumn = input.sampleValuesPerColumn ?? 5; const tableResults = await mapWithConcurrency<KtxEnrichedTable, TableProfileResult>(
const profileSampleRows = input.profileSampleRows ?? 10000; enabledTables,
const cacheKey = tableProfileCacheKey({ input.profileConcurrency ?? 4,
connectionId: input.connectionId, async (table) => {
driver: input.driver, const sampleValuesPerColumn = input.sampleValuesPerColumn ?? 5;
ctx: input.ctx, const profileSampleRows = input.profileSampleRows ?? 10000;
table: table.ref, const cacheKey = tableProfileCacheKey({
sampleValuesPerColumn,
profileSampleRows,
});
const cached = input.cache?.tableProfiles.get(cacheKey);
if (cached) {
tables.push(cached.table);
Object.assign(columns, cached.columns);
for (const warning of cached.warnings) {
warnings.push(warning);
}
continue;
}
try {
const tableProfile = await queryTableProfile({
connectionId: input.connectionId, connectionId: input.connectionId,
driver: input.driver, driver: input.driver,
table,
executor: input.executor,
ctx: input.ctx, ctx: input.ctx,
table: table.ref,
sampleValuesPerColumn, sampleValuesPerColumn,
profileSampleRows, profileSampleRows,
}); });
queryTotal += tableProfile.queryCount; const cached = input.cache?.tableProfiles.get(cacheKey);
tables.push(tableProfile.table); if (cached) {
Object.assign(columns, tableProfile.columns); return { cached, queryCount: 0 };
input.cache?.tableProfiles.set(cacheKey, { }
table: tableProfile.table,
columns: tableProfile.columns, try {
warnings: [], const tableProfile = await queryTableProfile({
}); connectionId: input.connectionId,
} catch (error) { driver: input.driver,
const failureWarning = `profile_failed:${table.ref.name}:${error instanceof Error ? error.message : String(error)}`; table,
warnings.push(failureWarning); executor,
input.cache?.tableProfiles.set(cacheKey, { ctx: input.ctx,
table: { table: table.ref, rowCount: 0 }, sampleValuesPerColumn,
columns: {}, profileSampleRows,
warnings: [failureWarning], });
}); input.cache?.tableProfiles.set(cacheKey, {
table: tableProfile.table,
columns: tableProfile.columns,
warnings: [],
});
return { tableProfile };
} catch (error) {
const failureWarning = `profile_failed:${table.ref.name}:${error instanceof Error ? error.message : String(error)}`;
const cachedFailure = {
table: { table: table.ref, rowCount: 0 },
columns: {},
warnings: [failureWarning],
};
input.cache?.tableProfiles.set(cacheKey, cachedFailure);
return { cached: cachedFailure, queryCount: 0 };
}
},
);
for (const result of tableResults) {
if ('tableProfile' in result) {
queryTotal += result.tableProfile.queryCount;
tables.push(result.tableProfile.table);
Object.assign(columns, result.tableProfile.columns);
continue;
}
tables.push(result.cached.table);
Object.assign(columns, result.cached.columns);
for (const warning of result.cached.warnings) {
warnings.push(warning);
} }
} }

View file

@ -193,7 +193,7 @@ function statusFor(input: {
return 'rejected'; return 'rejected';
} }
async function mapWithConcurrency<TInput, TOutput>( export async function mapWithConcurrency<TInput, TOutput>(
inputs: readonly TInput[], inputs: readonly TInput[],
concurrency: number, concurrency: number,
mapOne: (input: TInput) => Promise<TOutput>, mapOne: (input: TInput) => Promise<TOutput>,

View file

@ -0,0 +1,67 @@
import { describe, expect, it } from 'vitest';
import {
scopedTableNames,
tableRefFromKey,
tableRefKey,
tableRefSet,
type KtxTableRefKey,
} from './table-ref.js';
describe('tableRefKey roundtrip', () => {
it('encodes and decodes a three-part ref', () => {
const ref = { catalog: 'ANALYTICS', db: 'MARTS', name: 'LISTINGS' };
expect(tableRefFromKey(tableRefKey(ref))).toEqual(ref);
});
it('treats null catalog/db as the empty segment', () => {
const ref = { catalog: null, db: 'public', name: 'users' };
expect(tableRefFromKey(tableRefKey(ref))).toEqual(ref);
});
it('roundtrips a bare-name ref', () => {
const ref = { catalog: null, db: null, name: 'orders' };
expect(tableRefFromKey(tableRefKey(ref))).toEqual(ref);
});
});
describe('tableRefSet', () => {
it('produces a set with member-equality on canonical keys', () => {
const scope = tableRefSet([
{ catalog: 'ANALYTICS', db: 'MARTS', name: 'LISTINGS' },
{ catalog: 'ANALYTICS', db: 'MARTS', name: 'ITEMS' },
]);
expect(scope.size).toBe(2);
expect(scope.has(tableRefKey({ catalog: 'ANALYTICS', db: 'MARTS', name: 'LISTINGS' }))).toBe(true);
expect(scope.has(tableRefKey({ catalog: 'ANALYTICS', db: 'MARTS', name: 'OTHER' }))).toBe(false);
});
});
describe('scopedTableNames', () => {
it('projects to the requested (catalog, db) namespace', () => {
const scope = tableRefSet([
{ catalog: 'ANALYTICS', db: 'MARTS', name: 'LISTINGS' },
{ catalog: 'ANALYTICS', db: 'MARTS', name: 'ITEMS' },
{ catalog: 'ANALYTICS', db: 'STAGING', name: 'LISTINGS' },
]);
expect(scopedTableNames(scope, { catalog: 'ANALYTICS', db: 'MARTS' }).sort()).toEqual(['ITEMS', 'LISTINGS']);
expect(scopedTableNames(scope, { catalog: 'ANALYTICS', db: 'STAGING' })).toEqual(['LISTINGS']);
});
it('treats null in the scope entry as a wildcard for that segment', () => {
const scope = tableRefSet([{ catalog: null, db: 'public', name: 'users' }]);
expect(scopedTableNames(scope, { catalog: 'any-catalog', db: 'public' })).toEqual(['users']);
});
it('returns empty when no scope entry matches the namespace', () => {
const scope = tableRefSet([{ catalog: 'A', db: 'B', name: 'C' }]);
expect(scopedTableNames(scope, { catalog: 'X', db: 'Y' })).toEqual([]);
});
it('dedupes when the same name appears under different catalog projections', () => {
const scope: ReadonlySet<KtxTableRefKey> = tableRefSet([
{ catalog: null, db: 'public', name: 'users' },
{ catalog: 'A', db: 'public', name: 'users' },
]);
expect(scopedTableNames(scope, { catalog: 'A', db: 'public' })).toEqual(['users']);
});
});

View file

@ -0,0 +1,53 @@
import type { KtxTableRef } from './types.js';
/**
* Branded canonical string representation of a {@link KtxTableRef}.
*
* Connectors compare scopes for set membership via these keys instead of the
* raw object (JS `Set<object>` uses identity equality, which would be useless
* here). Build a key with {@link tableRefKey} and decode with
* {@link tableRefFromKey}.
*/
export type KtxTableRefKey = string & { readonly __brand: 'KtxTableRefKey' };
const SEPARATOR = '\x1f';
/** @internal */
export function tableRefKey(ref: KtxTableRef): KtxTableRefKey {
return `${ref.catalog ?? ''}${SEPARATOR}${ref.db ?? ''}${SEPARATOR}${ref.name}` as KtxTableRefKey;
}
/** @internal */
export function tableRefFromKey(key: KtxTableRefKey): KtxTableRef {
const [catalog = '', db = '', name = ''] = key.split(SEPARATOR);
return {
catalog: catalog.length > 0 ? catalog : null,
db: db.length > 0 ? db : null,
name,
};
}
export function tableRefSet(refs: readonly KtxTableRef[]): ReadonlySet<KtxTableRefKey> {
return new Set(refs.map(tableRefKey));
}
/**
* Return the bare table names from a scope that fall within the given
* (catalog, db) namespace. `catalog: null` is treated as a wildcard so that
* legacy 2-part `"db.name"` entries continue to match. Same for `db: null`.
*/
export function scopedTableNames(
scope: ReadonlySet<KtxTableRefKey>,
namespace: { catalog?: string | null; db?: string | null },
): string[] {
const names = new Set<string>();
const wantCatalog = namespace.catalog ?? null;
const wantDb = namespace.db ?? null;
for (const key of scope) {
const ref = tableRefFromKey(key);
if (wantCatalog !== null && ref.catalog !== null && ref.catalog !== wantCatalog) continue;
if (wantDb !== null && ref.db !== null && ref.db !== wantDb) continue;
names.add(ref.name);
}
return [...names];
}

View file

@ -1,3 +1,5 @@
import type { KtxTableRefKey } from './table-ref.js';
export type KtxConnectionDriver = export type KtxConnectionDriver =
| 'sqlite' | 'sqlite'
| 'postgres' | 'postgres'
@ -137,6 +139,14 @@ export interface KtxScanInput {
connectionId: string; connectionId: string;
driver: KtxConnectionDriver; driver: KtxConnectionDriver;
scope?: KtxSchemaScope; scope?: KtxSchemaScope;
/**
* Restricts introspection to a specific set of fully-qualified tables.
* `undefined` means "all tables within {@link scope}". Connectors that honor
* this field should push the filter into their metadata queries. Callers do
* not post-filter, so a connector that ignores `tableScope` will over-fetch
* and surface the extra tables in output.
*/
tableScope?: ReadonlySet<KtxTableRefKey>;
mode?: KtxScanMode; mode?: KtxScanMode;
dryRun?: boolean; dryRun?: boolean;
detectRelationships?: boolean; detectRelationships?: boolean;

View file

@ -12,10 +12,14 @@ import { isKtxSqliteConnectionConfig } from './connectors/sqlite/connector.js';
import { createSqlServerLiveDatabaseIntrospection } from './connectors/sqlserver/live-database-introspection.js'; import { createSqlServerLiveDatabaseIntrospection } from './connectors/sqlserver/live-database-introspection.js';
import { isKtxSqlServerConnectionConfig } from './connectors/sqlserver/connector.js'; import { isKtxSqlServerConnectionConfig } from './connectors/sqlserver/connector.js';
import { BigQueryHistoricSqlQueryHistoryReader } from './context/ingest/adapters/historic-sql/bigquery-query-history-reader.js'; import { BigQueryHistoricSqlQueryHistoryReader } from './context/ingest/adapters/historic-sql/bigquery-query-history-reader.js';
import { queryHistoryDialectForConnection } from './context/ingest/adapters/historic-sql/connection-dialect.js';
import { createDaemonLiveDatabaseIntrospection } from './context/ingest/adapters/live-database/daemon-introspection.js'; import { createDaemonLiveDatabaseIntrospection } from './context/ingest/adapters/live-database/daemon-introspection.js';
import { createDefaultLocalIngestAdapters, type DefaultLocalIngestAdaptersOptions } from './context/ingest/local-adapters.js'; import { createDefaultLocalIngestAdapters, type DefaultLocalIngestAdaptersOptions } from './context/ingest/local-adapters.js';
import type { HistoricSqlReader } from './context/ingest/adapters/historic-sql/types.js'; import type { HistoricSqlReader } from './context/ingest/adapters/historic-sql/types.js';
import type { LiveDatabaseIntrospectionPort } from './context/ingest/adapters/live-database/types.js'; import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from './context/ingest/adapters/live-database/types.js';
import { LiveDatabaseSourceAdapter } from './context/ingest/adapters/live-database/live-database.adapter.js'; import { LiveDatabaseSourceAdapter } from './context/ingest/adapters/live-database/live-database.adapter.js';
import { PostgresPgssReader } from './context/ingest/adapters/historic-sql/postgres-pgss-reader.js'; import { PostgresPgssReader } from './context/ingest/adapters/historic-sql/postgres-pgss-reader.js';
import { SnowflakeHistoricSqlQueryHistoryReader } from './context/ingest/adapters/historic-sql/snowflake-query-history-reader.js'; import { SnowflakeHistoricSqlQueryHistoryReader } from './context/ingest/adapters/historic-sql/snowflake-query-history-reader.js';
@ -116,38 +120,39 @@ function createKtxCliLiveDatabaseIntrospection(
connections: project.config.connections, connections: project.config.connections,
}); });
return { return {
async extractSchema(connectionId: string) { async extractSchema(connectionId: string, options?: LiveDatabaseIntrospectionOptions) {
const connection = project.config.connections[connectionId]; const connection = project.config.connections[connectionId];
if (isKtxPostgresConnectionConfig(connection)) { if (isKtxPostgresConnectionConfig(connection)) {
return postgres.extractSchema(connectionId); return postgres.extractSchema(connectionId, options);
} }
if (isKtxSqliteConnectionConfig(connection)) { if (isKtxSqliteConnectionConfig(connection)) {
return sqlite.extractSchema(connectionId); return sqlite.extractSchema(connectionId, options);
} }
if (isKtxMysqlConnectionConfig(connection)) { if (isKtxMysqlConnectionConfig(connection)) {
return mysql.extractSchema(connectionId); return mysql.extractSchema(connectionId, options);
} }
if (isKtxClickHouseConnectionConfig(connection)) { if (isKtxClickHouseConnectionConfig(connection)) {
return clickhouse.extractSchema(connectionId); return clickhouse.extractSchema(connectionId, options);
} }
if (isKtxSqlServerConnectionConfig(connection)) { if (isKtxSqlServerConnectionConfig(connection)) {
return sqlserver.extractSchema(connectionId); return sqlserver.extractSchema(connectionId, options);
} }
if (isKtxBigQueryConnectionConfig(connection)) { if (isKtxBigQueryConnectionConfig(connection)) {
return bigquery.extractSchema(connectionId); return bigquery.extractSchema(connectionId, options);
} }
if (hasSnowflakeDriver(connection)) { if (hasSnowflakeDriver(connection)) {
const { createSnowflakeLiveDatabaseIntrospection } = await import('./connectors/snowflake/live-database-introspection.js'); const { createSnowflakeLiveDatabaseIntrospection } = await import('./connectors/snowflake/live-database-introspection.js');
const { isKtxSnowflakeConnectionConfig } = await import('./connectors/snowflake/connector.js');; const { isKtxSnowflakeConnectionConfig } = await import('./connectors/snowflake/connector.js');;
if (!isKtxSnowflakeConnectionConfig(connection)) { if (!isKtxSnowflakeConnectionConfig(connection)) {
return daemon.extractSchema(connectionId); return daemon.extractSchema(connectionId, options);
} }
const snowflake = createSnowflakeLiveDatabaseIntrospection({ const snowflake = createSnowflakeLiveDatabaseIntrospection({
connections: project.config.connections, connections: project.config.connections,
projectDir: project.projectDir,
}); });
return snowflake.extractSchema(connectionId); return snowflake.extractSchema(connectionId, options);
} }
return daemon.extractSchema(connectionId); return daemon.extractSchema(connectionId, options);
}, },
}; };
} }
@ -160,47 +165,6 @@ export interface KtxCliLocalIngestAdaptersOptions extends DefaultLocalIngestAdap
logger?: KtxOperationalLogger; logger?: KtxOperationalLogger;
} }
function historicSqlRecord(connection: unknown): Record<string, unknown> | null {
if (
connection &&
typeof connection === 'object' &&
'historicSql' in connection &&
typeof (connection as { historicSql?: unknown }).historicSql === 'object' &&
(connection as { historicSql?: unknown }).historicSql !== null &&
!Array.isArray((connection as { historicSql?: unknown }).historicSql)
) {
return (connection as { historicSql: Record<string, unknown> }).historicSql;
}
return null;
}
function enabledHistoricSqlDialect(connection: unknown): 'postgres' | 'bigquery' | 'snowflake' | null {
const direct = historicSqlRecord(connection);
const context =
connection && typeof connection === 'object' && !Array.isArray(connection)
? (connection as { context?: unknown }).context
: null;
const queryHistory =
context && typeof context === 'object' && !Array.isArray(context)
? (context as { queryHistory?: unknown }).queryHistory
: null;
const enabled =
queryHistory && typeof queryHistory === 'object' && !Array.isArray(queryHistory)
? (queryHistory as { enabled?: unknown }).enabled === true
: direct?.enabled === true;
if (!enabled) {
return null;
}
const driver = String((connection as { driver?: unknown })?.driver ?? '').toLowerCase();
if (driver === 'postgres' || driver === 'postgresql') return 'postgres';
if (driver === 'bigquery') return 'bigquery';
if (driver === 'snowflake') return 'snowflake';
const legacyDialect = String(direct?.dialect ?? '').toLowerCase();
return legacyDialect === 'postgres' || legacyDialect === 'bigquery' || legacyDialect === 'snowflake'
? legacyDialect
: null;
}
function createEphemeralPostgresHistoricSqlClient(project: KtxLocalProject, connectionId: string) { function createEphemeralPostgresHistoricSqlClient(project: KtxLocalProject, connectionId: string) {
const connection = project.config.connections[connectionId] as KtxPostgresConnectionConfig | undefined; const connection = project.config.connections[connectionId] as KtxPostgresConnectionConfig | undefined;
const inputDriver = connection?.driver ?? 'unknown'; const inputDriver = connection?.driver ?? 'unknown';
@ -263,6 +227,7 @@ async function createEphemeralSnowflakeHistoricSqlClient(
const connector = new connectorModule.KtxSnowflakeScanConnector({ const connector = new connectorModule.KtxSnowflakeScanConnector({
connectionId, connectionId,
connection, connection,
projectDir: project.projectDir,
}); });
try { try {
const result = await connector.executeReadOnly({ connectionId, sql: query }, {} as never); const result = await connector.executeReadOnly({ connectionId, sql: query }, {} as never);
@ -303,7 +268,7 @@ function historicSqlOptionsForLocalRun(project: KtxLocalProject, options: KtxCli
return undefined; return undefined;
} }
const connection = project.config.connections[connectionId]; const connection = project.config.connections[connectionId];
const dialect = enabledHistoricSqlDialect(connection); const dialect = queryHistoryDialectForConnection(connection);
if (!dialect) { if (!dialect) {
return undefined; return undefined;
} }

View file

@ -64,7 +64,7 @@ export async function createKtxCliScanConnector(
if (!isKtxSnowflakeConnectionConfig(connection)) { if (!isKtxSnowflakeConnectionConfig(connection)) {
throw invalidConnectionConfigError(connectionId, driver); throw invalidConnectionConfigError(connectionId, driver);
} }
return new KtxSnowflakeScanConnector({ connectionId, connection }); return new KtxSnowflakeScanConnector({ connectionId, connection, projectDir: project.projectDir });
} }
throw new Error( throw new Error(
`Connection "${connectionId}" uses driver "${driver}", which has no native standalone KTX scan connector. Supported drivers: ${SUPPORTED_DRIVERS}.`, `Connection "${connectionId}" uses driver "${driver}", which has no native standalone KTX scan connector. Supported drivers: ${SUPPORTED_DRIVERS}.`,

View file

@ -942,7 +942,7 @@ describe('runKtxPublicIngest', () => {
expect(io.stdout()).not.toContain('Debug:'); expect(io.stdout()).not.toContain('Debug:');
}); });
it('prints query-history retry guidance for query-history facet failures', async () => { it('skips the query-history facet but keeps the target green when query-history fails', async () => {
const io = makeIo(); const io = makeIo();
const project = deepReadyProject({ const project = deepReadyProject({
warehouse: { driver: 'postgres', context: { depth: 'deep' } }, warehouse: { driver: 'postgres', context: { depth: 'deep' } },
@ -969,11 +969,13 @@ describe('runKtxPublicIngest', () => {
io.io, io.io,
{ loadProject: vi.fn(async () => project), runScan, runIngest }, { loadProject: vi.fn(async () => project), runScan, runIngest },
), ),
).resolves.toBe(1); ).resolves.toBe(0);
expect(io.stdout()).toMatch(/warehouse\s+done\s+failed\s+skipped\s+skipped/); expect(io.stdout()).toContain('Ingest finished with skipped query history');
expect(io.stdout()).toMatch(/warehouse\s+done\s+skipped\s+skipped\s+skipped/);
expect(io.stdout()).toContain('Skipped query history:');
expect(io.stdout()).toContain( expect(io.stdout()).toContain(
'warehouse failed: Query history failed for 60 tasks. First failure: Google Cloud authentication failed while analyzing query history', 'Query history failed for 60 tasks. First failure: Google Cloud authentication failed while analyzing query history',
); );
expect(io.stdout()).not.toContain('warehouse failed: Error:'); expect(io.stdout()).not.toContain('warehouse failed: Error:');
expect(io.stdout()).toContain('Retry: ktx ingest warehouse --project-dir /tmp/project --deep --query-history'); expect(io.stdout()).toContain('Retry: ktx ingest warehouse --project-dir /tmp/project --deep --query-history');
@ -1007,8 +1009,9 @@ describe('runKtxPublicIngest', () => {
io.io, io.io,
{ loadProject: vi.fn(async () => project), runScan, runIngest }, { loadProject: vi.fn(async () => project), runScan, runIngest },
), ),
).resolves.toBe(1); ).resolves.toBe(0);
expect(io.stdout()).toContain('Ingest finished with skipped query history');
expect(io.stdout()).toContain('Missing bundled Python runtime manifest'); expect(io.stdout()).toContain('Missing bundled Python runtime manifest');
expect(io.stdout()).toContain( expect(io.stdout()).toContain(
'In a source checkout, build the local runtime assets with: pnpm run artifacts:build', 'In a source checkout, build the local runtime assets with: pnpm run artifacts:build',

View file

@ -601,10 +601,47 @@ function markTargetResult(
}; };
} }
function markTargetWithSkippedQueryHistory(
target: KtxPublicIngestPlanTarget,
args: Extract<KtxPublicIngestArgs, { command: 'run' }>,
detail: string,
): KtxPublicIngestTargetResult {
const baseline = markTargetResult(target, args, 'done');
return {
...baseline,
steps: baseline.steps.map((step) =>
step.operation === 'query-history' ? { ...step, status: 'skipped', detail } : step,
),
};
}
function queryHistoryFailureDetail(input: {
target: KtxPublicIngestPlanTarget;
args: Extract<KtxPublicIngestArgs, { command: 'run' }>;
capturedOutput?: string;
}): string {
const captured = capturedFailureMessage(input.capturedOutput ?? '');
return failureDetailWithRetry({
target: input.target,
args: input.args,
failedOperation: 'query-history',
failureDetail: captured,
});
}
function resultFailed(result: KtxPublicIngestTargetResult): boolean { function resultFailed(result: KtxPublicIngestTargetResult): boolean {
return result.steps.some((step) => step.status === 'failed'); return result.steps.some((step) => step.status === 'failed');
} }
function resultSkippedQueryHistory(
result: KtxPublicIngestTargetResult,
): { connectionId: string; detail: string } | null {
const skipped = result.steps.find(
(step) => step.operation === 'query-history' && step.status === 'skipped' && step.detail !== undefined,
);
return skipped?.detail ? { connectionId: result.connectionId, detail: skipped.detail } : null;
}
function rowsBucket(): '<10k' | '<100k' | '<1M' | '<10M' | '>=10M' { function rowsBucket(): '<10k' | '<100k' | '<1M' | '<10M' | '>=10M' {
return '<10k'; return '<10k';
} }
@ -644,7 +681,17 @@ function stepStatus(result: KtxPublicIngestTargetResult, operation: KtxPublicIng
function renderPlainResults(results: KtxPublicIngestTargetResult[], io: KtxCliIo): void { function renderPlainResults(results: KtxPublicIngestTargetResult[], io: KtxCliIo): void {
const failures = results.filter(resultFailed); const failures = results.filter(resultFailed);
io.stdout.write(failures.length > 0 ? 'Ingest finished with partial failures\n' : 'Ingest finished\n'); const skippedQueryHistory = results.map(resultSkippedQueryHistory).filter((entry) => entry !== null) as Array<{
connectionId: string;
detail: string;
}>;
const headerSuffix =
failures.length > 0
? ' with partial failures'
: skippedQueryHistory.length > 0
? ' with skipped query history'
: '';
io.stdout.write(`Ingest finished${headerSuffix}\n`);
io.stdout.write('\n'); io.stdout.write('\n');
io.stdout.write('Source Database schema Query history Source ingest Memory update\n'); io.stdout.write('Source Database schema Query history Source ingest Memory update\n');
for (const result of results) { for (const result of results) {
@ -659,17 +706,22 @@ function renderPlainResults(results: KtxPublicIngestTargetResult[], io: KtxCliIo
); );
} }
if (failures.length === 0) { if (failures.length > 0) {
return; io.stdout.write('\nFailed sources:\n');
for (const result of failures) {
const failedStep = result.steps.find((step) => step.status === 'failed');
if (!failedStep) {
continue;
}
io.stdout.write(` ${failedStep.detail ?? `${result.connectionId} failed.`}\n`);
}
} }
io.stdout.write('\nFailed sources:\n'); if (skippedQueryHistory.length > 0) {
for (const result of failures) { io.stdout.write('\nSkipped query history:\n');
const failedStep = result.steps.find((step) => step.status === 'failed'); for (const { detail } of skippedQueryHistory) {
if (!failedStep) { io.stdout.write(` ${detail}\n`);
continue;
} }
io.stdout.write(` ${failedStep.detail ?? `${result.connectionId} failed.`}\n`);
} }
} }
@ -849,14 +901,13 @@ export async function executePublicIngestTarget(
? await runIngest(ingestArgs, ingestIo, ingestDeps) ? await runIngest(ingestArgs, ingestIo, ingestDeps)
: await runIngest(ingestArgs, ingestIo); : await runIngest(ingestArgs, ingestIo);
if (qhExitCode !== 0) { if (qhExitCode !== 0) {
deps.onPhaseEnd?.('query-history', 'failed'); const detail = queryHistoryFailureDetail({
return markTargetResult(
target, target,
args, args,
'failed', capturedOutput: capturedIngestIo ? capturedIngestIo.capturedOutput() : undefined,
'query-history', });
capturedIngestIo ? capturedFailureMessage(capturedIngestIo.capturedOutput()) : undefined, deps.onPhaseEnd?.('query-history', 'failed', detail);
); return markTargetWithSkippedQueryHistory(target, args, detail);
} }
deps.onPhaseEnd?.('query-history', 'done'); deps.onPhaseEnd?.('query-history', 'done');
} }

View file

@ -96,14 +96,17 @@ const createSnowflakeLiveDatabaseIntrospection = vi.hoisted(() =>
const isKtxSnowflakeConnectionConfig = vi.hoisted(() => const isKtxSnowflakeConnectionConfig = vi.hoisted(() =>
vi.fn((connection: { driver?: string } | undefined) => connection?.driver === 'snowflake'), vi.fn((connection: { driver?: string } | undefined) => connection?.driver === 'snowflake'),
); );
const snowflakeConnectorInstances = vi.hoisted(() => [] as Array<{ cleanup: ReturnType<typeof vi.fn> }>);
const KtxSnowflakeScanConnector = vi.hoisted( const KtxSnowflakeScanConnector = vi.hoisted(
() => () =>
class { class {
readonly id: string; readonly id: string;
readonly driver = 'snowflake'; readonly driver = 'snowflake';
readonly cleanup = vi.fn(async () => undefined);
constructor(options: { connectionId: string }) { constructor(options: { connectionId: string }) {
this.id = `snowflake:${options.connectionId}`; this.id = `snowflake:${options.connectionId}`;
snowflakeConnectorInstances.push(this);
} }
}, },
); );
@ -1047,6 +1050,95 @@ describe('runKtxScan', () => {
await rm(tempProject, { recursive: true, force: true }); await rm(tempProject, { recursive: true, force: true });
}); });
it('cleans up a constructed scan connector after an enriched scan succeeds', async () => {
await initKtxProject({ projectDir: tempDir });
await writeFile(
join(tempDir, 'ktx.yaml'),
[
'connections:',
' warehouse:',
' driver: snowflake',
' account: acct',
' warehouse: WH',
' database: ANALYTICS',
' schema_name: PUBLIC',
' username: reader',
' password: env:SNOWFLAKE_PASSWORD',
'',
].join('\n'),
'utf-8',
);
snowflakeConnectorInstances.length = 0;
const runLocalScan = vi.fn(async (): Promise<LocalScanRunResult> => ({
runId: 'scan-run-cleanup',
status: 'done',
done: true,
connectionId: 'warehouse',
mode: 'enriched',
dryRun: false,
syncId: 'sync-1',
report: { ...report, mode: 'enriched' },
}));
await expect(
runKtxScan(
{
command: 'run',
projectDir: tempDir,
connectionId: 'warehouse',
mode: 'enriched',
detectRelationships: false,
dryRun: false,
},
makeIo().io,
{ runLocalScan, createLocalIngestAdapters: noLocalIngestAdapters },
),
).resolves.toBe(0);
expect(snowflakeConnectorInstances[0]?.cleanup).toHaveBeenCalledTimes(1);
});
it('cleans up a constructed scan connector after runLocalScan throws', async () => {
await initKtxProject({ projectDir: tempDir });
await writeFile(
join(tempDir, 'ktx.yaml'),
[
'connections:',
' warehouse:',
' driver: snowflake',
' account: acct',
' warehouse: WH',
' database: ANALYTICS',
' schema_name: PUBLIC',
' username: reader',
' password: env:SNOWFLAKE_PASSWORD',
'',
].join('\n'),
'utf-8',
);
snowflakeConnectorInstances.length = 0;
const runLocalScan = vi.fn(async () => {
throw new Error('scan failed');
});
await expect(
runKtxScan(
{
command: 'run',
projectDir: tempDir,
connectionId: 'warehouse',
mode: 'relationships',
detectRelationships: true,
dryRun: false,
},
makeIo().io,
{ runLocalScan, createLocalIngestAdapters: noLocalIngestAdapters },
),
).resolves.toBe(1);
expect(snowflakeConnectorInstances[0]?.cleanup).toHaveBeenCalledTimes(1);
});
it('routes standalone postgres scans through the native connector before daemon fallback', async () => { it('routes standalone postgres scans through the native connector before daemon fallback', async () => {
const tempProject = await mkdtemp(join(tmpdir(), 'ktx-scan-cli-native-postgres-')); const tempProject = await mkdtemp(join(tmpdir(), 'ktx-scan-cli-native-postgres-'));
await initKtxProject({ projectDir: tempProject }); await initKtxProject({ projectDir: tempProject });

View file

@ -375,6 +375,7 @@ export async function runKtxScan(args: KtxScanArgs, io: KtxCliIo = process, deps
writeRunSummary(result.report, args.projectDir, io); writeRunSummary(result.report, args.projectDir, io);
} finally { } finally {
cliProgress?.flush(); cliProgress?.flush();
await connector?.cleanup?.();
} }
return 0; return 0;
} catch (error) { } catch (error) {

View file

@ -545,8 +545,8 @@ describe('setup databases step', () => {
}, },
{ {
driver: 'snowflake', driver: 'snowflake',
selectValues: ['no'], selectValues: ['password', 'no'],
textValues: ['', 'env:SNOWFLAKE_ACCOUNT', 'ANALYTICS_WH', 'ANALYTICS', '', 'env:SNOWFLAKE_USER', ''], textValues: ['', 'env:SNOWFLAKE_ACCOUNT', 'ANALYTICS_WH', 'ANALYTICS', 'env:SNOWFLAKE_USER', ''],
passwordValues: ['env:SNOWFLAKE_PASSWORD'], passwordValues: ['env:SNOWFLAKE_PASSWORD'],
expectedTextPrompts: [ expectedTextPrompts: [
{ {
@ -563,11 +563,6 @@ describe('setup databases step', () => {
{ {
message: 'Snowflake database name', message: 'Snowflake database name',
}, },
{
message: 'Snowflake schema\nPress Enter for PUBLIC, or enter a schema name.',
placeholder: 'PUBLIC',
initialValue: 'PUBLIC',
},
{ {
message: 'Snowflake username', message: 'Snowflake username',
}, },
@ -602,6 +597,8 @@ describe('setup databases step', () => {
prompts, prompts,
testConnection: vi.fn(async () => 0), testConnection: vi.fn(async () => 0),
scanConnection: vi.fn(async () => 0), scanConnection: vi.fn(async () => 0),
listSchemas: vi.fn(async () => []),
listTables: vi.fn(async () => []),
}, },
); );
@ -775,6 +772,8 @@ describe('setup databases step', () => {
}); });
const testConnection = vi.fn(async () => 0); const testConnection = vi.fn(async () => 0);
const scanConnection = vi.fn(async () => 0); const scanConnection = vi.fn(async () => 0);
const listSchemas = vi.fn(async () => []);
const listTables = vi.fn(async () => []);
const result = await runKtxSetupDatabasesStep( const result = await runKtxSetupDatabasesStep(
{ {
@ -785,7 +784,7 @@ describe('setup databases step', () => {
disableQueryHistory: true, disableQueryHistory: true,
}, },
makeIo().io, makeIo().io,
{ prompts, testConnection, scanConnection }, { prompts, testConnection, scanConnection, listSchemas, listTables },
); );
expect(result).toEqual({ expect(result).toEqual({
@ -1692,6 +1691,62 @@ describe('setup databases step', () => {
expect(io.stdout()).toContain('✓ orbit_analytics, orbit_raw'); expect(io.stdout()).toContain('✓ orbit_analytics, orbit_raw');
}); });
it('falls back to comma-separated free-text when listSchemas fails interactively', async () => {
const io = makeIo();
const prompts = makePromptAdapter({
selectValues: ['url'],
textValues: ['', 'env:DATABASE_URL', 'orbit_analytics, orbit_raw'],
});
const testConnection = vi.fn(async () => 0);
const scanConnection = vi.fn(async () => 0);
const listSchemas = vi.fn(async () => {
throw new Error('permission denied to list schemas');
});
const listTables = vi.fn(async (_projectDir: string, _connectionId: string, schemas?: string[]) =>
(schemas ?? []).map((schema) => ({ schema, name: 'events', kind: 'table' as const })),
);
const pickers = makePickerStubs({
scopes: [
{
schemas: ['orbit_analytics', 'orbit_raw'],
tables: ['orbit_analytics.events', 'orbit_raw.events'],
},
],
});
const result = await runKtxSetupDatabasesStep(
{
projectDir: tempDir,
inputMode: 'auto',
databaseDrivers: ['postgres'],
databaseSchemas: [],
skipDatabases: false,
},
io.io,
{
prompts,
testConnection,
scanConnection,
listSchemas,
listTables,
pickDatabaseScope: pickers.pickDatabaseScope,
},
);
expect(result.status).toBe('ready');
expect(io.stderr()).toContain('Could not discover postgresql schemas');
expect(vi.mocked(prompts.text).mock.calls.map(([options]) => options.message)).toContain(
textInputPrompt(
'Enter schemas for postgres-warehouse as a comma-separated list (e.g. SALES, MARKETING).',
),
);
expect(pickers.scopeCalls[0]).toMatchObject({
schemas: ['orbit_analytics', 'orbit_raw'],
initialSchemas: ['orbit_analytics', 'orbit_raw'],
schemaSuggestion: { suggested: new Set(['orbit_analytics', 'orbit_raw']) },
});
});
it('passes schemas and a lazy table callback to the scope picker instead of eager table discovery', async () => { it('passes schemas and a lazy table callback to the scope picker instead of eager table discovery', async () => {
const listSchemas = vi.fn(async () => ['analytics', 'raw']); const listSchemas = vi.fn(async () => ['analytics', 'raw']);
const listTables = vi.fn(async (_projectDir: string, _connectionId: string, schemas?: string[]) => const listTables = vi.fn(async (_projectDir: string, _connectionId: string, schemas?: string[]) =>
@ -2015,6 +2070,7 @@ describe('setup databases step', () => {
it('writes query history config for supported Snowflake databases after validation succeeds', async () => { it('writes query history config for supported Snowflake databases after validation succeeds', async () => {
const io = makeIo(); const io = makeIo();
const historicSqlProbe = vi.fn(async () => ({ ok: true, lines: [] }));
const result = await runKtxSetupDatabasesStep( const result = await runKtxSetupDatabasesStep(
{ {
projectDir: tempDir, projectDir: tempDir,
@ -2032,12 +2088,21 @@ describe('setup databases step', () => {
{ {
testConnection: vi.fn(async () => 0), testConnection: vi.fn(async () => 0),
scanConnection: vi.fn(async () => 0), scanConnection: vi.fn(async () => 0),
historicSqlProbe,
prompts: makePromptAdapter({ prompts: makePromptAdapter({
textValues: ['env:SNOWFLAKE_ACCOUNT', 'WH', 'ANALYTICS', 'PUBLIC', 'reader', ''], selectValues: ['password'],
textValues: ['env:SNOWFLAKE_ACCOUNT', 'WH', 'ANALYTICS', 'reader', ''],
passwordValues: ['env:SNOWFLAKE_PASSWORD'], passwordValues: ['env:SNOWFLAKE_PASSWORD'],
}), }),
}, },
); );
expect(historicSqlProbe).toHaveBeenCalledWith(
expect.objectContaining({
projectDir: tempDir,
connectionId: 'snowflake',
dialect: 'snowflake',
}),
);
expect(result.status).toBe('ready'); expect(result.status).toBe('ready');
const configText = await readFile(join(tempDir, 'ktx.yaml'), 'utf-8'); const configText = await readFile(join(tempDir, 'ktx.yaml'), 'utf-8');
@ -2067,6 +2132,51 @@ describe('setup databases step', () => {
expect(config.ingest.adapters).toEqual([]); expect(config.ingest.adapters).toEqual([]);
}); });
it('configures Snowflake with RSA key-pair auth via setup wizard', async () => {
const io = makeIo();
const result = await runKtxSetupDatabasesStep(
{
projectDir: tempDir,
inputMode: 'disabled',
databaseDrivers: ['snowflake'],
databaseConnectionId: 'snowflake',
databaseSchemas: [],
skipDatabases: false,
},
io.io,
{
testConnection: vi.fn(async () => 0),
scanConnection: vi.fn(async () => 0),
prompts: makePromptAdapter({
selectValues: ['rsa'],
textValues: [
'env:SNOWFLAKE_ACCOUNT',
'WH',
'ANALYTICS',
'reader',
'~/.ssh/snowflake_rsa_key.p8',
'',
],
passwordValues: ['env:SNOWFLAKE_KEY_PASS'],
}),
},
);
expect(result.status).toBe('ready');
const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8'));
expect(config.connections.snowflake).toMatchObject({
driver: 'snowflake',
authMethod: 'rsa',
account: 'env:SNOWFLAKE_ACCOUNT',
warehouse: 'WH',
database: 'ANALYTICS',
username: 'reader',
privateKey: 'file:~/.ssh/snowflake_rsa_key.p8', // pragma: allowlist secret
passphrase: 'env:SNOWFLAKE_KEY_PASS', // pragma: allowlist secret
});
expect(config.connections.snowflake.password).toBeUndefined();
});
it('writes Postgres query history config with minExecutions and ignores window/redaction output', async () => { it('writes Postgres query history config with minExecutions and ignores window/redaction output', async () => {
const io = makeIo(); const io = makeIo();
const result = await runKtxSetupDatabasesStep( const result = await runKtxSetupDatabasesStep(
@ -2427,7 +2537,53 @@ describe('setup databases step', () => {
expect(io.stdout()).toContain('Query history probe...'); expect(io.stdout()).toContain('Query history probe...');
expect(io.stdout()).not.toContain('Historic SQL probe...'); expect(io.stdout()).not.toContain('Historic SQL probe...');
expect(io.stdout()).toContain('pg_stat_statements extension is not installed'); expect(io.stdout()).toContain('pg_stat_statements extension is not installed');
expect(io.stdout()).toContain('Setup written; first ingest run will fail until fixed.'); expect(io.stdout()).toContain('Setup written; query history will be skipped until fixed.');
});
it('prints a non-blocking Snowflake query history probe failure with the grants remediation', async () => {
const io = makeIo();
const historicSqlProbe = vi.fn(async () => ({
ok: false,
lines: [
' FAIL Snowflake role cannot read SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY',
' Fix: Run (as ACCOUNTADMIN): GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;',
],
}));
const result = await runKtxSetupDatabasesStep(
{
projectDir: tempDir,
inputMode: 'disabled',
databaseDrivers: ['snowflake'],
databaseConnectionId: 'warehouse',
databaseSchemas: [],
enableQueryHistory: true,
skipDatabases: false,
},
io.io,
{
testConnection: vi.fn(async () => 0),
scanConnection: vi.fn(async () => 0),
historicSqlProbe,
prompts: makePromptAdapter({
textValues: ['env:SNOWFLAKE_ACCOUNT', 'WH', 'ANALYTICS', 'reader', ''],
passwordValues: ['env:SNOWFLAKE_PASSWORD'],
}),
},
);
expect(result.status).toBe('ready');
expect(historicSqlProbe).toHaveBeenCalledWith(
expect.objectContaining({
projectDir: tempDir,
connectionId: 'warehouse',
dialect: 'snowflake',
}),
);
expect(io.stdout()).toContain('Query history probe...');
expect(io.stdout()).toContain('Snowflake role cannot read SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY');
expect(io.stdout()).toContain('GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE');
expect(io.stdout()).toContain('Setup written; query history will be skipped until fixed.');
}); });
it('does not run the query history probe when the regular connection test fails', async () => { it('does not run the query history probe when the regular connection test fails', async () => {

View file

@ -343,6 +343,13 @@ function historicSqlProbeFailureLines(error: unknown): string[] {
]; ];
} }
if (error instanceof Error && error.name === 'HistoricSqlGrantsMissingError') { if (error instanceof Error && error.name === 'HistoricSqlGrantsMissingError') {
const dialect = (error as { dialect?: unknown }).dialect;
if (dialect === 'snowflake') {
return [
' FAIL Snowflake role cannot read SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY',
' Fix: Run (as ACCOUNTADMIN): GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;',
];
}
return [ return [
' FAIL Postgres connection role lacks pg_read_all_stats', ' FAIL Postgres connection role lacks pg_read_all_stats',
' Fix: Run: GRANT pg_read_all_stats TO <connection role>;', ' Fix: Run: GRANT pg_read_all_stats TO <connection role>;',
@ -355,10 +362,18 @@ function historicSqlProbeFailureLines(error: unknown): string[] {
} }
async function defaultHistoricSqlProbe(input: KtxSetupHistoricSqlProbeInput): Promise<KtxSetupHistoricSqlProbeResult> { async function defaultHistoricSqlProbe(input: KtxSetupHistoricSqlProbeInput): Promise<KtxSetupHistoricSqlProbeResult> {
if (input.dialect !== 'postgres') { if (input.dialect === 'postgres') {
return { ok: true, lines: [] }; return probePostgresHistoricSql(input);
} }
if (input.dialect === 'snowflake') {
return probeSnowflakeHistoricSql(input);
}
return { ok: true, lines: [] };
}
async function probePostgresHistoricSql(
input: KtxSetupHistoricSqlProbeInput,
): Promise<KtxSetupHistoricSqlProbeResult> {
const project = await loadKtxProject({ projectDir: input.projectDir }); const project = await loadKtxProject({ projectDir: input.projectDir });
const connection = project.config.connections[input.connectionId]; const connection = project.config.connections[input.connectionId];
const [{ PostgresPgssReader }, { KtxPostgresHistoricSqlQueryClient }, { isKtxPostgresConnectionConfig }] = const [{ PostgresPgssReader }, { KtxPostgresHistoricSqlQueryClient }, { isKtxPostgresConnectionConfig }] =
@ -396,6 +411,46 @@ async function defaultHistoricSqlProbe(input: KtxSetupHistoricSqlProbeInput): Pr
} }
} }
async function probeSnowflakeHistoricSql(
input: KtxSetupHistoricSqlProbeInput,
): Promise<KtxSetupHistoricSqlProbeResult> {
const project = await loadKtxProject({ projectDir: input.projectDir });
const connection = project.config.connections[input.connectionId];
const [{ SnowflakeHistoricSqlQueryHistoryReader }, { KtxSnowflakeHistoricSqlQueryClient }, { isKtxSnowflakeConnectionConfig }] =
await Promise.all([
import('./context/ingest/adapters/historic-sql/snowflake-query-history-reader.js'),
import('./connectors/snowflake/historic-sql-query-client.js'),
import('./connectors/snowflake/connector.js'),
]);
if (!isKtxSnowflakeConnectionConfig(connection)) {
return {
ok: false,
lines: [` FAIL Connection ${input.connectionId} is not a native Snowflake connection.`],
};
}
const client = new KtxSnowflakeHistoricSqlQueryClient({
connectionId: input.connectionId,
connection,
projectDir: input.projectDir,
});
try {
const result = await new SnowflakeHistoricSqlQueryHistoryReader().probe(client);
return {
ok: true,
lines: [
' OK SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY accessible',
...result.warnings.map((warning: string) => ` ! ${warning}`),
],
};
} catch (error) {
return { ok: false, lines: historicSqlProbeFailureLines(error) };
} finally {
await client.cleanup();
}
}
async function defaultListSchemas(projectDir: string, connectionId: string): Promise<string[]> { async function defaultListSchemas(projectDir: string, connectionId: string): Promise<string[]> {
const project = await loadKtxProject({ projectDir }); const project = await loadKtxProject({ projectDir });
const connection = project.config.connections[connectionId]; const connection = project.config.connections[connectionId];
@ -459,7 +514,7 @@ async function defaultListSchemas(projectDir: string, connectionId: string): Pro
if (driver === 'snowflake') { if (driver === 'snowflake') {
const { KtxSnowflakeScanConnector, isKtxSnowflakeConnectionConfig } = await import('./connectors/snowflake/connector.js');; const { KtxSnowflakeScanConnector, isKtxSnowflakeConnectionConfig } = await import('./connectors/snowflake/connector.js');;
if (!isKtxSnowflakeConnectionConfig(connection)) return []; if (!isKtxSnowflakeConnectionConfig(connection)) return [];
const connector = new KtxSnowflakeScanConnector({ connectionId, connection }); const connector = new KtxSnowflakeScanConnector({ connectionId, connection, projectDir });
try { try {
return await connector.listSchemas(); return await connector.listSchemas();
} finally { } finally {
@ -535,7 +590,7 @@ async function defaultListTables(
if (driver === 'snowflake') { if (driver === 'snowflake') {
const { KtxSnowflakeScanConnector, isKtxSnowflakeConnectionConfig } = await import('./connectors/snowflake/connector.js');; const { KtxSnowflakeScanConnector, isKtxSnowflakeConnectionConfig } = await import('./connectors/snowflake/connector.js');;
if (!isKtxSnowflakeConnectionConfig(connection)) return []; if (!isKtxSnowflakeConnectionConfig(connection)) return [];
const connector = new KtxSnowflakeScanConnector({ connectionId, connection }); const connector = new KtxSnowflakeScanConnector({ connectionId, connection, projectDir });
try { try {
return await connector.listTables(schemas); return await connector.listTables(schemas);
} finally { } finally {
@ -954,43 +1009,86 @@ async function buildConnectionConfig(input: {
stringConfigField(input.existingConnection, 'database'), stringConfigField(input.existingConnection, 'database'),
); );
if (database === undefined) return 'back'; if (database === undefined) return 'back';
const schemaName = await promptText(
prompts,
'Snowflake schema\nPress Enter for PUBLIC, or enter a schema name.',
stringConfigField(input.existingConnection, 'schema_name') ?? 'PUBLIC',
);
if (schemaName === undefined) return 'back';
const username = await promptText( const username = await promptText(
prompts, prompts,
'Snowflake username', 'Snowflake username',
stringConfigField(input.existingConnection, 'username'), stringConfigField(input.existingConnection, 'username'),
); );
if (username === undefined) return 'back'; if (username === undefined) return 'back';
const passwordRef = await promptCredential({ const authChoice = await prompts.select({
prompts, message: 'Snowflake authentication method',
message: 'Snowflake password', options: [
projectDir: args.projectDir, { value: 'password', label: 'Password' },
connectionId: input.connectionId, { value: 'rsa', label: 'Key-pair (RSA / JWT)' },
secretName: 'password', // pragma: allowlist secret { value: 'back', label: 'Back' },
],
}); });
if (passwordRef === 'back') return 'back'; // pragma: allowlist secret if (authChoice === 'back') return 'back';
const authMethod: 'password' | 'rsa' = authChoice === 'rsa' ? 'rsa' : 'password';
let passwordRef: string | null = null;
let privateKeyInput: string | undefined;
let passphraseRef: string | null = null;
if (authMethod === 'password') {
const ref = await promptCredential({
prompts,
message: 'Snowflake password',
projectDir: args.projectDir,
connectionId: input.connectionId,
secretName: 'password', // pragma: allowlist secret
});
if (ref === 'back') return 'back'; // pragma: allowlist secret
passwordRef = ref;
} else {
privateKeyInput = await promptText(
prompts,
'Path to Snowflake private key (PEM)\nFor example ~/.ssh/snowflake_rsa_key.p8, or $ENV_VAR / env:NAME / file:/abs/path.',
displayFileReference(stringConfigField(input.existingConnection, 'privateKey')),
);
if (privateKeyInput === undefined) return 'back';
const phr = await promptCredential({
prompts,
message: 'Private key passphrase (optional)\nPress Enter to skip.',
projectDir: args.projectDir,
connectionId: input.connectionId,
secretName: 'snowflake-passphrase', // pragma: allowlist secret
});
if (phr === 'back') return 'back';
passphraseRef = phr;
}
const role = await promptText( const role = await promptText(
prompts, prompts,
'Snowflake role (optional)\nPress Enter to skip.', 'Snowflake role (optional)\nPress Enter to skip.',
stringConfigField(input.existingConnection, 'role'), stringConfigField(input.existingConnection, 'role'),
); );
if (role === undefined) return 'back'; if (role === undefined) return 'back';
const resolvedPasswordRef = passwordRef ?? stringConfigField(input.existingConnection, 'password'); if (authMethod === 'password') {
if (!account || !warehouse || !database || !schemaName || !username || !resolvedPasswordRef) return null; const resolvedPasswordRef = passwordRef ?? stringConfigField(input.existingConnection, 'password');
if (!account || !warehouse || !database || !username || !resolvedPasswordRef) return null;
return {
driver: 'snowflake',
authMethod: 'password',
account,
warehouse,
database,
username,
password: resolvedPasswordRef,
...(role ? { role } : {}),
};
}
const resolvedPrivateKey = privateKeyInput
? normalizeFileReference(privateKeyInput)
: stringConfigField(input.existingConnection, 'privateKey');
if (!account || !warehouse || !database || !username || !resolvedPrivateKey) return null;
const resolvedPassphrase = passphraseRef ?? stringConfigField(input.existingConnection, 'passphrase');
return { return {
driver: 'snowflake', driver: 'snowflake',
authMethod: 'password', authMethod: 'rsa',
account, account,
warehouse, warehouse,
database, database,
schema_name: schemaName,
username, username,
password: resolvedPasswordRef, privateKey: resolvedPrivateKey,
...(resolvedPassphrase ? { passphrase: resolvedPassphrase } : {}),
...(role ? { role } : {}), ...(role ? { role } : {}),
}; };
} }
@ -1425,6 +1523,21 @@ async function writeScopeConfig(input: {
}); });
} }
async function promptCommaSeparatedScope(input: {
prompts: KtxSetupDatabasesPromptAdapter;
connectionId: string;
spec: ScopeDiscoverySpec;
}): Promise<string[] | undefined> {
const example =
input.spec.nounPlural === 'datasets' ? 'sales, marketing' : 'SALES, MARKETING';
const value = await promptText(
input.prompts,
`Enter ${input.spec.nounPlural} for ${input.connectionId} as a comma-separated list (e.g. ${example}).`,
);
if (value === undefined) return undefined;
return unique(value.split(',').map((part) => part.trim()));
}
async function maybeConfigureDatabaseScope(input: { async function maybeConfigureDatabaseScope(input: {
projectDir: string; projectDir: string;
connectionId: string; connectionId: string;
@ -1494,28 +1607,48 @@ async function maybeConfigureDatabaseScope(input: {
writeSetupSection(input.io, 'Discovering tables', [`Connecting to ${input.connectionId}`]); writeSetupSection(input.io, 'Discovering tables', [`Connecting to ${input.connectionId}`]);
const schemas = unique( let effectiveCliSchemas = cliSchemas;
cliSchemas.length > 0 let listedSchemas: string[];
? cliSchemas if (cliSchemas.length > 0) {
: await (async (): Promise<string[]> => { listedSchemas = cliSchemas;
if (!spec) return []; } else if (!spec) {
try { listedSchemas = [];
return await (input.deps.listSchemas ?? defaultListSchemas)(input.projectDir, input.connectionId); } else {
} catch (error) { try {
const detail = error instanceof Error ? error.message : String(error); listedSchemas = await (input.deps.listSchemas ?? defaultListSchemas)(
input.io.stderr.write( input.projectDir,
`Could not discover ${spec.promptLabel.toLowerCase()} for ${input.connectionId}; ${detail}\n`, input.connectionId,
); );
return []; } catch (error) {
} const detail = error instanceof Error ? error.message : String(error);
})(), input.io.stderr.write(
); `Could not discover ${spec.promptLabel.toLowerCase()} for ${input.connectionId}; ${detail}\n`,
);
const typed = await promptCommaSeparatedScope({
prompts: input.prompts,
connectionId: input.connectionId,
spec,
});
if (typed === undefined) return 'back';
effectiveCliSchemas = typed;
listedSchemas = typed;
if (typed.length > 0) {
await writeScopeConfig({
projectDir: input.projectDir,
connectionId: input.connectionId,
values: typed,
spec,
});
}
}
}
const schemas = unique(listedSchemas);
if (spec && schemas.length === 0) { if (spec && schemas.length === 0) {
return 'ready'; return 'ready';
} }
const schemaSuggestion = const schemaSuggestion =
cliSchemas.length > 0 effectiveCliSchemas.length > 0
? { excluded: new Set<string>(), suggested: new Set(cliSchemas) } ? { excluded: new Set<string>(), suggested: new Set(effectiveCliSchemas) }
: spec?.suggest(schemas) ?? { excluded: new Set<string>(), suggested: new Set<string>() }; : spec?.suggest(schemas) ?? { excluded: new Set<string>(), suggested: new Set<string>() };
const existingEnabled = const existingEnabled =
hasExistingTables && input.forcePrompt === true hasExistingTables && input.forcePrompt === true
@ -1533,7 +1666,7 @@ async function maybeConfigureDatabaseScope(input: {
schemaSuggestion, schemaSuggestion,
existing: { enabledTables: existingEnabled }, existing: { enabledTables: existingEnabled },
supportsSchemaScope: spec !== undefined, supportsSchemaScope: spec !== undefined,
initialSchemas: cliSchemas.length > 0 ? cliSchemas : undefined, initialSchemas: effectiveCliSchemas.length > 0 ? effectiveCliSchemas : undefined,
prompts: input.prompts, prompts: input.prompts,
listTablesForSchemas: (selectedSchemas) => listTablesForSchemas: (selectedSchemas) =>
(input.deps.listTables ?? defaultListTables)(input.projectDir, input.connectionId, selectedSchemas), (input.deps.listTables ?? defaultListTables)(input.projectDir, input.connectionId, selectedSchemas),
@ -1638,7 +1771,12 @@ async function maybeRunHistoricSqlSetupProbe(input: {
const connection = project.config.connections[input.connectionId]; const connection = project.config.connections[input.connectionId];
const queryHistory = queryHistoryConfigRecord(connection) ?? historicSqlConfigRecord(connection); const queryHistory = queryHistoryConfigRecord(connection) ?? historicSqlConfigRecord(connection);
const driver = normalizeDriver(connection?.driver); const driver = normalizeDriver(connection?.driver);
if (queryHistory?.enabled !== true || driver !== 'postgres') { if (queryHistory?.enabled !== true) {
return;
}
const dialect: 'postgres' | 'snowflake' | null =
driver === 'postgres' ? 'postgres' : driver === 'snowflake' ? 'snowflake' : null;
if (!dialect) {
return; return;
} }
@ -1647,13 +1785,13 @@ async function maybeRunHistoricSqlSetupProbe(input: {
const result = await probe({ const result = await probe({
projectDir: input.projectDir, projectDir: input.projectDir,
connectionId: input.connectionId, connectionId: input.connectionId,
dialect: 'postgres', dialect,
}); });
for (const line of result.lines) { for (const line of result.lines) {
input.io.stdout.write(`${line}\n`); input.io.stdout.write(`${line}\n`);
} }
if (!result.ok) { if (!result.ok) {
input.io.stdout.write('│ Setup written; first ingest run will fail until fixed.\n'); input.io.stdout.write('│ Setup written; query history will be skipped until fixed.\n');
} }
} }

View file

@ -148,6 +148,161 @@ function withPostgresQueryHistory(config: KtxProjectConfig): KtxProjectConfig {
}; };
} }
function withSnowflakeQueryHistory(config: KtxProjectConfig): KtxProjectConfig {
return {
...config,
connections: {
...config.connections,
warehouse: {
driver: 'snowflake',
account: 'EMOVRJS-CZ07756',
warehouse: 'COMPUTE_WH',
database: 'ANALYTICS',
username: 'svc_ktx',
password: 'env:SNOWFLAKE_PASSWORD', // pragma: allowlist secret
context: { queryHistory: { enabled: true } },
} as KtxProjectConfig['connections'][string],
},
};
}
function withBigQueryQueryHistory(config: KtxProjectConfig): KtxProjectConfig {
return {
...config,
connections: {
...config.connections,
bq: {
driver: 'bigquery',
credentials_json: 'env:BQ_CREDENTIALS_JSON',
context: { queryHistory: { enabled: true } },
} as KtxProjectConfig['connections'][string],
},
};
}
function withMysqlQueryHistory(config: KtxProjectConfig): KtxProjectConfig {
return {
...config,
connections: {
...config.connections,
legacy: {
driver: 'mysql',
host: 'db.example.com',
database: 'analytics',
username: 'svc',
password: 'env:MYSQL_PASSWORD', // pragma: allowlist secret
context: { queryHistory: { enabled: true } },
} as KtxProjectConfig['connections'][string],
},
};
}
describe('buildProjectStatus query history dispatch', () => {
it('runs the snowflake probe for snowflake connections, not the postgres one', async () => {
let postgresCalls = 0;
let snowflakeCalls = 0;
const project = projectWithConfig(withSnowflakeQueryHistory(baseProjectConfig()));
const status = await buildProjectStatus(project, {
claudeCodeAuthProbe: stubClaudeCodeAuthProbe,
postgresQueryHistoryProbe: async () => {
postgresCalls += 1;
throw new Error('postgres probe should not run for snowflake');
},
snowflakeQueryHistoryProbe: async () => {
snowflakeCalls += 1;
return { warnings: [], info: [] };
},
});
expect(postgresCalls).toBe(0);
expect(snowflakeCalls).toBe(1);
expect(status.queryHistory).toHaveLength(1);
expect(status.queryHistory[0]).toMatchObject({
connection: 'warehouse',
driver: 'snowflake',
dialect: 'snowflake',
status: 'ok',
});
expect(status.queryHistory[0].detail).toMatch(/SNOWFLAKE\.ACCOUNT_USAGE\.QUERY_HISTORY/);
expect(status.queryHistory[0].fix).toBeUndefined();
expect(status.verdict).not.toBe('blocked');
});
it('reports snowflake probe failures with the reader-provided remediation', async () => {
const project = projectWithConfig(withSnowflakeQueryHistory(baseProjectConfig()));
const { HistoricSqlGrantsMissingError } = await import(
'./context/ingest/adapters/historic-sql/errors.js'
);
const status = await buildProjectStatus(project, {
claudeCodeAuthProbe: stubClaudeCodeAuthProbe,
snowflakeQueryHistoryProbe: async () => {
throw new HistoricSqlGrantsMissingError({
dialect: 'snowflake',
message: 'role cannot read SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY',
remediation: 'GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE ktx;',
});
},
});
expect(status.queryHistory[0]).toMatchObject({
connection: 'warehouse',
driver: 'snowflake',
dialect: 'snowflake',
status: 'fail',
fix: 'GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE ktx;',
});
expect(status.queryHistory[0].detail).not.toMatch(/Set connections.*driver to postgres/);
});
it('runs the bigquery probe for bigquery connections', async () => {
let bigqueryCalls = 0;
const project = projectWithConfig(withBigQueryQueryHistory(baseProjectConfig()));
const status = await buildProjectStatus(project, {
claudeCodeAuthProbe: stubClaudeCodeAuthProbe,
bigqueryQueryHistoryProbe: async () => {
bigqueryCalls += 1;
return { warnings: [], info: [] };
},
});
expect(bigqueryCalls).toBe(1);
expect(status.queryHistory[0]).toMatchObject({
connection: 'bq',
driver: 'bigquery',
dialect: 'bigquery',
status: 'ok',
});
expect(status.queryHistory[0].detail).toMatch(/INFORMATION_SCHEMA\.JOBS_BY_PROJECT/);
});
it('fails with an accurate message for drivers without a query history reader', async () => {
const project = projectWithConfig(withMysqlQueryHistory(baseProjectConfig()));
const status = await buildProjectStatus(project, {
claudeCodeAuthProbe: stubClaudeCodeAuthProbe,
postgresQueryHistoryProbe: async () => {
throw new Error('postgres probe must not run for mysql');
},
});
expect(status.queryHistory).toHaveLength(1);
expect(status.queryHistory[0]).toMatchObject({
connection: 'legacy',
driver: 'mysql',
dialect: 'mysql',
status: 'fail',
detail: 'query history is not supported for driver "mysql"',
});
expect(status.queryHistory[0].fix).toMatch(
/Disable connections\.legacy\.context\.queryHistory/,
);
expect(status.queryHistory[0].fix).not.toMatch(/driver to postgres/);
});
});
describe('buildProjectStatus --fast', () => { describe('buildProjectStatus --fast', () => {
it('skips claude-code probe and Postgres query-history probe', async () => { it('skips claude-code probe and Postgres query-history probe', async () => {
let claudeProbeCalls = 0; let claudeProbeCalls = 0;

View file

@ -5,6 +5,10 @@ import type { KtxConfigIssue, KtxProjectConfig, KtxProjectConnectionConfig, KtxP
import type { KtxLocalProject } from './context/project/project.js'; import type { KtxLocalProject } from './context/project/project.js';
import { ktxLocalStateDbPath } from './context/project/local-state-db.js'; import { ktxLocalStateDbPath } from './context/project/local-state-db.js';
import type { PostgresPgssProbeResult } from './context/ingest/adapters/historic-sql/types.js'; import type { PostgresPgssProbeResult } from './context/ingest/adapters/historic-sql/types.js';
import {
isQueryHistoryEnabled,
queryHistoryDialectForConnection,
} from './context/ingest/adapters/historic-sql/connection-dialect.js';
import { import {
formatClaudeCodePromptCachingFix, formatClaudeCodePromptCachingFix,
formatClaudeCodePromptCachingWarning, formatClaudeCodePromptCachingWarning,
@ -47,7 +51,8 @@ interface ConnectionStatus extends ProjectStatusLine {
interface QueryHistoryStatus extends ProjectStatusLine { interface QueryHistoryStatus extends ProjectStatusLine {
connection: string; connection: string;
dialect: 'postgres'; driver: string;
dialect: string;
} }
interface PipelineStatus { interface PipelineStatus {
@ -396,45 +401,44 @@ function buildConnectionStatus(
} }
} }
interface PostgresQueryHistoryProbeInput { interface QueryHistoryProbeInput {
projectDir: string; projectDir: string;
connectionId: string; connectionId: string;
connection: KtxProjectConnectionConfig; connection: KtxProjectConnectionConfig;
env: NodeJS.ProcessEnv; env: NodeJS.ProcessEnv;
} }
type PostgresQueryHistoryProbe = ( interface GenericProbeResult {
input: PostgresQueryHistoryProbeInput, warnings: string[];
) => Promise<PostgresPgssProbeResult>; info?: string[];
function recordValue(value: unknown): Record<string, unknown> | null {
return value && typeof value === 'object' && !Array.isArray(value) ? (value as Record<string, unknown>) : null;
} }
function queryHistoryRecord(connection: KtxProjectConnectionConfig): Record<string, unknown> | null { type PostgresQueryHistoryProbe = (input: QueryHistoryProbeInput) => Promise<PostgresPgssProbeResult>;
const context = recordValue(connection.context); type SnowflakeQueryHistoryProbe = (input: QueryHistoryProbeInput) => Promise<GenericProbeResult>;
return recordValue(context?.queryHistory); type BigQueryQueryHistoryProbe = (input: QueryHistoryProbeInput) => Promise<GenericProbeResult>;
}
function legacyHistoricSqlRecord(connection: KtxProjectConnectionConfig): Record<string, unknown> | null { function failureDetail(error: unknown): string {
return recordValue(connection.historicSql); if (error instanceof Error && error.message.trim().length > 0) {
} return error.message.trim().split('\n')[0] ?? error.message.trim();
function isEnabledPostgresQueryHistory(connection: KtxProjectConnectionConfig): boolean {
const queryHistory = queryHistoryRecord(connection);
if (queryHistory) {
return queryHistory.enabled === true;
} }
const legacy = legacyHistoricSqlRecord(connection); return String(error);
return legacy?.enabled === true && legacy.dialect === 'postgres';
} }
function isPostgresDriver(connection: KtxProjectConnectionConfig): boolean { function postgresReadinessDetail(result: PostgresPgssProbeResult): string {
const driver = String(connection.driver ?? '').toLowerCase(); const warningText = result.warnings.length > 0 ? ` with warnings: ${result.warnings.join('; ')}` : '';
return driver === 'postgres' || driver === 'postgresql'; const info = result.info ?? [];
const infoText = info.length > 0 ? `; info: ${info.join('; ')}` : '';
return `pg_stat_statements ready (${result.pgServerVersion})${warningText}${infoText}`;
} }
function queryHistoryFailureFix(error: unknown, connectionId: string, projectDir: string): string { function genericReadinessDetail(label: string, result: GenericProbeResult): string {
const warningText = result.warnings.length > 0 ? ` with warnings: ${result.warnings.join('; ')}` : '';
const info = result.info ?? [];
const infoText = info.length > 0 ? `; info: ${info.join('; ')}` : '';
return `${label} ready${warningText}${infoText}`;
}
function probeFailureFix(error: unknown, dialect: string, connectionId: string, projectDir: string): string {
if (error instanceof Error && error.name === 'HistoricSqlExtensionMissingError' && 'remediation' in error) { if (error instanceof Error && error.name === 'HistoricSqlExtensionMissingError' && 'remediation' in error) {
return String(error.remediation); return String(error.remediation);
} }
@ -444,25 +448,11 @@ function queryHistoryFailureFix(error: unknown, connectionId: string, projectDir
if (error instanceof Error && error.name === 'HistoricSqlVersionUnsupportedError') { if (error instanceof Error && error.name === 'HistoricSqlVersionUnsupportedError') {
return 'Use PostgreSQL 14 or newer, or disable query history for this connection'; return 'Use PostgreSQL 14 or newer, or disable query history for this connection';
} }
return `Fix connections.${connectionId} Postgres settings, then rerun \`ktx status --project-dir ${projectDir}\``; return `Fix connections.${connectionId} ${dialect} settings, then rerun \`ktx status --project-dir ${projectDir}\``;
}
function failureDetail(error: unknown): string {
if (error instanceof Error && error.message.trim().length > 0) {
return error.message.trim().split('\n')[0] ?? error.message.trim();
}
return String(error);
}
function readinessDetail(result: PostgresPgssProbeResult): string {
const warningText = result.warnings.length > 0 ? ` with warnings: ${result.warnings.join('; ')}` : '';
const info = result.info ?? [];
const infoText = info.length > 0 ? `; info: ${info.join('; ')}` : '';
return `pg_stat_statements ready (${result.pgServerVersion})${warningText}${infoText}`;
} }
async function defaultPostgresQueryHistoryProbe( async function defaultPostgresQueryHistoryProbe(
input: PostgresQueryHistoryProbeInput, input: QueryHistoryProbeInput,
): Promise<PostgresPgssProbeResult> { ): Promise<PostgresPgssProbeResult> {
const [{ PostgresPgssReader }, { KtxPostgresHistoricSqlQueryClient }, { isKtxPostgresConnectionConfig }] = const [{ PostgresPgssReader }, { KtxPostgresHistoricSqlQueryClient }, { isKtxPostgresConnectionConfig }] =
await Promise.all([ await Promise.all([
@ -488,63 +478,225 @@ async function defaultPostgresQueryHistoryProbe(
} }
} }
async function defaultSnowflakeQueryHistoryProbe(
input: QueryHistoryProbeInput,
): Promise<GenericProbeResult> {
const [{ SnowflakeHistoricSqlQueryHistoryReader }, { KtxSnowflakeHistoricSqlQueryClient }, { isKtxSnowflakeConnectionConfig }] =
await Promise.all([
import('./context/ingest/adapters/historic-sql/snowflake-query-history-reader.js'),
import('./connectors/snowflake/historic-sql-query-client.js'),
import('./connectors/snowflake/connector.js'),
]);
const inputDriver = input.connection.driver ?? 'unknown';
if (!isKtxSnowflakeConnectionConfig(input.connection)) {
throw new Error(`Native Snowflake connector cannot run driver "${inputDriver}"`);
}
const client = new KtxSnowflakeHistoricSqlQueryClient({
connectionId: input.connectionId,
connection: input.connection,
projectDir: input.projectDir,
env: input.env,
});
try {
return await new SnowflakeHistoricSqlQueryHistoryReader().probe(client);
} finally {
await client.cleanup();
}
}
async function defaultBigQueryQueryHistoryProbe(
input: QueryHistoryProbeInput,
): Promise<GenericProbeResult> {
const [
{ BigQueryHistoricSqlQueryHistoryReader },
{ KtxBigQueryScanConnector, isKtxBigQueryConnectionConfig },
{ resolveKtxConfigReference },
] = await Promise.all([
import('./context/ingest/adapters/historic-sql/bigquery-query-history-reader.js'),
import('./connectors/bigquery/connector.js'),
import('./context/core/config-reference.js'),
]);
const inputDriver = input.connection.driver ?? 'unknown';
if (!isKtxBigQueryConnectionConfig(input.connection)) {
throw new Error(`Native BigQuery connector cannot run driver "${inputDriver}"`);
}
const rawCredentials = typeof input.connection.credentials_json === 'string' ? input.connection.credentials_json : '';
const resolvedCredentials = resolveKtxConfigReference(rawCredentials, input.env);
if (!resolvedCredentials) {
throw new Error(`Query history BigQuery connection ${input.connectionId} requires credentials_json`);
}
const parsed = JSON.parse(resolvedCredentials) as { project_id?: unknown };
if (typeof parsed.project_id !== 'string' || parsed.project_id.trim().length === 0) {
throw new Error(`Query history BigQuery connection ${input.connectionId} requires credentials_json.project_id`);
}
const region =
typeof input.connection.location === 'string' && input.connection.location.trim().length > 0
? input.connection.location.trim()
: 'us';
const connector = new KtxBigQueryScanConnector({
connectionId: input.connectionId,
connection: input.connection,
});
try {
return await new BigQueryHistoricSqlQueryHistoryReader({
projectId: parsed.project_id,
region,
}).probe({
async executeQuery(sql: string) {
const result = await connector.executeReadOnly({ connectionId: input.connectionId, sql }, {} as never);
return {
headers: result.headers,
rows: result.rows,
totalRows: result.totalRows,
};
},
});
} finally {
await connector.cleanup();
}
}
interface DispatchedProbe {
label: string;
spinnerLabel: string;
fastSkipDetail: string;
run: () => Promise<{ status: ProjectStatusLevel; detail: string; fix?: string }>;
}
function postgresProbeDispatch(
input: QueryHistoryProbeInput,
probe: PostgresQueryHistoryProbe,
): DispatchedProbe {
return {
label: 'postgres',
spinnerLabel: `Probing pg_stat_statements on ${input.connectionId}`,
fastSkipDetail: 'pg_stat_statements probe skipped (--fast)',
run: async () => {
const result = await probe(input);
return {
status: result.warnings.length > 0 ? 'warn' : 'ok',
detail: postgresReadinessDetail(result),
...(result.warnings.length > 0
? {
fix: `Update the Postgres parameter group or config, then rerun \`ktx status --project-dir ${input.projectDir}\``,
}
: {}),
};
},
};
}
function snowflakeProbeDispatch(
input: QueryHistoryProbeInput,
probe: SnowflakeQueryHistoryProbe,
): DispatchedProbe {
return {
label: 'snowflake',
spinnerLabel: `Probing SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY on ${input.connectionId}`,
fastSkipDetail: 'SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY probe skipped (--fast)',
run: async () => {
const result = await probe(input);
return {
status: result.warnings.length > 0 ? 'warn' : 'ok',
detail: genericReadinessDetail('SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY', result),
};
},
};
}
function bigqueryProbeDispatch(
input: QueryHistoryProbeInput,
probe: BigQueryQueryHistoryProbe,
): DispatchedProbe {
return {
label: 'bigquery',
spinnerLabel: `Probing INFORMATION_SCHEMA.JOBS_BY_PROJECT on ${input.connectionId}`,
fastSkipDetail: 'INFORMATION_SCHEMA.JOBS_BY_PROJECT probe skipped (--fast)',
run: async () => {
const result = await probe(input);
return {
status: result.warnings.length > 0 ? 'warn' : 'ok',
detail: genericReadinessDetail('INFORMATION_SCHEMA.JOBS_BY_PROJECT', result),
};
},
};
}
async function buildQueryHistoryStatus( async function buildQueryHistoryStatus(
project: KtxLocalProject, project: KtxLocalProject,
options: BuildProjectStatusOptions, options: BuildProjectStatusOptions,
): Promise<QueryHistoryStatus[]> { ): Promise<QueryHistoryStatus[]> {
const targets = Object.entries(project.config.connections) const targets = Object.entries(project.config.connections)
.filter(([, connection]) => isEnabledPostgresQueryHistory(connection)) .filter(([, connection]) => isQueryHistoryEnabled(connection))
.sort(([left], [right]) => left.localeCompare(right)); .sort(([left], [right]) => left.localeCompare(right));
const probe = options.postgresQueryHistoryProbe ?? defaultPostgresQueryHistoryProbe; const postgresProbe = options.postgresQueryHistoryProbe ?? defaultPostgresQueryHistoryProbe;
const snowflakeProbe = options.snowflakeQueryHistoryProbe ?? defaultSnowflakeQueryHistoryProbe;
const bigqueryProbe = options.bigqueryQueryHistoryProbe ?? defaultBigQueryQueryHistoryProbe;
const env = options.env ?? process.env; const env = options.env ?? process.env;
const statuses: QueryHistoryStatus[] = []; const statuses: QueryHistoryStatus[] = [];
for (const [connectionId, connection] of targets) { for (const [connectionId, connection] of targets) {
if (!isPostgresDriver(connection)) { const driver = String(connection.driver ?? 'unknown').toLowerCase();
const dialect = queryHistoryDialectForConnection(connection);
if (!dialect) {
statuses.push({ statuses.push({
connection: connectionId, connection: connectionId,
dialect: 'postgres', driver,
dialect: driver,
status: 'fail', status: 'fail',
detail: `connections.${connectionId}.context.queryHistory is enabled but driver is ${String(connection.driver)}`, detail: `query history is not supported for driver "${driver}"`,
fix: `Set connections.${connectionId}.driver to postgres or disable query history for this connection`, fix: `Disable connections.${connectionId}.context.queryHistory, or use a postgres, snowflake, or bigquery connection`,
}); });
continue; continue;
} }
const probeInput: QueryHistoryProbeInput = {
projectDir: project.projectDir,
connectionId,
connection,
env,
};
const dispatched =
dialect === 'postgres'
? postgresProbeDispatch(probeInput, postgresProbe)
: dialect === 'snowflake'
? snowflakeProbeDispatch(probeInput, snowflakeProbe)
: bigqueryProbeDispatch(probeInput, bigqueryProbe);
if (options.fast === true) { if (options.fast === true) {
statuses.push({ statuses.push({
connection: connectionId, connection: connectionId,
dialect: 'postgres', driver,
dialect,
status: 'skipped', status: 'skipped',
detail: 'pg_stat_statements probe skipped (--fast)', detail: dispatched.fastSkipDetail,
}); });
continue; continue;
} }
try { try {
const result = await withSpinner( const outcome = await withSpinner(options.useSpinner === true, dispatched.spinnerLabel, dispatched.run);
options.useSpinner === true,
`Probing pg_stat_statements on ${connectionId}`,
() => probe({ projectDir: project.projectDir, connectionId, connection, env }),
);
statuses.push({ statuses.push({
connection: connectionId, connection: connectionId,
dialect: 'postgres', driver,
status: result.warnings.length > 0 ? 'warn' : 'ok', dialect,
detail: readinessDetail(result), ...outcome,
...(result.warnings.length > 0
? {
fix: `Update the Postgres parameter group or config, then rerun \`ktx status --project-dir ${project.projectDir}\``,
}
: {}),
}); });
} catch (error) { } catch (error) {
statuses.push({ statuses.push({
connection: connectionId, connection: connectionId,
dialect: 'postgres', driver,
dialect,
status: 'fail', status: 'fail',
detail: failureDetail(error), detail: failureDetail(error),
fix: queryHistoryFailureFix(error, connectionId, project.projectDir), fix: probeFailureFix(error, dispatched.label, connectionId, project.projectDir),
}); });
} }
} }
@ -731,6 +883,8 @@ function buildVerdict(
export interface BuildProjectStatusOptions { export interface BuildProjectStatusOptions {
env?: NodeJS.ProcessEnv; env?: NodeJS.ProcessEnv;
postgresQueryHistoryProbe?: PostgresQueryHistoryProbe; postgresQueryHistoryProbe?: PostgresQueryHistoryProbe;
snowflakeQueryHistoryProbe?: SnowflakeQueryHistoryProbe;
bigqueryQueryHistoryProbe?: BigQueryQueryHistoryProbe;
claudeCodeAuthProbe?: ClaudeCodeAuthProbe; claudeCodeAuthProbe?: ClaudeCodeAuthProbe;
configIssues?: KtxConfigIssue[]; configIssues?: KtxConfigIssue[];
fast?: boolean; fast?: boolean;

View file

@ -47,6 +47,7 @@ describe('buildProjectStackSnapshotFields', () => {
maxLlmTablesPerBatch: 40, maxLlmTablesPerBatch: 40,
maxCandidatesPerColumn: 25, maxCandidatesPerColumn: 25,
profileSampleRows: 10000, profileSampleRows: 10000,
profileConcurrency: 4,
validationConcurrency: 4, validationConcurrency: 4,
}, },
}, },

View file

@ -2,6 +2,7 @@
from __future__ import annotations from __future__ import annotations
import json
from collections.abc import Callable, Mapping, Sequence from collections.abc import Callable, Mapping, Sequence
from dataclasses import dataclass from dataclasses import dataclass
from datetime import datetime, timezone from datetime import datetime, timezone
@ -24,6 +25,16 @@ join pg_catalog.pg_class c
and c.relname = t.table_name and c.relname = t.table_name
where t.table_schema = any(%s) where t.table_schema = any(%s)
and t.table_type = 'BASE TABLE' and t.table_type = 'BASE TABLE'
and (
%s::jsonb is null
or exists (
select 1
from jsonb_to_recordset(%s::jsonb) as scope(catalog text, db text, name text)
where (scope.catalog is null or scope.catalog = current_database())
and (scope.db is null or scope.db = t.table_schema)
and scope.name = t.table_name
)
)
order by t.table_schema, t.table_name order by t.table_schema, t.table_name
""" """
@ -52,6 +63,16 @@ where n.nspname = any(%s)
and c.relkind in ('r', 'p') and c.relkind in ('r', 'p')
and a.attnum > 0 and a.attnum > 0
and not a.attisdropped and not a.attisdropped
and (
%s::jsonb is null
or exists (
select 1
from jsonb_to_recordset(%s::jsonb) as scope(catalog text, db text, name text)
where (scope.catalog is null or scope.catalog = current_database())
and (scope.db is null or scope.db = n.nspname)
and scope.name = c.relname
)
)
order by n.nspname, c.relname, a.attnum order by n.nspname, c.relname, a.attnum
""" """
@ -80,6 +101,16 @@ join information_schema.key_column_usage target_key
and target_key.ordinal_position = source_key.position_in_unique_constraint and target_key.ordinal_position = source_key.position_in_unique_constraint
where source_constraint.constraint_type = 'FOREIGN KEY' where source_constraint.constraint_type = 'FOREIGN KEY'
and source_constraint.table_schema = any(%s) and source_constraint.table_schema = any(%s)
and (
%s::jsonb is null
or exists (
select 1
from jsonb_to_recordset(%s::jsonb) as scope(catalog text, db text, name text)
where (scope.catalog is null or scope.catalog = current_database())
and (scope.db is null or scope.db = source_constraint.table_schema)
and scope.name = source_constraint.table_name
)
)
order by source_constraint.table_schema, source_constraint.table_name, source_constraint.constraint_name, source_key.ordinal_position order by source_constraint.table_schema, source_constraint.table_name, source_constraint.constraint_name, source_key.ordinal_position
""" """
@ -108,6 +139,12 @@ class LiveDatabaseTable(BaseModel):
foreign_keys: list[LiveDatabaseForeignKey] = Field(default_factory=list) foreign_keys: list[LiveDatabaseForeignKey] = Field(default_factory=list)
class LiveDatabaseTableScopeRef(BaseModel):
catalog: str | None = None
db: str | None = None
name: str
class DatabaseIntrospectionRequest(BaseModel): class DatabaseIntrospectionRequest(BaseModel):
connection_id: str connection_id: str
driver: str = "postgres" driver: str = "postgres"
@ -115,6 +152,7 @@ class DatabaseIntrospectionRequest(BaseModel):
schemas: list[str] = Field(default_factory=lambda: ["public"]) schemas: list[str] = Field(default_factory=lambda: ["public"])
statement_timeout_ms: int = Field(default=30_000, ge=1) statement_timeout_ms: int = Field(default=30_000, ge=1)
connection_timeout_seconds: int = Field(default=5, ge=1) connection_timeout_seconds: int = Field(default=5, ge=1)
table_scope: list[LiveDatabaseTableScopeRef] | None = None
@field_validator("schemas") @field_validator("schemas")
@classmethod @classmethod
@ -169,6 +207,23 @@ def _statement_timeout_config(statement_timeout_ms: int) -> tuple[str, tuple[str
) )
def _table_scope_json(
table_scope: Sequence[LiveDatabaseTableScopeRef] | None,
) -> str | None:
if table_scope is None:
return None
return json.dumps(
[
{
"catalog": ref.catalog,
"db": ref.db,
"name": ref.name,
}
for ref in table_scope
]
)
def _load_postgres_rows( def _load_postgres_rows(
request: DatabaseIntrospectionRequest, request: DatabaseIntrospectionRequest,
) -> DatabaseIntrospectionRows: ) -> DatabaseIntrospectionRows:
@ -190,7 +245,8 @@ def _load_postgres_rows(
connection.execute("BEGIN READ ONLY") connection.execute("BEGIN READ ONLY")
try: try:
connection.execute(*_statement_timeout_config(request.statement_timeout_ms)) connection.execute(*_statement_timeout_config(request.statement_timeout_ms))
params = (request.schemas,) scope_json = _table_scope_json(request.table_scope)
params = (request.schemas, scope_json, scope_json)
table_rows = list(connection.execute(TABLES_SQL, params)) table_rows = list(connection.execute(TABLES_SQL, params))
column_rows = list(connection.execute(COLUMNS_SQL, params)) column_rows = list(connection.execute(COLUMNS_SQL, params))
foreign_key_rows = list(connection.execute(FOREIGN_KEYS_SQL, params)) foreign_key_rows = list(connection.execute(FOREIGN_KEYS_SQL, params))

View file

@ -155,6 +155,7 @@ def test_database_introspect_endpoint_returns_snapshot() -> None:
"driver": "postgres", "driver": "postgres",
"url": "postgresql://readonly@example.test/warehouse", "url": "postgresql://readonly@example.test/warehouse",
"schemas": ["public"], "schemas": ["public"],
"table_scope": [{"db": "public", "name": "orders"}],
}, },
) )
@ -162,6 +163,8 @@ def test_database_introspect_endpoint_returns_snapshot() -> None:
assert response.json()["connection_id"] == "warehouse" assert response.json()["connection_id"] == "warehouse"
assert response.json()["tables"][0]["name"] == "orders" assert response.json()["tables"][0]["name"] == "orders"
assert calls[0].connection_id == "warehouse" assert calls[0].connection_id == "warehouse"
assert calls[0].table_scope[0].db == "public"
assert calls[0].table_scope[0].name == "orders"
def test_database_introspect_endpoint_maps_value_error_to_400() -> None: def test_database_introspect_endpoint_maps_value_error_to_400() -> None:

View file

@ -311,6 +311,9 @@ def test_database_introspect_command_reads_stdin_and_writes_json(
assert request.connection_id == "warehouse" assert request.connection_id == "warehouse"
assert request.driver == "postgres" assert request.driver == "postgres"
assert request.schemas == ["public"] assert request.schemas == ["public"]
assert request.table_scope is not None
assert request.table_scope[0].db == "public"
assert request.table_scope[0].name == "orders"
return DatabaseIntrospectionResponse( return DatabaseIntrospectionResponse(
connection_id="warehouse", connection_id="warehouse",
extracted_at="2026-04-28T10:00:00+00:00", extracted_at="2026-04-28T10:00:00+00:00",
@ -337,7 +340,7 @@ def test_database_introspect_command_reads_stdin_and_writes_json(
sys, sys,
"stdin", "stdin",
io.StringIO( io.StringIO(
'{"connection_id":"warehouse","driver":"postgres","url":"postgresql://readonly@example.test/warehouse","schemas":["public"]}' '{"connection_id":"warehouse","driver":"postgres","url":"postgresql://readonly@example.test/warehouse","schemas":["public"],"table_scope":[{"db":"public","name":"orders"}]}'
), ),
) )

View file

@ -5,7 +5,9 @@ import pytest
from ktx_daemon.database_introspection import ( from ktx_daemon.database_introspection import (
DatabaseIntrospectionRequest, DatabaseIntrospectionRequest,
DatabaseIntrospectionRows, DatabaseIntrospectionRows,
LiveDatabaseTableScopeRef,
_statement_timeout_config, _statement_timeout_config,
_table_scope_json,
introspect_database_response, introspect_database_response,
) )
@ -146,6 +148,22 @@ def test_database_introspection_request_rejects_empty_schema_list() -> None:
) )
def test_table_scope_json_serializes_null_wildcards() -> None:
assert _table_scope_json(
[
LiveDatabaseTableScopeRef(catalog=None, db="public", name="orders"),
LiveDatabaseTableScopeRef(
catalog="warehouse",
db="marts",
name="customers",
),
]
) == (
'[{"catalog": null, "db": "public", "name": "orders"}, '
'{"catalog": "warehouse", "db": "marts", "name": "customers"}]'
)
def test_statement_timeout_config_uses_parameterized_set_config() -> None: def test_statement_timeout_config_uses_parameterized_set_config() -> None:
assert _statement_timeout_config(30_000) == ( assert _statement_timeout_config(30_000) == (
"SELECT set_config('statement_timeout', %s, true)", "SELECT set_config('statement_timeout', %s, true)",

26
scripts/ktx-reset.sh Executable file
View file

@ -0,0 +1,26 @@
#!/bin/bash
# ktx-reset.sh - Reset a ktx project directory back to its seed state.
#
# Removes everything in <dir> except ktx.yaml and .ktx/, and prunes .ktx/
# down to just .ktx/secrets/. Useful when re-running ingest/setup against
# a known-clean project tree.
set -e
set -o pipefail
if [ -z "$1" ]; then
echo "usage: ktx-reset <dir>" >&2
exit 1
fi
dir="${1%/}"
if [ ! -d "$dir" ]; then
echo "ktx-reset: $dir is not a directory" >&2
exit 1
fi
find "$dir" -mindepth 1 -maxdepth 1 ! -name ktx.yaml ! -name .ktx -exec rm -rf {} +
if [ -d "$dir/.ktx" ]; then
find "$dir/.ktx" -mindepth 1 -maxdepth 1 ! -name secrets -exec rm -rf {} +
fi
tree -a "$dir"