mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-13 08:15:14 +02:00
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure Snowflake setup previously asked for a single schema as free text, then ran a multiselect against the discovered schemas — two schema questions back-to-back, with the first being only a session bootstrap. The SDK's `schema` is optional, so the bootstrap step is unnecessary. - Remove the free-text Snowflake schema prompt; only pass `schema` to snowflake-sdk when one is configured. - When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the user for a comma-separated list, persist it as `schema_names`, and use it as both the table-list filter and the multiselect default. Applies to every driver with a scope-discovery spec, not just Snowflake. - Update docs to lead with `schema_names`; keep `schema_name` as a documented single-schema shorthand. * fix(snowflake): keep introspecting when primary-key discovery is denied The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the connection role may not have. Previously a 'SQL compilation error: Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist or not authorized' aborted the entire introspect — schemas, columns, and row counts were all discarded over a missing nice-to-have. Wrap the constraint query in try/catch, log a one-line warning per schema, and return an empty PK map. Columns end up with primaryKey=false; relationship inference still has FK and profiling to fall back on. * fix(scan): unblock relationship discovery on Snowflake Two adjacent bugs prevented the scan's relationship pipeline from producing any joins on a Snowflake warehouse: - relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table profile query failed with "Unknown function GROUP_CONCAT". Add an explicit Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter (Snowflake requires the delimiter to be a constant, so CHR(31) is rejected). - description-generation.ts destructured `connector.sampleTable` and `connector.sampleColumn` into bare locals, losing the `this` binding when the class-method connectors (Snowflake, Postgres, MySQL) were invoked. Every sample call threw "Cannot read properties of undefined (reading 'assertConnection')" and degraded LLM descriptions to metadata-only prompts. Call the methods through the connector instead. Without these, even after the primary-key probe is allowed to fail softly, the scan ends up with 0 validated relationships and an empty `joins:` block in every shard YAML. * test(scan): cover table-ref helpers * feat(scan): plumb tableScope through live-database introspection port * feat(scan): apply tableScope during metadata fetch * feat(scan): enforce table scope at fetch boundary * feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206) * feat(cli): add RSA key-pair auth option to Snowflake setup wizard Extends the interactive Snowflake setup flow with an authentication-method prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key path (env/file/absolute) and an optional passphrase; the resulting connection config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead of `password`. * feat(scan): pool Snowflake sessions * fix(scan): reuse structural snapshots and cleanup connectors * feat(scan): parallelize relationship profiling * feat(scan): batch table description generation * docs: document Snowflake ingest concurrency knobs * fix(scan): close Snowflake ingest perf verification gaps * fix(scan): keep batched description failure bounded * feat(scan): dispatch query-history probes by connection driver Extract historic-sql dialect resolution into a shared helper so the status-project readiness check and the local ingest factory agree on which connections enable query history and which probe to run. The status command now picks the postgres/snowflake/bigquery probe based on the connection's driver instead of always reporting against postgres, which previously caused snowflake connections with queryHistory.enabled to surface a misleading "driver is snowflake" failure. Also drops a noisy console.warn from Snowflake primary-key discovery — INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only roles and the FK + profiling paths handle the empty PK map already. * fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject The Claude Code agent SDK announces an internal pseudo-tool named StructuredOutput in the system/init message whenever outputFormat is set to { type: 'json_schema' }. The runtime's isolation check built its allowedToolIds set only from MCP tool ids and treated StructuredOutput as an unexpected host-injected tool, so every generateObject call threw "Claude Code runtime isolation failed: tools=StructuredOutput ..." and the table-descriptions and relationship-LLM-proposal enrichment stages recorded null output across the board. Whitelist StructuredOutput specifically in generateObject's allowedToolIds — the check also enforces missing_tools symmetry, so generateText and runAgentLoop, which do not see StructuredOutput, must not require it. generateObject also ran with maxTurns: 1, which the model intermittently breached when it emitted thinking text before the structured response. Raised to 5 to give the schema-bound call enough headroom without allowing unbounded loops. The existing tests now exercise the path with an init message that announces StructuredOutput so the regression cannot slip back in. * chore(scripts): add ktx-reset.sh project-cleanup helper Convenience script for repeatable ingest testing: takes a project directory and prunes everything except ktx.yaml and .ktx/secrets/, so the next ktx setup or ktx ingest run starts from a known-clean state.
This commit is contained in:
parent
b0dd13ce7c
commit
394a985d2a
72 changed files with 3508 additions and 655 deletions
|
|
@ -1,6 +1,7 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createSqlServerLiveDatabaseIntrospection } from '../../connectors/sqlserver/live-database-introspection.js';
|
||||
import { isKtxSqlServerConnectionConfig, KtxSqlServerScanConnector, sqlServerConnectionPoolConfigFromConfig, type KtxSqlServerPoolFactory, type KtxSqlServerQueryResult } from '../../connectors/sqlserver/connector.js';
|
||||
import { tableRefSet } from '../../context/scan/table-ref.js';
|
||||
|
||||
function recordset<T extends Record<string, unknown>>(
|
||||
rows: T[],
|
||||
|
|
@ -290,6 +291,55 @@ describe('KtxSqlServerScanConnector', () => {
|
|||
await connector.cleanup();
|
||||
});
|
||||
|
||||
it('limits introspection to tables in tableScope', async () => {
|
||||
const queries: string[] = [];
|
||||
const inputs: Array<{ name: string; value: unknown }> = [];
|
||||
const request = {
|
||||
input: vi.fn((name: string, value: unknown) => {
|
||||
inputs.push({ name, value });
|
||||
return request;
|
||||
}),
|
||||
query: vi.fn(async (sql: string): Promise<KtxSqlServerQueryResult> => {
|
||||
queries.push(sql);
|
||||
if (sql.includes('INFORMATION_SCHEMA.TABLES')) {
|
||||
return result([{ table_name: 'orders', table_type: 'BASE TABLE' }], ['table_name', 'table_type']);
|
||||
}
|
||||
if (sql.includes('INFORMATION_SCHEMA.COLUMNS')) {
|
||||
return result(
|
||||
[{ table_name: 'orders', column_name: 'id', data_type: 'int', is_nullable: 'NO' }],
|
||||
['table_name', 'column_name', 'data_type', 'is_nullable'],
|
||||
);
|
||||
}
|
||||
return result([], []);
|
||||
}),
|
||||
};
|
||||
const poolFactory: KtxSqlServerPoolFactory = {
|
||||
createPool: vi.fn(async () => ({
|
||||
request: () => request,
|
||||
close: vi.fn(async () => undefined),
|
||||
})),
|
||||
};
|
||||
const connector = new KtxSqlServerScanConnector({
|
||||
connectionId: 'warehouse',
|
||||
connection: {
|
||||
driver: 'sqlserver',
|
||||
host: 'db.example.test',
|
||||
database: 'analytics',
|
||||
username: 'reader',
|
||||
schema: 'dbo',
|
||||
},
|
||||
poolFactory,
|
||||
});
|
||||
const scope = tableRefSet([{ catalog: 'analytics', db: 'dbo', name: 'orders' }]);
|
||||
const snapshot = await connector.introspect(
|
||||
{ connectionId: 'warehouse', driver: 'sqlserver', tableScope: scope },
|
||||
{ runId: 'scope-test' },
|
||||
);
|
||||
expect(snapshot.tables.map((table) => table.name)).toEqual(['orders']);
|
||||
expect(queries.find((query) => query.includes('INFORMATION_SCHEMA.TABLES'))).toMatch(/TABLE_NAME IN \(@table_0\)/);
|
||||
expect(inputs).toEqual(expect.arrayContaining([{ name: 'table_0', value: 'orders' }]));
|
||||
});
|
||||
|
||||
it('adapts native SQL Server snapshots to live-database introspection for local ingest', async () => {
|
||||
const introspection = createSqlServerLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
|
|
|
|||
|
|
@ -1,5 +1,6 @@
|
|||
import { assertReadOnlySql } from '../../context/connections/read-only-sql.js';
|
||||
import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
|
||||
import { scopedTableNames } from '../../context/scan/table-ref.js';
|
||||
import { readFileSync } from 'node:fs';
|
||||
import { homedir } from 'node:os';
|
||||
import { resolve } from 'node:path';
|
||||
|
|
@ -121,6 +122,20 @@ function sqlRecordset(
|
|||
return recordset;
|
||||
}
|
||||
|
||||
function tableScopeSql(
|
||||
scopedNames: readonly string[] | null,
|
||||
columnExpression: string,
|
||||
): { clause: string; params: Record<string, unknown> } {
|
||||
if (!scopedNames) return { clause: '', params: {} };
|
||||
const params: Record<string, unknown> = {};
|
||||
const placeholders = scopedNames.map((name, index) => {
|
||||
const key = `table_${index}`;
|
||||
params[key] = name;
|
||||
return `@${key}`;
|
||||
});
|
||||
return { clause: `AND ${columnExpression} IN (${placeholders.join(', ')})`, params };
|
||||
}
|
||||
|
||||
class DefaultSqlServerPoolFactory implements KtxSqlServerPoolFactory {
|
||||
async createPool(config: KtxSqlServerPoolConfig): Promise<KtxSqlServerPool> {
|
||||
const pool = await new sql.ConnectionPool(config as sql.config).connect();
|
||||
|
|
@ -314,7 +329,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
this.assertConnection(input.connectionId);
|
||||
const tables: KtxSchemaTable[] = [];
|
||||
for (const schemaName of this.schemas) {
|
||||
tables.push(...(await this.introspectSchema(schemaName)));
|
||||
const scopedNames = input.tableScope
|
||||
? scopedTableNames(input.tableScope, { catalog: this.poolConfig.database, db: schemaName })
|
||||
: null;
|
||||
tables.push(...(await this.introspectSchema(schemaName, scopedNames)));
|
||||
}
|
||||
return {
|
||||
connectionId: this.connectionId,
|
||||
|
|
@ -461,16 +479,19 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
}
|
||||
}
|
||||
|
||||
private async introspectSchema(schemaName: string): Promise<KtxSchemaTable[]> {
|
||||
private async introspectSchema(schemaName: string, scopedNames: readonly string[] | null): Promise<KtxSchemaTable[]> {
|
||||
if (scopedNames && scopedNames.length === 0) return [];
|
||||
const tableScope = tableScopeSql(scopedNames, 'TABLE_NAME');
|
||||
const tables = await this.queryRaw<{ table_name: string; table_type: string }>(
|
||||
`
|
||||
SELECT TABLE_NAME AS table_name, TABLE_TYPE AS table_type
|
||||
FROM INFORMATION_SCHEMA.TABLES
|
||||
WHERE TABLE_SCHEMA = @schemaName
|
||||
AND TABLE_TYPE IN ('BASE TABLE', 'VIEW')
|
||||
${tableScope.clause}
|
||||
ORDER BY TABLE_NAME
|
||||
`,
|
||||
{ schemaName },
|
||||
{ schemaName, ...tableScope.params },
|
||||
);
|
||||
const columns = await this.queryRaw<{
|
||||
table_name: string;
|
||||
|
|
@ -482,15 +503,16 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
SELECT TABLE_NAME AS table_name, COLUMN_NAME AS column_name, DATA_TYPE AS data_type, IS_NULLABLE AS is_nullable
|
||||
FROM INFORMATION_SCHEMA.COLUMNS
|
||||
WHERE TABLE_SCHEMA = @schemaName
|
||||
${tableScope.clause}
|
||||
ORDER BY TABLE_NAME, ORDINAL_POSITION
|
||||
`,
|
||||
{ schemaName },
|
||||
{ schemaName, ...tableScope.params },
|
||||
);
|
||||
const tableComments = await this.tableComments(schemaName);
|
||||
const columnComments = await this.columnComments(schemaName);
|
||||
const primaryKeys = await this.primaryKeys(schemaName);
|
||||
const foreignKeys = await this.foreignKeys(schemaName);
|
||||
const rowCounts = await this.rowCounts(schemaName);
|
||||
const tableComments = await this.tableComments(schemaName, scopedNames);
|
||||
const columnComments = await this.columnComments(schemaName, scopedNames);
|
||||
const primaryKeys = await this.primaryKeys(schemaName, scopedNames);
|
||||
const foreignKeys = await this.foreignKeys(schemaName, scopedNames);
|
||||
const rowCounts = await this.rowCounts(schemaName, scopedNames);
|
||||
const columnsByTable = groupByTable(columns);
|
||||
const foreignKeysByTable = groupByTable(foreignKeys);
|
||||
|
||||
|
|
@ -508,7 +530,8 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
}));
|
||||
}
|
||||
|
||||
private async tableComments(schemaName: string): Promise<Map<string, string>> {
|
||||
private async tableComments(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, string>> {
|
||||
const tableScope = tableScopeSql(scopedNames, 'o.name');
|
||||
const rows = await this.queryRaw<{ table_name: string; table_comment: string }>(
|
||||
`
|
||||
SELECT o.name AS table_name, CAST(ep.value AS NVARCHAR(MAX)) AS table_comment
|
||||
|
|
@ -519,13 +542,15 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
AND ep.name = 'MS_Description'
|
||||
WHERE s.name = @schemaName
|
||||
AND o.type IN ('U', 'V')
|
||||
${tableScope.clause}
|
||||
`,
|
||||
{ schemaName },
|
||||
{ schemaName, ...tableScope.params },
|
||||
);
|
||||
return new Map(rows.map((row) => [row.table_name, row.table_comment]));
|
||||
}
|
||||
|
||||
private async columnComments(schemaName: string): Promise<Map<string, string>> {
|
||||
private async columnComments(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, string>> {
|
||||
const tableScope = tableScopeSql(scopedNames, 'o.name');
|
||||
const rows = await this.queryRaw<{ table_name: string; column_name: string; column_comment: string }>(
|
||||
`
|
||||
SELECT o.name AS table_name, c.name AS column_name, CAST(ep.value AS NVARCHAR(MAX)) AS column_comment
|
||||
|
|
@ -537,13 +562,18 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
AND ep.name = 'MS_Description'
|
||||
WHERE s.name = @schemaName
|
||||
AND o.type IN ('U', 'V')
|
||||
${tableScope.clause}
|
||||
`,
|
||||
{ schemaName },
|
||||
{ schemaName, ...tableScope.params },
|
||||
);
|
||||
return new Map(rows.map((row) => [`${row.table_name}.${row.column_name}`, row.column_comment]));
|
||||
}
|
||||
|
||||
private async primaryKeys(schemaName: string): Promise<Map<string, Set<string>>> {
|
||||
private async primaryKeys(
|
||||
schemaName: string,
|
||||
scopedNames: readonly string[] | null,
|
||||
): Promise<Map<string, Set<string>>> {
|
||||
const tableScope = tableScopeSql(scopedNames, 'tc.TABLE_NAME');
|
||||
const rows = await this.queryRaw<{ table_name: string; column_name: string }>(
|
||||
`
|
||||
SELECT tc.TABLE_NAME AS table_name, kcu.COLUMN_NAME AS column_name
|
||||
|
|
@ -553,9 +583,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA
|
||||
WHERE tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
|
||||
AND tc.TABLE_SCHEMA = @schemaName
|
||||
${tableScope.clause}
|
||||
ORDER BY tc.TABLE_NAME, kcu.ORDINAL_POSITION
|
||||
`,
|
||||
{ schemaName },
|
||||
{ schemaName, ...tableScope.params },
|
||||
);
|
||||
const grouped = new Map<string, Set<string>>();
|
||||
for (const row of rows) {
|
||||
|
|
@ -566,7 +597,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
return grouped;
|
||||
}
|
||||
|
||||
private async foreignKeys(schemaName: string): Promise<
|
||||
private async foreignKeys(
|
||||
schemaName: string,
|
||||
scopedNames: readonly string[] | null,
|
||||
): Promise<
|
||||
Array<{
|
||||
table_name: string;
|
||||
column_name: string;
|
||||
|
|
@ -576,6 +610,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
constraint_name: string;
|
||||
}>
|
||||
> {
|
||||
const tableScope = tableScopeSql(scopedNames, 'fk.TABLE_NAME');
|
||||
return this.queryRaw(
|
||||
`
|
||||
SELECT
|
||||
|
|
@ -596,13 +631,15 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
AND pk.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME
|
||||
AND pk.ORDINAL_POSITION = fk.ORDINAL_POSITION
|
||||
WHERE fk.TABLE_SCHEMA = @schemaName
|
||||
${tableScope.clause}
|
||||
ORDER BY fk.TABLE_NAME, fk.COLUMN_NAME
|
||||
`,
|
||||
{ schemaName },
|
||||
{ schemaName, ...tableScope.params },
|
||||
);
|
||||
}
|
||||
|
||||
private async rowCounts(schemaName: string): Promise<Map<string, number>> {
|
||||
private async rowCounts(schemaName: string, scopedNames: readonly string[] | null): Promise<Map<string, number>> {
|
||||
const tableScope = tableScopeSql(scopedNames, 't.name');
|
||||
const rows = await this.queryRaw<{ table_name: string; row_count: unknown }>(
|
||||
`
|
||||
SELECT t.name AS table_name, SUM(p.rows) AS row_count
|
||||
|
|
@ -611,9 +648,10 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
|
|||
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
|
||||
WHERE s.name = @schemaName
|
||||
AND p.index_id IN (0, 1)
|
||||
${tableScope.clause}
|
||||
GROUP BY t.name
|
||||
`,
|
||||
{ schemaName },
|
||||
{ schemaName, ...tableScope.params },
|
||||
);
|
||||
return new Map(rows.map((row) => [row.table_name, firstNumber(row.row_count) ?? 0]));
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,4 +1,7 @@
|
|||
import type { LiveDatabaseIntrospectionPort } from '../../context/ingest/adapters/live-database/types.js';
|
||||
import type {
|
||||
LiveDatabaseIntrospectionOptions,
|
||||
LiveDatabaseIntrospectionPort,
|
||||
} from '../../context/ingest/adapters/live-database/types.js';
|
||||
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
|
||||
import {
|
||||
KtxSqlServerScanConnector,
|
||||
|
|
@ -18,7 +21,7 @@ export function createSqlServerLiveDatabaseIntrospection(
|
|||
options: CreateSqlServerLiveDatabaseIntrospectionOptions,
|
||||
): LiveDatabaseIntrospectionPort {
|
||||
return {
|
||||
async extractSchema(connectionId: string) {
|
||||
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
|
||||
const connection = options.connections[connectionId] as KtxSqlServerConnectionConfig | undefined;
|
||||
const connector = new KtxSqlServerScanConnector({
|
||||
connectionId,
|
||||
|
|
@ -29,7 +32,11 @@ export function createSqlServerLiveDatabaseIntrospection(
|
|||
});
|
||||
try {
|
||||
return await connector.introspect(
|
||||
{ connectionId, driver: 'sqlserver' },
|
||||
{
|
||||
connectionId,
|
||||
driver: 'sqlserver',
|
||||
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
|
||||
},
|
||||
{ runId: `sqlserver-${connectionId}` },
|
||||
);
|
||||
} finally {
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue