mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-22 08:38:08 +02:00
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure Snowflake setup previously asked for a single schema as free text, then ran a multiselect against the discovered schemas — two schema questions back-to-back, with the first being only a session bootstrap. The SDK's `schema` is optional, so the bootstrap step is unnecessary. - Remove the free-text Snowflake schema prompt; only pass `schema` to snowflake-sdk when one is configured. - When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the user for a comma-separated list, persist it as `schema_names`, and use it as both the table-list filter and the multiselect default. Applies to every driver with a scope-discovery spec, not just Snowflake. - Update docs to lead with `schema_names`; keep `schema_name` as a documented single-schema shorthand. * fix(snowflake): keep introspecting when primary-key discovery is denied The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the connection role may not have. Previously a 'SQL compilation error: Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist or not authorized' aborted the entire introspect — schemas, columns, and row counts were all discarded over a missing nice-to-have. Wrap the constraint query in try/catch, log a one-line warning per schema, and return an empty PK map. Columns end up with primaryKey=false; relationship inference still has FK and profiling to fall back on. * fix(scan): unblock relationship discovery on Snowflake Two adjacent bugs prevented the scan's relationship pipeline from producing any joins on a Snowflake warehouse: - relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table profile query failed with "Unknown function GROUP_CONCAT". Add an explicit Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter (Snowflake requires the delimiter to be a constant, so CHR(31) is rejected). - description-generation.ts destructured `connector.sampleTable` and `connector.sampleColumn` into bare locals, losing the `this` binding when the class-method connectors (Snowflake, Postgres, MySQL) were invoked. Every sample call threw "Cannot read properties of undefined (reading 'assertConnection')" and degraded LLM descriptions to metadata-only prompts. Call the methods through the connector instead. Without these, even after the primary-key probe is allowed to fail softly, the scan ends up with 0 validated relationships and an empty `joins:` block in every shard YAML. * test(scan): cover table-ref helpers * feat(scan): plumb tableScope through live-database introspection port * feat(scan): apply tableScope during metadata fetch * feat(scan): enforce table scope at fetch boundary * feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206) * feat(cli): add RSA key-pair auth option to Snowflake setup wizard Extends the interactive Snowflake setup flow with an authentication-method prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key path (env/file/absolute) and an optional passphrase; the resulting connection config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead of `password`. * feat(scan): pool Snowflake sessions * fix(scan): reuse structural snapshots and cleanup connectors * feat(scan): parallelize relationship profiling * feat(scan): batch table description generation * docs: document Snowflake ingest concurrency knobs * fix(scan): close Snowflake ingest perf verification gaps * fix(scan): keep batched description failure bounded * feat(scan): dispatch query-history probes by connection driver Extract historic-sql dialect resolution into a shared helper so the status-project readiness check and the local ingest factory agree on which connections enable query history and which probe to run. The status command now picks the postgres/snowflake/bigquery probe based on the connection's driver instead of always reporting against postgres, which previously caused snowflake connections with queryHistory.enabled to surface a misleading "driver is snowflake" failure. Also drops a noisy console.warn from Snowflake primary-key discovery — INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only roles and the FK + profiling paths handle the empty PK map already. * fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject The Claude Code agent SDK announces an internal pseudo-tool named StructuredOutput in the system/init message whenever outputFormat is set to { type: 'json_schema' }. The runtime's isolation check built its allowedToolIds set only from MCP tool ids and treated StructuredOutput as an unexpected host-injected tool, so every generateObject call threw "Claude Code runtime isolation failed: tools=StructuredOutput ..." and the table-descriptions and relationship-LLM-proposal enrichment stages recorded null output across the board. Whitelist StructuredOutput specifically in generateObject's allowedToolIds — the check also enforces missing_tools symmetry, so generateText and runAgentLoop, which do not see StructuredOutput, must not require it. generateObject also ran with maxTurns: 1, which the model intermittently breached when it emitted thinking text before the structured response. Raised to 5 to give the schema-bound call enough headroom without allowing unbounded loops. The existing tests now exercise the path with an init message that announces StructuredOutput so the regression cannot slip back in. * chore(scripts): add ktx-reset.sh project-cleanup helper Convenience script for repeatable ingest testing: takes a project directory and prunes everything except ktx.yaml and .ktx/secrets/, so the next ktx setup or ktx ingest run starts from a known-clean state.
This commit is contained in:
parent
b0dd13ce7c
commit
394a985d2a
72 changed files with 3508 additions and 655 deletions
|
|
@ -0,0 +1,48 @@
|
|||
import type { HistoricSqlDialect } from './types.js';
|
||||
|
||||
const KNOWN_DIALECTS = ['postgres', 'bigquery', 'snowflake'] as const;
|
||||
|
||||
function isKnownDialect(value: string): value is HistoricSqlDialect {
|
||||
return (KNOWN_DIALECTS as readonly string[]).includes(value);
|
||||
}
|
||||
|
||||
function recordOrNull(value: unknown): Record<string, unknown> | null {
|
||||
return value && typeof value === 'object' && !Array.isArray(value) ? (value as Record<string, unknown>) : null;
|
||||
}
|
||||
|
||||
function historicSqlRecord(connection: unknown): Record<string, unknown> | null {
|
||||
const conn = recordOrNull(connection);
|
||||
return conn ? recordOrNull(conn.historicSql) : null;
|
||||
}
|
||||
|
||||
function queryHistoryRecord(connection: unknown): Record<string, unknown> | null {
|
||||
const conn = recordOrNull(connection);
|
||||
const context = conn ? recordOrNull(conn.context) : null;
|
||||
return context ? recordOrNull(context.queryHistory) : null;
|
||||
}
|
||||
|
||||
export function isQueryHistoryEnabled(connection: unknown): boolean {
|
||||
const queryHistory = queryHistoryRecord(connection);
|
||||
if (queryHistory) {
|
||||
return queryHistory.enabled === true;
|
||||
}
|
||||
return historicSqlRecord(connection)?.enabled === true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolves the query-history dialect for a connection. Returns null when
|
||||
* query history is disabled, or when the connection's driver has no
|
||||
* query-history reader.
|
||||
*/
|
||||
export function queryHistoryDialectForConnection(connection: unknown): HistoricSqlDialect | null {
|
||||
if (!isQueryHistoryEnabled(connection)) {
|
||||
return null;
|
||||
}
|
||||
const conn = recordOrNull(connection);
|
||||
const driver = String(conn?.driver ?? '').toLowerCase();
|
||||
if (driver === 'postgres' || driver === 'postgresql') return 'postgres';
|
||||
if (driver === 'bigquery') return 'bigquery';
|
||||
if (driver === 'snowflake') return 'snowflake';
|
||||
const legacy = String(historicSqlRecord(connection)?.dialect ?? '').toLowerCase();
|
||||
return isKnownDialect(legacy) ? legacy : null;
|
||||
}
|
||||
|
|
@ -1,6 +1,7 @@
|
|||
import { once } from 'node:events';
|
||||
import { createServer } from 'node:http';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { tableRefSet } from '../../../scan/table-ref.js';
|
||||
import { createDaemonLiveDatabaseIntrospection } from './daemon-introspection.js';
|
||||
|
||||
const daemonResponse = {
|
||||
|
|
@ -161,7 +162,11 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
|
|||
baseUrl: `http://127.0.0.1:${address.port}`,
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).resolves.toMatchObject({
|
||||
await expect(
|
||||
introspection.extractSchema('warehouse', {
|
||||
tableScope: tableRefSet([{ catalog: 'warehouse', db: 'public', name: 'orders' }]),
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
connectionId: 'warehouse',
|
||||
tables: [{ name: 'customers' }, { name: 'orders' }],
|
||||
});
|
||||
|
|
@ -176,6 +181,7 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
|
|||
schemas: ['public'],
|
||||
statement_timeout_ms: 30_000,
|
||||
connection_timeout_seconds: 5,
|
||||
table_scope: [{ catalog: 'warehouse', db: 'public', name: 'orders' }],
|
||||
},
|
||||
},
|
||||
]);
|
||||
|
|
@ -217,7 +223,7 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
|
|||
expect(runJson).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('filters out tables not on the enabled_tables allowlist', async () => {
|
||||
it('does not use connection enabled_tables as a response filter', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
|
|
@ -232,7 +238,8 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
|
|||
});
|
||||
|
||||
const snapshot = await introspection.extractSchema('warehouse');
|
||||
expect(snapshot.tables.map((table) => `${table.db}.${table.name}`)).toEqual(['public.orders']);
|
||||
expect(snapshot.tables.map((table) => `${table.db}.${table.name}`)).toEqual(['public.customers', 'public.orders']);
|
||||
expect(runJson).toHaveBeenCalledWith('database-introspect', expect.not.objectContaining({ table_scope: expect.anything() }));
|
||||
});
|
||||
|
||||
it('passes through every table when enabled_tables is omitted or empty', async () => {
|
||||
|
|
|
|||
|
|
@ -3,10 +3,10 @@ import { request as httpRequest } from 'node:http';
|
|||
import { request as httpsRequest } from 'node:https';
|
||||
import { URL } from 'node:url';
|
||||
import type { KtxProjectConnectionConfig } from '../../../project/config.js';
|
||||
import { filterSnapshotTables, resolveEnabledTables } from '../../../scan/enabled-tables.js';
|
||||
import { tableRefFromKey } from '../../../scan/table-ref.js';
|
||||
import type { KtxSchemaColumn, KtxSchemaForeignKey, KtxSchemaSnapshot, KtxSchemaTable } from '../../../scan/types.js';
|
||||
import { inferKtxDimensionType, normalizeKtxNativeType } from '../../../scan/type-normalization.js';
|
||||
import type { LiveDatabaseIntrospectionPort } from './types.js';
|
||||
import type { LiveDatabaseIntrospectionOptions, LiveDatabaseIntrospectionPort } from './types.js';
|
||||
|
||||
type KtxDaemonDatabaseIntrospectionCommand = 'database-introspect';
|
||||
|
||||
|
|
@ -220,6 +220,18 @@ function mapDaemonSnapshot(
|
|||
};
|
||||
}
|
||||
|
||||
function serializeTableScope(options: LiveDatabaseIntrospectionOptions | undefined): Array<{
|
||||
catalog: string | null;
|
||||
db: string | null;
|
||||
name: string;
|
||||
}> | undefined {
|
||||
if (!options?.tableScope) return undefined;
|
||||
return [...options.tableScope].map((key) => {
|
||||
const ref = tableRefFromKey(key);
|
||||
return { catalog: ref.catalog, db: ref.db, name: ref.name };
|
||||
});
|
||||
}
|
||||
|
||||
export function createDaemonLiveDatabaseIntrospection(
|
||||
options: DaemonLiveDatabaseIntrospectionOptions,
|
||||
): LiveDatabaseIntrospectionPort {
|
||||
|
|
@ -231,8 +243,9 @@ export function createDaemonLiveDatabaseIntrospection(
|
|||
const now = options.now ?? (() => new Date());
|
||||
|
||||
return {
|
||||
async extractSchema(connectionId: string): Promise<KtxSchemaSnapshot> {
|
||||
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions): Promise<KtxSchemaSnapshot> {
|
||||
const connection = requirePostgresConnection(options.connections, connectionId);
|
||||
const tableScope = serializeTableScope(introspectionOptions);
|
||||
const payload = {
|
||||
connection_id: connectionId,
|
||||
driver: normalizeDriver(connection.driver),
|
||||
|
|
@ -240,17 +253,16 @@ export function createDaemonLiveDatabaseIntrospection(
|
|||
schemas,
|
||||
statement_timeout_ms: options.statementTimeoutMs ?? 30_000,
|
||||
connection_timeout_seconds: options.connectionTimeoutSeconds ?? 5,
|
||||
...(tableScope !== undefined ? { table_scope: tableScope } : {}),
|
||||
};
|
||||
const raw = requestJson
|
||||
? await requestJson('/database/introspect', payload)
|
||||
: await runJson('database-introspect', payload);
|
||||
const snapshot = mapDaemonSnapshot(raw, {
|
||||
return mapDaemonSnapshot(raw, {
|
||||
connectionId,
|
||||
extractedAt: now().toISOString(),
|
||||
schemas,
|
||||
});
|
||||
const enabledTables = resolveEnabledTables(connection);
|
||||
return enabledTables ? filterSnapshotTables(snapshot, enabledTables) : snapshot;
|
||||
},
|
||||
};
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,7 +1,8 @@
|
|||
import { mkdtemp } from 'node:fs/promises';
|
||||
import { mkdtemp, readdir, rm } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { tableRefSet, type KtxTableRefKey } from '../../../scan/table-ref.js';
|
||||
import { LiveDatabaseSourceAdapter } from './live-database.adapter.js';
|
||||
|
||||
describe('LiveDatabaseSourceAdapter', () => {
|
||||
|
|
@ -43,7 +44,7 @@ describe('LiveDatabaseSourceAdapter', () => {
|
|||
|
||||
await adapter.fetch(undefined, dir, { connectionId: 'conn-1', sourceKey: 'live-database' });
|
||||
|
||||
expect(extractSchema).toHaveBeenCalledWith('conn-1');
|
||||
expect(extractSchema).toHaveBeenCalledWith('conn-1', { tableScope: undefined });
|
||||
await expect(adapter.detect(dir)).resolves.toBe(true);
|
||||
const chunked = await adapter.chunk(dir);
|
||||
expect(chunked.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']);
|
||||
|
|
@ -56,4 +57,55 @@ describe('LiveDatabaseSourceAdapter', () => {
|
|||
expect(adapter.source).toBe('live-database');
|
||||
expect(adapter.skillNames).toEqual(['live_database_ingest']);
|
||||
});
|
||||
|
||||
it('threads tableScope from fetch context into the introspection port without post-filtering', async () => {
|
||||
const extractSchema = vi.fn(
|
||||
async (_connectionId: string, _options?: { tableScope?: ReadonlySet<KtxTableRefKey> }) => ({
|
||||
connectionId: 'warehouse',
|
||||
driver: 'snowflake' as const,
|
||||
extractedAt: '2026-05-22T00:00:00.000Z',
|
||||
scope: {},
|
||||
metadata: {},
|
||||
tables: [
|
||||
{
|
||||
catalog: 'A',
|
||||
db: 'MARTS',
|
||||
name: 'IN_SCOPE',
|
||||
kind: 'table' as const,
|
||||
comment: null,
|
||||
estimatedRows: 0,
|
||||
columns: [],
|
||||
foreignKeys: [],
|
||||
},
|
||||
{
|
||||
catalog: 'A',
|
||||
db: 'MARTS',
|
||||
name: 'OUT_OF_SCOPE',
|
||||
kind: 'table' as const,
|
||||
comment: null,
|
||||
estimatedRows: 0,
|
||||
columns: [],
|
||||
foreignKeys: [],
|
||||
},
|
||||
],
|
||||
}),
|
||||
);
|
||||
const scope = tableRefSet([{ catalog: 'A', db: 'MARTS', name: 'IN_SCOPE' }]);
|
||||
const adapter = new LiveDatabaseSourceAdapter({
|
||||
introspection: { extractSchema },
|
||||
});
|
||||
const stagedDir = await mkdtemp(join(tmpdir(), 'ktx-livedb-scope-'));
|
||||
try {
|
||||
await adapter.fetch(undefined, stagedDir, {
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'live-database',
|
||||
tableScope: scope,
|
||||
});
|
||||
expect(extractSchema).toHaveBeenCalledWith('warehouse', { tableScope: scope });
|
||||
const tables = await readdir(join(stagedDir, 'tables'));
|
||||
expect(tables).toHaveLength(2);
|
||||
} finally {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -14,7 +14,8 @@ export class LiveDatabaseSourceAdapter implements SourceAdapter {
|
|||
}
|
||||
|
||||
async fetch(_pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
|
||||
const snapshot = await this.deps.introspection.extractSchema(ctx.connectionId);
|
||||
const tableScope = ctx.tableScope;
|
||||
const snapshot = await this.deps.introspection.extractSchema(ctx.connectionId, { tableScope });
|
||||
await writeLiveDatabaseSnapshot(stagedDir, {
|
||||
...snapshot,
|
||||
connectionId: ctx.connectionId,
|
||||
|
|
|
|||
|
|
@ -1,7 +1,12 @@
|
|||
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
|
||||
import type { KtxTableRefKey } from '../../../scan/table-ref.js';
|
||||
|
||||
export interface LiveDatabaseIntrospectionOptions {
|
||||
tableScope?: ReadonlySet<KtxTableRefKey>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseIntrospectionPort {
|
||||
extractSchema(connectionId: string): Promise<KtxSchemaSnapshot>;
|
||||
extractSchema(connectionId: string, options?: LiveDatabaseIntrospectionOptions): Promise<KtxSchemaSnapshot>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseSourceAdapterDeps {
|
||||
|
|
|
|||
|
|
@ -9,6 +9,7 @@ import { sanitizeMemoryFlowError } from './memory-flow/live-buffer.js';
|
|||
import type { MemoryFlowEventSink, MemoryFlowPlannedWorkUnit } from './memory-flow/types.js';
|
||||
import { buildSyncId } from './raw-sources-paths.js';
|
||||
import { SqliteLocalIngestStore } from './sqlite-local-ingest-store.js';
|
||||
import type { KtxTableRefKey } from '../scan/table-ref.js';
|
||||
import type { IngestTrigger, SourceAdapter, WorkUnit } from './types.js';
|
||||
|
||||
type LocalIngestStatus = 'running' | 'done' | 'error';
|
||||
|
|
@ -62,6 +63,7 @@ export interface RunLocalStageOnlyIngestOptions {
|
|||
now?: () => Date;
|
||||
dryRun?: boolean;
|
||||
memoryFlow?: MemoryFlowEventSink;
|
||||
tableScope?: ReadonlySet<KtxTableRefKey>;
|
||||
}
|
||||
|
||||
const LOCAL_AUTHOR = 'ktx';
|
||||
|
|
@ -225,6 +227,7 @@ async function prepareLocalStagedDir(
|
|||
stagedDir: string,
|
||||
sourceDir: string | undefined,
|
||||
connectionId: string,
|
||||
tableScope: ReadonlySet<KtxTableRefKey> | undefined,
|
||||
): Promise<string | null> {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
await mkdir(stagedDir, { recursive: true });
|
||||
|
|
@ -242,7 +245,7 @@ async function prepareLocalStagedDir(
|
|||
);
|
||||
}
|
||||
const pullConfig = await localPullConfigForAdapter(project, adapter, connectionId);
|
||||
await adapter.fetch(pullConfig, stagedDir, { connectionId, sourceKey: adapter.source });
|
||||
await adapter.fetch(pullConfig, stagedDir, { connectionId, sourceKey: adapter.source, tableScope });
|
||||
return null;
|
||||
}
|
||||
|
||||
|
|
@ -274,7 +277,14 @@ async function runLocalStageOnlyIngestInner(options: RunLocalStageOnlyIngestOpti
|
|||
assertCompatibleExistingRun(existingRun, runId, adapter.source, connectionId);
|
||||
|
||||
const stagedDir = join(options.project.projectDir, '.ktx/cache/local-ingest', runId, 'staged');
|
||||
const sourceDir = await prepareLocalStagedDir(options.project, adapter, stagedDir, options.sourceDir, connectionId);
|
||||
const sourceDir = await prepareLocalStagedDir(
|
||||
options.project,
|
||||
adapter,
|
||||
stagedDir,
|
||||
options.sourceDir,
|
||||
connectionId,
|
||||
options.tableScope,
|
||||
);
|
||||
|
||||
const detected = await adapter.detect(stagedDir);
|
||||
if (!detected) {
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ import type { KtxEmbeddingPort } from '../core/embedding.js';
|
|||
import type { MemoryAction } from '../../context/memory/types.js';
|
||||
import type { SemanticLayerService } from '../../context/sl/semantic-layer.service.js';
|
||||
import type { TouchedSlSource } from '../../context/tools/touched-sl-sources.js';
|
||||
import type { KtxTableRefKey } from '../scan/table-ref.js';
|
||||
import type { MemoryFlowEventSink } from './memory-flow/types.js';
|
||||
import type { StageIndex } from './stages/stage-index.types.js';
|
||||
import type { WorkUnitOutcome } from './stages/stage-3-work-units.js';
|
||||
|
|
@ -52,6 +53,7 @@ export interface ChunkResult {
|
|||
export interface FetchContext {
|
||||
connectionId: string;
|
||||
sourceKey: string;
|
||||
tableScope?: ReadonlySet<KtxTableRefKey>;
|
||||
memoryFlow?: MemoryFlowEventSink;
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue