fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)

* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure Snowflake setup previously asked for a single schema as free text, then ran a multiselect against the discovered schemas — two schema questions back-to-back, with the first being only a session bootstrap. The SDK's `schema` is optional, so the bootstrap step is unnecessary. - Remove the free-text Snowflake schema prompt; only pass `schema` to snowflake-sdk when one is configured. - When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the user for a comma-separated list, persist it as `schema_names`, and use it as both the table-list filter and the multiselect default. Applies to every driver with a scope-discovery spec, not just Snowflake. - Update docs to lead with `schema_names`; keep `schema_name` as a documented single-schema shorthand. * fix(snowflake): keep introspecting when primary-key discovery is denied The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the connection role may not have. Previously a 'SQL compilation error: Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist or not authorized' aborted the entire introspect — schemas, columns, and row counts were all discarded over a missing nice-to-have. Wrap the constraint query in try/catch, log a one-line warning per schema, and return an empty PK map. Columns end up with primaryKey=false; relationship inference still has FK and profiling to fall back on. * fix(scan): unblock relationship discovery on Snowflake Two adjacent bugs prevented the scan's relationship pipeline from producing any joins on a Snowflake warehouse: - relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table profile query failed with "Unknown function GROUP_CONCAT". Add an explicit Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter (Snowflake requires the delimiter to be a constant, so CHR(31) is rejected). - description-generation.ts destructured `connector.sampleTable` and `connector.sampleColumn` into bare locals, losing the `this` binding when the class-method connectors (Snowflake, Postgres, MySQL) were invoked. Every sample call threw "Cannot read properties of undefined (reading 'assertConnection')" and degraded LLM descriptions to metadata-only prompts. Call the methods through the connector instead. Without these, even after the primary-key probe is allowed to fail softly, the scan ends up with 0 validated relationships and an empty `joins:` block in every shard YAML. * test(scan): cover table-ref helpers * feat(scan): plumb tableScope through live-database introspection port * feat(scan): apply tableScope during metadata fetch * feat(scan): enforce table scope at fetch boundary * feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206) * feat(cli): add RSA key-pair auth option to Snowflake setup wizard Extends the interactive Snowflake setup flow with an authentication-method prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key path (env/file/absolute) and an optional passphrase; the resulting connection config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead of `password`. * feat(scan): pool Snowflake sessions * fix(scan): reuse structural snapshots and cleanup connectors * feat(scan): parallelize relationship profiling * feat(scan): batch table description generation * docs: document Snowflake ingest concurrency knobs * fix(scan): close Snowflake ingest perf verification gaps * fix(scan): keep batched description failure bounded * feat(scan): dispatch query-history probes by connection driver Extract historic-sql dialect resolution into a shared helper so the status-project readiness check and the local ingest factory agree on which connections enable query history and which probe to run. The status command now picks the postgres/snowflake/bigquery probe based on the connection's driver instead of always reporting against postgres, which previously caused snowflake connections with queryHistory.enabled to surface a misleading "driver is snowflake" failure. Also drops a noisy console.warn from Snowflake primary-key discovery — INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only roles and the FK + profiling paths handle the empty PK map already. * fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject The Claude Code agent SDK announces an internal pseudo-tool named StructuredOutput in the system/init message whenever outputFormat is set to { type: 'json_schema' }. The runtime's isolation check built its allowedToolIds set only from MCP tool ids and treated StructuredOutput as an unexpected host-injected tool, so every generateObject call threw "Claude Code runtime isolation failed: tools=StructuredOutput ..." and the table-descriptions and relationship-LLM-proposal enrichment stages recorded null output across the board. Whitelist StructuredOutput specifically in generateObject's allowedToolIds — the check also enforces missing_tools symmetry, so generateText and runAgentLoop, which do not see StructuredOutput, must not require it. generateObject also ran with maxTurns: 1, which the model intermittently breached when it emitted thinking text before the structured response. Raised to 5 to give the schema-bound call enough headroom without allowing unbounded loops. The existing tests now exercise the path with an init message that announces StructuredOutput so the regression cannot slip back in. * chore(scripts): add ktx-reset.sh project-cleanup helper Convenience script for repeatable ingest testing: takes a project directory and prunes everything except ktx.yaml and .ktx/secrets/, so the next ktx setup or ktx ingest run starts from a known-clean state.
2026-06-13 08:15:14 +02:00 · 2026-05-23 10:41:30 +02:00 · 2026-05-23 10:41:30 +02:00 · 394a985d2a
commit 394a985d2a
parent b0dd13ce7c
72 changed files with 3508 additions and 655 deletions
--- a/packages/cli/src/setup-databases.ts
+++ b/packages/cli/src/setup-databases.ts
@ -343,6 +343,13 @@ function historicSqlProbeFailureLines(error: unknown): string[] {
    ];
  }
  if (error instanceof Error && error.name === 'HistoricSqlGrantsMissingError') {
+    const dialect = (error as { dialect?: unknown }).dialect;
+    if (dialect === 'snowflake') {
+      return [
+        '  FAIL Snowflake role cannot read SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY',
+        '  Fix: Run (as ACCOUNTADMIN): GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;',
+      ];
+    }
    return [
      '  FAIL Postgres connection role lacks pg_read_all_stats',
      '  Fix: Run: GRANT pg_read_all_stats TO <connection role>;',
@ -355,10 +362,18 @@ function historicSqlProbeFailureLines(error: unknown): string[] {
 }

 async function defaultHistoricSqlProbe(input: KtxSetupHistoricSqlProbeInput): Promise<KtxSetupHistoricSqlProbeResult> {
-  if (input.dialect !== 'postgres') {
-    return { ok: true, lines: [] };
+  if (input.dialect === 'postgres') {
+    return probePostgresHistoricSql(input);
  }
+  if (input.dialect === 'snowflake') {
+    return probeSnowflakeHistoricSql(input);
+  }
+  return { ok: true, lines: [] };
+}

+async function probePostgresHistoricSql(
+  input: KtxSetupHistoricSqlProbeInput,
+): Promise<KtxSetupHistoricSqlProbeResult> {
  const project = await loadKtxProject({ projectDir: input.projectDir });
  const connection = project.config.connections[input.connectionId];
  const [{ PostgresPgssReader }, { KtxPostgresHistoricSqlQueryClient }, { isKtxPostgresConnectionConfig }] =
@ -396,6 +411,46 @@ async function defaultHistoricSqlProbe(input: KtxSetupHistoricSqlProbeInput): Pr
  }
 }

+async function probeSnowflakeHistoricSql(
+  input: KtxSetupHistoricSqlProbeInput,
+): Promise<KtxSetupHistoricSqlProbeResult> {
+  const project = await loadKtxProject({ projectDir: input.projectDir });
+  const connection = project.config.connections[input.connectionId];
+  const [{ SnowflakeHistoricSqlQueryHistoryReader }, { KtxSnowflakeHistoricSqlQueryClient }, { isKtxSnowflakeConnectionConfig }] =
+    await Promise.all([
+      import('./context/ingest/adapters/historic-sql/snowflake-query-history-reader.js'),
+      import('./connectors/snowflake/historic-sql-query-client.js'),
+      import('./connectors/snowflake/connector.js'),
+    ]);
+
+  if (!isKtxSnowflakeConnectionConfig(connection)) {
+    return {
+      ok: false,
+      lines: [`  FAIL Connection ${input.connectionId} is not a native Snowflake connection.`],
+    };
+  }
+
+  const client = new KtxSnowflakeHistoricSqlQueryClient({
+    connectionId: input.connectionId,
+    connection,
+    projectDir: input.projectDir,
+  });
+  try {
+    const result = await new SnowflakeHistoricSqlQueryHistoryReader().probe(client);
+    return {
+      ok: true,
+      lines: [
+        '  OK SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY accessible',
+        ...result.warnings.map((warning: string) => `  ! ${warning}`),
+      ],
+    };
+  } catch (error) {
+    return { ok: false, lines: historicSqlProbeFailureLines(error) };
+  } finally {
+    await client.cleanup();
+  }
+}
+
 async function defaultListSchemas(projectDir: string, connectionId: string): Promise<string[]> {
  const project = await loadKtxProject({ projectDir });
  const connection = project.config.connections[connectionId];
@ -459,7 +514,7 @@ async function defaultListSchemas(projectDir: string, connectionId: string): Pro
  if (driver === 'snowflake') {
    const { KtxSnowflakeScanConnector, isKtxSnowflakeConnectionConfig } = await import('./connectors/snowflake/connector.js');;
    if (!isKtxSnowflakeConnectionConfig(connection)) return [];
-    const connector = new KtxSnowflakeScanConnector({ connectionId, connection });
+    const connector = new KtxSnowflakeScanConnector({ connectionId, connection, projectDir });
    try {
      return await connector.listSchemas();
    } finally {
@ -535,7 +590,7 @@ async function defaultListTables(
  if (driver === 'snowflake') {
    const { KtxSnowflakeScanConnector, isKtxSnowflakeConnectionConfig } = await import('./connectors/snowflake/connector.js');;
    if (!isKtxSnowflakeConnectionConfig(connection)) return [];
-    const connector = new KtxSnowflakeScanConnector({ connectionId, connection });
+    const connector = new KtxSnowflakeScanConnector({ connectionId, connection, projectDir });
    try {
      return await connector.listTables(schemas);
    } finally {
@ -954,43 +1009,86 @@ async function buildConnectionConfig(input: {
      stringConfigField(input.existingConnection, 'database'),
    );
    if (database === undefined) return 'back';
-    const schemaName = await promptText(
-      prompts,
-      'Snowflake schema\nPress Enter for PUBLIC, or enter a schema name.',
-      stringConfigField(input.existingConnection, 'schema_name') ?? 'PUBLIC',
-    );
-    if (schemaName === undefined) return 'back';
    const username = await promptText(
      prompts,
      'Snowflake username',
      stringConfigField(input.existingConnection, 'username'),
    );
    if (username === undefined) return 'back';
-    const passwordRef = await promptCredential({
-      prompts,
-      message: 'Snowflake password',
-      projectDir: args.projectDir,
-      connectionId: input.connectionId,
-      secretName: 'password', // pragma: allowlist secret
+    const authChoice = await prompts.select({
+      message: 'Snowflake authentication method',
+      options: [
+        { value: 'password', label: 'Password' },
+        { value: 'rsa', label: 'Key-pair (RSA / JWT)' },
+        { value: 'back', label: 'Back' },
+      ],
    });
-    if (passwordRef === 'back') return 'back'; // pragma: allowlist secret
+    if (authChoice === 'back') return 'back';
+    const authMethod: 'password' | 'rsa' = authChoice === 'rsa' ? 'rsa' : 'password';
+    let passwordRef: string | null = null;
+    let privateKeyInput: string | undefined;
+    let passphraseRef: string | null = null;
+    if (authMethod === 'password') {
+      const ref = await promptCredential({
+        prompts,
+        message: 'Snowflake password',
+        projectDir: args.projectDir,
+        connectionId: input.connectionId,
+        secretName: 'password', // pragma: allowlist secret
+      });
+      if (ref === 'back') return 'back'; // pragma: allowlist secret
+      passwordRef = ref;
+    } else {
+      privateKeyInput = await promptText(
+        prompts,
+        'Path to Snowflake private key (PEM)\nFor example ~/.ssh/snowflake_rsa_key.p8, or $ENV_VAR / env:NAME / file:/abs/path.',
+        displayFileReference(stringConfigField(input.existingConnection, 'privateKey')),
+      );
+      if (privateKeyInput === undefined) return 'back';
+      const phr = await promptCredential({
+        prompts,
+        message: 'Private key passphrase (optional)\nPress Enter to skip.',
+        projectDir: args.projectDir,
+        connectionId: input.connectionId,
+        secretName: 'snowflake-passphrase', // pragma: allowlist secret
+      });
+      if (phr === 'back') return 'back';
+      passphraseRef = phr;
+    }
    const role = await promptText(
      prompts,
      'Snowflake role (optional)\nPress Enter to skip.',
      stringConfigField(input.existingConnection, 'role'),
    );
    if (role === undefined) return 'back';
-    const resolvedPasswordRef = passwordRef ?? stringConfigField(input.existingConnection, 'password');
-    if (!account || !warehouse || !database || !schemaName || !username || !resolvedPasswordRef) return null;
+    if (authMethod === 'password') {
+      const resolvedPasswordRef = passwordRef ?? stringConfigField(input.existingConnection, 'password');
+      if (!account || !warehouse || !database || !username || !resolvedPasswordRef) return null;
+      return {
+        driver: 'snowflake',
+        authMethod: 'password',
+        account,
+        warehouse,
+        database,
+        username,
+        password: resolvedPasswordRef,
+        ...(role ? { role } : {}),
+      };
+    }
+    const resolvedPrivateKey = privateKeyInput
+      ? normalizeFileReference(privateKeyInput)
+      : stringConfigField(input.existingConnection, 'privateKey');
+    if (!account || !warehouse || !database || !username || !resolvedPrivateKey) return null;
+    const resolvedPassphrase = passphraseRef ?? stringConfigField(input.existingConnection, 'passphrase');
    return {
      driver: 'snowflake',
-      authMethod: 'password',
+      authMethod: 'rsa',
      account,
      warehouse,
      database,
-      schema_name: schemaName,
      username,
-      password: resolvedPasswordRef,
+      privateKey: resolvedPrivateKey,
+      ...(resolvedPassphrase ? { passphrase: resolvedPassphrase } : {}),
      ...(role ? { role } : {}),
    };
  }
@ -1425,6 +1523,21 @@ async function writeScopeConfig(input: {
  });
 }

+async function promptCommaSeparatedScope(input: {
+  prompts: KtxSetupDatabasesPromptAdapter;
+  connectionId: string;
+  spec: ScopeDiscoverySpec;
+}): Promise<string[] | undefined> {
+  const example =
+    input.spec.nounPlural === 'datasets' ? 'sales, marketing' : 'SALES, MARKETING';
+  const value = await promptText(
+    input.prompts,
+    `Enter ${input.spec.nounPlural} for ${input.connectionId} as a comma-separated list (e.g. ${example}).`,
+  );
+  if (value === undefined) return undefined;
+  return unique(value.split(',').map((part) => part.trim()));
+}
+
 async function maybeConfigureDatabaseScope(input: {
  projectDir: string;
  connectionId: string;
@ -1494,28 +1607,48 @@ async function maybeConfigureDatabaseScope(input: {

  writeSetupSection(input.io, 'Discovering tables', [`Connecting to ${input.connectionId}…`]);

-  const schemas = unique(
-    cliSchemas.length > 0
-      ? cliSchemas
-      : await (async (): Promise<string[]> => {
-          if (!spec) return [];
-          try {
-            return await (input.deps.listSchemas ?? defaultListSchemas)(input.projectDir, input.connectionId);
-          } catch (error) {
-            const detail = error instanceof Error ? error.message : String(error);
-            input.io.stderr.write(
-              `Could not discover ${spec.promptLabel.toLowerCase()} for ${input.connectionId}; ${detail}\n`,
-            );
-            return [];
-          }
-        })(),
-  );
+  let effectiveCliSchemas = cliSchemas;
+  let listedSchemas: string[];
+  if (cliSchemas.length > 0) {
+    listedSchemas = cliSchemas;
+  } else if (!spec) {
+    listedSchemas = [];
+  } else {
+    try {
+      listedSchemas = await (input.deps.listSchemas ?? defaultListSchemas)(
+        input.projectDir,
+        input.connectionId,
+      );
+    } catch (error) {
+      const detail = error instanceof Error ? error.message : String(error);
+      input.io.stderr.write(
+        `Could not discover ${spec.promptLabel.toLowerCase()} for ${input.connectionId}; ${detail}\n`,
+      );
+      const typed = await promptCommaSeparatedScope({
+        prompts: input.prompts,
+        connectionId: input.connectionId,
+        spec,
+      });
+      if (typed === undefined) return 'back';
+      effectiveCliSchemas = typed;
+      listedSchemas = typed;
+      if (typed.length > 0) {
+        await writeScopeConfig({
+          projectDir: input.projectDir,
+          connectionId: input.connectionId,
+          values: typed,
+          spec,
+        });
+      }
+    }
+  }
+  const schemas = unique(listedSchemas);
  if (spec && schemas.length === 0) {
    return 'ready';
  }
  const schemaSuggestion =
-    cliSchemas.length > 0
-      ? { excluded: new Set<string>(), suggested: new Set(cliSchemas) }
+    effectiveCliSchemas.length > 0
+      ? { excluded: new Set<string>(), suggested: new Set(effectiveCliSchemas) }
      : spec?.suggest(schemas) ?? { excluded: new Set<string>(), suggested: new Set<string>() };
  const existingEnabled =
    hasExistingTables && input.forcePrompt === true
@ -1533,7 +1666,7 @@ async function maybeConfigureDatabaseScope(input: {
        schemaSuggestion,
        existing: { enabledTables: existingEnabled },
        supportsSchemaScope: spec !== undefined,
-        initialSchemas: cliSchemas.length > 0 ? cliSchemas : undefined,
+        initialSchemas: effectiveCliSchemas.length > 0 ? effectiveCliSchemas : undefined,
        prompts: input.prompts,
        listTablesForSchemas: (selectedSchemas) =>
          (input.deps.listTables ?? defaultListTables)(input.projectDir, input.connectionId, selectedSchemas),
@ -1638,7 +1771,12 @@ async function maybeRunHistoricSqlSetupProbe(input: {
  const connection = project.config.connections[input.connectionId];
  const queryHistory = queryHistoryConfigRecord(connection) ?? historicSqlConfigRecord(connection);
  const driver = normalizeDriver(connection?.driver);
-  if (queryHistory?.enabled !== true || driver !== 'postgres') {
+  if (queryHistory?.enabled !== true) {
+    return;
+  }
+  const dialect: 'postgres' | 'snowflake' | null =
+    driver === 'postgres' ? 'postgres' : driver === 'snowflake' ? 'snowflake' : null;
+  if (!dialect) {
    return;
  }

@ -1647,13 +1785,13 @@ async function maybeRunHistoricSqlSetupProbe(input: {
  const result = await probe({
    projectDir: input.projectDir,
    connectionId: input.connectionId,
-    dialect: 'postgres',
+    dialect,
  });
  for (const line of result.lines) {
    input.io.stdout.write(`│${line}\n`);
  }
  if (!result.ok) {
-    input.io.stdout.write('│  Setup written; first ingest run will fail until fixed.\n');
+    input.io.stdout.write('│  Setup written; query history will be skipped until fixed.\n');
  }
 }