chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
import { getKtxCliPackageInfo } from './cli-runtime.js' ;
import { loadKtxProject , type KtxLocalProject } from './context/project/project.js' ;
2026-05-29 17:41:04 +02:00
import type { KtxProjectConfig , KtxProjectConnectionConfig } from './context/project/config.js' ;
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
import type { KtxProgressPort } from './context/scan/types.js' ;
2026-05-10 23:51:24 +02:00
import type { KtxCliIo } from './index.js' ;
2026-05-13 17:01:48 +02:00
import type { KtxIngestArgs , KtxIngestDeps , KtxIngestProgressUpdate } from './ingest.js' ;
2026-05-29 17:41:04 +02:00
import { isDatabaseDriver , normalizeConnectionDriver } from './connection-drivers.js' ;
2026-06-03 17:19:42 +02:00
import { resolveQueryHistoryScopeFloor } from './context/ingest/adapters/historic-sql/scope-floor.js' ;
2026-05-17 10:27:29 +02:00
import {
ensureManagedPythonCommandRuntime ,
type KtxManagedPythonInstallPolicy ,
type ManagedPythonCommandRuntime ,
} from './managed-python-command.js' ;
import type { KtxRuntimeFeature } from './managed-python-runtime.js' ;
2026-06-01 23:31:31 +02:00
import {
publicDatabaseIngestMessage ,
publicIngestOutputLine ,
publicQueryHistoryMessage ,
} from './public-ingest-copy.js' ;
import { createAggregateProgressPort } from './progress-port-adapter.js' ;
2026-05-17 10:27:29 +02:00
import { resolvePublicIngestRuntimeRequirements } from './runtime-requirements.js' ;
2026-05-13 17:01:48 +02:00
import type { KtxScanArgs , KtxScanDeps } from './scan.js' ;
2026-06-03 17:19:42 +02:00
import type { KtxTableRef } from './context/scan/types.js' ;
2026-05-10 23:12:26 +02:00
import { profileMark } from './startup-profile.js' ;
2026-05-22 18:18:47 +02:00
import { isDemoConnection } from './telemetry/demo-detect.js' ;
2026-06-05 19:36:21 +02:00
import { emitProjectStackSnapshot , emitTelemetryEvent , reportException } from './telemetry/index.js' ;
import { collectTelemetryRedactionSecrets } from './telemetry/redaction-secrets.js' ;
2026-06-02 17:23:51 +02:00
import { formatErrorDetail } from './telemetry/scrubber.js' ;
2026-05-10 23:12:26 +02:00
profileMark ( 'module:public-ingest' ) ;
2026-05-14 01:43:06 +02:00
type KtxPublicIngestStepName = 'database-schema' | 'query-history' | 'source-ingest' | 'memory-update' ;
2026-05-13 13:33:28 +02:00
type KtxPublicIngestStepStatus = 'done' | 'skipped' | 'failed' | 'not-run' ;
type KtxPublicIngestInputMode = 'auto' | 'disabled' ;
2026-05-14 01:43:06 +02:00
type KtxPublicIngestQueryHistoryFlag = 'default' | 'enabled' | 'disabled' ;
type HistoricSqlDialect = 'postgres' | 'bigquery' | 'snowflake' ;
2026-05-10 23:12:26 +02:00
2026-05-10 23:51:24 +02:00
export type KtxPublicIngestArgs =
2026-05-14 01:43:06 +02:00
{
command : 'run' ;
projectDir : string ;
targetConnectionId? : string ;
all : boolean ;
json : boolean ;
inputMode : KtxPublicIngestInputMode ;
queryHistory? : KtxPublicIngestQueryHistoryFlag ;
queryHistoryWindowDays? : number ;
scanMode? : Extract < KtxScanArgs , { command : 'run' } > [ 'mode' ] ;
detectRelationships? : boolean ;
cliVersion? : string ;
runtimeInstallPolicy? : KtxManagedPythonInstallPolicy ;
} ;
2026-05-10 23:12:26 +02:00
2026-05-10 23:51:24 +02:00
export interface KtxPublicIngestPlanTarget {
2026-05-10 23:12:26 +02:00
connectionId : string ;
driver : string ;
2026-05-14 01:43:06 +02:00
operation : 'database-ingest' | 'source-ingest' ;
2026-05-10 23:12:26 +02:00
adapter? : string ;
sourceDir? : string ;
debugCommand : string ;
2026-05-10 23:51:24 +02:00
steps : KtxPublicIngestStepName [ ] ;
2026-05-14 01:43:06 +02:00
detectRelationships? : boolean ;
preflightFailure? : string ;
queryHistory ? : {
enabled : boolean ;
dialect? : HistoricSqlDialect ;
windowDays? : number ;
pullConfig? : Record < string , unknown > ;
unsupported? : boolean ;
} ;
2026-05-10 23:12:26 +02:00
}
2026-05-10 23:51:24 +02:00
export interface KtxPublicIngestPlan {
2026-05-10 23:12:26 +02:00
projectDir : string ;
2026-05-10 23:51:24 +02:00
targets : KtxPublicIngestPlanTarget [ ] ;
2026-05-14 01:43:06 +02:00
warnings : string [ ] ;
notices? : string [ ] ;
2026-05-10 23:12:26 +02:00
}
2026-05-10 23:51:24 +02:00
export interface KtxPublicIngestTargetResult {
2026-05-10 23:12:26 +02:00
connectionId : string ;
driver : string ;
steps : Array < {
2026-05-10 23:51:24 +02:00
operation : KtxPublicIngestStepName ;
status : KtxPublicIngestStepStatus ;
2026-05-10 23:12:26 +02:00
detail? : string ;
debugCommand? : string ;
} > ;
}
2026-05-10 23:51:24 +02:00
export type KtxPublicIngestProject = Pick < KtxLocalProject , 'projectDir' | 'config' > ;
2026-05-10 23:12:26 +02:00
2026-05-14 01:43:06 +02:00
type KtxPublicIngestPhaseKey = 'database-schema' | 'query-history' | 'source-ingest' ;
2026-05-10 23:51:24 +02:00
export interface KtxPublicIngestDeps {
fix(cli): resolve managed-embeddings daemon URL at project boundary (#184)
A clean `ktx setup` was failing verification because the managed
local-embeddings daemon URL was passed library-side through
`process.env[KTX_MANAGED_SENTENCE_TRANSFORMERS_BASE_URL]`, and the setup
flow never wrote that variable. With no resolved URL the embedding
provider was null, the deep scan emitted
`scan_enrichment_backend_not_configured`, descriptions + embeddings
stayed `skipped`, and the agent-readiness check exited 1.
Replace the env-var indirection with CLI-side substitution at the
project-load boundary. New `loadKtxCliProject` wraps `loadKtxProject`,
ensures the managed daemon when `managed:local-embeddings` is present in
`config.ingest.embeddings` or `config.scan.enrichment.embeddings`, and
substitutes the resolved baseUrl into the in-memory config. Runtime
entry points (scan, ingest, public-ingest, admin-reindex) use the new
loader; setup-time persistence paths keep raw `loadKtxProject` so the
on-disk `ktx.yaml` keeps the portable sentinel.
Cleanup follows from the new design: drop
`MANAGED_SENTENCE_TRANSFORMERS_BASE_URL_ENV`, remove the env-var lookup
branch in `resolveSentenceTransformersBaseUrl`, drop the `env` field
from `ManagedLocalEmbeddingsDaemon`, and collapse the manual
daemon-ensure dance in `admin-reindex.ts`.
2026-05-20 14:43:02 +02:00
loadProject ? : ( options : { projectDir : string } ) = > Promise < KtxPublicIngestProject > ;
2026-05-13 17:01:48 +02:00
runScan ? : ( args : KtxScanArgs , io : KtxCliIo , deps? : KtxScanDeps ) = > Promise < number > ;
runIngest ? : ( args : KtxIngestArgs , io : KtxCliIo , deps? : KtxIngestDeps ) = > Promise < number > ;
2026-05-14 01:43:06 +02:00
runContextBuild ? : (
project : KtxPublicIngestProject ,
args : KtxPublicContextBuildArgs ,
io : KtxCliIo ,
) = > Promise < { exitCode : number } > ;
2026-05-13 17:01:48 +02:00
scanProgress? : KtxProgressPort ;
ingestProgress ? : ( update : KtxIngestProgressUpdate ) = > void ;
2026-05-17 10:27:29 +02:00
ensureRuntime ? : ( options : {
cliVersion : string ;
installPolicy : KtxManagedPythonInstallPolicy ;
io : KtxCliIo ;
feature : KtxRuntimeFeature ;
} ) = > Promise < ManagedPythonCommandRuntime > ;
env? : NodeJS.ProcessEnv ;
2026-05-16 11:39:43 +02:00
runtimeIo? : KtxCliIo ;
2026-05-14 01:43:06 +02:00
onPhaseStart ? : ( phaseKey : KtxPublicIngestPhaseKey ) = > void ;
onPhaseEnd ? : ( phaseKey : KtxPublicIngestPhaseKey , status : 'done' | 'failed' | 'skipped' , summary? : string ) = > void ;
}
interface KtxPublicContextBuildArgs {
projectDir : string ;
inputMode : 'auto' | 'disabled' ;
targetConnectionId? : string ;
all? : boolean ;
queryHistory? : KtxPublicIngestQueryHistoryFlag ;
queryHistoryWindowDays? : number ;
scanMode? : Extract < KtxScanArgs , { command : 'run' } > [ 'mode' ] ;
detectRelationships? : boolean ;
cliVersion? : string ;
runtimeInstallPolicy? : KtxManagedPythonInstallPolicy ;
2026-05-10 23:12:26 +02:00
}
const sourceAdapterByDriver = new Map < string , string > ( [
[ 'metabase' , 'metabase' ] ,
[ 'local_metabase' , 'metabase' ] ,
[ 'looker' , 'looker' ] ,
[ 'notion' , 'notion' ] ,
2026-06-27 14:41:32 -07:00
[ 'gdrive' , 'gdrive' ] ,
2026-05-10 23:12:26 +02:00
[ 'metricflow' , 'metricflow' ] ,
[ 'dbt' , 'dbt' ] ,
[ 'lookml' , 'lookml' ] ,
] ) ;
2026-06-01 23:31:31 +02:00
export function publicProgressMessage ( message : string , target : KtxPublicIngestPlanTarget ) : string {
let current = message ;
if ( target . operation === 'database-ingest' ) {
current = publicDatabaseIngestMessage ( current ) ;
}
if ( target . steps . includes ( 'query-history' ) ) {
current = publicQueryHistoryMessage ( current , target . connectionId ) ;
}
return current ;
}
2026-05-14 01:43:06 +02:00
const queryHistoryDialectByDriver = new Map < string , HistoricSqlDialect > ( [
[ 'postgres' , 'postgres' ] ,
[ 'bigquery' , 'bigquery' ] ,
[ 'snowflake' , 'snowflake' ] ,
2026-05-10 23:12:26 +02:00
] ) ;
2026-05-14 01:43:06 +02:00
interface KtxUnsupportedQueryHistoryWarning {
connectionId : string ;
driver : string ;
reason : 'explicit' | 'stored' ;
}
interface KtxPublicIngestWarningAccumulator {
warnings : string [ ] ;
ignoredQueryHistoryForSources : string [ ] ;
unsupportedQueryHistoryForDatabases : KtxUnsupportedQueryHistoryWarning [ ] ;
}
function createWarningAccumulator ( ) : KtxPublicIngestWarningAccumulator {
return {
warnings : [ ] ,
ignoredQueryHistoryForSources : [ ] ,
unsupportedQueryHistoryForDatabases : [ ] ,
} ;
}
function sourceIgnoredWarning ( option : string , connectionIds : string [ ] , all : boolean ) : string | null {
if ( connectionIds . length === 0 ) {
return null ;
}
if ( all ) {
const sourceLabel =
connectionIds . length === 1 ? '1 non-database source' : ` ${ connectionIds . length } non-database sources ` ;
return ` ${ option } ignored for ${ sourceLabel } . ` ;
}
return ` ${ option } affects database ingest only; ignoring it for ${ connectionIds [ 0 ] } . ` ;
}
function unsupportedDriverList ( entries : KtxUnsupportedQueryHistoryWarning [ ] ) : string {
return [ . . . new Set ( entries . map ( ( entry ) = > entry . driver ) ) ]
. sort ( ( left , right ) = > left . localeCompare ( right ) )
. join ( ', ' ) ;
}
function unsupportedQueryHistoryWarnings (
entries : KtxUnsupportedQueryHistoryWarning [ ] ,
all : boolean ,
) : string [ ] {
if ( entries . length === 0 ) {
return [ ] ;
}
const warnings : string [ ] = [ ] ;
const explicitEntries = entries . filter ( ( entry ) = > entry . reason === 'explicit' ) ;
const storedEntries = entries . filter ( ( entry ) = > entry . reason === 'stored' ) ;
if ( explicitEntries . length === 1 || ( ! all && explicitEntries . length > 0 ) ) {
warnings . push (
. . . explicitEntries . map (
( entry ) = >
` --query-history is not supported for ${ entry . driver } ; running schema ingest for ${ entry . connectionId } . ` ,
) ,
) ;
} else if ( explicitEntries . length > 1 ) {
warnings . push (
` --query-history is not supported for ${ explicitEntries . length } database connections ( ${ unsupportedDriverList (
explicitEntries ,
) } ) ; running schema ingest for those connections . ` ,
) ;
}
if ( storedEntries . length === 1 || ( ! all && storedEntries . length > 0 ) ) {
warnings . push (
. . . storedEntries . map (
( entry ) = >
` ${ entry . connectionId } has query history enabled in ktx.yaml, but ${ entry . driver } does not support it; running schema ingest. ` ,
) ,
) ;
} else if ( storedEntries . length > 1 ) {
warnings . push (
` ${ storedEntries . length } database connections have query history enabled in ktx.yaml, but their drivers do not support it; running schema ingest for those connections. ` ,
) ;
}
return warnings ;
}
function finalizeWarnings (
accumulator : KtxPublicIngestWarningAccumulator ,
args : {
all : boolean ;
queryHistory? : KtxPublicIngestQueryHistoryFlag ;
queryHistoryWindowDays? : number ;
} ,
) : string [ ] {
const warnings = [
. . . accumulator . warnings ,
. . . unsupportedQueryHistoryWarnings ( accumulator . unsupportedQueryHistoryForDatabases , args . all ) ,
] ;
if ( args . queryHistory === 'enabled' || args . queryHistoryWindowDays !== undefined ) {
const warning = sourceIgnoredWarning ( '--query-history' , accumulator . ignoredQueryHistoryForSources , args . all ) ;
if ( warning ) warnings . push ( warning ) ;
}
return warnings ;
}
function schemaFirstQueryHistoryNotice (
targets : KtxPublicIngestPlanTarget [ ] ,
args : { queryHistory? : KtxPublicIngestQueryHistoryFlag } ,
) : string | null {
if ( args . queryHistory !== 'enabled' ) {
return null ;
}
const queryHistoryTargets = targets . filter ( ( target ) = > target . queryHistory ? . enabled === true ) ;
if ( queryHistoryTargets . length === 0 ) {
return null ;
}
if ( queryHistoryTargets . length === 1 ) {
return ` Schema ingest runs before query history for ${ queryHistoryTargets [ 0 ] . connectionId } . ` ;
}
return ` Schema ingest runs before query history for ${ queryHistoryTargets . length } database connections. ` ;
}
function storedQueryHistory ( connection : KtxProjectConnectionConfig ) : Record < string , unknown > {
const context = connection . context ;
const contextRecord =
context && typeof context === 'object' && ! Array . isArray ( context ) ? ( context as Record < string , unknown > ) : { } ;
const value = contextRecord . queryHistory ;
return typeof value === 'object' && value !== null && ! Array . isArray ( value ) ? ( value as Record < string , unknown > ) : { } ;
}
function positiveInteger ( value : unknown ) : number | undefined {
return typeof value === 'number' && Number . isInteger ( value ) && value > 0 ? value : undefined ;
}
2026-06-03 17:19:42 +02:00
/** @internal */
export function queryHistoryPullConfig ( input : {
2026-05-14 01:43:06 +02:00
stored : Record < string , unknown > ;
dialect : HistoricSqlDialect ;
windowDays? : number ;
2026-06-03 17:19:42 +02:00
enabledTables? : KtxTableRef [ ] ;
enabledSchemas? : string [ ] ;
modeledTableCatalog? : KtxTableRef [ ] ;
scopeFloorWarnings? : string [ ] ;
2026-05-14 01:43:06 +02:00
} ) : Record < string , unknown > {
2026-06-03 17:19:42 +02:00
const {
enabled : _enabled ,
dialect : _dialect ,
enabledTables : _enabledTables ,
enabledSchemas : _enabledSchemas ,
scopeFloorWarnings : _scopeFloorWarnings ,
. . . storedConfig
} = input . stored ;
2026-05-14 01:43:06 +02:00
return {
. . . storedConfig ,
dialect : input.dialect ,
2026-06-03 17:19:42 +02:00
. . . ( input . enabledTables && input . enabledTables . length > 0 ? { enabledTables : input.enabledTables } : { } ) ,
. . . ( input . enabledSchemas && input . enabledSchemas . length > 0 ? { enabledSchemas : input.enabledSchemas } : { } ) ,
. . . ( input . modeledTableCatalog && input . modeledTableCatalog . length > 0
? { modeledTableCatalog : input.modeledTableCatalog }
: { } ) ,
. . . ( input . scopeFloorWarnings && input . scopeFloorWarnings . length > 0
? { scopeFloorWarnings : input.scopeFloorWarnings }
: { } ) ,
2026-05-14 01:43:06 +02:00
. . . ( input . windowDays !== undefined ? { windowDays : input.windowDays } : { } ) ,
} ;
}
2026-05-10 23:51:24 +02:00
function sourceDirForConnection ( connection : KtxProjectConnectionConfig ) : string | undefined {
2026-05-13 15:55:00 +02:00
const value = connection . source_dir ;
2026-05-10 23:12:26 +02:00
return typeof value === 'string' && value . trim ( ) . length > 0 ? value . trim ( ) : undefined ;
}
2026-05-14 01:43:06 +02:00
function resolveDatabaseTargetOptions ( input : {
connectionId : string ;
driver : string ;
connection : KtxProjectConnectionConfig ;
args : {
queryHistory? : KtxPublicIngestQueryHistoryFlag ;
queryHistoryWindowDays? : number ;
scanMode? : Extract < KtxScanArgs , { command : 'run' } > [ 'mode' ] ;
} ;
warnings : KtxPublicIngestWarningAccumulator ;
2026-05-29 17:41:04 +02:00
} ) : Pick < KtxPublicIngestPlanTarget , 'queryHistory' | 'steps' > {
2026-05-14 01:43:06 +02:00
const storedQh = storedQueryHistory ( input . connection ) ;
const dialect = queryHistoryDialectByDriver . get ( input . driver ) ;
const explicitQueryHistory = input . args . queryHistory ? ? 'default' ;
const storedEnabled = storedQh . enabled === true ;
const windowOverrideRequested = input . args . queryHistoryWindowDays !== undefined ;
const requestedQh =
explicitQueryHistory === 'enabled' ||
( explicitQueryHistory !== 'disabled' && ( windowOverrideRequested || storedEnabled ) ) ;
const queryHistory = {
enabled : false ,
. . . ( input . args . queryHistoryWindowDays !== undefined
? { windowDays : input.args.queryHistoryWindowDays }
: positiveInteger ( storedQh . windowDays ) !== undefined
? { windowDays : positiveInteger ( storedQh . windowDays ) }
: { } ) ,
} ;
if ( requestedQh && ! dialect ) {
input . warnings . unsupportedQueryHistoryForDatabases . push ( {
connectionId : input.connectionId ,
driver : input.driver ,
reason :
explicitQueryHistory === 'enabled' || input . args . queryHistoryWindowDays !== undefined ? 'explicit' : 'stored' ,
} ) ;
return {
queryHistory : { . . . queryHistory , unsupported : true } ,
steps : [ 'database-schema' ] ,
} ;
}
if ( requestedQh && dialect ) {
return {
queryHistory : {
. . . queryHistory ,
enabled : true ,
dialect ,
pullConfig : queryHistoryPullConfig ( {
stored : storedQh ,
dialect ,
windowDays : queryHistory.windowDays ,
} ) ,
} ,
steps : [ 'database-schema' , 'query-history' ] ,
} ;
}
return {
queryHistory ,
steps : [ 'database-schema' ] ,
} ;
}
2026-06-03 17:19:42 +02:00
async function resolvedQueryHistoryPullConfigForTarget (
target : KtxPublicIngestPlanTarget ,
project : KtxPublicIngestProject ,
) : Promise < Record < string , unknown > | null > {
if ( target . operation !== 'database-ingest' || target . queryHistory ? . enabled !== true || ! target . queryHistory . dialect ) {
return null ;
}
const connection = project . config . connections [ target . connectionId ] ;
if ( ! connection ) {
return (
target . queryHistory . pullConfig ? ?
queryHistoryPullConfig ( {
stored : { } ,
dialect : target.queryHistory.dialect ,
windowDays : target.queryHistory.windowDays ,
} )
) ;
}
const stored = storedQueryHistory ( connection ) ;
const scopeFloor = await resolveQueryHistoryScopeFloor ( {
projectDir : project.projectDir ,
connectionId : target.connectionId ,
driver : target.driver ,
connection : connection as Record < string , unknown > ,
storedQueryHistory : stored ,
} ) ;
return queryHistoryPullConfig ( {
stored ,
dialect : target.queryHistory.dialect ,
windowDays : target.queryHistory.windowDays ,
enabledTables : scopeFloor.enabledTables ,
enabledSchemas : scopeFloor.enabledSchemas ,
modeledTableCatalog : scopeFloor.modeledTableCatalog ,
scopeFloorWarnings : scopeFloor.warnings ,
} ) ;
}
2026-05-29 17:41:04 +02:00
function enrichmentReadinessGaps ( config : KtxProjectConfig ) : string [ ] {
const gaps : string [ ] = [ ] ;
if ( config . llm . provider . backend === 'none' || ! config . llm . models . default ) {
gaps . push ( 'model configuration' ) ;
}
if ( config . scan . enrichment . mode !== 'llm' ) {
gaps . push ( 'scan enrichment mode' ) ;
}
const embeddings = config . scan . enrichment . embeddings ;
if ( ! embeddings || embeddings . backend === 'none' || ! embeddings . model || embeddings . dimensions <= 0 ) {
gaps . push ( 'scan embeddings' ) ;
}
return gaps ;
}
2026-05-14 01:43:06 +02:00
function targetForConnection (
connectionId : string ,
connection : KtxProjectConnectionConfig ,
projectConfig : KtxPublicIngestProject [ 'config' ] ,
args : {
queryHistory? : KtxPublicIngestQueryHistoryFlag ;
queryHistoryWindowDays? : number ;
scanMode? : Extract < KtxScanArgs , { command : 'run' } > [ 'mode' ] ;
} ,
warnings : KtxPublicIngestWarningAccumulator ,
) : KtxPublicIngestPlanTarget {
const driver = normalizeConnectionDriver ( connection ) ;
2026-05-10 23:12:26 +02:00
const adapter = sourceAdapterByDriver . get ( driver ) ;
const sourceDir = sourceDirForConnection ( connection ) ;
if ( adapter ) {
2026-05-14 01:43:06 +02:00
if ( args . queryHistory === 'enabled' || args . queryHistoryWindowDays !== undefined ) {
warnings . ignoredQueryHistoryForSources . push ( connectionId ) ;
}
2026-05-10 23:12:26 +02:00
return {
connectionId ,
driver ,
operation : 'source-ingest' ,
adapter ,
. . . ( sourceDir ? { sourceDir } : { } ) ,
2026-05-14 01:43:06 +02:00
debugCommand : ` ktx ingest ${ connectionId } --debug ` ,
2026-05-10 23:12:26 +02:00
steps : [ 'source-ingest' , 'memory-update' ] ,
} ;
}
2026-05-14 01:43:06 +02:00
if ( isDatabaseDriver ( driver ) ) {
const options = resolveDatabaseTargetOptions ( { connectionId , driver , connection , args , warnings } ) ;
2026-05-29 17:41:04 +02:00
const gaps = enrichmentReadinessGaps ( projectConfig ) ;
2026-05-10 23:12:26 +02:00
return {
connectionId ,
driver ,
2026-05-14 01:43:06 +02:00
operation : 'database-ingest' ,
debugCommand : ` ktx ingest ${ connectionId } --debug ` ,
2026-05-29 17:41:04 +02:00
detectRelationships : projectConfig.scan.relationships.enabled ,
2026-05-14 01:43:06 +02:00
. . . ( gaps . length > 0
? {
2026-05-29 17:41:04 +02:00
preflightFailure : ` ${ connectionId } cannot be ingested: enrichment is not configured ( ${ gaps . join (
2026-05-14 01:43:06 +02:00
', ' ,
2026-05-29 17:41:04 +02:00
) } ) . Run ktx setup to configure a model and embeddings . ` ,
2026-05-14 01:43:06 +02:00
}
: { } ) ,
. . . options ,
2026-05-10 23:12:26 +02:00
} ;
}
throw new Error ( ` Connection " ${ connectionId } " uses unsupported public ingest driver " ${ driver || 'unknown' } " ` ) ;
}
export function buildPublicIngestPlan (
2026-05-10 23:51:24 +02:00
project : KtxPublicIngestProject ,
2026-05-14 01:43:06 +02:00
args : {
projectDir : string ;
targetConnectionId? : string ;
all : boolean ;
queryHistory? : KtxPublicIngestQueryHistoryFlag ;
queryHistoryWindowDays? : number ;
scanMode? : Extract < KtxScanArgs , { command : 'run' } > [ 'mode' ] ;
} ,
2026-05-10 23:51:24 +02:00
) : KtxPublicIngestPlan {
2026-05-20 01:52:37 +02:00
const allConnections = args . all || ! args . targetConnectionId ;
2026-05-10 23:12:26 +02:00
const entries = Object . entries ( project . config . connections ) . sort ( ( [ a ] , [ b ] ) = > a . localeCompare ( b ) ) ;
2026-05-20 01:52:37 +02:00
const selected = allConnections ? entries : entries.filter ( ( [ connectionId ] ) = > connectionId === args . targetConnectionId ) ;
2026-05-10 23:12:26 +02:00
2026-05-20 01:52:37 +02:00
if ( ! allConnections && selected . length === 0 ) {
2026-05-10 23:51:24 +02:00
throw new Error ( ` Connection " ${ args . targetConnectionId } " is not configured in ktx.yaml ` ) ;
2026-05-10 23:12:26 +02:00
}
if ( selected . length === 0 ) {
throw new Error ( 'No configured connections are eligible for ingest' ) ;
}
2026-05-14 01:43:06 +02:00
const warnings = createWarningAccumulator ( ) ;
const targets = selected . map ( ( [ connectionId , connection ] ) = >
targetForConnection ( connectionId , connection , project . config , args , warnings ) ,
) ;
const orderedTargets = [
. . . targets . filter ( ( t ) = > t . operation === 'database-ingest' ) ,
. . . targets . filter ( ( t ) = > t . operation === 'source-ingest' ) ,
] ;
const notice = schemaFirstQueryHistoryNotice ( orderedTargets , args ) ;
2026-05-10 23:12:26 +02:00
return {
projectDir : args.projectDir ,
2026-05-14 01:43:06 +02:00
targets : orderedTargets ,
warnings : finalizeWarnings ( warnings , args ) ,
. . . ( notice ? { notices : [ notice ] } : { } ) ,
2026-05-10 23:12:26 +02:00
} ;
}
2026-05-10 23:51:24 +02:00
function defaultSteps ( target : KtxPublicIngestPlanTarget ) : KtxPublicIngestTargetResult [ 'steps' ] {
2026-05-10 23:12:26 +02:00
return [
{
2026-05-14 01:43:06 +02:00
operation : 'database-schema' ,
status : target.steps.includes ( 'database-schema' ) ? 'not-run' : 'skipped' ,
. . . ( target . operation === 'database-ingest' ? { debugCommand : target.debugCommand } : { } ) ,
} ,
{
operation : 'query-history' ,
status : target.steps.includes ( 'query-history' ) ? 'not-run' : 'skipped' ,
. . . ( target . operation === 'database-ingest' ? { debugCommand : target.debugCommand } : { } ) ,
2026-05-10 23:12:26 +02:00
} ,
{
operation : 'source-ingest' ,
status : target.steps.includes ( 'source-ingest' ) ? 'not-run' : 'skipped' ,
. . . ( target . operation === 'source-ingest' ? { debugCommand : target.debugCommand } : { } ) ,
} ,
{
operation : 'memory-update' ,
status : target.steps.includes ( 'memory-update' ) ? 'not-run' : 'skipped' ,
. . . ( target . operation === 'source-ingest' ? { debugCommand : target.debugCommand } : { } ) ,
} ,
] ;
}
2026-05-14 01:43:06 +02:00
function retryCommandForTarget (
target : KtxPublicIngestPlanTarget ,
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ,
) : string {
const projectPart = ` --project-dir ${ args . projectDir } ` ;
const queryHistoryPart = target . queryHistory ? . enabled === true ? ' --query-history' : '' ;
const windowPart =
target . queryHistory ? . enabled === true && target . queryHistory . windowDays !== undefined
? ` --query-history-window-days ${ target . queryHistory . windowDays } `
: '' ;
2026-05-29 17:41:04 +02:00
return ` ktx ingest ${ target . connectionId } ${ projectPart } ${ queryHistoryPart } ${ windowPart } ` ;
2026-05-14 01:43:06 +02:00
}
function trimTrailingPeriod ( value : string ) : string {
return value . endsWith ( '.' ) ? value . slice ( 0 , - 1 ) : value ;
}
function failureDetailWithRetry ( input : {
target : KtxPublicIngestPlanTarget ;
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ;
failedOperation : KtxPublicIngestStepName ;
failureDetail? : string ;
} ) : string {
const detail = input . failureDetail ? . trim ( ) ;
const base =
detail && detail . startsWith ( ` ${ input . target . connectionId } ` )
? detail
: detail
? ` ${ input . target . connectionId } failed: ${ detail } `
: ` ${ input . target . connectionId } failed at ${ input . failedOperation } . ` ;
return ` ${ trimTrailingPeriod ( base ) } . Retry: ${ retryCommandForTarget ( input . target , input . args ) } ` ;
}
function markTargetResult (
target : KtxPublicIngestPlanTarget ,
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ,
status : 'done' | 'failed' ,
failedOperation? : KtxPublicIngestStepName ,
failureDetail? : string ,
) : KtxPublicIngestTargetResult {
const selectedFailedOperation =
failedOperation ? ? ( target . operation === 'database-ingest' ? 'database-schema' : 'source-ingest' ) ;
2026-05-17 10:27:29 +02:00
const selectedFailedOperationIndex = target . steps . indexOf ( selectedFailedOperation ) ;
2026-05-10 23:12:26 +02:00
return {
connectionId : target.connectionId ,
driver : target.driver ,
steps : defaultSteps ( target ) . map ( ( step ) = > {
if ( ! target . steps . includes ( step . operation ) ) {
return step ;
}
if ( status === 'done' ) {
return { . . . step , status : 'done' } ;
}
2026-05-17 10:27:29 +02:00
const stepIndex = target . steps . indexOf ( step . operation ) ;
if ( selectedFailedOperationIndex >= 0 && stepIndex >= 0 && stepIndex < selectedFailedOperationIndex ) {
return { . . . step , status : 'done' } ;
}
2026-05-14 01:43:06 +02:00
if ( step . operation === selectedFailedOperation ) {
return {
. . . step ,
status : 'failed' ,
detail : failureDetailWithRetry ( {
target ,
args ,
failedOperation : selectedFailedOperation ,
failureDetail ,
} ) ,
} ;
2026-05-10 23:12:26 +02:00
}
return { . . . step , status : 'not-run' } ;
} ) ,
} ;
}
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure
Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.
- Remove the free-text Snowflake schema prompt; only pass `schema` to
snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
user for a comma-separated list, persist it as `schema_names`, and use
it as both the table-list filter and the multiselect default. Applies
to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
documented single-schema shorthand.
* fix(snowflake): keep introspecting when primary-key discovery is denied
The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.
Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.
* fix(scan): unblock relationship discovery on Snowflake
Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:
- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
(Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
`connector.sampleColumn` into bare locals, losing the `this` binding when
the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
Every sample call threw "Cannot read properties of undefined (reading
'assertConnection')" and degraded LLM descriptions to metadata-only
prompts. Call the methods through the connector instead.
Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.
* test(scan): cover table-ref helpers
* feat(scan): plumb tableScope through live-database introspection port
* feat(scan): apply tableScope during metadata fetch
* feat(scan): enforce table scope at fetch boundary
* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)
* feat(cli): add RSA key-pair auth option to Snowflake setup wizard
Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.
* feat(scan): pool Snowflake sessions
* fix(scan): reuse structural snapshots and cleanup connectors
* feat(scan): parallelize relationship profiling
* feat(scan): batch table description generation
* docs: document Snowflake ingest concurrency knobs
* fix(scan): close Snowflake ingest perf verification gaps
* fix(scan): keep batched description failure bounded
* feat(scan): dispatch query-history probes by connection driver
Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.
Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.
* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject
The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.
Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.
generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.
* chore(scripts): add ktx-reset.sh project-cleanup helper
Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
function markTargetWithSkippedQueryHistory (
target : KtxPublicIngestPlanTarget ,
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ,
detail : string ,
) : KtxPublicIngestTargetResult {
const baseline = markTargetResult ( target , args , 'done' ) ;
return {
. . . baseline ,
steps : baseline.steps.map ( ( step ) = >
step . operation === 'query-history' ? { . . . step , status : 'skipped' , detail } : step ,
) ,
} ;
}
function queryHistoryFailureDetail ( input : {
target : KtxPublicIngestPlanTarget ;
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ;
capturedOutput? : string ;
} ) : string {
const captured = capturedFailureMessage ( input . capturedOutput ? ? '' ) ;
return failureDetailWithRetry ( {
target : input.target ,
args : input.args ,
failedOperation : 'query-history' ,
failureDetail : captured ,
} ) ;
}
2026-05-10 23:51:24 +02:00
function resultFailed ( result : KtxPublicIngestTargetResult ) : boolean {
2026-05-10 23:12:26 +02:00
return result . steps . some ( ( step ) = > step . status === 'failed' ) ;
}
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure
Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.
- Remove the free-text Snowflake schema prompt; only pass `schema` to
snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
user for a comma-separated list, persist it as `schema_names`, and use
it as both the table-list filter and the multiselect default. Applies
to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
documented single-schema shorthand.
* fix(snowflake): keep introspecting when primary-key discovery is denied
The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.
Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.
* fix(scan): unblock relationship discovery on Snowflake
Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:
- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
(Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
`connector.sampleColumn` into bare locals, losing the `this` binding when
the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
Every sample call threw "Cannot read properties of undefined (reading
'assertConnection')" and degraded LLM descriptions to metadata-only
prompts. Call the methods through the connector instead.
Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.
* test(scan): cover table-ref helpers
* feat(scan): plumb tableScope through live-database introspection port
* feat(scan): apply tableScope during metadata fetch
* feat(scan): enforce table scope at fetch boundary
* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)
* feat(cli): add RSA key-pair auth option to Snowflake setup wizard
Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.
* feat(scan): pool Snowflake sessions
* fix(scan): reuse structural snapshots and cleanup connectors
* feat(scan): parallelize relationship profiling
* feat(scan): batch table description generation
* docs: document Snowflake ingest concurrency knobs
* fix(scan): close Snowflake ingest perf verification gaps
* fix(scan): keep batched description failure bounded
* feat(scan): dispatch query-history probes by connection driver
Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.
Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.
* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject
The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.
Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.
generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.
* chore(scripts): add ktx-reset.sh project-cleanup helper
Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
function resultSkippedQueryHistory (
result : KtxPublicIngestTargetResult ,
) : { connectionId : string ; detail : string } | null {
const skipped = result . steps . find (
( step ) = > step . operation === 'query-history' && step . status === 'skipped' && step . detail !== undefined ,
) ;
return skipped ? . detail ? { connectionId : result.connectionId , detail : skipped.detail } : null ;
}
2026-05-22 18:18:47 +02:00
function rowsBucket ( ) : '<10k' | '<100k' | '<1M' | '<10M' | '>=10M' {
return '<10k' ;
}
async function emitIngestCompleted ( input : {
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ;
project : KtxPublicIngestProject ;
target : KtxPublicIngestPlanTarget ;
result : KtxPublicIngestTargetResult ;
startedAt : number ;
io : KtxCliIo ;
} ) : Promise < void > {
const failed = resultFailed ( input . result ) ;
2026-06-02 17:23:51 +02:00
const failureDetail = failed
? formatErrorDetail ( input . result . steps . find ( ( step ) = > step . status === 'failed' ) ? . detail )
: undefined ;
2026-05-22 18:18:47 +02:00
await emitTelemetryEvent ( {
name : 'ingest_completed' ,
projectDir : input.args.projectDir ,
io : input.io ,
fields : {
driver : input.target.driver ,
isDemoConnection : isDemoConnection (
input . target . connectionId ,
input . project . config . connections [ input . target . connectionId ] ,
) ,
schemaCount : 0 ,
tableCount : 0 ,
columnCount : 0 ,
rowsBucket : rowsBucket ( ) ,
durationMs : Math.max ( 0 , performance . now ( ) - input . startedAt ) ,
outcome : failed ? 'error' : 'ok' ,
2026-06-02 17:23:51 +02:00
. . . ( failureDetail ? { errorDetail : failureDetail } : { } ) ,
2026-05-22 18:18:47 +02:00
} ,
} ) ;
}
2026-05-10 23:51:24 +02:00
function stepStatus ( result : KtxPublicIngestTargetResult , operation : KtxPublicIngestStepName ) : string {
2026-05-10 23:12:26 +02:00
return result . steps . find ( ( step ) = > step . operation === operation ) ? . status ? ? 'not-run' ;
}
2026-05-10 23:51:24 +02:00
function renderPlainResults ( results : KtxPublicIngestTargetResult [ ] , io : KtxCliIo ) : void {
2026-05-10 23:12:26 +02:00
const failures = results . filter ( resultFailed ) ;
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure
Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.
- Remove the free-text Snowflake schema prompt; only pass `schema` to
snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
user for a comma-separated list, persist it as `schema_names`, and use
it as both the table-list filter and the multiselect default. Applies
to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
documented single-schema shorthand.
* fix(snowflake): keep introspecting when primary-key discovery is denied
The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.
Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.
* fix(scan): unblock relationship discovery on Snowflake
Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:
- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
(Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
`connector.sampleColumn` into bare locals, losing the `this` binding when
the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
Every sample call threw "Cannot read properties of undefined (reading
'assertConnection')" and degraded LLM descriptions to metadata-only
prompts. Call the methods through the connector instead.
Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.
* test(scan): cover table-ref helpers
* feat(scan): plumb tableScope through live-database introspection port
* feat(scan): apply tableScope during metadata fetch
* feat(scan): enforce table scope at fetch boundary
* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)
* feat(cli): add RSA key-pair auth option to Snowflake setup wizard
Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.
* feat(scan): pool Snowflake sessions
* fix(scan): reuse structural snapshots and cleanup connectors
* feat(scan): parallelize relationship profiling
* feat(scan): batch table description generation
* docs: document Snowflake ingest concurrency knobs
* fix(scan): close Snowflake ingest perf verification gaps
* fix(scan): keep batched description failure bounded
* feat(scan): dispatch query-history probes by connection driver
Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.
Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.
* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject
The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.
Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.
generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.
* chore(scripts): add ktx-reset.sh project-cleanup helper
Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
const skippedQueryHistory = results . map ( resultSkippedQueryHistory ) . filter ( ( entry ) = > entry !== null ) as Array < {
connectionId : string ;
detail : string ;
} > ;
const headerSuffix =
failures . length > 0
? ' with partial failures'
: skippedQueryHistory . length > 0
? ' with skipped query history'
: '' ;
io . stdout . write ( ` Ingest finished ${ headerSuffix } \ n ` ) ;
2026-05-10 23:12:26 +02:00
io . stdout . write ( '\n' ) ;
2026-05-14 01:43:06 +02:00
io . stdout . write ( 'Source Database schema Query history Source ingest Memory update\n' ) ;
2026-05-10 23:12:26 +02:00
for ( const result of results ) {
io . stdout . write (
2026-05-14 01:43:06 +02:00
` ${ result . connectionId . padEnd ( 14 ) } ${ stepStatus ( result , 'database-schema' ) . padEnd ( 16 ) } ${ stepStatus (
result ,
'query-history' ,
) . padEnd ( 14 ) } $ { stepStatus (
2026-05-10 23:12:26 +02:00
result ,
'source-ingest' ,
2026-05-14 01:43:06 +02:00
) . padEnd ( 14 ) } $ { stepStatus ( result , 'memory-update' ) } \ n ` ,
2026-05-10 23:12:26 +02:00
) ;
}
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure
Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.
- Remove the free-text Snowflake schema prompt; only pass `schema` to
snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
user for a comma-separated list, persist it as `schema_names`, and use
it as both the table-list filter and the multiselect default. Applies
to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
documented single-schema shorthand.
* fix(snowflake): keep introspecting when primary-key discovery is denied
The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.
Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.
* fix(scan): unblock relationship discovery on Snowflake
Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:
- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
(Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
`connector.sampleColumn` into bare locals, losing the `this` binding when
the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
Every sample call threw "Cannot read properties of undefined (reading
'assertConnection')" and degraded LLM descriptions to metadata-only
prompts. Call the methods through the connector instead.
Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.
* test(scan): cover table-ref helpers
* feat(scan): plumb tableScope through live-database introspection port
* feat(scan): apply tableScope during metadata fetch
* feat(scan): enforce table scope at fetch boundary
* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)
* feat(cli): add RSA key-pair auth option to Snowflake setup wizard
Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.
* feat(scan): pool Snowflake sessions
* fix(scan): reuse structural snapshots and cleanup connectors
* feat(scan): parallelize relationship profiling
* feat(scan): batch table description generation
* docs: document Snowflake ingest concurrency knobs
* fix(scan): close Snowflake ingest perf verification gaps
* fix(scan): keep batched description failure bounded
* feat(scan): dispatch query-history probes by connection driver
Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.
Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.
* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject
The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.
Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.
generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.
* chore(scripts): add ktx-reset.sh project-cleanup helper
Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
if ( failures . length > 0 ) {
io . stdout . write ( '\nFailed sources:\n' ) ;
for ( const result of failures ) {
const failedStep = result . steps . find ( ( step ) = > step . status === 'failed' ) ;
if ( ! failedStep ) {
continue ;
}
io . stdout . write ( ` ${ failedStep . detail ? ? ` ${ result . connectionId } failed. ` } \ n ` ) ;
}
2026-05-10 23:12:26 +02:00
}
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure
Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.
- Remove the free-text Snowflake schema prompt; only pass `schema` to
snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
user for a comma-separated list, persist it as `schema_names`, and use
it as both the table-list filter and the multiselect default. Applies
to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
documented single-schema shorthand.
* fix(snowflake): keep introspecting when primary-key discovery is denied
The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.
Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.
* fix(scan): unblock relationship discovery on Snowflake
Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:
- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
(Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
`connector.sampleColumn` into bare locals, losing the `this` binding when
the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
Every sample call threw "Cannot read properties of undefined (reading
'assertConnection')" and degraded LLM descriptions to metadata-only
prompts. Call the methods through the connector instead.
Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.
* test(scan): cover table-ref helpers
* feat(scan): plumb tableScope through live-database introspection port
* feat(scan): apply tableScope during metadata fetch
* feat(scan): enforce table scope at fetch boundary
* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)
* feat(cli): add RSA key-pair auth option to Snowflake setup wizard
Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.
* feat(scan): pool Snowflake sessions
* fix(scan): reuse structural snapshots and cleanup connectors
* feat(scan): parallelize relationship profiling
* feat(scan): batch table description generation
* docs: document Snowflake ingest concurrency knobs
* fix(scan): close Snowflake ingest perf verification gaps
* fix(scan): keep batched description failure bounded
* feat(scan): dispatch query-history probes by connection driver
Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.
Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.
* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject
The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.
Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.
generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.
* chore(scripts): add ktx-reset.sh project-cleanup helper
Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
if ( skippedQueryHistory . length > 0 ) {
io . stdout . write ( '\nSkipped query history:\n' ) ;
for ( const { detail } of skippedQueryHistory ) {
io . stdout . write ( ` ${ detail } \ n ` ) ;
2026-05-10 23:12:26 +02:00
}
}
}
2026-05-10 23:51:24 +02:00
function hasInteractiveInput ( io : KtxCliIo ) : boolean {
2026-05-10 23:12:26 +02:00
const stdin = ( io as { stdin ? : { isTTY? : boolean ; setRawMode ? : ( value : boolean ) = > void } } ) . stdin ;
return stdin ? . isTTY === true && typeof stdin . setRawMode === 'function' ;
}
2026-05-10 23:51:24 +02:00
function sourceIngestOutputMode ( args : Extract < KtxPublicIngestArgs , { command : 'run' } > , io : KtxCliIo ) : 'plain' | 'viz' {
2026-05-10 23:12:26 +02:00
return args . inputMode === 'auto' && io . stdout . isTTY === true && hasInteractiveInput ( io ) ? 'viz' : 'plain' ;
}
2026-05-14 01:43:06 +02:00
function shouldUseForegroundContextBuildView (
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ,
io : KtxCliIo ,
) : boolean {
return args . inputMode === 'auto' && args . json !== true && io . stdout . isTTY === true && hasInteractiveInput ( io ) ;
}
interface CapturedPublicIngestIo extends KtxCliIo {
capturedOutput ( ) : string ;
}
function createCapturedPublicIngestIo ( ) : CapturedPublicIngestIo {
let output = '' ;
return {
stdout : {
isTTY : false ,
write ( chunk : string ) {
output += chunk ;
} ,
} ,
stderr : {
write ( chunk : string ) {
output += chunk ;
} ,
} ,
capturedOutput() {
return output ;
} ,
} ;
}
2026-06-01 23:31:31 +02:00
function isCapturedPublicIngestIo ( io : KtxCliIo ) : io is CapturedPublicIngestIo {
return typeof ( io as Partial < CapturedPublicIngestIo > ) . capturedOutput === 'function' ;
}
const PLAIN_PUBLIC_INGEST_PHASE_LABELS : Record < KtxPublicIngestPhaseKey , string > = {
'database-schema' : 'database schema' ,
'query-history' : 'query history' ,
'source-ingest' : 'source ingest' ,
} ;
interface PlainPublicIngestProgressOptions {
target : KtxPublicIngestPlanTarget ;
index : number ;
total : number ;
}
function firstSummaryLine ( summary : string | undefined ) : string | undefined {
if ( ! summary ) return undefined ;
return summary . split ( /\r?\n/ ) . find ( ( line ) = > line . trim ( ) . length > 0 ) ? . trim ( ) ;
}
function plainPhaseHeader ( options : PlainPublicIngestProgressOptions , phaseKey : KtxPublicIngestPhaseKey ) : string {
const prefix = options . total > 1 ? ` [ ${ options . index + 1 } / ${ options . total } ] ` : '' ;
return ` ${ prefix } ${ options . target . connectionId } · ${ PLAIN_PUBLIC_INGEST_PHASE_LABELS [ phaseKey ] } ` ;
}
function plainPhaseEndLine ( status : 'done' | 'failed' | 'skipped' , summary? : string ) : string {
const firstLine = firstSummaryLine ( summary ) ;
return firstLine ? ` ${ status } · ${ firstLine } ` : ` ${ status } ` ;
}
function createPlainPublicIngestProgress ( io : KtxCliIo , options : PlainPublicIngestProgressOptions ) : Required <
Pick < KtxPublicIngestDeps , 'scanProgress' | 'ingestProgress' | 'onPhaseStart' | 'onPhaseEnd' >
> {
let currentPhase : KtxPublicIngestPhaseKey | null = null ;
const startedPhases = new Set < KtxPublicIngestPhaseKey > ( ) ;
const lastPercentByPhase = new Map < KtxPublicIngestPhaseKey , number > ( ) ;
const startPhase = ( phaseKey : KtxPublicIngestPhaseKey ) : void = > {
currentPhase = phaseKey ;
startedPhases . add ( phaseKey ) ;
lastPercentByPhase . set ( phaseKey , - 1 ) ;
io . stderr . write ( ` ${ plainPhaseHeader ( options , phaseKey ) } \ n ` ) ;
} ;
const ensurePhaseStarted = ( phaseKey : KtxPublicIngestPhaseKey ) : void = > {
if ( ! startedPhases . has ( phaseKey ) ) {
startPhase ( phaseKey ) ;
return ;
}
currentPhase = phaseKey ;
} ;
const emitProgress = ( update : KtxIngestProgressUpdate ) : void = > {
if ( currentPhase === null ) return ;
const rounded = Math . max ( 0 , Math . min ( 100 , Math . round ( update . percent ) ) ) ;
const lastPercent = lastPercentByPhase . get ( currentPhase ) ? ? - 1 ;
if ( rounded <= lastPercent ) return ;
lastPercentByPhase . set ( currentPhase , rounded ) ;
io . stderr . write ( ` [ ${ rounded } %] ${ publicProgressMessage ( update . message , options . target ) } \ n ` ) ;
} ;
return {
onPhaseStart : startPhase ,
onPhaseEnd ( phaseKey , status , summary ) {
ensurePhaseStarted ( phaseKey ) ;
io . stderr . write ( ` ${ plainPhaseEndLine ( status , summary ) } \ n ` ) ;
currentPhase = null ;
} ,
scanProgress : createAggregateProgressPort ( emitProgress ) ,
ingestProgress : emitProgress ,
} ;
}
2026-05-14 01:43:06 +02:00
const INTERNAL_STATUS_LINE_RE =
/^(Report|Run|Job|Status|Adapter|Connection|Sync|Diff|Tasks|Work units|Failed tasks|Saved memory|Provenance rows):\s*/ ;
2026-05-17 01:04:44 +02:00
const ACTIONABLE_FAILURE_LINE_RE =
2026-06-11 13:49:45 +02:00
/^(Missing bundled Python runtime manifest|ktx Python runtime is required|ktx daemon HTTP|Error:|Failed\b|Could not\b|Cannot\b)/ ;
const RUNTIME_BACKED_RETRY_LINE_RE = /^Then retry the runtime-backed ktx command\.?$/ ;
2026-05-14 01:43:06 +02:00
2026-05-17 10:27:29 +02:00
function trimErrorPrefix ( line : string ) : string {
return line . replace ( /^Error:\s*/ , '' ) ;
}
2026-05-17 01:04:44 +02:00
function capturedFailureMessage ( output : string ) : string | undefined {
const lines = output
2026-05-14 01:43:06 +02:00
. split ( /\r?\n/ )
. map ( ( line ) = > line . trim ( ) )
. filter ( ( line ) = > line . length > 0 )
2026-06-11 13:49:45 +02:00
. filter ( ( line ) = > ! line . startsWith ( 'ktx scan completed' ) )
2026-05-14 01:43:06 +02:00
. filter ( ( line ) = > ! INTERNAL_STATUS_LINE_RE . test ( line ) )
2026-05-17 01:04:44 +02:00
. map ( publicIngestOutputLine ) ;
const actionableIndex = lines . findIndex ( ( line ) = > ACTIONABLE_FAILURE_LINE_RE . test ( line ) ) ;
if ( actionableIndex < 0 ) {
2026-05-17 10:27:29 +02:00
const line = lines . find ( ( candidate ) = > candidate . length > 0 ) ;
return line ? trimErrorPrefix ( line ) : undefined ;
2026-05-17 01:04:44 +02:00
}
const firstLine = lines [ actionableIndex ] ;
if ( ! firstLine ? . startsWith ( 'Missing bundled Python runtime manifest' ) ) {
2026-05-17 10:27:29 +02:00
return trimErrorPrefix ( firstLine ) ;
2026-05-17 01:04:44 +02:00
}
const followupLines = lines
. slice ( actionableIndex + 1 )
. filter ( ( line ) = > ! RUNTIME_BACKED_RETRY_LINE_RE . test ( line ) )
. filter ( ( line ) = > ! /\bRetry:\s/ . test ( line ) )
. filter ( ( line ) = > line . startsWith ( 'In a source checkout, build the local runtime assets with:' ) ) ;
return [ firstLine , . . . followupLines ] . join ( '\n' ) ;
2026-05-14 01:43:06 +02:00
}
2026-06-02 20:03:27 +02:00
/ * *
* Run one ingest target through its scan / ingest steps . The single per - target
* chokepoint reached by every entrypoint — standalone ` ktx ingest ` ( plain / json
* and foreground ) and ` ktx setup ` ( via ` runContextBuild ` ) . The exported
* ` executePublicIngestTarget ` wraps this and emits the ` ingest_completed `
* telemetry event exactly once , so every path is counted .
* /
2026-05-10 23:12:26 +02:00
export async function executePublicIngestTarget (
2026-05-10 23:51:24 +02:00
target : KtxPublicIngestPlanTarget ,
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ,
io : KtxCliIo ,
deps : KtxPublicIngestDeps ,
2026-06-02 20:03:27 +02:00
project : KtxPublicIngestProject ,
) : Promise < KtxPublicIngestTargetResult > {
const startedAt = performance . now ( ) ;
2026-06-03 17:19:42 +02:00
const result = await runIngestTargetSteps ( target , args , io , deps , project ) ;
2026-06-02 20:03:27 +02:00
// `io` may be a capture buffer for the scan/ingest step output; the telemetry
// debug echo belongs on the real user-facing stream, which callers expose as
// `deps.runtimeIo` (falling back to `io` when the step io is already real).
await emitIngestCompleted ( { args , project , target , result , startedAt , io : deps.runtimeIo ? ? io } ) ;
return result ;
}
async function runIngestTargetSteps (
target : KtxPublicIngestPlanTarget ,
args : Extract < KtxPublicIngestArgs , { command : 'run' } > ,
io : KtxCliIo ,
deps : KtxPublicIngestDeps ,
2026-06-03 17:19:42 +02:00
project : KtxPublicIngestProject ,
2026-05-10 23:51:24 +02:00
) : Promise < KtxPublicIngestTargetResult > {
2026-05-14 01:43:06 +02:00
if ( target . preflightFailure ) {
if ( target . operation === 'database-ingest' ) {
deps . onPhaseEnd ? . ( 'database-schema' , 'failed' , target . preflightFailure ) ;
if ( target . queryHistory ? . enabled === true ) {
deps . onPhaseEnd ? . ( 'query-history' , 'skipped' ) ;
}
} else {
deps . onPhaseEnd ? . ( 'source-ingest' , 'failed' , target . preflightFailure ) ;
}
return {
connectionId : target.connectionId ,
driver : target.driver ,
steps : defaultSteps ( target ) . map ( ( step ) = >
step . operation === 'database-schema'
? {
. . . step ,
status : 'failed' ,
2026-06-01 23:31:31 +02:00
detail : ` ${ target . connectionId } failed: ${ target . preflightFailure } ` ,
2026-05-14 01:43:06 +02:00
}
: step ,
) ,
} ;
}
if ( target . operation === 'database-ingest' ) {
2026-05-10 23:51:24 +02:00
const { runKtxScan } = await import ( './scan.js' ) ;
2026-05-13 17:01:48 +02:00
const scanArgs : KtxScanArgs = {
command : 'run' ,
projectDir : args.projectDir ,
connectionId : target.connectionId ,
2026-05-29 17:41:04 +02:00
mode : 'enriched' ,
2026-05-14 01:43:06 +02:00
detectRelationships : target.detectRelationships === true ,
2026-05-13 17:01:48 +02:00
dryRun : false ,
2026-05-14 01:43:06 +02:00
. . . ( args . cliVersion ? { cliVersion : args.cliVersion } : { } ) ,
. . . ( args . runtimeInstallPolicy ? { runtimeInstallPolicy : args.runtimeInstallPolicy } : { } ) ,
2026-05-13 17:01:48 +02:00
} ;
const runScan = deps . runScan ? ? runKtxScan ;
2026-06-01 23:31:31 +02:00
const capturedScanIo = deps . scanProgress
? isCapturedPublicIngestIo ( io )
? io
: null
: createCapturedPublicIngestIo ( ) ;
2026-05-14 01:43:06 +02:00
const scanIo = capturedScanIo ? ? io ;
2026-05-16 11:39:43 +02:00
const scanDeps = {
. . . ( deps . scanProgress ? { progress : deps.scanProgress } : { } ) ,
. . . ( deps . runtimeIo ? { runtimeIo : deps.runtimeIo } : { } ) ,
} ;
2026-05-14 01:43:06 +02:00
deps . onPhaseStart ? . ( 'database-schema' ) ;
2026-05-16 11:39:43 +02:00
const scanExitCode =
Object . keys ( scanDeps ) . length > 0 ? await runScan ( scanArgs , scanIo , scanDeps ) : await runScan ( scanArgs , scanIo ) ;
2026-05-14 01:43:06 +02:00
if ( scanExitCode !== 0 ) {
deps . onPhaseEnd ? . ( 'database-schema' , 'failed' ) ;
if ( target . queryHistory ? . enabled === true ) {
deps . onPhaseEnd ? . ( 'query-history' , 'skipped' ) ;
}
return markTargetResult (
target ,
args ,
'failed' ,
'database-schema' ,
2026-05-17 01:04:44 +02:00
capturedScanIo ? capturedFailureMessage ( capturedScanIo . capturedOutput ( ) ) : undefined ,
2026-05-14 01:43:06 +02:00
) ;
}
deps . onPhaseEnd ? . ( 'database-schema' , 'done' ) ;
if ( target . queryHistory ? . enabled === true ) {
const { runKtxIngest } = await import ( './ingest.js' ) ;
const runIngest = deps . runIngest ? ? runKtxIngest ;
2026-06-03 17:19:42 +02:00
const historicSqlPullConfigOverride =
( await resolvedQueryHistoryPullConfigForTarget ( target , project ) ) ? ? {
dialect : target.queryHistory.dialect ,
. . . ( target . queryHistory . windowDays !== undefined ? { windowDays : target.queryHistory.windowDays } : { } ) ,
} ;
2026-05-14 01:43:06 +02:00
const ingestArgs : KtxIngestArgs = {
command : 'run' ,
projectDir : args.projectDir ,
connectionId : target.connectionId ,
adapter : 'historic-sql' ,
outputMode : sourceIngestOutputMode ( args , io ) ,
inputMode : args.inputMode ,
. . . ( args . cliVersion ? { cliVersion : args.cliVersion } : { } ) ,
. . . ( args . runtimeInstallPolicy ? { runtimeInstallPolicy : args.runtimeInstallPolicy } : { } ) ,
allowImplicitAdapter : true ,
2026-06-03 17:19:42 +02:00
historicSqlPullConfigOverride ,
2026-05-14 01:43:06 +02:00
} ;
2026-06-01 23:31:31 +02:00
// Query history runs after the schema scan has already written its report
// into the shared target io, so it needs a phase-local capture. Reusing
// `io` here would let leftover scan text (e.g. "Mode: enriched") surface as
// the query-history failure detail. Only skip capture when progress is
// active and the caller manages its own buffer (io is not a capture).
const capturedIngestIo =
deps . ingestProgress && ! isCapturedPublicIngestIo ( io ) ? null : createCapturedPublicIngestIo ( ) ;
2026-05-14 01:43:06 +02:00
const ingestIo = capturedIngestIo ? ? io ;
2026-05-16 11:39:43 +02:00
const ingestDeps = {
. . . ( deps . ingestProgress ? { progress : deps.ingestProgress } : { } ) ,
. . . ( deps . runtimeIo ? { runtimeIo : deps.runtimeIo } : { } ) ,
} ;
2026-05-14 01:43:06 +02:00
deps . onPhaseStart ? . ( 'query-history' ) ;
2026-05-16 11:39:43 +02:00
const qhExitCode =
Object . keys ( ingestDeps ) . length > 0
? await runIngest ( ingestArgs , ingestIo , ingestDeps )
: await runIngest ( ingestArgs , ingestIo ) ;
2026-05-14 01:43:06 +02:00
if ( qhExitCode !== 0 ) {
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure
Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.
- Remove the free-text Snowflake schema prompt; only pass `schema` to
snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
user for a comma-separated list, persist it as `schema_names`, and use
it as both the table-list filter and the multiselect default. Applies
to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
documented single-schema shorthand.
* fix(snowflake): keep introspecting when primary-key discovery is denied
The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.
Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.
* fix(scan): unblock relationship discovery on Snowflake
Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:
- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
(Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
`connector.sampleColumn` into bare locals, losing the `this` binding when
the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
Every sample call threw "Cannot read properties of undefined (reading
'assertConnection')" and degraded LLM descriptions to metadata-only
prompts. Call the methods through the connector instead.
Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.
* test(scan): cover table-ref helpers
* feat(scan): plumb tableScope through live-database introspection port
* feat(scan): apply tableScope during metadata fetch
* feat(scan): enforce table scope at fetch boundary
* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)
* feat(cli): add RSA key-pair auth option to Snowflake setup wizard
Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.
* feat(scan): pool Snowflake sessions
* fix(scan): reuse structural snapshots and cleanup connectors
* feat(scan): parallelize relationship profiling
* feat(scan): batch table description generation
* docs: document Snowflake ingest concurrency knobs
* fix(scan): close Snowflake ingest perf verification gaps
* fix(scan): keep batched description failure bounded
* feat(scan): dispatch query-history probes by connection driver
Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.
Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.
* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject
The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.
Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.
generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.
* chore(scripts): add ktx-reset.sh project-cleanup helper
Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
const detail = queryHistoryFailureDetail ( {
2026-05-14 01:43:06 +02:00
target ,
args ,
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure
Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.
- Remove the free-text Snowflake schema prompt; only pass `schema` to
snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
user for a comma-separated list, persist it as `schema_names`, and use
it as both the table-list filter and the multiselect default. Applies
to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
documented single-schema shorthand.
* fix(snowflake): keep introspecting when primary-key discovery is denied
The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.
Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.
* fix(scan): unblock relationship discovery on Snowflake
Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:
- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
(Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
`connector.sampleColumn` into bare locals, losing the `this` binding when
the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
Every sample call threw "Cannot read properties of undefined (reading
'assertConnection')" and degraded LLM descriptions to metadata-only
prompts. Call the methods through the connector instead.
Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.
* test(scan): cover table-ref helpers
* feat(scan): plumb tableScope through live-database introspection port
* feat(scan): apply tableScope during metadata fetch
* feat(scan): enforce table scope at fetch boundary
* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)
* feat(cli): add RSA key-pair auth option to Snowflake setup wizard
Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.
* feat(scan): pool Snowflake sessions
* fix(scan): reuse structural snapshots and cleanup connectors
* feat(scan): parallelize relationship profiling
* feat(scan): batch table description generation
* docs: document Snowflake ingest concurrency knobs
* fix(scan): close Snowflake ingest perf verification gaps
* fix(scan): keep batched description failure bounded
* feat(scan): dispatch query-history probes by connection driver
Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.
Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.
* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject
The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.
Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.
generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.
* chore(scripts): add ktx-reset.sh project-cleanup helper
Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
capturedOutput : capturedIngestIo ? capturedIngestIo . capturedOutput ( ) : undefined ,
} ) ;
deps . onPhaseEnd ? . ( 'query-history' , 'failed' , detail ) ;
return markTargetWithSkippedQueryHistory ( target , args , detail ) ;
2026-05-14 01:43:06 +02:00
}
deps . onPhaseEnd ? . ( 'query-history' , 'done' ) ;
}
return markTargetResult ( target , args , 'done' ) ;
2026-05-10 23:12:26 +02:00
}
2026-05-10 23:51:24 +02:00
const { runKtxIngest } = await import ( './ingest.js' ) ;
2026-05-13 17:01:48 +02:00
const ingestArgs : KtxIngestArgs = {
command : 'run' ,
projectDir : args.projectDir ,
connectionId : target.connectionId ,
adapter : target.adapter ? ? target . driver ,
. . . ( target . sourceDir ? { sourceDir : target.sourceDir } : { } ) ,
outputMode : sourceIngestOutputMode ( args , io ) ,
inputMode : args.inputMode ,
2026-05-14 01:43:06 +02:00
. . . ( args . cliVersion ? { cliVersion : args.cliVersion } : { } ) ,
. . . ( args . runtimeInstallPolicy ? { runtimeInstallPolicy : args.runtimeInstallPolicy } : { } ) ,
allowImplicitAdapter : true ,
2026-05-13 17:01:48 +02:00
} ;
const runIngest = deps . runIngest ? ? runKtxIngest ;
2026-06-01 23:31:31 +02:00
const capturedIngestIo = deps . ingestProgress
? isCapturedPublicIngestIo ( io )
? io
: null
: createCapturedPublicIngestIo ( ) ;
2026-05-14 01:43:06 +02:00
const ingestIo = capturedIngestIo ? ? io ;
2026-05-16 11:39:43 +02:00
const ingestDeps = {
. . . ( deps . ingestProgress ? { progress : deps.ingestProgress } : { } ) ,
. . . ( deps . runtimeIo ? { runtimeIo : deps.runtimeIo } : { } ) ,
} ;
2026-05-14 01:43:06 +02:00
deps . onPhaseStart ? . ( 'source-ingest' ) ;
2026-05-16 11:39:43 +02:00
const exitCode =
Object . keys ( ingestDeps ) . length > 0
? await runIngest ( ingestArgs , ingestIo , ingestDeps )
: await runIngest ( ingestArgs , ingestIo ) ;
2026-05-14 01:43:06 +02:00
deps . onPhaseEnd ? . ( 'source-ingest' , exitCode === 0 ? 'done' : 'failed' ) ;
return markTargetResult (
target ,
args ,
exitCode === 0 ? 'done' : 'failed' ,
'source-ingest' ,
2026-05-17 01:04:44 +02:00
capturedIngestIo ? capturedFailureMessage ( capturedIngestIo . capturedOutput ( ) ) : undefined ,
2026-05-14 01:43:06 +02:00
) ;
2026-05-10 23:12:26 +02:00
}
2026-05-10 23:51:24 +02:00
export async function runKtxPublicIngest (
args : KtxPublicIngestArgs ,
io : KtxCliIo ,
deps : KtxPublicIngestDeps = { } ,
2026-05-10 23:12:26 +02:00
) : Promise < number > {
fix(cli): resolve managed-embeddings daemon URL at project boundary (#184)
A clean `ktx setup` was failing verification because the managed
local-embeddings daemon URL was passed library-side through
`process.env[KTX_MANAGED_SENTENCE_TRANSFORMERS_BASE_URL]`, and the setup
flow never wrote that variable. With no resolved URL the embedding
provider was null, the deep scan emitted
`scan_enrichment_backend_not_configured`, descriptions + embeddings
stayed `skipped`, and the agent-readiness check exited 1.
Replace the env-var indirection with CLI-side substitution at the
project-load boundary. New `loadKtxCliProject` wraps `loadKtxProject`,
ensures the managed daemon when `managed:local-embeddings` is present in
`config.ingest.embeddings` or `config.scan.enrichment.embeddings`, and
substitutes the resolved baseUrl into the in-memory config. Runtime
entry points (scan, ingest, public-ingest, admin-reindex) use the new
loader; setup-time persistence paths keep raw `loadKtxProject` so the
on-disk `ktx.yaml` keeps the portable sentinel.
Cleanup follows from the new design: drop
`MANAGED_SENTENCE_TRANSFORMERS_BASE_URL_ENV`, remove the env-var lookup
branch in `resolveSentenceTransformersBaseUrl`, drop the `env` field
from `ManagedLocalEmbeddingsDaemon`, and collapse the manual
daemon-ensure dance in `admin-reindex.ts`.
2026-05-20 14:43:02 +02:00
const loadProject =
2026-05-21 02:21:22 +02:00
deps . loadProject ? ? ( ( options : { projectDir : string } ) = > loadKtxProject ( { projectDir : options.projectDir } ) ) ;
2026-05-14 01:43:06 +02:00
const project = await loadProject ( { projectDir : args.projectDir } ) ;
if ( shouldUseForegroundContextBuildView ( args , io ) ) {
2026-05-17 10:27:29 +02:00
const plan = buildPublicIngestPlan ( project , args ) ;
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
const requirements = resolvePublicIngestRuntimeRequirements ( plan , {
config : project.config ,
env : deps.env ? ? process . env ,
} ) ;
2026-05-17 10:27:29 +02:00
const ensureRuntime = deps . ensureRuntime ? ? ensureManagedPythonCommandRuntime ;
for ( const feature of requirements . features ) {
try {
await ensureRuntime ( {
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
cliVersion : args.cliVersion ? ? getKtxCliPackageInfo ( ) . version ,
2026-05-17 10:27:29 +02:00
installPolicy : args.runtimeInstallPolicy ? ? 'prompt' ,
io ,
feature ,
} ) ;
} catch ( error ) {
2026-06-05 19:36:21 +02:00
await reportException ( {
error ,
context : { source : 'ingest runtime' , handled : true , fatal : false } ,
projectDir : args.projectDir ,
io ,
redactionSecrets : await collectTelemetryRedactionSecrets ( {
project ,
projectDir : args.projectDir ,
connectionId : args.targetConnectionId ,
includeLlm : true ,
includeEmbeddings : true ,
env : deps.env ? ? process . env ,
} ) ,
} ) ;
2026-05-17 10:27:29 +02:00
io . stderr . write ( ` ${ error instanceof Error ? error.message : String ( error ) } \ n ` ) ;
return 1 ;
}
}
2026-05-14 01:43:06 +02:00
const { runContextBuild } = await import ( './context-build-view.js' ) ;
const contextBuild = deps . runContextBuild ? ? runContextBuild ;
2026-06-05 19:36:21 +02:00
try {
const result = await contextBuild (
project ,
{
projectDir : args.projectDir ,
. . . ( args . targetConnectionId ? { targetConnectionId : args.targetConnectionId } : { } ) ,
all : args.all ,
entrypoint : 'ingest' ,
inputMode : args.inputMode ,
. . . ( args . queryHistory ? { queryHistory : args.queryHistory } : { } ) ,
. . . ( args . queryHistoryWindowDays !== undefined ? { queryHistoryWindowDays : args.queryHistoryWindowDays } : { } ) ,
. . . ( args . scanMode ? { scanMode : args.scanMode } : { } ) ,
. . . ( args . detectRelationships !== undefined ? { detectRelationships : args.detectRelationships } : { } ) ,
. . . ( args . cliVersion ? { cliVersion : args.cliVersion } : { } ) ,
. . . ( args . runtimeInstallPolicy ? { runtimeInstallPolicy : args.runtimeInstallPolicy } : { } ) ,
} ,
io ,
) ;
return result . exitCode ;
} catch ( error ) {
await reportException ( {
error ,
context : { source : 'ingest context-build' , handled : true , fatal : false } ,
2026-05-10 23:12:26 +02:00
projectDir : args.projectDir ,
2026-06-05 19:36:21 +02:00
io ,
redactionSecrets : await collectTelemetryRedactionSecrets ( {
project ,
projectDir : args.projectDir ,
connectionId : args.targetConnectionId ,
includeLlm : true ,
includeEmbeddings : true ,
env : deps.env ? ? process . env ,
} ) ,
} ) ;
io . stderr . write ( ` ${ error instanceof Error ? error.message : String ( error ) } \ n ` ) ;
return 1 ;
}
2026-05-10 23:12:26 +02:00
}
const plan = buildPublicIngestPlan ( project , args ) ;
2026-05-10 23:51:24 +02:00
const results : KtxPublicIngestTargetResult [ ] = [ ] ;
2026-05-10 23:12:26 +02:00
2026-05-14 01:43:06 +02:00
if ( ! args . json ) {
for ( const notice of plan . notices ? ? [ ] ) {
io . stdout . write ( ` ${ notice } \ n ` ) ;
}
for ( const warning of plan . warnings ) {
io . stderr . write ( ` Warning: ${ warning } \ n ` ) ;
}
}
2026-06-01 23:31:31 +02:00
for ( const [ index , target ] of plan . targets . entries ( ) ) {
if ( args . json ) {
2026-06-02 20:03:27 +02:00
results . push ( await executePublicIngestTarget ( target , args , io , deps , project ) ) ;
2026-06-01 23:31:31 +02:00
continue ;
}
const capture = createCapturedPublicIngestIo ( ) ;
const progress = createPlainPublicIngestProgress ( io , {
target ,
index ,
total : plan.targets.length ,
} ) ;
const targetDeps : KtxPublicIngestDeps = {
. . . deps ,
scanProgress : progress.scanProgress ,
ingestProgress : progress.ingestProgress ,
onPhaseStart : progress.onPhaseStart ,
onPhaseEnd : progress.onPhaseEnd ,
runtimeIo : deps.runtimeIo ? ? io ,
} ;
2026-06-02 20:03:27 +02:00
results . push ( await executePublicIngestTarget ( target , args , capture , targetDeps , project ) ) ;
2026-05-10 23:12:26 +02:00
}
if ( args . json ) {
io . stdout . write ( ` ${ JSON . stringify ( { plan , results } , null, 2)} \ n ` ) ;
} else {
renderPlainResults ( results , io ) ;
}
2026-05-22 18:18:47 +02:00
await emitProjectStackSnapshot ( { projectDir : args.projectDir , io } ) ;
2026-05-10 23:12:26 +02:00
return results . some ( resultFailed ) ? 1 : 0 ;
}