chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
|
|
|
import type { KtxProgressPort, KtxScanMode, KtxScanReport, KtxScanWarning } from './context/scan/types.js';
|
|
|
|
|
import { runLocalScan } from './context/scan/local-scan.js';
|
|
|
|
|
import { loadKtxProject } from './context/project/project.js';
|
|
|
|
|
import { getKtxCliPackageInfo } from './cli-runtime.js';
|
2026-05-21 02:21:22 +02:00
|
|
|
import { resolveProjectEmbeddingProvider } from './embedding-resolution.js';
|
2026-05-10 23:51:24 +02:00
|
|
|
import type { KtxCliIo } from './index.js';
|
|
|
|
|
import { createKtxCliLocalIngestAdapters } from './local-adapters.js';
|
|
|
|
|
import { createKtxCliScanConnector } from './local-scan-connectors.js';
|
2026-05-11 15:50:34 +02:00
|
|
|
import type { KtxManagedPythonInstallPolicy } from './managed-python-command.js';
|
2026-05-10 23:12:26 +02:00
|
|
|
import { profileMark } from './startup-profile.js';
|
2026-05-22 18:18:47 +02:00
|
|
|
import { emitTelemetryEvent } from './telemetry/index.js';
|
2026-06-02 17:23:51 +02:00
|
|
|
import { formatErrorDetail, scrubErrorClass } from './telemetry/scrubber.js';
|
2026-05-10 23:12:26 +02:00
|
|
|
|
|
|
|
|
profileMark('module:scan');
|
|
|
|
|
|
2026-05-13 12:00:08 +02:00
|
|
|
export interface KtxScanArgs {
|
|
|
|
|
command: 'run';
|
|
|
|
|
projectDir: string;
|
|
|
|
|
connectionId: string;
|
|
|
|
|
mode: KtxScanMode;
|
|
|
|
|
detectRelationships: boolean;
|
|
|
|
|
dryRun: boolean;
|
|
|
|
|
databaseIntrospectionUrl?: string;
|
|
|
|
|
cliVersion?: string;
|
|
|
|
|
runtimeInstallPolicy?: KtxManagedPythonInstallPolicy;
|
|
|
|
|
}
|
2026-05-10 23:12:26 +02:00
|
|
|
|
2026-05-13 17:01:48 +02:00
|
|
|
export interface KtxScanDeps {
|
2026-05-10 23:12:26 +02:00
|
|
|
runLocalScan?: typeof runLocalScan;
|
2026-05-10 23:51:24 +02:00
|
|
|
createLocalIngestAdapters?: typeof createKtxCliLocalIngestAdapters;
|
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
|
|
|
resolveEmbeddingProvider?: typeof resolveProjectEmbeddingProvider;
|
2026-05-13 17:01:48 +02:00
|
|
|
progress?: KtxProgressPort;
|
2026-05-16 11:39:43 +02:00
|
|
|
runtimeIo?: KtxCliIo;
|
2026-05-10 23:12:26 +02:00
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function shouldUseStyledOutput(io: KtxCliIo): boolean {
|
2026-05-10 23:12:26 +02:00
|
|
|
return io.stdout.isTTY === true && !process.env.NO_COLOR && process.env.TERM !== 'dumb' && !process.env.CI;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function green(text: string): string {
|
|
|
|
|
return `\u001b[32m${text}\u001b[39m`;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function dim(text: string): string {
|
|
|
|
|
return `\u001b[2m${text}\u001b[22m`;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function quoteCliArg(value: string): string {
|
|
|
|
|
if (/^[A-Za-z0-9_./:@=-]+$/.test(value)) {
|
|
|
|
|
return value;
|
|
|
|
|
}
|
|
|
|
|
return `'${value.replaceAll("'", "'\\''")}'`;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function plural(count: number, singular: string, pluralValue = `${singular}s`): string {
|
|
|
|
|
return count === 1 ? singular : pluralValue;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function tableChangeCount(report: KtxScanReport): number {
|
2026-05-10 23:12:26 +02:00
|
|
|
return report.diffSummary.tablesAdded + report.diffSummary.tablesModified + report.diffSummary.tablesDeleted;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function totalTableCount(report: KtxScanReport): number {
|
2026-05-10 23:12:26 +02:00
|
|
|
return tableChangeCount(report) + report.diffSummary.tablesUnchanged;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-22 18:18:47 +02:00
|
|
|
function scanColumnCount(report: KtxScanReport): number {
|
|
|
|
|
return report.structuralSyncStats.columnsCreated + report.structuralSyncStats.columnsUpdated;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function inferredFkCount(report: KtxScanReport): number {
|
|
|
|
|
return report.relationships.accepted + report.relationships.review + report.relationships.rejected;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function writeScanIdentity(report: KtxScanReport, io: KtxCliIo): void {
|
2026-05-10 23:12:26 +02:00
|
|
|
io.stdout.write(`Run: ${report.runId}\n`);
|
|
|
|
|
io.stdout.write(`Connection: ${report.connectionId}\n`);
|
|
|
|
|
io.stdout.write(`Mode: ${report.mode}\n`);
|
|
|
|
|
io.stdout.write(`Sync: ${report.syncId}\n`);
|
|
|
|
|
io.stdout.write(`Dry run: ${report.dryRun ? 'yes' : 'no'}\n`);
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function writeWhatChanged(report: KtxScanReport, io: KtxCliIo): void {
|
2026-05-10 23:12:26 +02:00
|
|
|
const changedTables = tableChangeCount(report);
|
|
|
|
|
const totalTables = totalTableCount(report);
|
|
|
|
|
io.stdout.write('\nWhat changed\n');
|
|
|
|
|
const tableNoun = plural(totalTables, 'table');
|
|
|
|
|
const changeNoun = plural(changedTables, 'change');
|
|
|
|
|
io.stdout.write(
|
|
|
|
|
` Semantic layer comparison found ${changedTables} ${changeNoun} across ${totalTables} ${tableNoun}\n`,
|
|
|
|
|
);
|
|
|
|
|
io.stdout.write(` New tables: ${report.diffSummary.tablesAdded}\n`);
|
|
|
|
|
io.stdout.write(` Changed tables: ${report.diffSummary.tablesModified}\n`);
|
|
|
|
|
io.stdout.write(` Removed tables: ${report.diffSummary.tablesDeleted}\n`);
|
|
|
|
|
io.stdout.write(` Unchanged tables: ${report.diffSummary.tablesUnchanged}\n`);
|
|
|
|
|
if (
|
|
|
|
|
report.diffSummary.columnsAdded > 0 ||
|
|
|
|
|
report.diffSummary.columnsModified > 0 ||
|
|
|
|
|
report.diffSummary.columnsDeleted > 0
|
|
|
|
|
) {
|
|
|
|
|
io.stdout.write(` New columns: ${report.diffSummary.columnsAdded}\n`);
|
|
|
|
|
io.stdout.write(` Changed columns: ${report.diffSummary.columnsModified}\n`);
|
|
|
|
|
io.stdout.write(` Removed columns: ${report.diffSummary.columnsDeleted}\n`);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function hasRelationshipResults(report: KtxScanReport): boolean {
|
2026-05-10 23:12:26 +02:00
|
|
|
return (
|
|
|
|
|
report.relationships.accepted > 0 ||
|
|
|
|
|
report.relationships.review > 0 ||
|
|
|
|
|
report.relationships.rejected > 0 ||
|
|
|
|
|
report.relationships.skipped > 0
|
|
|
|
|
);
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function writeRelationships(report: KtxScanReport, io: KtxCliIo): void {
|
2026-05-10 23:12:26 +02:00
|
|
|
if (!hasRelationshipResults(report)) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
io.stdout.write('\nRelationships\n');
|
|
|
|
|
io.stdout.write(` Accepted: ${report.relationships.accepted}\n`);
|
|
|
|
|
io.stdout.write(` Review: ${report.relationships.review}\n`);
|
|
|
|
|
io.stdout.write(` Rejected: ${report.relationships.rejected}\n`);
|
|
|
|
|
io.stdout.write(` Skipped: ${report.relationships.skipped}\n`);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function capabilityGapMessage(gap: string): string {
|
|
|
|
|
if (gap === 'columnStats') {
|
|
|
|
|
return 'columnStats is unavailable; relationship confidence may be lower.';
|
|
|
|
|
}
|
|
|
|
|
if (gap === 'tableSampling' || gap === 'columnSampling') {
|
|
|
|
|
return `${gap} is unavailable; descriptions may be less specific.`;
|
|
|
|
|
}
|
|
|
|
|
if (gap === 'readOnlySql') {
|
|
|
|
|
return 'readOnlySql is unavailable; relationship and validation checks may be limited.';
|
|
|
|
|
}
|
|
|
|
|
return `${gap} is unavailable; scan results may be less complete.`;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function warningLine(warning: KtxScanWarning): string {
|
2026-05-10 23:12:26 +02:00
|
|
|
const location = warning.table ? `${warning.table}${warning.column ? `.${warning.column}` : ''}: ` : '';
|
|
|
|
|
return `${warning.code}: ${location}${warning.message}`;
|
|
|
|
|
}
|
|
|
|
|
|
fix(context): merge overlay columns onto manifest columns by name (#94)
* fix(context): merge overlay columns onto manifest columns by name
composeOverlay was appending overlay columns to the manifest column list,
producing duplicate entries when dbt/metabase overlays declared a column
just to attach descriptions. The duplicates carried no `type`, so the
pydantic SourceDefinition rejected them at semantic-query time and broke
`ktx sl query` for every overlay-backed measure. Now overlay columns
match base columns by name (case-insensitive): same-name entries merge
onto the manifest (overlay fields win, type/role fall back to the base,
descriptions merge per source key) and only new names append.
* refactor(sl): split overlay columns from column_overrides and enforce TS/Python wire contract
Overlay sources now have two distinct collections: `columns:` for computed
columns (requiring `expr` + `type`) and `column_overrides:` for metadata
patches to inherited manifest columns. Composing or loading an overlay that
mixes the two — or references an unknown column — fails with a typed error.
Introduce `ResolvedSemanticLayerSource` / `resolvedSourceSchema` /
`toResolvedWire` as the strict shape sent to the Python engine, and add a
schema contract test that diffs Zod against the Pydantic JSON schema dumped
by `python -m semantic_layer dump-schema`. `SourceDefinition` is now
`extra="forbid"` on the Python side.
`loadAllSources` surfaces per-file load errors instead of swallowing them,
so validation/query paths can report manifest shard parse failures.
* fix(context): make scan description generation resilient and quiet
A transient sampleTable failure during ingest used to take out every
table in a connection: generateTableDescription returned a hardcoded
'Table not found' string into descriptions.ai, and KtxDescriptionGenerator
was constructed without a logger, so the failure left no trail anywhere.
- sampleTable / sampleColumn calls retry 3x with 200/400/800ms backoff,
honouring KtxScanContext.signal via a new KtxAbortedError.
- On retry exhaustion or missing capability, table generation falls back
to a metadata-only prompt built from column name / native type / comment
/ rawDescriptions. The column path follows the same rule -- call the
LLM when any of samples or rawDescriptions are available; skip only
when both are absent.
- Logger is now threaded from KtxScanContext into the generator. Failures
emit structured KtxScanWarning entries (new description_fallback_used
code, plus existing sampling_failed / enrichment_failed /
connector_capability_missing). ktx scan groups warnings by code so a
batch of identical failures collapses to one summary line plus sample.
- Returns null on failure instead of the 'Table not found' sentinel; the
manifest writer's existing guard already skips empty descriptions, so
schema YAML no longer carries misleading text. SCAN_MANAGED_DESCRIPTION_KEYS
already strips stale 'ai' on merge, so existing YAML clears on next run.
Also suppress AI SDK v6 'system in messages' warning: pull system messages
out of KtxMessageBuilder.wrapSimple's output via a new splitKtxSystemMessages
helper and pass them top-level to generateText (preserves cacheControl
providerOptions on the SystemModelMessage). Agent-runner's local
splitSystemPromptMessages dedupes onto the shared helper.
* test(docs): align examples-docs assertions with revamped docs
PR #103 (setup/guide doc revamp) reworded several CLI examples and
connection labels; the assertions in scripts/examples-docs.test.mjs
still referenced the pre-revamp wording and were failing in CI on main.
Update the regexes to match the post-revamp content:
- drop the `--json` flag from the sl-query example expectation
- move the `Driver:` / `Status: ok` probe to the connection reference,
which is where that output now lives (driver id is lowercase
`postgres`, not the display name `PostgreSQL`)
- drop the obsolete `Install \`uv\`...` troubleshooting line
- accept `<connectionId>` everywhere; the docs no longer use the
hyphenated `<connection-id>` form
- match the `warehouse` connection id used in the quickstart instead of
the `postgres-warehouse` id only used in the README and setup ref
* fix(sl): skip TS/Python schema contract test when uv is unavailable
The TypeScript checks CI job does not install uv or Python, so the
module-level `execFileSync('uv', ...)` in schemas.contract.test.ts threw
ENOENT and failed the suite. Wrap the schema dump in a try/catch and
guard the describe block with `describe.skipIf` so the test skips in
environments without uv. Local dev and any CI job that has uv on PATH
still runs the cross-language contract assertion.
2026-05-15 02:11:04 +02:00
|
|
|
function groupWarningsByCode(warnings: readonly KtxScanWarning[]): Map<string, KtxScanWarning[]> {
|
|
|
|
|
const groups = new Map<string, KtxScanWarning[]>();
|
|
|
|
|
for (const warning of warnings) {
|
|
|
|
|
const list = groups.get(warning.code);
|
|
|
|
|
if (list) {
|
|
|
|
|
list.push(warning);
|
|
|
|
|
} else {
|
|
|
|
|
groups.set(warning.code, [warning]);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return groups;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function describeWarningGroup(code: string, count: number): string {
|
|
|
|
|
switch (code) {
|
|
|
|
|
case 'sampling_failed':
|
|
|
|
|
return `${count} ${plural(count, 'table')} could not be sampled (retries exhausted); descriptions used metadata-only fallback or were skipped.`;
|
|
|
|
|
case 'description_fallback_used':
|
|
|
|
|
return `${count} ${plural(count, 'table')} got an AI description from column metadata only (no sample rows available).`;
|
|
|
|
|
case 'enrichment_failed':
|
|
|
|
|
return `${count} ${plural(count, 'table/column')} could not be enriched.`;
|
|
|
|
|
case 'connector_capability_missing':
|
|
|
|
|
return `${count} ${plural(count, 'table')} affected by missing connector capability.`;
|
|
|
|
|
case 'statistics_failed':
|
|
|
|
|
return `${count} statistics ${plural(count, 'lookup')} failed.`;
|
|
|
|
|
case 'llm_unavailable':
|
|
|
|
|
return 'LLM provider unavailable; AI enrichment was skipped.';
|
|
|
|
|
case 'embedding_unavailable':
|
|
|
|
|
return 'Embedding provider unavailable; embeddings were skipped.';
|
|
|
|
|
case 'relationship_validation_failed':
|
|
|
|
|
return `${count} relationship ${plural(count, 'validation')} could not run.`;
|
|
|
|
|
case 'relationship_llm_invalid_reference':
|
|
|
|
|
return `${count} LLM-proposed ${plural(count, 'relationship')} referenced unknown columns.`;
|
|
|
|
|
case 'relationship_llm_proposal_failed':
|
|
|
|
|
return `${count} LLM relationship ${plural(count, 'proposal')} failed.`;
|
|
|
|
|
case 'scan_enrichment_backend_not_configured':
|
|
|
|
|
return 'Scan enrichment backend is not configured; AI stages were skipped.';
|
|
|
|
|
case 'credential_redacted':
|
|
|
|
|
return `${count} ${plural(count, 'credential')} were redacted from scan output.`;
|
|
|
|
|
default:
|
|
|
|
|
return `${count} ${plural(count, 'warning')} (${code})`;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-11 15:50:34 +02:00
|
|
|
function managedDaemonOptionsForScanRun(args: Extract<KtxScanArgs, { command: 'run' }>, io: KtxCliIo) {
|
|
|
|
|
if (args.databaseIntrospectionUrl || !args.cliVersion || !args.runtimeInstallPolicy) {
|
|
|
|
|
return undefined;
|
|
|
|
|
}
|
|
|
|
|
return {
|
|
|
|
|
cliVersion: args.cliVersion,
|
2026-05-14 14:35:55 +02:00
|
|
|
projectDir: args.projectDir,
|
2026-05-11 15:50:34 +02:00
|
|
|
installPolicy: args.runtimeInstallPolicy,
|
|
|
|
|
io,
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function writeNeedsAttention(report: KtxScanReport, io: KtxCliIo): void {
|
2026-05-10 23:12:26 +02:00
|
|
|
io.stdout.write('\nNeeds attention\n');
|
|
|
|
|
if (report.warnings.length === 0 && report.capabilityGaps.length === 0) {
|
|
|
|
|
io.stdout.write(' None\n');
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
if (report.warnings.length > 0) {
|
|
|
|
|
io.stdout.write(` ${report.warnings.length} ${plural(report.warnings.length, 'warning')}\n`);
|
fix(context): merge overlay columns onto manifest columns by name (#94)
* fix(context): merge overlay columns onto manifest columns by name
composeOverlay was appending overlay columns to the manifest column list,
producing duplicate entries when dbt/metabase overlays declared a column
just to attach descriptions. The duplicates carried no `type`, so the
pydantic SourceDefinition rejected them at semantic-query time and broke
`ktx sl query` for every overlay-backed measure. Now overlay columns
match base columns by name (case-insensitive): same-name entries merge
onto the manifest (overlay fields win, type/role fall back to the base,
descriptions merge per source key) and only new names append.
* refactor(sl): split overlay columns from column_overrides and enforce TS/Python wire contract
Overlay sources now have two distinct collections: `columns:` for computed
columns (requiring `expr` + `type`) and `column_overrides:` for metadata
patches to inherited manifest columns. Composing or loading an overlay that
mixes the two — or references an unknown column — fails with a typed error.
Introduce `ResolvedSemanticLayerSource` / `resolvedSourceSchema` /
`toResolvedWire` as the strict shape sent to the Python engine, and add a
schema contract test that diffs Zod against the Pydantic JSON schema dumped
by `python -m semantic_layer dump-schema`. `SourceDefinition` is now
`extra="forbid"` on the Python side.
`loadAllSources` surfaces per-file load errors instead of swallowing them,
so validation/query paths can report manifest shard parse failures.
* fix(context): make scan description generation resilient and quiet
A transient sampleTable failure during ingest used to take out every
table in a connection: generateTableDescription returned a hardcoded
'Table not found' string into descriptions.ai, and KtxDescriptionGenerator
was constructed without a logger, so the failure left no trail anywhere.
- sampleTable / sampleColumn calls retry 3x with 200/400/800ms backoff,
honouring KtxScanContext.signal via a new KtxAbortedError.
- On retry exhaustion or missing capability, table generation falls back
to a metadata-only prompt built from column name / native type / comment
/ rawDescriptions. The column path follows the same rule -- call the
LLM when any of samples or rawDescriptions are available; skip only
when both are absent.
- Logger is now threaded from KtxScanContext into the generator. Failures
emit structured KtxScanWarning entries (new description_fallback_used
code, plus existing sampling_failed / enrichment_failed /
connector_capability_missing). ktx scan groups warnings by code so a
batch of identical failures collapses to one summary line plus sample.
- Returns null on failure instead of the 'Table not found' sentinel; the
manifest writer's existing guard already skips empty descriptions, so
schema YAML no longer carries misleading text. SCAN_MANAGED_DESCRIPTION_KEYS
already strips stale 'ai' on merge, so existing YAML clears on next run.
Also suppress AI SDK v6 'system in messages' warning: pull system messages
out of KtxMessageBuilder.wrapSimple's output via a new splitKtxSystemMessages
helper and pass them top-level to generateText (preserves cacheControl
providerOptions on the SystemModelMessage). Agent-runner's local
splitSystemPromptMessages dedupes onto the shared helper.
* test(docs): align examples-docs assertions with revamped docs
PR #103 (setup/guide doc revamp) reworded several CLI examples and
connection labels; the assertions in scripts/examples-docs.test.mjs
still referenced the pre-revamp wording and were failing in CI on main.
Update the regexes to match the post-revamp content:
- drop the `--json` flag from the sl-query example expectation
- move the `Driver:` / `Status: ok` probe to the connection reference,
which is where that output now lives (driver id is lowercase
`postgres`, not the display name `PostgreSQL`)
- drop the obsolete `Install \`uv\`...` troubleshooting line
- accept `<connectionId>` everywhere; the docs no longer use the
hyphenated `<connection-id>` form
- match the `warehouse` connection id used in the quickstart instead of
the `postgres-warehouse` id only used in the README and setup ref
* fix(sl): skip TS/Python schema contract test when uv is unavailable
The TypeScript checks CI job does not install uv or Python, so the
module-level `execFileSync('uv', ...)` in schemas.contract.test.ts threw
ENOENT and failed the suite. Wrap the schema dump in a try/catch and
guard the describe block with `describe.skipIf` so the test skips in
environments without uv. Local dev and any CI job that has uv on PATH
still runs the cross-language contract assertion.
2026-05-15 02:11:04 +02:00
|
|
|
const groups = groupWarningsByCode(report.warnings);
|
|
|
|
|
for (const [code, warnings] of groups) {
|
|
|
|
|
io.stdout.write(` - ${describeWarningGroup(code, warnings.length)}\n`);
|
|
|
|
|
const first = warnings[0];
|
|
|
|
|
if (first) {
|
|
|
|
|
io.stdout.write(` ${warningLine(first)}\n`);
|
|
|
|
|
}
|
|
|
|
|
if (warnings.length > 1) {
|
|
|
|
|
const moreTables = warnings
|
|
|
|
|
.slice(1)
|
|
|
|
|
.map((warning) =>
|
|
|
|
|
warning.table ? (warning.column ? `${warning.table}.${warning.column}` : warning.table) : null,
|
|
|
|
|
)
|
|
|
|
|
.filter((value): value is string => value !== null)
|
|
|
|
|
.slice(0, 3);
|
|
|
|
|
if (moreTables.length > 0) {
|
|
|
|
|
const suffix = warnings.length - 1 > moreTables.length ? `, …` : '';
|
|
|
|
|
io.stdout.write(` also: ${moreTables.join(', ')}${suffix}\n`);
|
|
|
|
|
}
|
|
|
|
|
}
|
2026-05-10 23:12:26 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (report.capabilityGaps.length > 0) {
|
|
|
|
|
io.stdout.write(` ${report.capabilityGaps.length} capability ${plural(report.capabilityGaps.length, 'gap')}\n`);
|
|
|
|
|
for (const gap of report.capabilityGaps) {
|
|
|
|
|
io.stdout.write(` - ${capabilityGapMessage(gap)}\n`);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function writeArtifacts(report: KtxScanReport, io: KtxCliIo): void {
|
2026-05-10 23:12:26 +02:00
|
|
|
io.stdout.write('\nArtifacts\n');
|
|
|
|
|
io.stdout.write(` Report: ${report.artifactPaths.reportPath ?? 'none'}\n`);
|
|
|
|
|
io.stdout.write(` Raw sources: ${report.artifactPaths.rawSourcesDir ?? 'none'}\n`);
|
|
|
|
|
if (report.artifactPaths.manifestShards.length > 0) {
|
|
|
|
|
io.stdout.write(` Schema shards: ${report.artifactPaths.manifestShards.length}\n`);
|
|
|
|
|
}
|
|
|
|
|
if (report.artifactPaths.enrichmentArtifacts.length > 0) {
|
|
|
|
|
io.stdout.write(` Enrichment artifacts: ${report.artifactPaths.enrichmentArtifacts.length}\n`);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function writeHumanReportBody(report: KtxScanReport, io: KtxCliIo): void {
|
2026-05-10 23:12:26 +02:00
|
|
|
writeScanIdentity(report, io);
|
|
|
|
|
writeWhatChanged(report, io);
|
|
|
|
|
writeRelationships(report, io);
|
|
|
|
|
writeNeedsAttention(report, io);
|
|
|
|
|
writeArtifacts(report, io);
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
function writeRunSummary(report: KtxScanReport, projectDir: string, io: KtxCliIo): void {
|
2026-05-10 23:12:26 +02:00
|
|
|
const styled = shouldUseStyledOutput(io);
|
2026-05-10 23:51:24 +02:00
|
|
|
io.stdout.write(`${styled ? green('✓') : ''}${styled ? ' ' : ''}KTX scan completed\n`);
|
2026-05-10 23:12:26 +02:00
|
|
|
io.stdout.write('Status: done\n');
|
|
|
|
|
writeHumanReportBody(report, io);
|
|
|
|
|
const projectDirArg = quoteCliArg(projectDir);
|
|
|
|
|
io.stdout.write('\nNext:\n');
|
2026-05-13 12:00:08 +02:00
|
|
|
const statusCommand = styled ? dim('ktx status') : 'ktx status';
|
|
|
|
|
io.stdout.write(` ${statusCommand} --project-dir ${projectDirArg}\n`);
|
2026-05-10 23:12:26 +02:00
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
interface KtxCliScanProgressState {
|
2026-05-10 23:12:26 +02:00
|
|
|
progress: number;
|
|
|
|
|
hasPendingTransient: boolean;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
interface KtxCliScanProgressUpdateOptions {
|
2026-05-10 23:12:26 +02:00
|
|
|
transient?: boolean;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
interface KtxCliScanProgress extends Omit<KtxProgressPort, 'update'> {
|
|
|
|
|
update(progress: number, message?: string, options?: KtxCliScanProgressUpdateOptions): Promise<void>;
|
2026-05-10 23:12:26 +02:00
|
|
|
flush(): void;
|
|
|
|
|
}
|
|
|
|
|
|
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
|
|
|
/** @internal */
|
2026-05-10 23:12:26 +02:00
|
|
|
export function createCliScanProgress(
|
2026-05-10 23:51:24 +02:00
|
|
|
io: KtxCliIo,
|
|
|
|
|
state: KtxCliScanProgressState = { progress: 0, hasPendingTransient: false },
|
2026-05-10 23:12:26 +02:00
|
|
|
start = 0,
|
|
|
|
|
weight = 1,
|
2026-05-10 23:51:24 +02:00
|
|
|
): KtxCliScanProgress {
|
2026-05-10 23:12:26 +02:00
|
|
|
const shouldWrite = io.stdout.isTTY === true && !process.env.CI;
|
2026-05-10 23:51:24 +02:00
|
|
|
const progress: KtxCliScanProgress = {
|
|
|
|
|
async update(value: number, message?: string, options?: KtxCliScanProgressUpdateOptions) {
|
2026-05-10 23:12:26 +02:00
|
|
|
const absoluteValue = start + Math.max(0, Math.min(1, value)) * weight;
|
|
|
|
|
state.progress = Math.max(state.progress, Math.min(1, absoluteValue));
|
|
|
|
|
if (!shouldWrite || !message) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
const percent = Math.max(0, Math.min(100, Math.round(absoluteValue * 100)));
|
|
|
|
|
const line = `[${percent}%] ${message}`;
|
|
|
|
|
if (options?.transient === true) {
|
|
|
|
|
io.stdout.write(`\r${line}\u001b[K`);
|
|
|
|
|
state.hasPendingTransient = true;
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
progress.flush();
|
|
|
|
|
io.stdout.write(`${line}\n`);
|
|
|
|
|
},
|
|
|
|
|
startPhase(phaseWeight: number) {
|
2026-05-12 14:07:02 +02:00
|
|
|
return createCliScanProgress(io, state, state.progress, weight * phaseWeight);
|
2026-05-10 23:12:26 +02:00
|
|
|
},
|
|
|
|
|
flush() {
|
|
|
|
|
if (!shouldWrite || !state.hasPendingTransient) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
io.stdout.write('\n');
|
|
|
|
|
state.hasPendingTransient = false;
|
|
|
|
|
},
|
|
|
|
|
};
|
|
|
|
|
return progress;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 23:51:24 +02:00
|
|
|
export async function runKtxScan(args: KtxScanArgs, io: KtxCliIo = process, deps: KtxScanDeps = {}): Promise<number> {
|
2026-05-22 18:18:47 +02:00
|
|
|
const startedAt = performance.now();
|
2026-05-10 23:12:26 +02:00
|
|
|
try {
|
2026-05-21 02:21:22 +02:00
|
|
|
const project = await loadKtxProject({ projectDir: args.projectDir });
|
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
|
|
|
const resolveEmbeddingProvider = deps.resolveEmbeddingProvider ?? resolveProjectEmbeddingProvider;
|
|
|
|
|
const resolution = await resolveEmbeddingProvider(project, {
|
2026-05-21 02:21:22 +02:00
|
|
|
mode: 'ensure',
|
fix(cli): resolve managed-embeddings daemon URL at project boundary (#184)
A clean `ktx setup` was failing verification because the managed
local-embeddings daemon URL was passed library-side through
`process.env[KTX_MANAGED_SENTENCE_TRANSFORMERS_BASE_URL]`, and the setup
flow never wrote that variable. With no resolved URL the embedding
provider was null, the deep scan emitted
`scan_enrichment_backend_not_configured`, descriptions + embeddings
stayed `skipped`, and the agent-readiness check exited 1.
Replace the env-var indirection with CLI-side substitution at the
project-load boundary. New `loadKtxCliProject` wraps `loadKtxProject`,
ensures the managed daemon when `managed:local-embeddings` is present in
`config.ingest.embeddings` or `config.scan.enrichment.embeddings`, and
substitutes the resolved baseUrl into the in-memory config. Runtime
entry points (scan, ingest, public-ingest, admin-reindex) use the new
loader; setup-time persistence paths keep raw `loadKtxProject` so the
on-disk `ktx.yaml` keeps the portable sentinel.
Cleanup follows from the new design: drop
`MANAGED_SENTENCE_TRANSFORMERS_BASE_URL_ENV`, remove the env-var lookup
branch in `resolveSentenceTransformersBaseUrl`, drop the `env` field
from `ManagedLocalEmbeddingsDaemon`, and collapse the manual
daemon-ensure dance in `admin-reindex.ts`.
2026-05-20 14:43:02 +02:00
|
|
|
installPolicy: args.runtimeInstallPolicy ?? 'never',
|
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
|
|
|
cliVersion: args.cliVersion ?? getKtxCliPackageInfo().version,
|
|
|
|
|
io: deps.runtimeIo ?? io,
|
fix(cli): resolve managed-embeddings daemon URL at project boundary (#184)
A clean `ktx setup` was failing verification because the managed
local-embeddings daemon URL was passed library-side through
`process.env[KTX_MANAGED_SENTENCE_TRANSFORMERS_BASE_URL]`, and the setup
flow never wrote that variable. With no resolved URL the embedding
provider was null, the deep scan emitted
`scan_enrichment_backend_not_configured`, descriptions + embeddings
stayed `skipped`, and the agent-readiness check exited 1.
Replace the env-var indirection with CLI-side substitution at the
project-load boundary. New `loadKtxCliProject` wraps `loadKtxProject`,
ensures the managed daemon when `managed:local-embeddings` is present in
`config.ingest.embeddings` or `config.scan.enrichment.embeddings`, and
substitutes the resolved baseUrl into the in-memory config. Runtime
entry points (scan, ingest, public-ingest, admin-reindex) use the new
loader; setup-time persistence paths keep raw `loadKtxProject` so the
on-disk `ktx.yaml` keeps the portable sentinel.
Cleanup follows from the new design: drop
`MANAGED_SENTENCE_TRANSFORMERS_BASE_URL_ENV`, remove the env-var lookup
branch in `resolveSentenceTransformersBaseUrl`, drop the `env` field
from `ManagedLocalEmbeddingsDaemon`, and collapse the manual
daemon-ensure dance in `admin-reindex.ts`.
2026-05-20 14:43:02 +02:00
|
|
|
});
|
2026-05-21 02:21:22 +02:00
|
|
|
const embeddingProvider =
|
|
|
|
|
resolution.kind === 'disabled' || resolution.kind === 'managed-unavailable' ? null : resolution.provider;
|
2026-05-16 11:39:43 +02:00
|
|
|
const managedDaemon = managedDaemonOptionsForScanRun(args, deps.runtimeIo ?? io);
|
2026-05-10 23:12:26 +02:00
|
|
|
const connector =
|
|
|
|
|
args.mode !== 'structural' || args.detectRelationships
|
2026-05-10 23:51:24 +02:00
|
|
|
? await createKtxCliScanConnector(project, args.connectionId)
|
2026-05-10 23:12:26 +02:00
|
|
|
: undefined;
|
2026-05-13 17:01:48 +02:00
|
|
|
const cliProgress = deps.progress ? null : createCliScanProgress(io);
|
|
|
|
|
const progress = deps.progress ?? cliProgress;
|
2026-05-10 23:12:26 +02:00
|
|
|
try {
|
|
|
|
|
const result = await (deps.runLocalScan ?? runLocalScan)({
|
|
|
|
|
project,
|
|
|
|
|
connectionId: args.connectionId,
|
|
|
|
|
mode: args.mode,
|
|
|
|
|
detectRelationships: args.detectRelationships,
|
|
|
|
|
dryRun: args.dryRun,
|
|
|
|
|
trigger: 'cli',
|
|
|
|
|
databaseIntrospectionUrl: args.databaseIntrospectionUrl,
|
|
|
|
|
connector,
|
2026-05-21 02:21:22 +02:00
|
|
|
embeddingProvider,
|
2026-05-10 23:51:24 +02:00
|
|
|
adapters: (deps.createLocalIngestAdapters ?? createKtxCliLocalIngestAdapters)(project, {
|
2026-05-11 15:50:34 +02:00
|
|
|
...(args.databaseIntrospectionUrl ? { databaseIntrospectionUrl: args.databaseIntrospectionUrl } : {}),
|
|
|
|
|
...(managedDaemon ? { managedDaemon } : {}),
|
2026-05-10 23:12:26 +02:00
|
|
|
}),
|
2026-05-13 17:01:48 +02:00
|
|
|
...(progress ? { progress } : {}),
|
2026-05-10 23:12:26 +02:00
|
|
|
});
|
2026-05-13 17:01:48 +02:00
|
|
|
cliProgress?.flush();
|
2026-05-22 18:18:47 +02:00
|
|
|
await emitTelemetryEvent({
|
|
|
|
|
name: 'scan_completed',
|
|
|
|
|
projectDir: args.projectDir,
|
|
|
|
|
io,
|
|
|
|
|
fields: {
|
|
|
|
|
driver: result.report.driver,
|
|
|
|
|
tableCount: totalTableCount(result.report),
|
|
|
|
|
columnCount: scanColumnCount(result.report),
|
|
|
|
|
inferredFkCount: inferredFkCount(result.report),
|
|
|
|
|
declaredFkCount: 0,
|
|
|
|
|
durationMs: Math.max(0, performance.now() - startedAt),
|
|
|
|
|
outcome: 'ok',
|
|
|
|
|
},
|
|
|
|
|
});
|
2026-05-10 23:12:26 +02:00
|
|
|
writeRunSummary(result.report, args.projectDir, io);
|
|
|
|
|
} finally {
|
2026-05-13 17:01:48 +02:00
|
|
|
cliProgress?.flush();
|
fix(snowflake): unblock multi-schema ingest and relationship discovery (#204)
* feat(setup): drop redundant Snowflake schema prompt; fall back to free-text on listSchemas failure
Snowflake setup previously asked for a single schema as free text, then
ran a multiselect against the discovered schemas — two schema questions
back-to-back, with the first being only a session bootstrap. The SDK's
`schema` is optional, so the bootstrap step is unnecessary.
- Remove the free-text Snowflake schema prompt; only pass `schema` to
snowflake-sdk when one is configured.
- When `listSchemas()` fails (e.g. role lacks SHOW SCHEMAS), prompt the
user for a comma-separated list, persist it as `schema_names`, and use
it as both the table-list filter and the multiselect default. Applies
to every driver with a scope-discovery spec, not just Snowflake.
- Update docs to lead with `schema_names`; keep `schema_name` as a
documented single-schema shorthand.
* fix(snowflake): keep introspecting when primary-key discovery is denied
The PK query joins INFORMATION_SCHEMA.TABLE_CONSTRAINTS and
INFORMATION_SCHEMA.KEY_COLUMN_USAGE, which require grants the
connection role may not have. Previously a 'SQL compilation error:
Object ANALYTICS.INFORMATION_SCHEMA.KEY_COLUMN_USAGE does not exist
or not authorized' aborted the entire introspect — schemas, columns,
and row counts were all discarded over a missing nice-to-have.
Wrap the constraint query in try/catch, log a one-line warning per
schema, and return an empty PK map. Columns end up with
primaryKey=false; relationship inference still has FK and profiling
to fall back on.
* fix(scan): unblock relationship discovery on Snowflake
Two adjacent bugs prevented the scan's relationship pipeline from producing
any joins on a Snowflake warehouse:
- relationship-profiling.ts fell through to a default `GROUP_CONCAT` branch
for unknown drivers. Snowflake has no GROUP_CONCAT, so every per-table
profile query failed with "Unknown function GROUP_CONCAT". Add an explicit
Snowflake branch that uses LISTAGG with a literal '\x1f' delimiter
(Snowflake requires the delimiter to be a constant, so CHR(31) is rejected).
- description-generation.ts destructured `connector.sampleTable` and
`connector.sampleColumn` into bare locals, losing the `this` binding when
the class-method connectors (Snowflake, Postgres, MySQL) were invoked.
Every sample call threw "Cannot read properties of undefined (reading
'assertConnection')" and degraded LLM descriptions to metadata-only
prompts. Call the methods through the connector instead.
Without these, even after the primary-key probe is allowed to fail softly,
the scan ends up with 0 validated relationships and an empty `joins:` block
in every shard YAML.
* test(scan): cover table-ref helpers
* feat(scan): plumb tableScope through live-database introspection port
* feat(scan): apply tableScope during metadata fetch
* feat(scan): enforce table scope at fetch boundary
* feat(scan): pool Snowflake sessions and batch enrichment for faster ingest (#206)
* feat(cli): add RSA key-pair auth option to Snowflake setup wizard
Extends the interactive Snowflake setup flow with an authentication-method
prompt (password vs RSA/JWT key-pair). The RSA branch collects a private-key
path (env/file/absolute) and an optional passphrase; the resulting connection
config records `authMethod: 'rsa'` with `privateKey` and `passphrase` instead
of `password`.
* feat(scan): pool Snowflake sessions
* fix(scan): reuse structural snapshots and cleanup connectors
* feat(scan): parallelize relationship profiling
* feat(scan): batch table description generation
* docs: document Snowflake ingest concurrency knobs
* fix(scan): close Snowflake ingest perf verification gaps
* fix(scan): keep batched description failure bounded
* feat(scan): dispatch query-history probes by connection driver
Extract historic-sql dialect resolution into a shared helper so the
status-project readiness check and the local ingest factory agree on
which connections enable query history and which probe to run. The
status command now picks the postgres/snowflake/bigquery probe based on
the connection's driver instead of always reporting against postgres,
which previously caused snowflake connections with queryHistory.enabled
to surface a misleading "driver is snowflake" failure.
Also drops a noisy console.warn from Snowflake primary-key discovery —
INFORMATION_SCHEMA.KEY_COLUMN_USAGE is commonly ungranted for read-only
roles and the FK + profiling paths handle the empty PK map already.
* fix(llm): allow StructuredOutput tool and raise maxTurns for generateObject
The Claude Code agent SDK announces an internal pseudo-tool named
StructuredOutput in the system/init message whenever outputFormat is set
to { type: 'json_schema' }. The runtime's isolation check built its
allowedToolIds set only from MCP tool ids and treated StructuredOutput
as an unexpected host-injected tool, so every generateObject call threw
"Claude Code runtime isolation failed: tools=StructuredOutput ..." and
the table-descriptions and relationship-LLM-proposal enrichment stages
recorded null output across the board.
Whitelist StructuredOutput specifically in generateObject's
allowedToolIds — the check also enforces missing_tools symmetry, so
generateText and runAgentLoop, which do not see StructuredOutput, must
not require it.
generateObject also ran with maxTurns: 1, which the model intermittently
breached when it emitted thinking text before the structured response.
Raised to 5 to give the schema-bound call enough headroom without
allowing unbounded loops. The existing tests now exercise the path with
an init message that announces StructuredOutput so the regression cannot
slip back in.
* chore(scripts): add ktx-reset.sh project-cleanup helper
Convenience script for repeatable ingest testing: takes a project
directory and prunes everything except ktx.yaml and .ktx/secrets/, so
the next ktx setup or ktx ingest run starts from a known-clean state.
2026-05-23 10:41:30 +02:00
|
|
|
await connector?.cleanup?.();
|
2026-05-10 23:12:26 +02:00
|
|
|
}
|
|
|
|
|
return 0;
|
|
|
|
|
} catch (error) {
|
2026-05-22 18:18:47 +02:00
|
|
|
const errorClass = scrubErrorClass(error);
|
2026-06-02 17:23:51 +02:00
|
|
|
const errorDetail = formatErrorDetail(error);
|
2026-05-22 18:18:47 +02:00
|
|
|
await emitTelemetryEvent({
|
|
|
|
|
name: 'scan_completed',
|
|
|
|
|
projectDir: args.projectDir,
|
|
|
|
|
io,
|
|
|
|
|
fields: {
|
|
|
|
|
driver: 'unknown',
|
|
|
|
|
tableCount: 0,
|
|
|
|
|
columnCount: 0,
|
|
|
|
|
inferredFkCount: 0,
|
|
|
|
|
declaredFkCount: 0,
|
|
|
|
|
durationMs: Math.max(0, performance.now() - startedAt),
|
|
|
|
|
outcome: 'error',
|
|
|
|
|
...(errorClass ? { errorClass } : {}),
|
2026-06-02 17:23:51 +02:00
|
|
|
...(errorDetail ? { errorDetail } : {}),
|
2026-05-22 18:18:47 +02:00
|
|
|
},
|
|
|
|
|
});
|
2026-05-10 23:12:26 +02:00
|
|
|
io.stderr.write(`${error instanceof Error ? error.message : String(error)}\n`);
|
|
|
|
|
return 1;
|
|
|
|
|
}
|
|
|
|
|
}
|