ktx/scripts/installed-live-database-smoke.mjs
Andrey Avtomonov 2366b00301
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm

* refactor(workspace): rewrite @ktx/llm imports to relative paths

* refactor(workspace): fold internal packages into cli

* chore(workspace): gate dead-code with knip production mode

Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.

* refactor(cli): delete internal barrel index.ts files

The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).

This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
  (the published package entry).
- Rewrites ~270 source/test files to import each name directly from
  the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
  `create-warehouse-verification-tools.ts` (the function it defined
  locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
  the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
  live-database/extracted-schema, live-database/structural-sync,
  relationship-* feedback/review chain) plus their tests and a
  cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
  (notion-client, connector barrels in scan/local-scan-connectors
  tests) to mock the source files instead.
- Points the maintainer benchmark script
  (`scripts/relationship-benchmark-report.mjs`) at source files
  instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
  production entries only for the benchmark code reached via dist by
  the maintainer script.

Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.

`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.

* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly

Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.

Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.

* docs: align "agent clients" and "data agents" terminology

Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.

Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.

* refactor(release): single source of truth for package version

Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.

Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.

- update-public-release-version.mjs rewrites both Python pyproject.toml
  files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
  normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
  @semantic-release/git assets so the release commit back to main
  carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
  replaced with "?? getKtxCliPackageInfo().version", and
  createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
  always reflects the most recent release; no sentinel pin to
  maintain.

Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.

* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime

Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.

* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal

Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.

* fix(cli): use real package metadata in print-command-tree

The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.

* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts

Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.

Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00

336 lines
10 KiB
JavaScript

#!/usr/bin/env node
import { execFile } from 'node:child_process';
import { once } from 'node:events';
import { access, mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
import { createServer } from 'node:net';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { pathToFileURL } from 'node:url';
import {
npmSmokePackageJson,
npmSmokePnpmWorkspaceYaml,
packageArtifactLayout,
} from './package-artifacts.mjs';
const POSTGRES_IMAGE = process.env.KTX_ARTIFACT_POSTGRES_IMAGE ?? 'postgres:16-alpine';
const POSTGRES_USER = 'ktx';
const POSTGRES_PASSWORD = 'postgres'; // pragma: allowlist secret
const POSTGRES_DB = 'warehouse';
export function smokeContainerName(pid = process.pid, now = Date.now()) {
return `ktx-live-db-smoke-${pid}-${now}`;
}
export function buildPostgresUrl(hostPort) {
return `postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@127.0.0.1:${hostPort}/${POSTGRES_DB}`; // pragma: allowlist secret
}
export function buildDockerRunArgs({ containerName, hostPort, image = POSTGRES_IMAGE }) {
return [
'run',
'--rm',
'-d',
'--name',
containerName,
'-e',
`POSTGRES_PASSWORD=${POSTGRES_PASSWORD}`,
'-e',
`POSTGRES_USER=${POSTGRES_USER}`,
'-e',
`POSTGRES_DB=${POSTGRES_DB}`,
'-p',
`127.0.0.1:${hostPort}:5432`,
image,
];
}
export function buildPostgresReadyArgs(containerName) {
return [
'exec',
containerName,
'psql',
'-U',
POSTGRES_USER,
'-d',
POSTGRES_DB,
'-v',
'ON_ERROR_STOP=1',
'-c',
'SELECT 1;',
];
}
export function buildSeedSql() {
return [
'DROP TABLE IF EXISTS orders;',
'DROP TABLE IF EXISTS customers;',
'CREATE TABLE customers (',
' id integer PRIMARY KEY,',
' name text NOT NULL',
');',
"COMMENT ON TABLE customers IS 'Customers captured by the artifact smoke';",
"COMMENT ON COLUMN customers.name IS 'Customer display name';",
'CREATE TABLE orders (',
' id integer PRIMARY KEY,',
' customer_id integer NOT NULL REFERENCES customers(id),',
' status text NOT NULL,',
' amount integer NOT NULL',
');',
"COMMENT ON TABLE orders IS 'Orders captured by the artifact smoke';",
"COMMENT ON COLUMN orders.amount IS 'Order amount in cents';",
"INSERT INTO customers (id, name) VALUES (1, 'Acme'), (2, 'Globex');",
"INSERT INTO orders (id, customer_id, status, amount) VALUES (10, 1, 'paid', 2000), (11, 2, 'open', 3500);",
'',
].join('\n');
}
export function buildKtxYaml(postgresUrl) {
return [
'connections:',
' warehouse:',
' driver: postgres',
` url: "${postgresUrl}"`,
'storage:',
' state: sqlite',
' search: sqlite-fts5',
'',
].join('\n');
}
export function buildLiveDatabaseIngestArgs(projectDir, _databaseIntrospectionUrl, connectionId = 'warehouse') {
return [
'exec',
'ktx',
'ingest',
connectionId,
'--project-dir',
projectDir,
'--fast',
'--no-input',
];
}
async function run(command, args, options = {}) {
process.stdout.write(`$ ${command} ${args.join(' ')}\n`);
return new Promise((resolve) => {
const child = execFile(
command,
args,
{
cwd: options.cwd,
env: options.env ?? process.env,
encoding: 'utf8',
maxBuffer: 1024 * 1024 * 20,
timeout: options.timeout ?? 60_000,
},
(error, stdout, stderr) => {
if (stdout) {
process.stdout.write(stdout);
}
if (stderr) {
process.stderr.write(stderr);
}
resolve({
code: error && typeof error.code === 'number' ? error.code : error ? 1 : 0,
stdout,
stderr: stderr || (error instanceof Error ? error.message : ''),
});
},
);
if (options.input !== undefined) {
child.stdin?.end(options.input);
}
});
}
function requireSuccess(label, result) {
if (result.code !== 0) {
throw new Error(
`${label} failed with code ${result.code}\nstdout:\n${result.stdout}\nstderr:\n${result.stderr}`,
);
}
}
function requireOutput(label, result, pattern) {
if (!pattern.test(result.stdout)) {
throw new Error(`${label} output did not match ${pattern}\nstdout:\n${result.stdout}`);
}
}
function getRunId(stdout) {
const match = stdout.match(/^Run: (.+)$/m);
if (!match) {
throw new Error(`ingest output did not include a run id\nstdout:\n${stdout}`);
}
return match[1];
}
async function requireDocker() {
const result = await run('docker', ['info'], { timeout: 20_000 });
if (result.code !== 0) {
throw new Error(
'Docker is required for the installed live-database artifact smoke. Start Docker and rerun `pnpm run artifacts:live-db-smoke`.',
);
}
}
async function getAvailablePort() {
const server = createServer();
server.listen(0, '127.0.0.1');
await once(server, 'listening');
const address = server.address();
if (!address || typeof address === 'string') {
server.close();
throw new Error('expected TCP server address');
}
const port = address.port;
server.close();
await once(server, 'close');
return port;
}
async function startPostgresContainer(containerName, hostPort) {
await requireDocker();
const result = await run('docker', buildDockerRunArgs({ containerName, hostPort }), { timeout: 120_000 });
requireSuccess('docker run postgres', result);
}
async function stopPostgresContainer(containerName) {
await run('docker', ['rm', '-f', containerName], { timeout: 30_000 });
}
async function waitForPostgres(containerName) {
const deadline = Date.now() + 60_000;
while (Date.now() < deadline) {
const result = await run('docker', buildPostgresReadyArgs(containerName), { timeout: 10_000 });
if (result.code === 0) {
return;
}
await new Promise((resolve) => setTimeout(resolve, 500));
}
throw new Error(`Timed out waiting for Postgres container ${containerName}`);
}
async function seedPostgres(containerName) {
const result = await run(
'docker',
['exec', '-i', containerName, 'psql', '-U', POSTGRES_USER, '-d', POSTGRES_DB, '-v', 'ON_ERROR_STOP=1'],
{ input: buildSeedSql(), timeout: 30_000 },
);
requireSuccess('seed postgres catalog', result);
}
function managedRuntimeEnv(cleanInstallDir) {
return {
...process.env,
KTX_RUNTIME_ROOT: join(cleanInstallDir, 'managed-runtime'),
};
}
function parseDaemonBaseUrl(stdout) {
const match = stdout.match(/^url: (http:\/\/127\.0\.0\.1:\d+)$/m);
if (!match) {
throw new Error(`Daemon URL was not printed by runtime start:\n${stdout}`);
}
return match[1];
}
async function startDaemon(cleanInstallDir) {
const result = await run('pnpm', ['exec', 'ktx', 'admin', 'runtime', 'start'], {
cwd: cleanInstallDir,
env: managedRuntimeEnv(cleanInstallDir),
timeout: 120_000,
});
requireSuccess('ktx admin runtime start', result);
return parseDaemonBaseUrl(result.stdout);
}
async function stopDaemon(cleanInstallDir) {
await run('pnpm', ['exec', 'ktx', 'admin', 'runtime', 'stop'], {
cwd: cleanInstallDir,
env: managedRuntimeEnv(cleanInstallDir),
timeout: 30_000,
});
}
async function assertPathExists(path, label) {
try {
await access(path);
} catch {
throw new Error(`Missing ${label}: ${path}`);
}
}
async function prepareCleanInstall(layout, cleanInstallDir) {
await assertPathExists(layout.cliTarball, 'CLI tarball');
await mkdir(cleanInstallDir, { recursive: true });
await writeFile(join(cleanInstallDir, 'package.json'), `${JSON.stringify(npmSmokePackageJson(layout), null, 2)}\n`);
await writeFile(join(cleanInstallDir, 'pnpm-workspace.yaml'), npmSmokePnpmWorkspaceYaml());
await run('pnpm', ['install'], { cwd: cleanInstallDir, timeout: 120_000 }).then((result) =>
requireSuccess('pnpm install clean artifact project', result),
);
await run('pnpm', ['exec', 'ktx', 'admin', 'runtime', 'install', '--yes'], {
cwd: cleanInstallDir,
env: managedRuntimeEnv(cleanInstallDir),
timeout: 120_000,
}).then((result) => requireSuccess('install managed runtime', result));
}
async function main() {
const layout = packageArtifactLayout();
const root = await mkdtemp(join(tmpdir(), 'ktx-live-db-artifact-smoke-'));
const containerName = smokeContainerName();
let cleanInstallDir;
let daemonStarted = false;
try {
const postgresPort = await getAvailablePort();
const postgresUrl = buildPostgresUrl(postgresPort);
cleanInstallDir = join(root, 'npm-clean-install');
const projectDir = join(root, 'project');
await startPostgresContainer(containerName, postgresPort);
await waitForPostgres(containerName);
await seedPostgres(containerName);
await prepareCleanInstall(layout, cleanInstallDir);
await mkdir(projectDir, { recursive: true });
const init = await run('pnpm', ['exec', 'ktx', 'init', projectDir, '--name', 'artifact-live-database'], {
cwd: cleanInstallDir,
timeout: 30_000,
});
requireSuccess('ktx init', init);
await writeFile(join(projectDir, 'ktx.yaml'), buildKtxYaml(postgresUrl), 'utf8');
const databaseIntrospectionUrl = await startDaemon(cleanInstallDir);
daemonStarted = true;
const ingestRun = await run('pnpm', buildLiveDatabaseIngestArgs(projectDir, databaseIntrospectionUrl), {
cwd: cleanInstallDir,
env: managedRuntimeEnv(cleanInstallDir),
timeout: 120_000,
});
requireSuccess('ktx ingest warehouse --fast', ingestRun);
requireOutput('ktx ingest warehouse --fast', ingestRun, /Ingest finished/);
requireOutput('ktx ingest warehouse --fast', ingestRun, /Database schema/);
const runId = getRunId(ingestRun.stdout);
await assertPathExists(join(projectDir, '.ktx', 'db.sqlite'), 'SQLite local ingest state');
process.stdout.write(`Installed live-database artifact smoke passed: ${runId}\n`);
} finally {
if (daemonStarted && cleanInstallDir) {
await stopDaemon(cleanInstallDir);
}
await stopPostgresContainer(containerName);
await rm(root, { recursive: true, force: true });
}
}
if (import.meta.url === pathToFileURL(process.argv[1] ?? '').href) {
try {
await main();
} catch (error) {
process.stderr.write(`${error instanceof Error ? error.stack : String(error)}\n`);
process.exitCode = 1;
}
}