mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-07 07:55:13 +02:00
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm * refactor(workspace): rewrite @ktx/llm imports to relative paths * refactor(workspace): fold internal packages into cli * chore(workspace): gate dead-code with knip production mode Turn on production-mode knip plus an autofix run in pre-commit and the `pnpm dead-code` script, document the `/** @internal */` convention for test-only exports in AGENTS.md, annotate test-only exports across the CLI with that JSDoc, and drop dead exports/wrappers the new gate surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`, `createLocalScanEnrichmentProvidersFromConfig`, `PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports). Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit production entries so cross-package barrel leaks are caught. * refactor(cli): delete internal barrel index.ts files The 34 `index.ts` re-export barrels inside `packages/cli/src/` were holdovers from the pre-fold multi-workspace structure. Post-fold-in they served no production purpose: external consumers go through the single package main entry, and in-repo callers mostly imported through them only because the path was short. Internally, knip flagged most barrel re-exports as production-dead (only reached via tests). This change: - Deletes every internal barrel except `packages/cli/src/index.ts` (the published package entry). - Rewrites ~270 source/test files to import each name directly from the file that defines it. - Moves `tools/warehouse-verification/index.ts` to `create-warehouse-verification-tools.ts` (the function it defined locally) and updates its single consumer. - Renames `search/backend-conformance.ts` → `.test-utils.ts` to match the existing test-helper file convention. - Deletes 13 dead test-only chains (dbt-descriptions/*, live-database/extracted-schema, live-database/structural-sync, relationship-* feedback/review chain) plus their tests and a cascading orphan integration test. - Updates test mocks that pointed at deleted barrel paths (notion-client, connector barrels in scan/local-scan-connectors tests) to mock the source files instead. - Points the maintainer benchmark script (`scripts/relationship-benchmark-report.mjs`) at source files instead of `dist/context/scan/index.js`. - Drops the barrel `!` entries from `knip.json`; adds explicit production entries only for the benchmark code reached via dist by the maintainer script. Net: 413 files changed, ~1.2k insertions, ~9.4k deletions. `pnpm run dead-code` (Biome + knip default + knip production) and `pnpm run type-check` are clean; 2277 tests pass. * refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly Promote the CLI workspace package to the public name `@kaelio/ktx` and drop the separate `scripts/build-public-npm-package.mjs` wrapper. The CLI package is now publishable in place (`publishConfig.access: public`, `provenance: true`), so artifact packing uses `pnpm pack` against `packages/cli/` instead of assembling a parallel package tree. Updates all workspace filter invocations, docs, tests, and release readiness checks to reference the new package name, and folds the tarball-name helper into `scripts/public-npm-release-metadata.mjs`. * docs: align "agent clients" and "data agents" terminology Replace "client agents" with "agent clients" and "database agents" with "data agents" across AGENTS.md, README.md, the docs-site copy, and the matching setup-agents test description, matching the canonical vocabulary in docs/terminology.md. Also moves packages/cli/tsconfig.json's tsBuildInfoFile from node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive node_modules reinstalls. * refactor(release): single source of truth for package version Make packages/cli/package.json the single source of truth for the @kaelio/ktx version. publicNpmPackageVersion() now reads it directly, so artifact filenames, release-readiness checks, and the Python wheel version all derive from one field. The duplicate release-policy.json.publicNpmPackageVersion is removed. Previously the two fields could drift: tarballs were named kaelio-ktx-0.4.1.tgz while internally containing @kaelio/ktx@0.0.0-private. - update-public-release-version.mjs rewrites both Python pyproject.toml files (ktx-daemon, ktx-sl) alongside the npm package.jsons, normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2). - semantic-release-config.cjs adds the two pyproject.toml files to @semantic-release/git assets so the release commit back to main carries every version source in lockstep. - The six "?? '0.0.0-private'" fallback literals across the CLI are replaced with "?? getKtxCliPackageInfo().version", and createDefaultKtxMcpServer makes its version arg required. - docs/release.md describes the actual commit-back model: the dev tree always reflects the most recent release; no sentinel pin to maintain. Verified: pnpm run artifacts:build now produces kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with @kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and 2287 vitests + 173 script tests pass. * refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and scan command entrypoints so tests can stub them, and teach resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime feature when ktx.yaml selects sentence-transformers. * chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal Both symbols are consumed only by status-project.test.ts. Annotating with /** @internal */ keeps knip's production-mode check clean without changing runtime behavior. * fix(cli): use real package metadata in print-command-tree The stubbed package name embedded a forbidden product identifier that tripped the boundary check in CI. Read the metadata from package.json instead — keeps the rendered tree unchanged and removes a duplicate source of truth. * feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer source counts, computed with `SUM(embedding_json IS NOT NULL)` over `knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to "Wiki" (canonical per `docs/terminology.md`) and rename the matching `localStats.knowledgePages` field to `localStats.wikiPages`. Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those duplicated the per-surface rows above. Disk now reports only actual byte usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` / `semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry` helpers, and the `filter` arg on `summarizeDir` are removed.
413 lines
13 KiB
JavaScript
413 lines
13 KiB
JavaScript
import { execFile } from 'node:child_process';
|
|
import { mkdir, mkdtemp, readFile, readdir, rm, writeFile } from 'node:fs/promises';
|
|
import { tmpdir } from 'node:os';
|
|
import { dirname, join, resolve } from 'node:path';
|
|
import { fileURLToPath } from 'node:url';
|
|
import { promisify } from 'node:util';
|
|
|
|
import {
|
|
PUBLIC_NPM_PACKAGE_NAME,
|
|
PUBLIC_NPM_PACKAGE_VERSION,
|
|
} from './public-npm-release-metadata.mjs';
|
|
import { npmSmokePnpmWorkspaceYaml } from './package-artifacts.mjs';
|
|
|
|
const execFileAsync = promisify(execFile);
|
|
const SCRIPT_DIR = dirname(fileURLToPath(import.meta.url));
|
|
const DEFAULT_ROOT_DIR = resolve(SCRIPT_DIR, '..');
|
|
const PUBLIC_NPM_ARTIFACT_DIR = join('dist', 'artifacts', 'npm');
|
|
const OPT_IN_MESSAGE =
|
|
'Set KTX_RUN_LOCAL_EMBEDDINGS_SMOKE=1 or pass --force to run the local embeddings smoke.';
|
|
|
|
function escapeRegExp(value) {
|
|
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
|
|
}
|
|
|
|
export function expectedPublicKtxVersionPattern() {
|
|
return new RegExp(
|
|
`${escapeRegExp(PUBLIC_NPM_PACKAGE_NAME)} ${escapeRegExp(PUBLIC_NPM_PACKAGE_VERSION)}`,
|
|
);
|
|
}
|
|
|
|
export function localEmbeddingsSmokeOptIn(env = process.env, args = process.argv.slice(2)) {
|
|
if (env.KTX_RUN_LOCAL_EMBEDDINGS_SMOKE === '1' || args.includes('--force')) {
|
|
return { run: true };
|
|
}
|
|
return { run: false, message: OPT_IN_MESSAGE };
|
|
}
|
|
|
|
export function publicKtxTarballName(files) {
|
|
const matches = files.filter((file) => /^kaelio-ktx-.+\.tgz$/.test(file)).sort();
|
|
if (matches.length !== 1) {
|
|
throw new Error(
|
|
`Expected exactly one @kaelio/ktx tarball in ${PUBLIC_NPM_ARTIFACT_DIR}, found ${matches.length}: ${
|
|
matches.join(', ') || 'none'
|
|
}. Run pnpm run artifacts:build first.`,
|
|
);
|
|
}
|
|
return matches[0];
|
|
}
|
|
|
|
export async function selectPublicKtxTarball(rootDir = DEFAULT_ROOT_DIR) {
|
|
const npmArtifactDir = join(rootDir, PUBLIC_NPM_ARTIFACT_DIR);
|
|
const files = await readdir(npmArtifactDir);
|
|
return join(npmArtifactDir, publicKtxTarballName(files));
|
|
}
|
|
|
|
export function buildLocalEmbeddingsSmokeEnv(root, baseEnv = process.env) {
|
|
return {
|
|
...baseEnv,
|
|
KTX_RUN_LOCAL_EMBEDDINGS_SMOKE: '1',
|
|
KTX_RUNTIME_ROOT: join(root, 'managed-runtime'),
|
|
HF_HOME: join(root, 'hf-home'),
|
|
TRANSFORMERS_CACHE: join(root, 'transformers-cache'),
|
|
SENTENCE_TRANSFORMERS_HOME: join(root, 'sentence-transformers-home'),
|
|
TORCH_HOME: join(root, 'torch-home'),
|
|
};
|
|
}
|
|
|
|
export function localEmbeddingsSmokeCommands(input) {
|
|
return [
|
|
{
|
|
label: 'ktx public package version',
|
|
command: 'pnpm',
|
|
args: ['exec', 'ktx', '--version'],
|
|
timeoutMs: 60_000,
|
|
},
|
|
{
|
|
label: 'ktx admin runtime status missing',
|
|
command: 'pnpm',
|
|
args: ['exec', 'ktx', 'admin', 'runtime', 'status', '--json'],
|
|
timeoutMs: 60_000,
|
|
},
|
|
{
|
|
label: 'ktx admin runtime install local embeddings',
|
|
command: 'pnpm',
|
|
args: ['exec', 'ktx', 'admin', 'runtime', 'install', '--feature', 'local-embeddings', '--yes'],
|
|
timeoutMs: 1_200_000,
|
|
},
|
|
{
|
|
label: 'ktx admin runtime status local embeddings ready',
|
|
command: 'pnpm',
|
|
args: ['exec', 'ktx', 'admin', 'runtime', 'status', '--json'],
|
|
timeoutMs: 60_000,
|
|
},
|
|
{
|
|
label: 'ktx admin runtime start local embeddings',
|
|
command: 'pnpm',
|
|
args: ['exec', 'ktx', 'admin', 'runtime', 'start', '--feature', 'local-embeddings'],
|
|
timeoutMs: 300_000,
|
|
},
|
|
{
|
|
label: 'ktx setup local embeddings',
|
|
command: 'pnpm',
|
|
args: [
|
|
'exec',
|
|
'ktx',
|
|
'setup',
|
|
'--project-dir',
|
|
input.projectDir,
|
|
'--no-input',
|
|
'--yes',
|
|
'--skip-llm',
|
|
'--embedding-backend',
|
|
'sentence-transformers',
|
|
'--skip-databases',
|
|
'--skip-sources',
|
|
'--skip-agents',
|
|
],
|
|
timeoutMs: 900_000,
|
|
},
|
|
{
|
|
label: 'ktx admin runtime stop local embeddings',
|
|
command: 'pnpm',
|
|
args: ['exec', 'ktx', 'admin', 'runtime', 'stop'],
|
|
timeoutMs: 60_000,
|
|
},
|
|
];
|
|
}
|
|
|
|
export function parseDaemonBaseUrl(stdout) {
|
|
const match = stdout.match(/^url: (http:\/\/127\.0\.0\.1:\d+)$/m);
|
|
if (!match) {
|
|
throw new Error(`Daemon URL was not printed by runtime start:\n${stdout}`);
|
|
}
|
|
return match[1];
|
|
}
|
|
|
|
export function validateEmbeddingResponse(raw, expectedDimensions) {
|
|
if (!raw || typeof raw !== 'object' || Array.isArray(raw)) {
|
|
throw new Error('Embedding response must be a JSON object');
|
|
}
|
|
const embedding = raw.embedding;
|
|
if (!Array.isArray(embedding)) {
|
|
throw new Error('Embedding response must include an embedding array');
|
|
}
|
|
if (embedding.length !== expectedDimensions) {
|
|
throw new Error(`Expected embedding dimension ${expectedDimensions}, got ${embedding.length}`);
|
|
}
|
|
for (const [index, value] of embedding.entries()) {
|
|
if (typeof value !== 'number' || !Number.isFinite(value)) {
|
|
throw new Error(`Embedding value at index ${index} is not a finite number`);
|
|
}
|
|
}
|
|
}
|
|
|
|
async function run(command, args, options = {}) {
|
|
process.stdout.write(`$ ${command} ${args.join(' ')}\n`);
|
|
try {
|
|
const result = await execFileAsync(command, args, {
|
|
cwd: options.cwd,
|
|
env: { ...process.env, ...options.env },
|
|
encoding: 'utf8',
|
|
maxBuffer: 1024 * 1024 * 20,
|
|
timeout: options.timeoutMs ?? 120_000,
|
|
});
|
|
if (result.stdout) {
|
|
process.stdout.write(result.stdout);
|
|
}
|
|
if (result.stderr) {
|
|
process.stderr.write(result.stderr);
|
|
}
|
|
return { code: 0, stdout: result.stdout, stderr: result.stderr };
|
|
} catch (error) {
|
|
const stdout = typeof error.stdout === 'string' ? error.stdout : '';
|
|
const stderr = typeof error.stderr === 'string' ? error.stderr : error.message;
|
|
if (stdout) {
|
|
process.stdout.write(stdout);
|
|
}
|
|
if (stderr) {
|
|
process.stderr.write(stderr);
|
|
}
|
|
return {
|
|
code: typeof error.code === 'number' ? error.code : 1,
|
|
stdout,
|
|
stderr,
|
|
};
|
|
}
|
|
}
|
|
|
|
function requireSuccess(label, result, options = {}) {
|
|
if (result.code !== 0) {
|
|
throw new Error(`${label} failed with code ${result.code}\nstdout:\n${result.stdout}\nstderr:\n${result.stderr}`);
|
|
}
|
|
if (options.stderrPattern && !options.stderrPattern.test(result.stderr)) {
|
|
throw new Error(`${label} stderr did not match ${options.stderrPattern}\nstderr:\n${result.stderr}`);
|
|
}
|
|
}
|
|
|
|
function parseJsonStdout(label, result) {
|
|
requireSuccess(label, result);
|
|
try {
|
|
return JSON.parse(result.stdout);
|
|
} catch (error) {
|
|
throw new Error(`${label} did not write JSON stdout: ${error.message}\nstdout:\n${result.stdout}`);
|
|
}
|
|
}
|
|
|
|
function parseJsonStdoutWithExitCode(label, result, expectedCode) {
|
|
if (result.code !== expectedCode) {
|
|
throw new Error(`${label} failed with code ${result.code}\nstdout:\n${result.stdout}\nstderr:\n${result.stderr}`);
|
|
}
|
|
try {
|
|
return JSON.parse(result.stdout);
|
|
} catch (error) {
|
|
throw new Error(`${label} did not write JSON stdout: ${error.message}\nstdout:\n${result.stdout}`);
|
|
}
|
|
}
|
|
|
|
function requireOutput(label, result, pattern) {
|
|
if (!pattern.test(result.stdout)) {
|
|
throw new Error(`${label} stdout did not match ${pattern}\nstdout:\n${result.stdout}`);
|
|
}
|
|
}
|
|
|
|
async function postJson(baseUrl, path, payload, timeoutMs) {
|
|
const response = await fetch(new URL(path, baseUrl), {
|
|
method: 'POST',
|
|
headers: {
|
|
accept: 'application/json',
|
|
'content-type': 'application/json',
|
|
},
|
|
body: JSON.stringify(payload),
|
|
signal: AbortSignal.timeout(timeoutMs),
|
|
});
|
|
const text = await response.text();
|
|
if (!response.ok) {
|
|
throw new Error(`POST ${path} failed with ${response.status}: ${text}`);
|
|
}
|
|
try {
|
|
return JSON.parse(text);
|
|
} catch (error) {
|
|
throw new Error(`POST ${path} returned non-JSON response: ${error.message}\n${text}`);
|
|
}
|
|
}
|
|
|
|
async function writeSmokePackage(projectDir, tarballPath) {
|
|
await mkdir(projectDir, { recursive: true });
|
|
await writeFile(
|
|
join(projectDir, 'package.json'),
|
|
`${JSON.stringify(
|
|
{
|
|
name: 'ktx-local-embeddings-runtime-smoke',
|
|
version: '0.0.0',
|
|
private: true,
|
|
type: 'module',
|
|
dependencies: {
|
|
'@kaelio/ktx': `file:${tarballPath}`,
|
|
},
|
|
},
|
|
null,
|
|
2,
|
|
)}\n`,
|
|
);
|
|
await writeFile(join(projectDir, 'pnpm-workspace.yaml'), npmSmokePnpmWorkspaceYaml());
|
|
}
|
|
|
|
export async function runLocalEmbeddingsRuntimeSmoke(options = {}) {
|
|
const rootDir = options.rootDir ?? DEFAULT_ROOT_DIR;
|
|
const tarballPath = options.tarballPath ?? (await selectPublicKtxTarball(rootDir));
|
|
const root = await mkdtemp(join(tmpdir(), 'ktx-local-embeddings-smoke-'));
|
|
const keepTemp = options.keepTemp ?? process.env.KTX_KEEP_LOCAL_EMBEDDINGS_SMOKE === '1';
|
|
const installDir = join(root, 'installed-package');
|
|
const projectDir = join(root, 'project');
|
|
const smokeEnv = buildLocalEmbeddingsSmokeEnv(root);
|
|
const commands = localEmbeddingsSmokeCommands({ projectDir });
|
|
let daemonStarted = false;
|
|
|
|
try {
|
|
await writeSmokePackage(installDir, tarballPath);
|
|
requireSuccess(
|
|
'pnpm install public package',
|
|
await run('pnpm', ['install', '--ignore-scripts=false'], {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: 300_000,
|
|
}),
|
|
);
|
|
|
|
const version = await run(commands[0].command, commands[0].args, {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: commands[0].timeoutMs,
|
|
});
|
|
requireSuccess(commands[0].label, version);
|
|
requireOutput(commands[0].label, version, expectedPublicKtxVersionPattern());
|
|
|
|
const missingStatus = parseJsonStdoutWithExitCode(
|
|
commands[1].label,
|
|
await run(commands[1].command, commands[1].args, {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: commands[1].timeoutMs,
|
|
}),
|
|
1,
|
|
);
|
|
if (missingStatus.kind !== 'missing') {
|
|
throw new Error(`Expected missing runtime before install, got ${JSON.stringify(missingStatus)}`);
|
|
}
|
|
|
|
const install = await run(commands[2].command, commands[2].args, {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: commands[2].timeoutMs,
|
|
});
|
|
requireSuccess(commands[2].label, install);
|
|
requireOutput(commands[2].label, install, /Installed KTX Python runtime/);
|
|
requireOutput(commands[2].label, install, /features: core, local-embeddings/);
|
|
|
|
const readyStatus = parseJsonStdout(
|
|
commands[3].label,
|
|
await run(commands[3].command, commands[3].args, {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: commands[3].timeoutMs,
|
|
}),
|
|
);
|
|
if (readyStatus.kind !== 'ready') {
|
|
throw new Error(`Expected ready runtime after install, got ${JSON.stringify(readyStatus)}`);
|
|
}
|
|
if (!readyStatus.manifest?.features?.includes('local-embeddings')) {
|
|
throw new Error(`Runtime manifest did not include local-embeddings: ${JSON.stringify(readyStatus.manifest)}`);
|
|
}
|
|
|
|
const start = await run(commands[4].command, commands[4].args, {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: commands[4].timeoutMs,
|
|
});
|
|
requireSuccess(commands[4].label, start);
|
|
daemonStarted = true;
|
|
const baseUrl = parseDaemonBaseUrl(start.stdout);
|
|
|
|
const embeddingResponse = await postJson(
|
|
baseUrl,
|
|
'/embeddings/compute',
|
|
{ text: 'KTX local embeddings release smoke' },
|
|
900_000,
|
|
);
|
|
validateEmbeddingResponse(embeddingResponse, 384);
|
|
process.stdout.write('KTX daemon computed a 384-dimensional embedding\n');
|
|
|
|
const setup = await run(commands[5].command, commands[5].args, {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: commands[5].timeoutMs,
|
|
});
|
|
requireSuccess(commands[5].label, setup);
|
|
requireOutput(commands[5].label, setup, /Embeddings ready: yes \(all-MiniLM-L6-v2\)/);
|
|
|
|
const config = await readFile(join(projectDir, 'ktx.yaml'), 'utf8');
|
|
if (!/backend:\s*sentence-transformers/.test(config)) {
|
|
throw new Error(`ktx.yaml did not declare sentence-transformers embedding backend:\n${config}`);
|
|
}
|
|
if (/base_url:/.test(config)) {
|
|
throw new Error(`ktx.yaml should omit base_url for managed local embeddings:\n${config}`);
|
|
}
|
|
process.stdout.write('KTX setup persisted managed local embeddings (no base_url)\n');
|
|
|
|
const stop = await run(commands[6].command, commands[6].args, {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: commands[6].timeoutMs,
|
|
});
|
|
requireSuccess(commands[6].label, stop);
|
|
daemonStarted = false;
|
|
requireOutput(commands[6].label, stop, /Stopped KTX daemon/);
|
|
|
|
process.stdout.write('KTX local embeddings runtime smoke verified\n');
|
|
} finally {
|
|
if (daemonStarted) {
|
|
await run('pnpm', ['exec', 'ktx', 'admin', 'runtime', 'stop'], {
|
|
cwd: installDir,
|
|
env: smokeEnv,
|
|
timeoutMs: 60_000,
|
|
});
|
|
}
|
|
if (!keepTemp) {
|
|
await rm(root, { recursive: true, force: true });
|
|
} else {
|
|
process.stdout.write(`Kept local embeddings smoke root: ${root}\n`);
|
|
}
|
|
}
|
|
}
|
|
|
|
async function main() {
|
|
const args = process.argv.slice(2);
|
|
const optIn = localEmbeddingsSmokeOptIn(process.env, args);
|
|
if (!optIn.run) {
|
|
process.stdout.write(`Skipping KTX local embeddings runtime smoke. ${optIn.message}\n`);
|
|
if (args.includes('--require-opt-in')) {
|
|
process.exitCode = 1;
|
|
}
|
|
return;
|
|
}
|
|
|
|
await runLocalEmbeddingsRuntimeSmoke();
|
|
}
|
|
|
|
if (process.argv[1] && fileURLToPath(import.meta.url) === resolve(process.argv[1])) {
|
|
main().catch((error) => {
|
|
process.stderr.write(`${error instanceof Error ? error.stack ?? error.message : String(error)}\n`);
|
|
process.exitCode = 1;
|
|
});
|
|
}
|