diff --git a/README.md b/README.md index 08570ba1..d3d55d03 100644 --- a/README.md +++ b/README.md @@ -6,8 +6,6 @@ The context layer for analytics agents -

by Kaelio

-

npm version Codecov @@ -18,19 +16,38 @@ --- -KTX turns warehouse metadata, semantic definitions, and business knowledge into -reviewable project files that agents can use to plan, query, and update -analytics work. +KTX is a self-improving context layer that teaches agents how to query your +warehouse accurately - from approved metric definitions, joinable columns, and +business knowledge it builds and maintains for you. -Use KTX when you want agents to: +Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and +SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion. -- Generate SQL from approved measures and joins -- Repair semantic definitions through reviewable diffs -- Explain metric provenance with warehouse evidence -- Work alongside dbt, MetricFlow, LookML, Looker, Metabase, and Notion +## Why KTX -Supports PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and -SQLite. +General-purpose agents struggle on data tasks. They re-explore your warehouse +on every question, invent their own metric logic, and return numbers that +don't match approved definitions. + +Traditional semantic layers don't fix this. They demand constant manual +upkeep and don't absorb the rest of your company's knowledge. + +KTX does both, automatically: + +- **Learns from company knowledge.** Ingests wiki content, organizes it, + removes duplicates, and flags contradictions for human review. +- **Maps the data stack.** Samples tables, captures metadata and usage + patterns, detects joinable columns, and annotates sources so agents write + better queries. +- **Builds a semantic layer.** Combines raw tables and high-level metrics + through a join graph that automatically resolves chasm and fan traps, so + agents fetch metrics declaratively instead of rewriting canonical SQL each + time. +- **Serves agents at execution.** Exposes CLI and MCP tools with combined + full-text and semantic search across wiki and semantic-layer entities. + +Agents can run raw SQL when they need it, or compose semantic-layer queries +when they want approved metrics with reliable joins.

KTX ingestion flow from source systems through validation to wiki and semantic-layer outputs @@ -109,17 +126,17 @@ Commit `ktx.yaml`, `semantic-layer/`, and `wiki/`. Keep `.ktx/` local. ## Agent Usage -Setup can install KTX instructions for Claude Code, Codex, Cursor, OpenCode, -and universal `.agents` clients: +Install KTX integration for Claude Code, Claude Desktop, Codex, Cursor, +OpenCode, and generic `.agents` clients: ```bash ktx setup --agents ``` -Use `--target ` when you want to install or repair one specific -integration. +Pass `--target ` to install or repair one specific integration. -Agent-facing workflows typically start with: +A typical agent workflow combines wiki and semantic-layer search before +querying: ```bash ktx sl search "revenue" --json @@ -127,40 +144,14 @@ ktx wiki search "refund policy" --json ktx sl query --connection-id warehouse --measure orders.revenue --format sql ``` -During agent setup, choose **Ask data questions with KTX MCP** for client -agents. Choose **Ask data questions + manage KTX with CLI commands** only when -a developer or operator agent also needs pinned `ktx` admin commands. +During setup, choose **Ask data questions with KTX MCP** for client agents. +Choose **Ask data questions + manage KTX with CLI commands** when an operator +agent also needs pinned `ktx` admin commands. -After setup, KTX prints **Required before using agents**. Complete those steps -before opening the configured agent. If it shows `ktx mcp start --project-dir ...`, -run that command before using Claude Code, Codex, Cursor, OpenCode, or generic -MCP clients. The same output also prints the matching `ktx mcp stop` command -for when you want to stop MCP later. Claude Desktop uses its own launcher for -MCP and prints separate skill upload steps. - -The analytics skill teaches client agents the MCP workflow: discover data, -prefer semantic-layer measures, inspect entity details before raw SQL, and -capture durable learnings. Admin CLI skills call `ktx` commands directly -through a skill file installed in your agent's config: - -```bash -ktx sl query --measure orders.revenue --dimension orders.status --format sql -ktx wiki search "revenue definition" -ktx sl validate orders -``` - -Supported client agents: Claude Code, Claude Desktop, Codex, Cursor, OpenCode, -and clients that can use the printed MCP endpoint or `.agents` admin skills. -Claude Desktop setup registers a local `ktx mcp stdio` server in Claude -Desktop's config and generates one uploadable ZIP per Claude Desktop skill -under `.ktx/agents/claude/`. Restart Claude Desktop after setup, then upload -each ZIP from **Customize** > **Skills** > **+** > **Create skill** > -**Upload a skill**. - -The release artifact manifest contains the public npm tarball and the bundled -`kaelio-ktx` runtime wheel. The `python/ktx-sl` and `python/ktx-daemon` -directories remain source packages for development, not public release -artifacts. +After setup, KTX prints **Required before using agents** with the exact +commands to run. If the output includes `ktx mcp start --project-dir ...`, run +it before opening your agent. Claude Desktop uses its own launcher and prints +separate skill upload steps under `.ktx/agents/claude/`. ## Workspace packages diff --git a/packages/cli/src/managed-local-embeddings.test.ts b/packages/cli/src/managed-local-embeddings.test.ts index cbb9b5f1..85fa00c9 100644 --- a/packages/cli/src/managed-local-embeddings.test.ts +++ b/packages/cli/src/managed-local-embeddings.test.ts @@ -150,6 +150,8 @@ describe('ensureManagedLocalEmbeddingsDaemon', () => { }), ).resolves.toEqual({ baseUrl: 'http://127.0.0.1:61234', + stdoutLog: '/work/proj/.ktx/runtime/daemon.stdout.log', + stderrLog: '/work/proj/.ktx/runtime/daemon.stderr.log', env: { [MANAGED_SENTENCE_TRANSFORMERS_BASE_URL_ENV]: 'http://127.0.0.1:61234', }, diff --git a/packages/cli/src/managed-local-embeddings.ts b/packages/cli/src/managed-local-embeddings.ts index 8c383ef5..f485a942 100644 --- a/packages/cli/src/managed-local-embeddings.ts +++ b/packages/cli/src/managed-local-embeddings.ts @@ -14,6 +14,8 @@ import { startManagedPythonDaemon, type ManagedPythonDaemonStartResult } from '. export interface ManagedLocalEmbeddingsDaemon { baseUrl: string; + stdoutLog: string; + stderrLog: string; env: Record; } @@ -91,6 +93,8 @@ export async function ensureManagedLocalEmbeddingsDaemon( return { baseUrl: daemon.baseUrl, + stdoutLog: daemon.state.stdoutLog, + stderrLog: daemon.state.stderrLog, env: { [MANAGED_SENTENCE_TRANSFORMERS_BASE_URL_ENV]: daemon.baseUrl, }, diff --git a/packages/cli/src/managed-mcp-daemon.test.ts b/packages/cli/src/managed-mcp-daemon.test.ts index e28c4a3e..d72bb6a4 100644 --- a/packages/cli/src/managed-mcp-daemon.test.ts +++ b/packages/cli/src/managed-mcp-daemon.test.ts @@ -11,6 +11,8 @@ import { type KtxMcpDaemonState, } from './managed-mcp-daemon.js'; +type KtxMcpDaemonStartOptions = Parameters[0]; + function child(pid = 4242): KtxMcpDaemonChild { return { pid, unref: vi.fn() }; } @@ -40,6 +42,7 @@ describe('managed MCP daemon lifecycle', () => { }); afterEach(async () => { + vi.unstubAllEnvs(); await rm(tempDir, { recursive: true, force: true }); }); @@ -94,6 +97,33 @@ describe('managed MCP daemon lifecycle', () => { ); }); + it('sanitizes IPv6 CIDR entries from child NO_PROXY env', async () => { + vi.stubEnv('NO_PROXY', 'localhost,fd07:b51a:cc66:f0::/64'); + vi.stubEnv('no_proxy', '::1,fd00::/8,*.orb.local'); + const spawnDaemon = vi.fn>(() => child(5555)); + + await startKtxMcpDaemon({ + projectDir, + cliVersion: '0.0.0-test', + host: '127.0.0.1', + port: 7879, + allowedHosts: [], + allowedOrigins: [], + binPath: '/repo/packages/cli/dist/bin.js', + spawnDaemon, + processAlive: vi.fn(() => false), + portAvailable: vi.fn(async () => true), + now: () => new Date('2026-05-14T00:00:00.000Z'), + }); + + const env = spawnDaemon.mock.calls[0]?.[2].env; + if (!env) { + throw new Error('Expected MCP daemon spawn env'); + } + expect(env.NO_PROXY).toBe('localhost,::1,*.orb.local'); + expect(env.no_proxy).toBe(env.NO_PROXY); + }); + it('returns already-running without spawning when the daemon is alive at the same host/port', async () => { await mkdir(join(projectDir, '.ktx'), { recursive: true }); await writeFile(join(projectDir, '.ktx/mcp.json'), `${JSON.stringify(state(projectDir), null, 2)}\n`); diff --git a/packages/cli/src/managed-mcp-daemon.ts b/packages/cli/src/managed-mcp-daemon.ts index ef3df2a9..dd3fb821 100644 --- a/packages/cli/src/managed-mcp-daemon.ts +++ b/packages/cli/src/managed-mcp-daemon.ts @@ -4,6 +4,7 @@ import { createServer } from 'node:net'; import { dirname, join } from 'node:path'; import { setTimeout as delay } from 'node:timers/promises'; import { z } from 'zod'; +import { sanitizeChildProxyEnv } from './proxy-env.js'; export interface KtxMcpDaemonState { schemaVersion: 1; @@ -166,11 +167,11 @@ export async function startKtxMcpDaemon(options: { const child = (options.spawnDaemon ?? defaultSpawnDaemon)(process.execPath, args, { detached: true, stdio: ['ignore', log.fd, log.fd], - env: { + env: sanitizeChildProxyEnv({ ...process.env, KTX_CLI_VERSION: options.cliVersion, ...(options.token ? { KTX_MCP_TOKEN: options.token } : {}), - }, + }), }); if (!child.pid) { throw new Error('Failed to start KTX MCP daemon: child process pid was not available.'); diff --git a/packages/cli/src/managed-python-command.test.ts b/packages/cli/src/managed-python-command.test.ts index a63f162e..767d8dd1 100644 --- a/packages/cli/src/managed-python-command.test.ts +++ b/packages/cli/src/managed-python-command.test.ts @@ -99,6 +99,7 @@ function installResult(features: KtxRuntimeFeature[] = ['core']): ManagedPythonR asset: { manifest: installedManifest.asset, wheelPath: '/assets/python/kaelio_ktx-0.2.0-py3-none-any.whl', + requiresPython: { specifier: '>=3.13', minimumVersion: '3.13' }, }, manifest: installedManifest, }; diff --git a/packages/cli/src/managed-python-daemon.test.ts b/packages/cli/src/managed-python-daemon.test.ts index 09e45fd3..8797fb8f 100644 --- a/packages/cli/src/managed-python-daemon.test.ts +++ b/packages/cli/src/managed-python-daemon.test.ts @@ -79,6 +79,7 @@ function installResult(root: string, features: Array<'core' | 'local-embeddings' asset: { manifest: manifest(root, features).asset, wheelPath: join(root, 'assets', 'python', 'kaelio_ktx-0.2.0-py3-none-any.whl'), + requiresPython: { specifier: '>=3.13', minimumVersion: '3.13' }, }, manifest: manifest(root, features), }; @@ -132,6 +133,7 @@ describe('managed Python daemon lifecycle', () => { }); afterEach(async () => { + vi.unstubAllEnvs(); await rm(tempDir, { recursive: true, force: true }); }); @@ -187,6 +189,27 @@ describe('managed Python daemon lifecycle', () => { }); }); + it('sanitizes IPv6 CIDR entries from child NO_PROXY env', async () => { + vi.stubEnv('NO_PROXY', 'localhost,fd07:b51a:cc66:f0::/64,127.0.0.0/8'); + vi.stubEnv('no_proxy', '::1,fd00::/8,*.orb.local'); + const spawnDaemon = makeSpawn(5555); + + await startManagedPythonDaemon({ + ...daemonOptionsBase(tempDir), + features: ['local-embeddings'], + installRuntime: vi.fn(async () => installResult(tempDir, ['core', 'local-embeddings'])), + spawnDaemon, + fetch: makeFetch(), + allocatePort: vi.fn(async () => 61234), + now: () => new Date('2026-05-11T00:00:00.000Z'), + pollIntervalMs: 1, + }); + + const env = vi.mocked(spawnDaemon).mock.calls[0]?.[2].env; + expect(env?.NO_PROXY).toBe('localhost,127.0.0.0/8,::1,*.orb.local'); + expect(env?.no_proxy).toBe(env?.NO_PROXY); + }); + it('makes a final health probe before reporting startup failure', async () => { const spawnDaemon = makeSpawn(5556); const installRuntime = vi.fn(async () => installResult(tempDir)); diff --git a/packages/cli/src/managed-python-daemon.ts b/packages/cli/src/managed-python-daemon.ts index 76740554..bcf7b446 100644 --- a/packages/cli/src/managed-python-daemon.ts +++ b/packages/cli/src/managed-python-daemon.ts @@ -14,6 +14,7 @@ import { type ManagedPythonRuntimeInstallOptions, type ManagedPythonRuntimeInstallResult, } from './managed-python-runtime.js'; +import { sanitizeChildProxyEnv } from './proxy-env.js'; export interface ManagedPythonDaemonState { schemaVersion: 1; @@ -696,10 +697,10 @@ export async function startManagedPythonDaemon( { detached: true, stdio: ['ignore', stdout.fd, stderr.fd], - env: { + env: sanitizeChildProxyEnv({ ...process.env, KTX_DAEMON_VERSION: options.cliVersion, - }, + }), }, ); child.unref(); diff --git a/packages/cli/src/managed-python-runtime.test.ts b/packages/cli/src/managed-python-runtime.test.ts index 540df619..13b97a45 100644 --- a/packages/cli/src/managed-python-runtime.test.ts +++ b/packages/cli/src/managed-python-runtime.test.ts @@ -2,6 +2,7 @@ import { createHash } from 'node:crypto'; import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises'; import { tmpdir } from 'node:os'; import { join } from 'node:path'; +import { strToU8, zipSync } from 'fflate'; import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; import { MISSING_UV_RUNTIME_INSTALL_MESSAGE, @@ -14,10 +15,33 @@ import { type ManagedPythonRuntimeExec, } from './managed-python-runtime.js'; -async function writeAsset(root: string, contents = 'wheel-bytes') { +function runtimeWheelContents(input: { label?: string; requiresPython?: string | null } = {}): Buffer { + const label = input.label ?? 'runtime-wheel'; + const requiresPython = input.requiresPython === null ? [] : [`Requires-Python: ${input.requiresPython ?? '>=3.13'}`]; + return Buffer.from( + zipSync({ + 'kaelio_ktx-0.1.0.dist-info/METADATA': strToU8( + [ + 'Metadata-Version: 2.4', + 'Name: kaelio-ktx', + 'Version: 0.1.0', + ...requiresPython, + `Summary: ${label}`, + '', + ].join('\n'), + ), + }), + ); +} + +async function writeAsset( + root: string, + options: { label?: string; requiresPython?: string | null; contents?: Buffer } = {}, +) { const assetDir = join(root, 'assets', 'python'); await mkdir(assetDir, { recursive: true }); const wheelPath = join(assetDir, 'kaelio_ktx-0.1.0-py3-none-any.whl'); + const contents = options.contents ?? runtimeWheelContents(options); await writeFile(wheelPath, contents); await writeFile( join(assetDir, 'manifest.json'), @@ -30,7 +54,7 @@ async function writeAsset(root: string, contents = 'wheel-bytes') { wheel: { file: 'kaelio_ktx-0.1.0-py3-none-any.whl', sha256: createHash('sha256').update(contents).digest('hex'), - bytes: Buffer.byteLength(contents), + bytes: contents.byteLength, }, }, null, @@ -145,17 +169,18 @@ describe('verifyRuntimeAsset', () => { }); it('reads the manifest and verifies the wheel checksum', async () => { - const { assetDir, wheelPath } = await writeAsset(tempDir, 'valid-wheel'); + const { assetDir, wheelPath } = await writeAsset(tempDir, { label: 'valid-wheel' }); const asset = await verifyRuntimeAsset({ assetDir }); expect(asset.manifest.distributionName).toBe('kaelio-ktx'); expect(asset.manifest.normalizedName).toBe('kaelio_ktx'); expect(asset.wheelPath).toBe(wheelPath); + expect(asset.requiresPython).toEqual({ specifier: '>=3.13', minimumVersion: '3.13' }); }); it('rejects a wheel whose checksum does not match the manifest', async () => { - const { assetDir, wheelPath } = await writeAsset(tempDir, 'original'); + const { assetDir, wheelPath } = await writeAsset(tempDir, { label: 'original' }); await writeFile(wheelPath, 'tampered'); await expect(verifyRuntimeAsset({ assetDir })).rejects.toThrow( @@ -164,7 +189,7 @@ describe('verifyRuntimeAsset', () => { }); it('rejects an unsafe wheel filename in the manifest', async () => { - const { assetDir } = await writeAsset(tempDir, 'valid-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'valid-wheel' }); await writeFile( join(assetDir, 'manifest.json'), `${JSON.stringify({ @@ -190,6 +215,22 @@ describe('verifyRuntimeAsset', () => { /Missing bundled Python runtime manifest.*pnpm run artifacts:build/s, ); }); + + it('rejects a bundled wheel without Requires-Python metadata', async () => { + const { assetDir } = await writeAsset(tempDir, { requiresPython: null }); + + await expect(verifyRuntimeAsset({ assetDir })).rejects.toThrow( + /Bundled Python runtime wheel metadata is missing Requires-Python/, + ); + }); + + it('rejects a bundled wheel without a supported minimum Python version', async () => { + const { assetDir } = await writeAsset(tempDir, { requiresPython: '<4' }); + + await expect(verifyRuntimeAsset({ assetDir })).rejects.toThrow( + /Unsupported bundled Python runtime Requires-Python: <4/, + ); + }); }); describe('installManagedPythonRuntime', () => { @@ -204,7 +245,7 @@ describe('installManagedPythonRuntime', () => { }); it('creates a venv, installs the core wheel, and writes a manifest', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const commands: Array<{ command: string; args: string[] }> = []; const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args) => { commands.push({ command, args }); @@ -222,7 +263,8 @@ describe('installManagedPythonRuntime', () => { expect(result.status).toBe('installed'); expect(commands).toEqual([ { command: 'uv', args: ['--version'] }, - { command: 'uv', args: ['venv', result.layout.venvDir] }, + { command: 'uv', args: ['python', 'install', '3.13'] }, + { command: 'uv', args: ['venv', '--python', '3.13', result.layout.venvDir] }, { command: 'uv', args: ['pip', 'install', '--python', result.layout.pythonPath, result.asset.wheelPath], @@ -240,7 +282,7 @@ describe('installManagedPythonRuntime', () => { }); it('disables repo uv config for managed runtime uv commands', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const commands: Array<{ command: string; args: string[]; env?: NodeJS.ProcessEnv }> = []; const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args, options) => { commands.push({ command, args, env: options?.env }); @@ -258,13 +300,14 @@ describe('installManagedPythonRuntime', () => { expect(commands.map((call) => [call.command, call.args[0], call.env?.UV_NO_CONFIG, call.env?.PATH])).toEqual([ ['uv', '--version', '1', '/opt/homebrew/bin'], + ['uv', 'python', '1', '/opt/homebrew/bin'], ['uv', 'venv', '1', '/opt/homebrew/bin'], ['uv', 'pip', '1', '/opt/homebrew/bin'], ]); }); it('installs the local-embeddings extra when requested', async () => { - const { assetDir } = await writeAsset(tempDir, 'embedding-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'embedding-wheel' }); const commands: Array<{ command: string; args: string[] }> = []; const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args) => { commands.push({ command, args }); @@ -288,7 +331,7 @@ describe('installManagedPythonRuntime', () => { }); it('fails with the hard-prerequisite message when uv is missing', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const commands: Array<{ command: string; args: string[] }> = []; const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args) => { commands.push({ command, args }); @@ -309,7 +352,7 @@ describe('installManagedPythonRuntime', () => { }); it('reuses an existing compatible runtime when force is false', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args) => ({ stdout: command === 'uv' && args[0] === '--version' ? 'uv 0.9.5\n' : '', stderr: '', @@ -335,14 +378,17 @@ describe('installManagedPythonRuntime', () => { }); expect(second.status).toBe('ready'); - expect(exec).toHaveBeenCalledTimes(3); + expect(exec).toHaveBeenCalledTimes(4); }); it('keeps failed install logs in the versioned runtime directory', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args) => { if (command === 'uv' && args[0] === 'venv') { - throw Object.assign(new Error('uv venv failed'), { stdout: 'creating\n', stderr: 'bad python\n' }); + throw Object.assign(new Error('uv venv failed'), { + stdout: 'creating\n', + stderr: '× No solution found\n╰─▶ current Python version (3.12.3) does not satisfy Python>=3.13\n', + }); } return { stdout: command === 'uv' && args[0] === '--version' ? 'uv 0.9.5\n' : '', stderr: '' }; }); @@ -355,11 +401,11 @@ describe('installManagedPythonRuntime', () => { features: ['core'], exec, }), - ).rejects.toThrow(/Python runtime install failed/); + ).rejects.toThrow(/current Python version \(3\.12\.3\) does not satisfy Python>=3\.13/); const log = await readFile(join(tempDir, 'runtime', '0.2.0', 'install.log'), 'utf8'); - expect(log).toContain('$ uv venv'); - expect(log).toContain('bad python'); + expect(log).toContain('$ uv venv --python 3.13'); + expect(log).toContain('current Python version (3.12.3) does not satisfy Python>=3.13'); }); }); @@ -386,7 +432,7 @@ describe('readManagedPythonRuntimeStatus', () => { }); it('reports ready when manifest and executables exist', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args) => ({ stdout: command === 'uv' && args[0] === '--version' ? 'uv 0.9.5\n' : '', stderr: '', @@ -413,7 +459,7 @@ describe('readManagedPythonRuntimeStatus', () => { }); it('reports broken when an executable is missing', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args) => ({ stdout: command === 'uv' && args[0] === '--version' ? 'uv 0.9.5\n' : '', stderr: '', @@ -449,7 +495,7 @@ describe('doctorManagedPythonRuntime', () => { }); it('checks uv, bundled assets, and installed runtime status', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const exec: ManagedPythonRuntimeExec = vi.fn(async (command, args) => ({ stdout: command === 'uv' && args[0] === '--version' ? 'uv 0.9.5\n' : '', stderr: '', @@ -471,7 +517,7 @@ describe('doctorManagedPythonRuntime', () => { }); it('reports uv as a hard prerequisite when uv is missing', async () => { - const { assetDir } = await writeAsset(tempDir, 'core-wheel'); + const { assetDir } = await writeAsset(tempDir, { label: 'core-wheel' }); const exec: ManagedPythonRuntimeExec = vi.fn(async () => { throw new Error('spawn uv ENOENT'); }); diff --git a/packages/cli/src/managed-python-runtime.ts b/packages/cli/src/managed-python-runtime.ts index 4e3af013..88b0fa2b 100644 --- a/packages/cli/src/managed-python-runtime.ts +++ b/packages/cli/src/managed-python-runtime.ts @@ -5,6 +5,7 @@ import { homedir } from 'node:os'; import { basename, join } from 'node:path'; import { fileURLToPath } from 'node:url'; import { promisify } from 'node:util'; +import { strFromU8, unzipSync } from 'fflate'; import { z } from 'zod'; const execFileAsync = promisify(execFile); @@ -78,6 +79,10 @@ export interface ManagedPythonDaemonLayout extends ManagedPythonRuntimeLayout { export interface ManagedRuntimeAsset { manifest: KtxRuntimeAssetManifest; wheelPath: string; + requiresPython: { + specifier: string; + minimumVersion: string; + }; } export type ManagedPythonRuntimeExec = ( @@ -196,6 +201,40 @@ function isErrnoException(error: unknown, code: string): boolean { return typeof error === 'object' && error !== null && 'code' in error && error.code === code; } +function parseRequiresPythonFromWheel(input: { wheelPath: string; contents: Buffer }): ManagedRuntimeAsset['requiresPython'] { + let files: Record; + try { + files = unzipSync(new Uint8Array(input.contents)); + } catch (error) { + throw new Error( + `Unable to read bundled Python runtime wheel metadata: ${error instanceof Error ? error.message : String(error)}`, + ); + } + const metadataEntry = Object.entries(files).find(([path]) => path.endsWith('.dist-info/METADATA')); + if (!metadataEntry) { + throw new Error(`Bundled Python runtime wheel metadata is missing: ${input.wheelPath}`); + } + + const metadata = strFromU8(metadataEntry[1]); + const requiresPython = metadata + .split(/\r?\n/) + .map((line) => line.match(/^Requires-Python:\s*(.+)\s*$/i)?.[1]?.trim()) + .find((value): value is string => typeof value === 'string' && value.length > 0); + if (!requiresPython) { + throw new Error('Bundled Python runtime wheel metadata is missing Requires-Python'); + } + + const minimumMatch = requiresPython.match(/(?:^|[,\s])>=\s*([0-9]+)\.([0-9]+)(?:\.[0-9]+)?\b/); + if (!minimumMatch) { + throw new Error(`Unsupported bundled Python runtime Requires-Python: ${requiresPython}`); + } + + return { + specifier: requiresPython, + minimumVersion: `${minimumMatch[1]}.${minimumMatch[2]}`, + }; +} + export async function verifyRuntimeAsset(input: { assetDir: string }): Promise { const manifestPath = join(input.assetDir, 'manifest.json'); let manifestData: unknown; @@ -221,7 +260,7 @@ export async function verifyRuntimeAsset(input: { assetDir: string }): Promise part.length > 0).join('\n'); + if (!output) { + return `Python runtime install failed. Install log: ${input.logPath}`; + } + return `Python runtime install failed.\n${output}\nInstall log: ${input.logPath}`; +} + async function runLogged(input: { exec: ManagedPythonRuntimeExec; logPath: string; @@ -288,7 +335,7 @@ async function runLogged(input: { if (output.stderr) { await appendFile(input.logPath, output.stderr.endsWith('\n') ? output.stderr : `${output.stderr}\n`); } - throw new Error(`Python runtime install failed. Install log: ${input.logPath}`); + throw new Error(installFailureMessage({ logPath: input.logPath, stdout: output.stdout, stderr: output.stderr })); } } @@ -334,7 +381,14 @@ export async function installManagedPythonRuntime( exec, logPath: layout.installLogPath, command: 'uv', - args: ['venv', layout.venvDir], + args: ['python', 'install', asset.requiresPython.minimumVersion], + env: uvEnv, + }); + await runLogged({ + exec, + logPath: layout.installLogPath, + command: 'uv', + args: ['venv', '--python', asset.requiresPython.minimumVersion, layout.venvDir], env: uvEnv, }); const wheelSpec = features.includes('local-embeddings') ? `${asset.wheelPath}[local-embeddings]` : asset.wheelPath; diff --git a/packages/cli/src/proxy-env.test.ts b/packages/cli/src/proxy-env.test.ts new file mode 100644 index 00000000..1da7bc91 --- /dev/null +++ b/packages/cli/src/proxy-env.test.ts @@ -0,0 +1,21 @@ +import { describe, expect, it } from 'vitest'; +import { sanitizeChildProxyEnv } from './proxy-env.js'; + +describe('sanitizeChildProxyEnv', () => { + it('drops IPv6 CIDR no-proxy entries and normalizes both env keys', () => { + const env = sanitizeChildProxyEnv({ + NO_PROXY: 'localhost,127.0.0.1,127.0.0.0/8,fd07:b51a:cc66:f0::/64,*.orb.local', + no_proxy: '::1,0.250.250.0/24,fd00::/8,*.orb.internal', + }); + + expect(env.NO_PROXY).toBe('localhost,127.0.0.1,127.0.0.0/8,*.orb.local,::1,0.250.250.0/24,*.orb.internal'); + expect(env.no_proxy).toBe(env.NO_PROXY); + }); + + it('preserves the input object and leaves missing proxy env unset', () => { + const input = { PATH: '/usr/bin' }; + + expect(sanitizeChildProxyEnv(input)).toEqual({ PATH: '/usr/bin' }); + expect(input).toEqual({ PATH: '/usr/bin' }); + }); +}); diff --git a/packages/cli/src/proxy-env.ts b/packages/cli/src/proxy-env.ts new file mode 100644 index 00000000..dd47ad8e --- /dev/null +++ b/packages/cli/src/proxy-env.ts @@ -0,0 +1,27 @@ +const NO_PROXY_KEYS = ['NO_PROXY', 'no_proxy'] as const; + +function isIpv6CidrNoProxyEntry(entry: string): boolean { + return entry.includes('/') && entry.includes(':'); +} + +function cleanedNoProxyValue(env: NodeJS.ProcessEnv): string | undefined { + const entries = NO_PROXY_KEYS.flatMap((key) => (env[key] ?? '').split(',')) + .map((entry) => entry.trim()) + .filter((entry) => entry.length > 0 && !isIpv6CidrNoProxyEntry(entry)); + + if (!NO_PROXY_KEYS.some((key) => env[key] !== undefined)) { + return undefined; + } + return [...new Set(entries)].join(','); +} + +export function sanitizeChildProxyEnv(env: NodeJS.ProcessEnv): NodeJS.ProcessEnv { + const sanitized = { ...env }; + const noProxy = cleanedNoProxyValue(env); + if (noProxy === undefined) { + return sanitized; + } + sanitized.NO_PROXY = noProxy; + sanitized.no_proxy = noProxy; + return sanitized; +} diff --git a/packages/cli/src/runtime.test.ts b/packages/cli/src/runtime.test.ts index 5af457de..01a529e7 100644 --- a/packages/cli/src/runtime.test.ts +++ b/packages/cli/src/runtime.test.ts @@ -52,6 +52,7 @@ describe('runKtxRuntime', () => { }, asset: { wheelPath: '/assets/python/kaelio_ktx-0.1.0-py3-none-any.whl', + requiresPython: { specifier: '>=3.13', minimumVersion: '3.13' }, manifest: { schemaVersion: 1, distributionName: 'kaelio-ktx', diff --git a/packages/cli/src/setup-embeddings.test.ts b/packages/cli/src/setup-embeddings.test.ts index 8d9ca0bc..7e22be26 100644 --- a/packages/cli/src/setup-embeddings.test.ts +++ b/packages/cli/src/setup-embeddings.test.ts @@ -46,9 +46,14 @@ function makePromptAdapter(options: { }; } -function managedDaemon(baseUrl = 'http://127.0.0.1:61234') { +function managedDaemon( + baseUrl = 'http://127.0.0.1:61234', + logs: { stdoutLog?: string; stderrLog?: string } = {}, +) { return { baseUrl, + stdoutLog: logs.stdoutLog ?? '/tmp/ktx-daemon.stdout.log', + stderrLog: logs.stderrLog ?? '/tmp/ktx-daemon.stderr.log', env: { KTX_MANAGED_SENTENCE_TRANSFORMERS_BASE_URL: baseUrl, }, @@ -330,6 +335,65 @@ describe('setup embeddings step', () => { expect(io.stderr()).not.toContain('skip for now'); }); + it('prints the recent daemon stderr tail when local embedding health check fails', async () => { + const io = makeIo(); + const stderrLog = join(tempDir, '.ktx', 'runtime', 'daemon.stderr.log'); + await mkdir(join(tempDir, '.ktx', 'runtime'), { recursive: true }); + await writeFile( + stderrLog, + Array.from({ length: 45 }, (_value, index) => `daemon traceback line ${index + 1}`).join('\n'), + ); + + const result = await runKtxSetupEmbeddingsStep( + { + projectDir: tempDir, + inputMode: 'disabled', + cliVersion: '0.2.0', + runtimeInstallPolicy: 'auto', + skipEmbeddings: false, + }, + io.io, + { + env: {}, + ensureLocalEmbeddings: vi.fn(async () => managedDaemon('http://127.0.0.1:61234', { stderrLog })), + healthCheck: vi.fn(async () => ({ ok: false as const, message: 'HTTP 500' })), + }, + ); + + expect(result.status).toBe('failed'); + expect(io.stderr()).toContain('Recent local embeddings daemon stderr:'); + expect(io.stderr()).toContain('daemon traceback line 6'); + expect(io.stderr()).toContain('daemon traceback line 45'); + expect(io.stderr()).not.toContain('daemon traceback line 5'); + }); + + it('does not print daemon stderr diagnostics when the log is unavailable or empty', async () => { + const io = makeIo(); + + const result = await runKtxSetupEmbeddingsStep( + { + projectDir: tempDir, + inputMode: 'disabled', + cliVersion: '0.2.0', + runtimeInstallPolicy: 'auto', + skipEmbeddings: false, + }, + io.io, + { + env: {}, + ensureLocalEmbeddings: vi.fn(async () => + managedDaemon('http://127.0.0.1:61234', { + stderrLog: join(tempDir, '.ktx', 'runtime', 'missing.stderr.log'), + }), + ), + healthCheck: vi.fn(async () => ({ ok: false as const, message: 'HTTP 500' })), + }, + ); + + expect(result.status).toBe('failed'); + expect(io.stderr()).not.toContain('Recent local embeddings daemon stderr:'); + }); + it('uses fixed OpenAI defaults and only asks for credentials when OpenAI is selected', async () => { const io = makeIo(); const healthCheck = vi.fn(async () => ({ ok: true as const })); diff --git a/packages/cli/src/setup-embeddings.ts b/packages/cli/src/setup-embeddings.ts index 1f3c73ae..475e5126 100644 --- a/packages/cli/src/setup-embeddings.ts +++ b/packages/cli/src/setup-embeddings.ts @@ -1,4 +1,4 @@ -import { writeFile } from 'node:fs/promises'; +import { readFile, writeFile } from 'node:fs/promises'; import { resolveKtxConfigReference } from '@ktx/context/core'; import { type KtxProjectConfig, @@ -59,6 +59,7 @@ export interface KtxSetupEmbeddingsDeps { healthCheck?: (config: KtxEmbeddingConfig) => Promise; ensureLocalEmbeddings?: (options: { cliVersion: string; + projectDir: string; installPolicy: KtxManagedPythonInstallPolicy; io: KtxCliIo; }) => Promise; @@ -85,6 +86,7 @@ const EMBEDDING_OPTION_PROMPT_CONTEXT = 'KTX uses embeddings for semantic search over semantic-layer sources, wiki context, schema metadata, ' + 'and relationship evidence.'; const LOCAL_EMBEDDING_HEALTH_TIMEOUT_MS = 120_000; +const LOCAL_EMBEDDING_STDERR_TAIL_LINES = 40; function createPromptAdapter(): KtxSetupEmbeddingsPromptAdapter { return createKtxSetupPromptAdapter({ selectCancelValue: 'back' }); @@ -286,14 +288,33 @@ async function chooseEmbeddingBackend( return 'back'; } -function localEmbeddingSetupMessage(message: string): string { - return [ +async function readLocalEmbeddingDaemonStderrTail(stderrLog: string | undefined): Promise { + if (!stderrLog) { + return []; + } + try { + const lines = (await readFile(stderrLog, 'utf8')) + .split(/\r?\n/) + .map((line) => line.trimEnd()) + .filter((line) => line.trim().length > 0); + return lines.slice(-LOCAL_EMBEDDING_STDERR_TAIL_LINES); + } catch { + return []; + } +} + +function localEmbeddingSetupMessage(message: string, stderrTail: string[] = []): string { + const lines = [ `Local embedding health check failed: ${message}`, 'Local embeddings use the KTX-managed Python runtime.', 'Prepare the runtime with: ktx dev runtime start --feature local-embeddings', 'Use --yes with setup to install and start the runtime without prompting.', 'The first run may download Python packages and the all-MiniLM-L6-v2 model.', - ].join('\n'); + ]; + if (stderrTail.length > 0) { + lines.push('Recent local embeddings daemon stderr:', ...stderrTail); + } + return lines.join('\n'); } async function promptAfterLocalEmbeddingFailure( @@ -447,9 +468,13 @@ export async function runKtxSetupEmbeddingsStep( } progress.fail('Embedding test failed'); + const stderrTail = + selectedBackend === 'sentence-transformers' + ? await readLocalEmbeddingDaemonStderrTail(managedLocalEmbeddings?.stderrLog) + : []; io.stderr.write( selectedBackend === 'sentence-transformers' - ? `${localEmbeddingSetupMessage(health.message)}\n` + ? `${localEmbeddingSetupMessage(health.message, stderrTail)}\n` : `Embedding health check failed: ${health.message}\n`, ); if (args.inputMode === 'disabled') { diff --git a/packages/cli/src/setup-project.ts b/packages/cli/src/setup-project.ts index 6207974d..04bd54f5 100644 --- a/packages/cli/src/setup-project.ts +++ b/packages/cli/src/setup-project.ts @@ -29,8 +29,18 @@ export interface KtxSetupProjectArgs { allowBack?: boolean; } +export type KtxSetupCreatedProjectCleanup = + | { kind: 'remove-project-dir'; projectDir: string } + | { kind: 'remove-ktx-scaffold'; projectDir: string }; + export type KtxSetupProjectResult = - | { status: 'ready'; projectDir: string; project: KtxLocalProject; confirmedCreation?: boolean } + | { + status: 'ready'; + projectDir: string; + project: KtxLocalProject; + confirmedCreation?: boolean; + createdProjectCleanup?: KtxSetupCreatedProjectCleanup; + } | { status: 'back'; projectDir: string } | { status: 'cancelled'; projectDir: string } | { status: 'missing-input'; projectDir: string }; @@ -49,7 +59,12 @@ export interface KtxSetupProjectDeps { } type PromptProjectDirResult = - | { status: 'selected'; projectDir: string; confirmedCreation: boolean } + | { + status: 'selected'; + projectDir: string; + confirmedCreation: boolean; + createdProjectCleanup?: KtxSetupCreatedProjectCleanup; + } | { status: 'cancelled'; projectDir: string } | { status: 'missing-input'; projectDir: string } | { status: 'back'; projectDir: string }; @@ -92,12 +107,29 @@ async function existingFolderState( } type ConfirmProjectDirResult = - | { status: 'confirmed'; confirmedCreation: boolean } + | { + status: 'confirmed'; + confirmedCreation: boolean; + createdProjectCleanup?: KtxSetupCreatedProjectCleanup; + } | { status: 'choose-another' } | { status: 'back' } | { status: 'cancelled' } | { status: 'not-directory' }; +function cleanupForFolderState( + projectDir: string, + state: Awaited>, +): KtxSetupCreatedProjectCleanup | undefined { + if (state === 'missing') { + return { kind: 'remove-project-dir', projectDir }; + } + if (state === 'empty-directory') { + return { kind: 'remove-ktx-scaffold', projectDir }; + } + return undefined; +} + async function confirmProjectDir( selectedDir: string, io: KtxCliIo, @@ -137,7 +169,7 @@ async function confirmProjectDir( if (action === 'choose-another') return { status: 'choose-another' }; if (action === 'back') return { status: 'back' }; if (action !== 'create') return { status: 'cancelled' }; - return { status: 'confirmed', confirmedCreation: true }; + return { status: 'confirmed', confirmedCreation: true, createdProjectCleanup: cleanupForFolderState(selectedDir, state) }; } async function normalizeSetupGitignore(projectDir: string): Promise { @@ -220,10 +252,28 @@ async function promptForNewProjectDir( if (confirmed.status === 'choose-another') continue; if (confirmed.status === 'back') return { status: 'back', projectDir }; if (confirmed.status === 'cancelled') return { status: 'cancelled', projectDir }; - return { status: 'selected', projectDir: selectedDir, confirmedCreation: confirmed.confirmedCreation }; + return { + status: 'selected', + projectDir: selectedDir, + confirmedCreation: confirmed.confirmedCreation, + ...(confirmed.createdProjectCleanup ? { createdProjectCleanup: confirmed.createdProjectCleanup } : {}), + }; } } +async function createProjectWithCleanup( + projectDir: string, + deps: KtxSetupProjectDeps, +): Promise<{ project: KtxLocalProject; createdProjectCleanup?: KtxSetupCreatedProjectCleanup }> { + const state = await existingFolderState(projectDir); + const project = await createProject(projectDir, deps); + const createdProjectCleanup = cleanupForFolderState(projectDir, state); + return { + project, + ...(createdProjectCleanup ? { createdProjectCleanup } : {}), + }; +} + export async function runKtxSetupProjectStep( args: KtxSetupProjectArgs, io: KtxCliIo, @@ -261,6 +311,7 @@ export async function runKtxSetupProjectStep( projectDir: selected.projectDir, project, confirmedCreation: selected.confirmedCreation, + ...(selected.createdProjectCleanup ? { createdProjectCleanup: selected.createdProjectCleanup } : {}), }; } @@ -275,9 +326,14 @@ export async function runKtxSetupProjectStep( io.stderr.write('Missing setup choice: pass --yes to create a project in non-interactive setup.\n'); return { status: 'missing-input', projectDir }; } - const project = await createProject(projectDir, deps); + const { project, createdProjectCleanup } = await createProjectWithCleanup(projectDir, deps); printProjectSummary(io, projectDir); - return { status: 'ready', projectDir, project }; + return { + status: 'ready', + projectDir, + project, + ...(createdProjectCleanup ? { createdProjectCleanup } : {}), + }; } if (!io.stdout.isTTY && !deps.prompts) { @@ -316,9 +372,14 @@ export async function runKtxSetupProjectStep( } if (choice === 'current') { - const project = await createProject(projectDir, deps); + const { project, createdProjectCleanup } = await createProjectWithCleanup(projectDir, deps); printProjectSummary(io, projectDir); - return { status: 'ready', projectDir, project }; + return { + status: 'ready', + projectDir, + project, + ...(createdProjectCleanup ? { createdProjectCleanup } : {}), + }; } if (choice === 'new-default') { @@ -333,6 +394,7 @@ export async function runKtxSetupProjectStep( projectDir: defaultProjectDir, project, confirmedCreation: confirmed.confirmedCreation, + ...(confirmed.createdProjectCleanup ? { createdProjectCleanup: confirmed.createdProjectCleanup } : {}), }; } @@ -356,7 +418,13 @@ export async function runKtxSetupProjectStep( if (confirmed.status === 'cancelled') return { status: 'cancelled', projectDir }; const project = await createProject(customDir, deps); printProjectSummary(io, customDir); - return { status: 'ready', projectDir: customDir, project, confirmedCreation: confirmed.confirmedCreation }; + return { + status: 'ready', + projectDir: customDir, + project, + confirmedCreation: confirmed.confirmedCreation, + ...(confirmed.createdProjectCleanup ? { createdProjectCleanup: confirmed.createdProjectCleanup } : {}), + }; } prompts.cancel('Setup cancelled.'); diff --git a/packages/cli/src/setup-runtime.test.ts b/packages/cli/src/setup-runtime.test.ts index e6046379..ee070fc7 100644 --- a/packages/cli/src/setup-runtime.test.ts +++ b/packages/cli/src/setup-runtime.test.ts @@ -101,6 +101,8 @@ describe('runKtxSetupRuntimeStep', () => { const io = makeIo(); const ensureLocalEmbeddings = vi.fn(async () => ({ baseUrl: 'http://127.0.0.1:61234', + stdoutLog: join(tempDir, '.ktx', 'runtime', 'daemon.stdout.log'), + stderrLog: join(tempDir, '.ktx', 'runtime', 'daemon.stderr.log'), env: { KTX_MANAGED_SENTENCE_TRANSFORMERS_BASE_URL: 'http://127.0.0.1:61234' }, })); const config: KtxProjectConfig = { diff --git a/packages/cli/src/setup.test.ts b/packages/cli/src/setup.test.ts index 99ac06d5..9c6bf626 100644 --- a/packages/cli/src/setup.test.ts +++ b/packages/cli/src/setup.test.ts @@ -1,5 +1,5 @@ import { execFile } from 'node:child_process'; -import { mkdir, mkdtemp, readFile, rm, stat, writeFile } from 'node:fs/promises'; +import { mkdir, mkdtemp, readFile, readdir, rm, stat, writeFile } from 'node:fs/promises'; import { tmpdir } from 'node:os'; import { join } from 'node:path'; import { promisify } from 'node:util'; @@ -563,6 +563,112 @@ describe('setup status', () => { expect(testIo.stderr()).toBe(''); }); + it('removes a newly created missing project directory when a later runtime step fails', async () => { + const projectDir = join(tempDir, 'missing-project'); + const testIo = makeIo(); + + await expect( + runKtxSetup( + { + command: 'run', + projectDir, + mode: 'auto', + agents: false, + skipAgents: true, + inputMode: 'disabled', + yes: true, + cliVersion: '0.2.0', + skipLlm: true, + skipEmbeddings: true, + databaseSchemas: [], + skipDatabases: true, + skipSources: true, + }, + testIo.io, + { + model: async () => ({ status: 'skipped', projectDir }), + embeddings: async () => ({ status: 'skipped', projectDir }), + databases: async () => ({ status: 'skipped', projectDir }), + sources: async () => ({ status: 'skipped', projectDir }), + runtime: async () => ({ status: 'failed', projectDir, requirements: { features: ['core'], requirements: [] } }), + }, + ), + ).resolves.toBe(1); + + await expect(stat(projectDir)).rejects.toThrow(); + }); + + it('removes KTX scaffold files from an initially empty project directory when runtime setup fails', async () => { + const testIo = makeIo(); + + await expect( + runKtxSetup( + { + command: 'run', + projectDir: tempDir, + mode: 'auto', + agents: false, + skipAgents: true, + inputMode: 'disabled', + yes: true, + cliVersion: '0.2.0', + skipLlm: true, + skipEmbeddings: true, + databaseSchemas: [], + skipDatabases: true, + skipSources: true, + }, + testIo.io, + { + model: async () => ({ status: 'skipped', projectDir: tempDir }), + embeddings: async () => ({ status: 'skipped', projectDir: tempDir }), + databases: async () => ({ status: 'skipped', projectDir: tempDir }), + sources: async () => ({ status: 'skipped', projectDir: tempDir }), + runtime: async () => ({ status: 'failed', projectDir: tempDir, requirements: { features: ['core'], requirements: [] } }), + }, + ), + ).resolves.toBe(1); + + await expect(stat(tempDir)).resolves.toBeDefined(); + expect(await readdir(tempDir)).toEqual([]); + }); + + it('preserves a pre-existing non-empty project directory when runtime setup fails', async () => { + await writeFile(join(tempDir, 'notes.txt'), 'keep me\n', 'utf-8'); + const testIo = makeIo(); + + await expect( + runKtxSetup( + { + command: 'run', + projectDir: tempDir, + mode: 'auto', + agents: false, + skipAgents: true, + inputMode: 'disabled', + yes: true, + cliVersion: '0.2.0', + skipLlm: true, + skipEmbeddings: true, + databaseSchemas: [], + skipDatabases: true, + skipSources: true, + }, + testIo.io, + { + model: async () => ({ status: 'skipped', projectDir: tempDir }), + embeddings: async () => ({ status: 'skipped', projectDir: tempDir }), + databases: async () => ({ status: 'skipped', projectDir: tempDir }), + sources: async () => ({ status: 'skipped', projectDir: tempDir }), + runtime: async () => ({ status: 'failed', projectDir: tempDir, requirements: { features: ['core'], requirements: [] } }), + }, + ), + ).resolves.toBe(1); + + await expect(readFile(join(tempDir, 'notes.txt'), 'utf-8')).resolves.toBe('keep me\n'); + await expect(stat(join(tempDir, 'ktx.yaml'))).resolves.toBeDefined(); + }); + it('shows demo near the bottom of the first setup intent menu before project creation', async () => { const testIo = makeIo(); const select = vi.fn(async (options: { options: Array<{ value: string; label: string }> }) => { diff --git a/packages/cli/src/setup.ts b/packages/cli/src/setup.ts index b0d675c1..fbf40aa8 100644 --- a/packages/cli/src/setup.ts +++ b/packages/cli/src/setup.ts @@ -1,4 +1,5 @@ import { existsSync } from 'node:fs'; +import { rm } from 'node:fs/promises'; import { basename, join, resolve } from 'node:path'; import { getLatestLocalIngestStatus, savedMemoryCountsForReport } from '@ktx/context/ingest'; import { @@ -33,7 +34,11 @@ import { isKtxSetupLlmConfigReady, runKtxSetupAnthropicModelStep, } from './setup-models.js'; -import { type KtxSetupProjectDeps, runKtxSetupProjectStep } from './setup-project.js'; +import { + type KtxSetupCreatedProjectCleanup, + type KtxSetupProjectDeps, + runKtxSetupProjectStep, +} from './setup-project.js'; import { isKtxPreAgentSetupReady, isKtxSetupReady, @@ -502,6 +507,23 @@ async function commitSetupConfigChanges(projectDir: string): Promise { await project.git.commitFile('ktx.yaml', 'setup: update KTX project config', 'ktx setup', 'setup@ktx.local'); } +const KTX_SETUP_SCAFFOLD_PATHS = ['ktx.yaml', '.ktx', 'wiki', 'semantic-layer', 'raw-sources', '.git']; + +async function cleanupCreatedProjectScaffold(cleanup: KtxSetupCreatedProjectCleanup | undefined): Promise { + if (!cleanup) { + return; + } + if (cleanup.kind === 'remove-project-dir') { + await rm(cleanup.projectDir, { recursive: true, force: true }); + return; + } + await Promise.all( + KTX_SETUP_SCAFFOLD_PATHS.map((relativePath) => + rm(join(cleanup.projectDir, relativePath), { recursive: true, force: true }), + ), + ); +} + export async function runKtxSetup(args: KtxSetupArgs, io: KtxCliIo, deps: KtxSetupDeps = {}): Promise { try { return await runKtxSetupInner(args, io, deps); @@ -772,7 +794,11 @@ async function runKtxSetupInner(args: KtxSetupArgs, io: KtxCliIo, deps: KtxSetup } } - if (stepResult.status === 'failed' || stepResult.status === 'missing-input') { + if (stepResult.status === 'failed') { + await cleanupCreatedProjectScaffold(projectResult.createdProjectCleanup); + return 1; + } + if (stepResult.status === 'missing-input') { return 1; } if (stepResult.status === 'back') { diff --git a/packages/llm/src/embedding-provider.test.ts b/packages/llm/src/embedding-provider.test.ts index 41d11b1a..c649a948 100644 --- a/packages/llm/src/embedding-provider.test.ts +++ b/packages/llm/src/embedding-provider.test.ts @@ -111,12 +111,12 @@ describe('createKtxEmbeddingProvider', () => { ); }); - it('falls back to one-shot ktx-daemon inference when the local HTTP daemon is unavailable', async () => { - const fetch = vi.fn().mockRejectedValue(new TypeError('fetch failed')); - const runSentenceTransformersJson = vi + it('reports local HTTP daemon failures without a ktx-daemon spawn fallback cascade', async () => { + const fetch = vi .fn() - .mockResolvedValueOnce({ embedding: [0.1, 0.2] }) - .mockResolvedValueOnce({ embeddings: [[0.3, 0.4], [0.5, 0.6]] }); + .mockResolvedValue( + new Response('Embedding compute failed: httpx.InvalidURL: Invalid port', { status: 500 }), + ); const provider = createKtxEmbeddingProvider( { @@ -125,19 +125,13 @@ describe('createKtxEmbeddingProvider', () => { dimensions: 2, sentenceTransformers: { baseURL: 'http://127.0.0.1:8765', pathPrefix: '' }, }, - { fetch, runSentenceTransformersJson }, + { fetch }, ); - await expect(provider.embedMany(['hello', 'world'])).resolves.toEqual([ - [0.3, 0.4], - [0.5, 0.6], - ]); + await expect(provider.embed('hello')).rejects.toThrow( + 'Embedding provider sentence-transformers request failed with HTTP 500: Embedding compute failed: httpx.InvalidURL: Invalid port', + ); + await expect(provider.embed('hello')).rejects.not.toThrow('ktx-daemon fallback failed'); expect(fetch).toHaveBeenCalledTimes(1); - expect(runSentenceTransformersJson).toHaveBeenNthCalledWith(1, 'embedding-compute', { - text: '__ktx_embedding_probe__', - }); - expect(runSentenceTransformersJson).toHaveBeenNthCalledWith(2, 'embedding-compute-bulk', { - texts: ['hello', 'world'], - }); }); }); diff --git a/packages/llm/src/embedding-provider.ts b/packages/llm/src/embedding-provider.ts index d24e3749..5290c044 100644 --- a/packages/llm/src/embedding-provider.ts +++ b/packages/llm/src/embedding-provider.ts @@ -1,15 +1,7 @@ -import { spawn } from 'node:child_process'; -import { join } from 'node:path'; import OpenAI from 'openai'; import type { KtxEmbeddingConfig, KtxEmbeddingProvider } from './types.js'; type FetchFn = typeof fetch; -type SentenceTransformersCommand = 'embedding-compute' | 'embedding-compute-bulk'; -type SentenceTransformersJsonRunner = ( - subcommand: SentenceTransformersCommand, - payload: Record, -) => Promise>; -type SentenceTransformersProcessCommand = { command: string; args: string[] }; export interface KtxEmbeddingProviderDeps { createOpenAIClient?: (options: { apiKey?: string; baseURL?: string }) => { @@ -23,14 +15,10 @@ export interface KtxEmbeddingProviderDeps { }; }; fetch?: FetchFn; - runSentenceTransformersJson?: SentenceTransformersJsonRunner; - sentenceTransformersCommand?: string; - sentenceTransformersArgs?: string[]; - sentenceTransformersCwd?: string; - sentenceTransformersEnv?: NodeJS.ProcessEnv; } const DEFAULT_BATCH_SIZE = 100; +const HTTP_ERROR_BODY_MAX_LENGTH = 2_000; function assertNonEmptyText(text: string): void { if (!text.trim()) { @@ -69,110 +57,12 @@ function joinUrl(baseURL: string, pathPrefix: string, path: string): string { return prefix ? `${base}/${prefix}/${suffix}` : `${base}/${suffix}`; } -function errorText(error: unknown): string { - if (error instanceof Error) { - return error.cause - ? `${error.name}: ${error.message}; cause: ${errorText(error.cause)}` - : `${error.name}: ${error.message}`; +function boundedHttpBody(text: string): string { + const normalized = text.trim(); + if (normalized.length <= HTTP_ERROR_BODY_MAX_LENGTH) { + return normalized; } - return String(error); -} - -function parseJsonObject(raw: string, subcommand: string): Record { - const parsed = JSON.parse(raw) as unknown; - if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) { - throw new Error(`ktx-daemon ${subcommand} returned non-object JSON`); - } - return parsed as Record; -} - -function isCommandNotFound(error: unknown): boolean { - return ( - error instanceof Error && - ('code' in error || 'errno' in error) && - ((error as { code?: unknown }).code === 'ENOENT' || (error as { errno?: unknown }).errno === 'ENOENT') - ); -} - -function defaultSentenceTransformersProcessCommands(): SentenceTransformersProcessCommand[] { - const venvBin = - process.platform === 'win32' ? join('.venv', 'Scripts', 'ktx-daemon.exe') : join('.venv', 'bin', 'ktx-daemon'); - const repoVenvBin = - process.platform === 'win32' - ? join('ktx', '.venv', 'Scripts', 'ktx-daemon.exe') - : join('ktx', '.venv', 'bin', 'ktx-daemon'); - return [ - { command: 'ktx-daemon', args: [] }, - { command: venvBin, args: [] }, - { command: repoVenvBin, args: [] }, - ]; -} - -function runSentenceTransformersProcessCommand( - options: SentenceTransformersProcessCommand & { - cwd?: string; - env?: NodeJS.ProcessEnv; - }, -): SentenceTransformersJsonRunner { - return async ( - subcommand: SentenceTransformersCommand, - payload: Record, - ): Promise> => - new Promise((resolve, reject) => { - const child = spawn(options.command, [...options.args, subcommand], { - cwd: options.cwd, - env: { ...process.env, ...options.env }, - stdio: ['pipe', 'pipe', 'pipe'], - }); - const stdout: Buffer[] = []; - const stderr: Buffer[] = []; - - child.stdout.on('data', (chunk: Buffer) => stdout.push(chunk)); - child.stderr.on('data', (chunk: Buffer) => stderr.push(chunk)); - child.on('error', reject); - child.on('close', (code) => { - const stdoutText = Buffer.concat(stdout).toString('utf8').trim(); - const stderrText = Buffer.concat(stderr).toString('utf8').trim(); - if (code !== 0) { - reject(new Error(`ktx-daemon ${subcommand} failed: ${stderrText || `exit code ${code}`}`)); - return; - } - try { - resolve(parseJsonObject(stdoutText, subcommand)); - } catch (error) { - reject(error); - } - }); - child.stdin.end(`${JSON.stringify(payload)}\n`); - }); -} - -function runSentenceTransformersProcessJson(options: { - commands: SentenceTransformersProcessCommand[]; - cwd?: string; - env?: NodeJS.ProcessEnv; -}): SentenceTransformersJsonRunner { - return async ( - subcommand: SentenceTransformersCommand, - payload: Record, - ): Promise> => { - const errors: string[] = []; - for (const command of options.commands) { - try { - return await runSentenceTransformersProcessCommand({ - ...command, - cwd: options.cwd, - env: options.env, - })(subcommand, payload); - } catch (error) { - errors.push(`${command.command}: ${errorText(error)}`); - if (!isCommandNotFound(error)) { - break; - } - } - } - throw new Error(`ktx-daemon ${subcommand} failed: ${errors.join('; ')}`); - }; + return `${normalized.slice(0, HTTP_ERROR_BODY_MAX_LENGTH)}...`; } class OpenAIEmbeddingProvider implements KtxEmbeddingProvider { @@ -228,9 +118,7 @@ class SentenceTransformersEmbeddingProvider implements KtxEmbeddingProvider { private readonly fetch: FetchFn; private readonly baseURL: string; private readonly pathPrefix: string; - private readonly runJson: SentenceTransformersJsonRunner; private readonly startupProbe: Promise; - private useProcessRunner = false; constructor(config: KtxEmbeddingConfig, deps: KtxEmbeddingProviderDeps) { if (!config.sentenceTransformers?.baseURL) { @@ -241,15 +129,6 @@ class SentenceTransformersEmbeddingProvider implements KtxEmbeddingProvider { this.fetch = deps.fetch ?? fetch; this.baseURL = config.sentenceTransformers.baseURL; this.pathPrefix = config.sentenceTransformers.pathPrefix ?? '/api'; - this.runJson = - deps.runSentenceTransformersJson ?? - runSentenceTransformersProcessJson({ - commands: deps.sentenceTransformersCommand - ? [{ command: deps.sentenceTransformersCommand, args: deps.sentenceTransformersArgs ?? [] }] - : defaultSentenceTransformersProcessCommands(), - cwd: deps.sentenceTransformersCwd, - env: deps.sentenceTransformersEnv, - }); this.startupProbe = this.requestSingle('__ktx_embedding_probe__').then((embedding) => { assertVectorDimensions(embedding, this.dimensions, 'sentence-transformers'); }); @@ -264,7 +143,7 @@ class SentenceTransformersEmbeddingProvider implements KtxEmbeddingProvider { async embedMany(texts: string[]): Promise { assertBatchSize(texts, this.maxBatchSize); await this.startupProbe; - const response = await this.requestJson('embedding-compute-bulk', '/embeddings/compute-bulk', { texts }); + const response = await this.requestJson('/embeddings/compute-bulk', { texts }); if ( !response || typeof response !== 'object' || @@ -285,37 +164,15 @@ class SentenceTransformersEmbeddingProvider implements KtxEmbeddingProvider { } private async requestSingle(text: string): Promise { - const response = await this.requestJson('embedding-compute', '/embeddings/compute', { text }); + const response = await this.requestJson('/embeddings/compute', { text }); if (!response || typeof response !== 'object' || !('embedding' in response) || !Array.isArray(response.embedding)) { throw new Error('Embedding provider sentence-transformers returned malformed single response'); } return response.embedding; } - private async requestJson( - command: SentenceTransformersCommand, - path: string, - body: Record, - ): Promise> { - if (this.useProcessRunner) { - return this.runJson(command, body); - } - - try { - return await this.postJson(path, body); - } catch (httpError) { - try { - const response = await this.runJson(command, body); - this.useProcessRunner = true; - return response; - } catch (processError) { - throw new Error( - `Embedding provider sentence-transformers local HTTP request failed (${errorText( - httpError, - )}) and ktx-daemon fallback failed (${errorText(processError)})`, - ); - } - } + private async requestJson(path: string, body: Record): Promise> { + return await this.postJson(path, body); } private async postJson(path: string, body: Record): Promise> { @@ -325,7 +182,12 @@ class SentenceTransformersEmbeddingProvider implements KtxEmbeddingProvider { body: JSON.stringify(body), }); if (!response.ok) { - throw new Error(`Embedding provider sentence-transformers request failed with HTTP ${response.status}`); + const bodyText = boundedHttpBody(await response.text()); + throw new Error( + `Embedding provider sentence-transformers request failed with HTTP ${response.status}${ + bodyText ? `: ${bodyText}` : '' + }`, + ); } const parsed = (await response.json()) as unknown; if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {