diff --git a/README.md b/README.md index d44905d5..2c433e0d 100644 --- a/README.md +++ b/README.md @@ -30,8 +30,9 @@ warehouse accurately - from approved metric definitions, joinable columns, and business knowledge it builds and maintains for you. > [!NOTE] -> Run **ktx** with your own LLM API keys or a **Claude Pro/Max** subscription. -> No extra usage billing from **ktx**. +> Run **ktx** with your own LLM API keys or a local agent sign-in — a +> **Claude Pro/Max** subscription through Claude Code, or your local Codex +> authentication. No extra usage billing from **ktx**.

@@ -175,8 +176,9 @@ then the current directory. Pass `--project-dir ` when scripting. No. **ktx** runs locally. The only data leaving your machine is what you send to the LLM provider you configured. - **Which LLM backends are supported?** - Anthropic API, Google Vertex AI, AI Gateway, and the local Claude Code - session through the Claude Agent SDK. See + Anthropic API, Google Vertex AI, AI Gateway, the local Claude Code session + through the Claude Agent SDK, and your local Codex authentication through the + Codex SDK. See [LLM configuration](https://docs.kaelio.com/ktx/docs/guides/llm-configuration). - **How is ktx different from a dbt or MetricFlow semantic layer?** **ktx** *ingests* those layers and combines them with raw-table diff --git a/docs-site/content/docs/cli-reference/ktx-setup.mdx b/docs-site/content/docs/cli-reference/ktx-setup.mdx index 0da7b339..24469a63 100644 --- a/docs-site/content/docs/cli-reference/ktx-setup.mdx +++ b/docs-site/content/docs/cli-reference/ktx-setup.mdx @@ -51,8 +51,9 @@ prompts. | Flag | Description | |------|-------------| -| `--llm-backend ` | LLM backend: `anthropic`, `vertex`, or `claude-code` | +| `--llm-backend ` | LLM backend: `anthropic`, `vertex`, `claude-code`, or `codex` | | `--llm-backend claude-code` | Use the local Claude Code session for **ktx** LLM calls | +| `--llm-backend codex` | Use local Codex authentication for **ktx** LLM calls | | `--llm-model ` | LLM model ID or backend model alias to validate and save | | `--anthropic-api-key-env ` | Environment variable containing the Anthropic API key | | `--anthropic-api-key-file ` | File containing the Anthropic API key | @@ -62,9 +63,14 @@ prompts. Choose only one Anthropic credential source. Anthropic credential flags are only valid with the Anthropic backend; Vertex flags are only valid with the Vertex -backend. The `claude-code` backend uses local Claude Code authentication instead +backend. The `claude-code` and `codex` backends use local authentication instead of Anthropic API key or Vertex flags. For Claude Code, `--llm-model` accepts -`sonnet`, `opus`, `haiku`, or a full Claude model ID. +`sonnet`, `opus`, `haiku`, or a full Claude model ID. For Codex, `--llm-model` +accepts `codex`, `default`, or a `gpt-*` / `codex-*` model ID such as +`gpt-5.5`; any other value is rejected before the auth probe. Run `codex` to +see the models available to your login, and pick a `gpt-*` / `codex-*` id from +that list. Note that `*-codex` API-billing model IDs (for example +`gpt-5.3-codex`) are not available to ChatGPT-subscription logins. ### Embeddings @@ -191,6 +197,17 @@ ktx setup \ --llm-backend claude-code \ --llm-model opus +# Configure **ktx** to use local Codex authentication for LLM work +ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input +``` + +When you choose `--llm-backend codex`, setup prints a warning if the public +Codex SDK and CLI surface cannot prove full Claude-Code-style isolation. The +backend restricts **ktx** runtime MCP tools to each run, but Codex may still +load user Codex config and built-in command execution or read-only file +capabilities. + +```bash # Script a Postgres connection that reads its URL from the environment ktx setup \ --project-dir ./analytics \ diff --git a/docs-site/content/docs/cli-reference/ktx-status.mdx b/docs-site/content/docs/cli-reference/ktx-status.mdx index 51c00148..66e4964c 100644 --- a/docs-site/content/docs/cli-reference/ktx-status.mdx +++ b/docs-site/content/docs/cli-reference/ktx-status.mdx @@ -21,7 +21,7 @@ ktx status [options] | `--json` | Print JSON output | `false` | | `-v`, `--verbose` | Show every check, including passing ones | `false` | | `--validate` | Only validate the `ktx.yaml` schema; skip readiness checks | `false` | -| `--fast` | Skip checks that require external communication (query-history readiness probes and Claude Code auth probe) | `false` | +| `--fast` | Skip checks that require external communication (query-history readiness probes, Claude Code auth probe, and Codex auth probe) | `false` | | `--no-input` | Disable interactive terminal input | - | ## Examples @@ -39,7 +39,7 @@ ktx status --verbose # Validate ktx.yaml without running readiness checks ktx status --validate -# Skip slow probes (query-history readiness, Claude Code auth) +# Skip slow probes (query-history readiness, Claude Code auth, Codex auth) ktx status --fast # Check a project from another directory @@ -57,6 +57,16 @@ flow, then rerun `ktx status`. Use `--fast` to skip this probe (useful in CI or offline contexts); skipped checks render as `-` and carry `"status": "skipped"` in JSON output. +For `llm.provider.backend: codex`, `ktx status` runs a minimal non-interactive +Codex request. If the probe fails, authenticate Codex locally with the Codex CLI +and verify the Codex CLI installation. + +When `llm.provider.backend: codex` is configured, `ktx status` also prints a +warning when the installed public Codex SDK and CLI surface cannot prove full +Claude-Code-style isolation. The warning does not block authenticated Codex +usage, but it marks the project status as partial so you can make an explicit +runtime-isolation decision. + A `Local data` section summarises what the project has accumulated locally: ingest run counts, last completed timestamp per connection, knowledge page counts by scope, semantic-layer source and dictionary value counts, and the diff --git a/docs-site/content/docs/configuration/ktx-yaml.mdx b/docs-site/content/docs/configuration/ktx-yaml.mdx index 13105851..a9298443 100644 --- a/docs-site/content/docs/configuration/ktx-yaml.mdx +++ b/docs-site/content/docs/configuration/ktx-yaml.mdx @@ -376,13 +376,23 @@ llm: | Field | Type | Default | Purpose | |-------|------|---------|---------| -| `provider.backend` | `none` \| `anthropic` \| `vertex` \| `gateway` \| `claude-code` | `none` | Selected backend. `none` disables LLM features. `claude-code` uses the local Claude Code session and needs no API key. | +| `provider.backend` | `none` \| `anthropic` \| `vertex` \| `gateway` \| `claude-code` \| `codex` | `none` | Selected backend. `none` disables LLM features. `claude-code` uses the local Claude Code session and needs no API key. `codex` uses local Codex authentication and needs no API key. | | `provider.anthropic.api_key` | `string` | - | Anthropic API key. Required when `backend: anthropic`. Accepts `env:` or `file:` references. | | `provider.anthropic.base_url` | `string` | - | Override the Anthropic API base URL (proxy, self-hosted gateway). | | `provider.gateway.api_key` / `base_url` | `string` | - | Credentials for an AI Gateway provider. Required when `backend: gateway`. | | `provider.vertex.project` | `string` | - | Google Cloud project ID hosting the Vertex AI endpoint. | | `provider.vertex.location` | `string` | - | Vertex AI region (for example `us-east5`). Required when the `vertex` block is present. | +Use `codex` when local Codex authentication should power **ktx** LLM work: + +```yaml +llm: + provider: + backend: codex + models: + default: gpt-5.5 +``` + ### Model roles `models` overrides the per-role model. Keys are fixed; values are diff --git a/docs-site/content/docs/guides/building-context.mdx b/docs-site/content/docs/guides/building-context.mdx index b806c424..52179e70 100644 --- a/docs-site/content/docs/guides/building-context.mdx +++ b/docs-site/content/docs/guides/building-context.mdx @@ -39,8 +39,20 @@ ktx ingest --all Enriched ingest needs a configured model and embeddings. Run `ktx setup` first; connections without that configuration fail before any work starts. -With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools for the -current run. +Local-auth backends keep provider credentials out of `ktx.yaml`: + +```bash +ktx setup --llm-backend claude-code --no-input +ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input +``` + +With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools +for the current run. With `codex`, **ktx** restricts the temporary runtime MCP +server to the current run's tool set, disables Codex web search, requests a +read-only sandbox, and sets `approval_policy=never`. The public Codex SDK and +CLI surface may still load user Codex config and built-in command execution or +read-only file capabilities, so use `claude-code` for stricter runtime tool +isolation. ## Query history diff --git a/docs-site/content/docs/guides/llm-configuration.mdx b/docs-site/content/docs/guides/llm-configuration.mdx index 880df24e..71ab9d80 100644 --- a/docs-site/content/docs/guides/llm-configuration.mdx +++ b/docs-site/content/docs/guides/llm-configuration.mdx @@ -16,6 +16,7 @@ Set `llm.provider.backend` to one of these values: - `gateway`: Use AI Gateway-compatible Anthropic model ids. - `claude-code`: Use your local Claude Code session through the Claude Agent SDK. **ktx** strips provider-routing environment variables from child processes. +- `codex`: Use your local Codex authentication through the Codex SDK. ## Claude Code @@ -47,6 +48,42 @@ model IDs are also accepted. metadata may still list host slash commands, skills, and subagents; **ktx** does not grant execution access to them. +## Codex backend + +Use `codex` when you want **ktx** to run LLM-backed workflows through your +local Codex authentication instead of a direct provider API key. + +```yaml +llm: + provider: + backend: codex + models: + default: gpt-5.5 +``` + +Configure it non-interactively: + +```bash +ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input +``` + +This is separate from Codex agent-client setup. `ktx setup --agents --target +codex` installs instructions and MCP access for an end-user Codex session. +`ktx setup --llm-backend codex` makes **ktx** itself execute ingest, scan +enrichment, memory, and other LLM-backed work through Codex. + +During runtime loops, **ktx** starts a temporary loopback MCP server for the +current run, exposes only the tools passed to that run, asks Codex to use a +read-only sandbox, sets `approval_policy=never`, auto-approves only those +run-scoped MCP tools, and disables Codex web search. + +Codex backend isolation is currently limited by the public Codex SDK and CLI +surface. Codex may still load user Codex config and built-in command execution +or read-only file capabilities. Use `llm.provider.backend: claude-code` when +you need stricter Claude-Code-style runtime tool isolation, or remove host +Codex MCP and tool config before running untrusted prompts through the `codex` +backend. + ## Prompt caching `llm.promptCaching` has partial parity on `claude-code`. Status and doctor warn diff --git a/knip.json b/knip.json index 270c2310..65b1a0a2 100644 --- a/knip.json +++ b/knip.json @@ -37,6 +37,9 @@ "@semantic-release/release-notes-generator", "conventional-changelog-conventionalcommits" ], + "ignore": [ + ".context/**" + ], "ignoreBinaries": [ "uv", "lsof" diff --git a/package.json b/package.json index fee7b745..a9590d70 100644 --- a/package.json +++ b/package.json @@ -32,6 +32,7 @@ "setup:dev": "node scripts/setup-dev.mjs", "release:published-smoke": "node scripts/published-package-smoke.mjs --require-config", "release:local-embeddings-smoke": "node scripts/local-embeddings-runtime-smoke.mjs --require-opt-in", + "release:codex-backend-smoke": "node scripts/codex-backend-live-smoke.mjs", "release:readiness": "node scripts/release-readiness.mjs", "release:update-version": "node scripts/update-public-release-version.mjs", "relationships:acquire-public-fixtures": "node scripts/acquire-public-benchmark-fixtures.mjs", diff --git a/packages/cli/package.json b/packages/cli/package.json index b04fceac..9d3af54c 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -56,6 +56,7 @@ "@looker/sdk-rtl": "^21.6.5", "@modelcontextprotocol/sdk": "^1.29.0", "@notionhq/client": "^5.22.0", + "@openai/codex-sdk": "^0.133.0", "ai": "^6.0.188", "better-sqlite3": "^12.10.0", "commander": "14.0.3", diff --git a/packages/cli/src/commands/setup-commands.ts b/packages/cli/src/commands/setup-commands.ts index 19f980bd..1619a80a 100644 --- a/packages/cli/src/commands/setup-commands.ts +++ b/packages/cli/src/commands/setup-commands.ts @@ -29,7 +29,7 @@ function embeddingBackend(value: string): 'openai' | 'sentence-transformers' { } function llmBackend(value: string): KtxSetupLlmBackend { - if (value === 'anthropic' || value === 'vertex' || value === 'claude-code') { + if (value === 'anthropic' || value === 'vertex' || value === 'claude-code' || value === 'codex') { return value; } throw new InvalidArgumentError(`invalid choice '${value}'`); diff --git a/packages/cli/src/context/ingest/local-bundle-runtime.ts b/packages/cli/src/context/ingest/local-bundle-runtime.ts index 77f4234e..9d6aba95 100644 --- a/packages/cli/src/context/ingest/local-bundle-runtime.ts +++ b/packages/cli/src/context/ingest/local-bundle-runtime.ts @@ -611,9 +611,10 @@ function nextLocalJobId(): string { function localIngestLlmProviderGuardMessage(projectDir: string): string { return [ - 'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.', - 'Configure a local Claude Code session or API-backed LLM, then rerun ingest:', + 'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.', + 'Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:', ` ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`, + ` ktx setup --project-dir ${projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`, ` ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`, ].join('\n'); } diff --git a/packages/cli/src/context/llm/codex-exec-events.ts b/packages/cli/src/context/llm/codex-exec-events.ts new file mode 100644 index 00000000..86e13694 --- /dev/null +++ b/packages/cli/src/context/llm/codex-exec-events.ts @@ -0,0 +1,194 @@ +import type { LlmTokenUsage, RunLoopStopReason } from './runtime-port.js'; + +export interface CodexExecEventSummary { + finalText: string; + stopReason: RunLoopStopReason; + usage: LlmTokenUsage; + stepCount: number; + stepBoundariesMs: number[]; + toolCallCount: number; + toolFailures: string[]; + error?: Error; +} + +interface CodexEventParseOptions { + startedAt?: number; + now?: () => number; +} + +function record(value: unknown): Record | undefined { + return value && typeof value === 'object' ? (value as Record) : undefined; +} + +/** + * Codex thread items that represent a discrete agent action consuming one loop + * step. The step budget caps the total number of these regardless of which + * capability the agent reaches for, so built-in `command_execution` (and any + * file/web action the public Codex surface still exposes) count alongside our + * own `mcp_tool_call` items rather than only the MCP ones. + */ +const AGENT_STEP_ITEM_TYPES = new Set(['command_execution', 'mcp_tool_call', 'file_change', 'web_search']); + +export function isCompletedAgentStep(event: unknown): boolean { + const eventRecord = record(event); + if (eventRecord?.type !== 'item.completed') { + return false; + } + const itemType = record(eventRecord.item)?.type; + return typeof itemType === 'string' && AGENT_STEP_ITEM_TYPES.has(itemType); +} + +function text(value: unknown): string | undefined { + return typeof value === 'string' && value.trim().length > 0 ? value : undefined; +} + +function numberValue(value: unknown): number | undefined { + return typeof value === 'number' && Number.isFinite(value) ? value : undefined; +} + +function usageFrom(value: unknown): LlmTokenUsage { + const usage = record(value); + if (!usage) { + return {}; + } + const inputTokens = numberValue(usage.input_tokens ?? usage.inputTokens); + const outputTokens = numberValue(usage.output_tokens ?? usage.outputTokens); + const explicitTotalTokens = numberValue(usage.total_tokens ?? usage.totalTokens); + const totalTokens = + explicitTotalTokens ?? + (inputTokens !== undefined && outputTokens !== undefined ? inputTokens + outputTokens : undefined); + return { + ...(inputTokens !== undefined ? { inputTokens } : {}), + ...(outputTokens !== undefined ? { outputTokens } : {}), + ...(totalTokens !== undefined ? { totalTokens } : {}), + }; +} + +function stopReasonFrom(value: unknown): RunLoopStopReason { + const reason = text(value)?.toLowerCase(); + if (reason && /(budget|max_turn|max-turn|limit)/.test(reason)) { + return 'budget'; + } + return 'natural'; +} + +function errorMessageFrom(value: unknown): string { + if (value instanceof Error) { + return value.message; + } + const asRecord = record(value); + const message = text(asRecord?.message); + return message ?? text(value) ?? 'Codex turn failed'; +} + +/** + * Codex serializes API failures as a JSON envelope inside the event message + * (e.g. `{"type":"error","status":400,"error":{"message":"…"}}`). Surface the + * human-readable inner message so callers don't leak raw JSON; pass plain + * strings through unchanged. + */ +function unwrapCodexApiErrorMessage(raw: string): string { + const trimmed = raw.trim(); + if (!trimmed.startsWith('{')) { + return raw; + } + try { + const parsed = record(JSON.parse(trimmed)); + return text(record(parsed?.error)?.message) ?? text(parsed?.message) ?? raw; + } catch { + return raw; + } +} + +/** @internal */ +export function parseCodexExecEventLine(line: string): unknown { + try { + return JSON.parse(line) as unknown; + } catch (error) { + throw new Error(`Codex JSONL event stream was malformed: ${error instanceof Error ? error.message : String(error)}`); + } +} + +export function summarizeCodexExecEvents( + events: Iterable, + options: CodexEventParseOptions = {}, +): CodexExecEventSummary { + const startedAt = options.startedAt ?? Date.now(); + const now = options.now ?? Date.now; + let finalText = ''; + let stopReason: RunLoopStopReason = 'natural'; + let usage: LlmTokenUsage = {}; + let turnCount = 0; + let completedStepCount = 0; + const stepBoundariesMs: number[] = []; + let toolCallCount = 0; + const toolFailures: string[] = []; + let error: Error | undefined; + + for (const event of events) { + const eventRecord = record(event); + const eventType = text(eventRecord?.type); + if (!eventRecord || !eventType) { + continue; + } + + if (eventType === 'turn.started') { + turnCount += 1; + continue; + } + + const item = record(eventRecord.item); + const itemType = text(item?.type); + + if (eventType === 'item.started' && itemType === 'mcp_tool_call') { + toolCallCount += 1; + continue; + } + + if (isCompletedAgentStep(event)) { + completedStepCount += 1; + stepBoundariesMs.push(now() - startedAt); + // Only MCP tool calls fail the loop: a non-zero `command_execution` exit + // is normal agent exploration, not a runtime error. `status` is the + // authoritative signal (the SDK always sets it); the SDK also serializes + // `error: null` on successful calls, so an explicit-null `error` must NOT + // be read as a failure — only a populated error object counts. + if (itemType === 'mcp_tool_call' && (item?.status === 'failed' || (item?.error !== undefined && item?.error !== null))) { + const name = text(item?.name) ?? text(item?.tool) ?? text(item?.tool_name) ?? 'unknown'; + toolFailures.push(`${name}: ${errorMessageFrom(item?.error)}`); + } + continue; + } + + if (eventType === 'item.completed' && itemType === 'agent_message') { + finalText = text(item?.text) ?? finalText; + continue; + } + + if (eventType === 'turn.completed') { + usage = usageFrom(eventRecord.usage); + if (completedStepCount === 0) { + stepBoundariesMs.push(now() - startedAt); + } + stopReason = stopReasonFrom(eventRecord.reason ?? eventRecord.stop_reason ?? eventRecord.terminal_reason); + continue; + } + + if (eventType === 'turn.failed' || eventType === 'error') { + stopReason = 'error'; + error = new Error(unwrapCodexApiErrorMessage(errorMessageFrom(eventRecord.error ?? eventRecord.message))); + continue; + } + } + + return { + finalText, + stopReason, + usage, + stepCount: completedStepCount > 0 ? completedStepCount : turnCount, + stepBoundariesMs, + toolCallCount, + toolFailures, + ...(error ? { error } : {}), + }; +} diff --git a/packages/cli/src/context/llm/codex-isolation.ts b/packages/cli/src/context/llm/codex-isolation.ts new file mode 100644 index 00000000..d54ac1f8 --- /dev/null +++ b/packages/cli/src/context/llm/codex-isolation.ts @@ -0,0 +1,9 @@ +export const CODEX_ISOLATION_WARNING = + 'Codex backend isolation is limited by the public Codex SDK/CLI surface: ktx restricts the runtime MCP server to the current ktx tool set, disables Codex web search, asks for a read-only sandbox, and sets approval_policy=never, but Codex may still load user Codex config and built-in command execution or read-only file capabilities.'; + +export const CODEX_ISOLATION_WARNING_FIX = + 'Use llm.provider.backend: claude-code when you need stricter Claude-Code-style runtime tool isolation, or remove host Codex MCP/tool config before running untrusted prompts through the codex backend.'; + +export function formatCodexIsolationWarning(): string { + return `${CODEX_ISOLATION_WARNING} ${CODEX_ISOLATION_WARNING_FIX}`; +} diff --git a/packages/cli/src/context/llm/codex-mcp-runtime-server.ts b/packages/cli/src/context/llm/codex-mcp-runtime-server.ts new file mode 100644 index 00000000..eacf28f9 --- /dev/null +++ b/packages/cli/src/context/llm/codex-mcp-runtime-server.ts @@ -0,0 +1,87 @@ +import { randomBytes } from 'node:crypto'; +import type { Server } from 'node:http'; +import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; +import type { KtxMcpServerLike } from '../mcp/types.js'; +import { runKtxMcpHttpServer, type KtxMcpHttpServerHandle } from '../../mcp-http-server.js'; +import type { KtxRuntimeToolSet } from './runtime-port.js'; +import { normalizeKtxRuntimeToolOutput } from './runtime-tools.js'; + +/** @internal */ +export interface CreateCodexRuntimeMcpServerInput { + server?: KtxMcpServerLike; + toolSet: KtxRuntimeToolSet; +} + +export interface CodexRuntimeMcpServerHandle { + url: string; + bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN'; + bearerToken: string; + close(): Promise; +} + +type RunServer = typeof runKtxMcpHttpServer; + +export interface StartCodexRuntimeMcpServerInput { + projectDir: string; + toolSet: KtxRuntimeToolSet; + runServer?: RunServer; +} + +/** @internal */ +export function createCodexRuntimeMcpServer(input: CreateCodexRuntimeMcpServerInput): KtxMcpServerLike { + const server = + input.server ?? + (new McpServer({ + name: 'ktx-runtime', + version: '0.0.0', + }) as KtxMcpServerLike); + + for (const descriptor of Object.values(input.toolSet)) { + server.registerTool( + descriptor.name, + { + description: descriptor.description, + inputSchema: descriptor.inputSchema.shape, + }, + async (toolInput) => { + const normalized = normalizeKtxRuntimeToolOutput(await descriptor.execute(toolInput)); + return { + content: [{ type: 'text', text: normalized.markdown }], + ...(normalized.structured !== undefined && normalized.structured !== null && typeof normalized.structured === 'object' + ? { structuredContent: normalized.structured as object } + : {}), + }; + }, + ); + } + + return server; +} + +function serverPort(server: Server, fallback: number): number { + const address = server.address(); + return typeof address === 'object' && address ? address.port : fallback; +} + +export async function startCodexRuntimeMcpServer( + input: StartCodexRuntimeMcpServerInput, +): Promise { + const bearerToken = randomBytes(32).toString('hex'); + const runServer = input.runServer ?? runKtxMcpHttpServer; + const handle = (await runServer({ + projectDir: input.projectDir, + host: '127.0.0.1', + port: 0, + token: bearerToken, + allowedHosts: ['127.0.0.1', 'localhost'], + allowedOrigins: [], + createMcpServer: () => createCodexRuntimeMcpServer({ toolSet: input.toolSet }) as McpServer, + })) as KtxMcpHttpServerHandle; + const port = serverPort(handle.server, 0); + return { + url: `http://127.0.0.1:${port}/mcp`, + bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN', + bearerToken, + close: () => handle.close(), + }; +} diff --git a/packages/cli/src/context/llm/codex-models.ts b/packages/cli/src/context/llm/codex-models.ts new file mode 100644 index 00000000..1a8b9b9d --- /dev/null +++ b/packages/cli/src/context/llm/codex-models.ts @@ -0,0 +1,20 @@ +export const DEFAULT_CODEX_MODEL = 'gpt-5.5'; + +const CODEX_MODEL_ALIASES: Record = { + codex: DEFAULT_CODEX_MODEL, + default: DEFAULT_CODEX_MODEL, +}; + +const EXPLICIT_CODEX_MODEL_ID = /^(?:gpt|codex)-[a-z0-9][a-z0-9._-]*$/i; + +export function resolveCodexModel(model: string): string { + const normalized = model.trim(); + const alias = CODEX_MODEL_ALIASES[normalized]; + if (alias) { + return alias; + } + if (EXPLICIT_CODEX_MODEL_ID.test(normalized)) { + return normalized; + } + throw new Error(`Unsupported Codex model "${model}". Use codex, default, or a gpt-* / codex-* model id.`); +} diff --git a/packages/cli/src/context/llm/codex-runtime-config.ts b/packages/cli/src/context/llm/codex-runtime-config.ts new file mode 100644 index 00000000..74de9efe --- /dev/null +++ b/packages/cli/src/context/llm/codex-runtime-config.ts @@ -0,0 +1,38 @@ +interface CodexRuntimeMcpConfig { + url: string; + bearerTokenEnvVar: string; + bearerToken: string; + toolNames: string[]; +} + +export interface BuildCodexRuntimeConfigInput { + model: string; + mcp?: CodexRuntimeMcpConfig; +} + +export interface CodexRuntimeConfig { + configOverrides: Record; + env: Record; +} + +export function buildCodexRuntimeConfig(input: BuildCodexRuntimeConfigInput): CodexRuntimeConfig { + const configOverrides: Record = { + history: { persistence: 'none' }, + }; + const env: Record = {}; + + if (input.mcp) { + configOverrides.mcp_servers = { + ktx: { + url: input.mcp.url, + bearer_token_env_var: input.mcp.bearerTokenEnvVar, + enabled_tools: input.mcp.toolNames, + default_tools_approval_mode: 'approve', + required: true, + }, + }; + env[input.mcp.bearerTokenEnvVar] = input.mcp.bearerToken; + } + + return { configOverrides, env }; +} diff --git a/packages/cli/src/context/llm/codex-runtime.ts b/packages/cli/src/context/llm/codex-runtime.ts new file mode 100644 index 00000000..3535072b --- /dev/null +++ b/packages/cli/src/context/llm/codex-runtime.ts @@ -0,0 +1,371 @@ +import { z } from 'zod'; +import { noopLogger, type KtxLogger } from '../core/config.js'; +import { isCompletedAgentStep, summarizeCodexExecEvents, type CodexExecEventSummary } from './codex-exec-events.js'; +import { + startCodexRuntimeMcpServer, + type CodexRuntimeMcpServerHandle, +} from './codex-mcp-runtime-server.js'; +import { resolveCodexModel } from './codex-models.js'; +import { buildCodexRuntimeConfig } from './codex-runtime-config.js'; +import { CodexSdkCliRunner, type CodexSdkRunner } from './codex-sdk-runner.js'; +import type { + KtxGenerateObjectInput, + KtxGenerateTextInput, + KtxLlmRuntimePort, + KtxRuntimeToolSet, + LlmTokenUsage, + RunLoopParams, + RunLoopResult, +} from './runtime-port.js'; + +export interface CodexKtxLlmRuntimeDeps { + projectDir: string; + modelSlots: { default: string } & Partial>; + runner?: CodexSdkRunner; + startMcpServer?: (input: { projectDir: string; toolSet: KtxRuntimeToolSet }) => Promise; + logger?: KtxLogger; +} + +function modelForRole(modelSlots: CodexKtxLlmRuntimeDeps['modelSlots'], role: string): string { + return resolveCodexModel(modelSlots[role] ?? modelSlots.default); +} + +function promptWithSystem(system: string | undefined, prompt: string): string { + return [system, prompt].filter(Boolean).join('\n\n'); +} + +interface CollectCodexEventsOptions { + stepBudget?: number; + abortController?: AbortController; + onStep?: (stepIndex: number) => void | Promise; +} + +interface CollectCodexEventsResult { + events: unknown[]; + budgetExceeded: boolean; + streamError?: Error; +} + +function eventRecord(value: unknown): Record | undefined { + return value && typeof value === 'object' ? (value as Record) : undefined; +} + +function isTurnCompleted(event: unknown): boolean { + return eventRecord(event)?.type === 'turn.completed'; +} + +/** + * Drains the Codex stream once, emitting a step as each agent action completes + * so callers see live progress and the step budget is enforced mid-run. Every + * completed agent-action item counts (see {@link isCompletedAgentStep}), so + * built-in `command_execution` steps decrement the budget the same as + * `mcp_tool_call`s. A turn that produced no actions still counts as one step, + * matching the metrics summary and the AI SDK backend. + */ +async function collectEvents( + events: AsyncIterable, + options: CollectCodexEventsOptions = {}, +): Promise { + const collected: unknown[] = []; + let completedSteps = 0; + let sawActionStep = false; + let budgetExceeded = false; + let streamError: Error | undefined; + + // The SDK yields every stdout event, then throws on a non-zero codex exec + // exit. Catch that throw so the events already collected (which carry the + // real `turn.failed`/`error` reason) survive for the summary; the masked + // exit message is kept only as a fallback when no error event was emitted. + try { + for await (const event of events) { + collected.push(event); + + const isActionStep = isCompletedAgentStep(event); + if (isActionStep) { + sawActionStep = true; + } else if (sawActionStep || !isTurnCompleted(event)) { + // Only fall back to counting a bare turn as a step when the turn produced + // no agent actions; a completed turn is terminal, so it never aborts. + continue; + } + + completedSteps += 1; + await options.onStep?.(completedSteps); + if (isActionStep && options.stepBudget !== undefined && completedSteps >= options.stepBudget) { + budgetExceeded = true; + options.abortController?.abort(); + break; + } + } + } catch (error) { + streamError = error instanceof Error ? error : new Error(String(error)); + } + + return { events: collected, budgetExceeded, ...(streamError ? { streamError } : {}) }; +} + +function metrics(summary: CodexExecEventSummary, startedAt: number): { totalMs: number; usage: LlmTokenUsage } { + return { totalMs: Date.now() - startedAt, usage: summary.usage }; +} + +function summaryError(summary: CodexExecEventSummary, streamError?: Error): Error | undefined { + // A `turn.failed`/`error` event carries the real reason; prefer it over the + // SDK's generic non-zero-exit throw. Fall back to the stream error only when + // no event explained the failure (e.g. spawn failure or auth before a turn). + if (summary.error) { + return summary.error; + } + if (summary.toolFailures.length > 0) { + return new Error(`Codex runtime tool call failed: ${summary.toolFailures.join('; ')}`); + } + return streamError; +} + +function assertSuccessfulText(summary: CodexExecEventSummary, streamError?: Error): string { + const error = summaryError(summary, streamError); + if (error) { + throw error; + } + if (!summary.finalText.trim()) { + throw new Error('Codex completed without an agent message'); + } + return summary.finalText; +} + +function parseStructuredOutput>(schema: TSchema, text: string): TOutput { + try { + return schema.parse(JSON.parse(text)); + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + throw new Error(`Codex structured output failed validation: ${message}`); + } +} + +async function mcpForTools(input: { + projectDir: string; + toolSet?: KtxRuntimeToolSet; + startMcpServer: CodexKtxLlmRuntimeDeps['startMcpServer']; +}): Promise { + if (!input.toolSet || Object.keys(input.toolSet).length === 0) { + return undefined; + } + return (input.startMcpServer ?? startCodexRuntimeMcpServer)({ + projectDir: input.projectDir, + toolSet: input.toolSet, + }); +} + +function runtimeToolNames(toolSet: KtxRuntimeToolSet | undefined): string[] { + return Object.values(toolSet ?? {}).map((descriptor) => descriptor.name); +} + +export class CodexKtxLlmRuntime implements KtxLlmRuntimePort { + private readonly runner: CodexSdkRunner; + private readonly logger: KtxLogger; + + constructor(private readonly deps: CodexKtxLlmRuntimeDeps) { + this.runner = deps.runner ?? new CodexSdkCliRunner(); + this.logger = deps.logger ?? noopLogger; + } + + async generateText(input: KtxGenerateTextInput): Promise { + const startedAt = Date.now(); + const model = modelForRole(this.deps.modelSlots, input.role); + const mcp = await mcpForTools({ + projectDir: this.deps.projectDir, + toolSet: input.tools, + startMcpServer: this.deps.startMcpServer, + }); + try { + const config = buildCodexRuntimeConfig({ + model, + ...(mcp + ? { + mcp: { + url: mcp.url, + bearerTokenEnvVar: mcp.bearerTokenEnvVar, + bearerToken: mcp.bearerToken, + toolNames: runtimeToolNames(input.tools), + }, + } + : {}), + }); + const collected = await collectEvents( + await this.runner.runStreamed({ + projectDir: this.deps.projectDir, + model, + prompt: promptWithSystem(input.system, input.prompt), + configOverrides: config.configOverrides, + env: config.env, + }), + ); + const summary = summarizeCodexExecEvents(collected.events, { startedAt }); + input.onMetrics?.(metrics(summary, startedAt)); + return assertSuccessfulText(summary, collected.streamError); + } finally { + await mcp?.close(); + } + } + + async generateObject>( + input: KtxGenerateObjectInput, + ): Promise { + const startedAt = Date.now(); + const model = modelForRole(this.deps.modelSlots, input.role); + const mcp = await mcpForTools({ + projectDir: this.deps.projectDir, + toolSet: input.tools, + startMcpServer: this.deps.startMcpServer, + }); + try { + const config = buildCodexRuntimeConfig({ + model, + ...(mcp + ? { + mcp: { + url: mcp.url, + bearerTokenEnvVar: mcp.bearerTokenEnvVar, + bearerToken: mcp.bearerToken, + toolNames: runtimeToolNames(input.tools), + }, + } + : {}), + }); + const collected = await collectEvents( + await this.runner.runStreamed({ + projectDir: this.deps.projectDir, + model, + prompt: promptWithSystem(input.system, input.prompt), + configOverrides: config.configOverrides, + env: config.env, + outputSchema: z.toJSONSchema(input.schema, { target: 'draft-7' }) as Record, + }), + ); + const summary = summarizeCodexExecEvents(collected.events, { startedAt }); + input.onMetrics?.(metrics(summary, startedAt)); + return parseStructuredOutput(input.schema, assertSuccessfulText(summary, collected.streamError)); + } finally { + await mcp?.close(); + } + } + + async runAgentLoop(params: RunLoopParams): Promise { + const startedAt = Date.now(); + const model = modelForRole(this.deps.modelSlots, params.modelRole); + let mcp: CodexRuntimeMcpServerHandle | undefined; + try { + mcp = await mcpForTools({ + projectDir: this.deps.projectDir, + toolSet: params.toolSet, + startMcpServer: this.deps.startMcpServer, + }); + const config = buildCodexRuntimeConfig({ + model, + ...(mcp + ? { + mcp: { + url: mcp.url, + bearerTokenEnvVar: mcp.bearerTokenEnvVar, + bearerToken: mcp.bearerToken, + toolNames: runtimeToolNames(params.toolSet), + }, + } + : {}), + }); + const abortController = new AbortController(); + const onStep = async (stepIndex: number): Promise => { + try { + await params.onStepFinish?.({ stepIndex, stepBudget: params.stepBudget }); + } catch (error) { + this.logger.warn( + `[codex-runner] onStepFinish callback threw; ignoring: ${error instanceof Error ? error.message : String(error)}`, + ); + } + }; + const collected = await collectEvents( + await this.runner.runStreamed({ + projectDir: this.deps.projectDir, + model, + prompt: promptWithSystem(params.systemPrompt, params.userPrompt), + configOverrides: config.configOverrides, + env: config.env, + signal: abortController.signal, + }), + { stepBudget: params.stepBudget, abortController, onStep }, + ); + const summary = summarizeCodexExecEvents(collected.events, { startedAt }); + const error = summaryError(summary, collected.streamError); + const stopReason = collected.budgetExceeded ? 'budget' : error ? 'error' : summary.stopReason; + return { + stopReason, + ...(stopReason === 'error' && error ? { error } : {}), + metrics: { + totalMs: Date.now() - startedAt, + usage: summary.usage, + stepCount: summary.stepCount, + stepBoundariesMs: summary.stepBoundariesMs, + }, + }; + } catch (error) { + const err = error instanceof Error ? error : new Error(String(error)); + return { + stopReason: 'error', + error: err, + metrics: { totalMs: Date.now() - startedAt, usage: {}, stepCount: 0, stepBoundariesMs: [] }, + }; + } finally { + await mcp?.close(); + } + } +} + +// A rejected model is not an auth failure: Codex authenticated, connected, and +// the API refused the model id. These markers come from the API error envelope +// (e.g. "model is not supported", "invalid_request_error"). +const MODEL_UNAVAILABLE_MARKERS = + /\bnot supported\b|\bnot available\b|\bdoes not exist\b|invalid_request_error|\bunknown model\b|\bunsupported model\b/i; + +function describeCodexProbeFailure(model: string, message: string): { message: string; fix: string } { + if (MODEL_UNAVAILABLE_MARKERS.test(message)) { + const fix = `Run \`codex\` to see the models your account supports, then set llm.models.default in ktx.yaml (or rerun \`ktx setup\`).`; + return { + message: `Codex is authenticated, but the configured model "${model}" is not available for this Codex account. ${fix} Details: ${message}`, + fix, + }; + } + const fix = `Authenticate Codex locally with the Codex CLI, verify the Codex CLI is installed, then rerun setup or \`ktx status\`.`; + return { + message: `Codex authentication is not usable. ${fix} Details: ${message}`, + fix, + }; +} + +export async function runCodexAuthProbe(input: { + projectDir: string; + model: string; + runner?: CodexSdkRunner; +}): Promise<{ ok: true } | { ok: false; message: string; fix: string }> { + let model: string; + try { + model = resolveCodexModel(input.model); + } catch (error) { + return { + ok: false, + message: error instanceof Error ? error.message : String(error), + fix: 'Set llm.models.default in ktx.yaml to a supported codex model (codex, default, or a gpt-* / codex-* id), or rerun `ktx setup`.', + }; + } + + const runtime = new CodexKtxLlmRuntime({ + projectDir: input.projectDir, + modelSlots: { default: model }, + ...(input.runner ? { runner: input.runner } : {}), + }); + try { + await runtime.generateText({ role: 'default', prompt: 'Reply with exactly: ok' }); + return { ok: true }; + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + return { ok: false, ...describeCodexProbeFailure(model, message) }; + } +} diff --git a/packages/cli/src/context/llm/codex-sdk-runner.ts b/packages/cli/src/context/llm/codex-sdk-runner.ts new file mode 100644 index 00000000..58170b3a --- /dev/null +++ b/packages/cli/src/context/llm/codex-sdk-runner.ts @@ -0,0 +1,96 @@ +import { Codex, type CodexOptions, type ThreadOptions, type TurnOptions } from '@openai/codex-sdk'; + +export interface CodexSdkRunnerInput { + projectDir: string; + model: string; + prompt: string; + configOverrides?: Record; + env?: Record; + outputSchema?: Record; + signal?: AbortSignal; +} + +export interface CodexSdkRunner { + runStreamed(input: CodexSdkRunnerInput): Promise>; +} + +type CodexThread = { + runStreamed(input: string, turnOptions?: TurnOptions): Promise<{ events: AsyncIterable }>; +}; + +type CodexClient = { + startThread(options: ThreadOptions): CodexThread; +}; + +type CodexConstructor = new (options?: CodexOptions) => CodexClient; + +export interface CodexSdkCliRunnerOptions { + envBase?: NodeJS.ProcessEnv; + codexPathOverride?: string; +} + +const CODEX_ENV_ALLOWLIST = new Set([ + 'HOME', + 'USERPROFILE', + 'APPDATA', + 'LOCALAPPDATA', + 'XDG_CONFIG_HOME', + 'CODEX_HOME', + 'CODEX_API_KEY', + 'OPENAI_API_KEY', + 'PATH', + 'Path', + 'SYSTEMROOT', + 'COMSPEC', + 'TMPDIR', + 'TMP', + 'TEMP', + 'SSL_CERT_FILE', + 'SSL_CERT_DIR', + 'NODE_EXTRA_CA_CERTS', + 'HTTPS_PROXY', + 'HTTP_PROXY', + 'ALL_PROXY', + 'NO_PROXY', +]); + +function buildCodexSdkEnv(baseEnv: NodeJS.ProcessEnv, overrides: Record | undefined): Record { + const env: Record = {}; + for (const key of CODEX_ENV_ALLOWLIST) { + const value = baseEnv[key]; + if (typeof value === 'string') { + env[key] = value; + } + } + return { ...env, ...(overrides ?? {}) }; +} + +export class CodexSdkCliRunner implements CodexSdkRunner { + constructor(private readonly options: CodexSdkCliRunnerOptions = {}) {} + + async runStreamed(input: CodexSdkRunnerInput): Promise> { + const CodexClass = Codex as CodexConstructor; + const codex = new CodexClass({ + ...(input.configOverrides ? { config: input.configOverrides as CodexOptions['config'] } : {}), + env: buildCodexSdkEnv(this.options.envBase ?? process.env, input.env), + ...(this.options.codexPathOverride ? { codexPathOverride: this.options.codexPathOverride } : {}), + }); + const thread = codex.startThread({ + workingDirectory: input.projectDir, + skipGitRepoCheck: true, + model: input.model, + sandboxMode: 'read-only', + webSearchMode: 'disabled', + approvalPolicy: 'never', + }); + const turnOptions: TurnOptions = { + ...(input.outputSchema ? { outputSchema: input.outputSchema } : {}), + ...(input.signal ? { signal: input.signal } : {}), + }; + const streamed = await thread.runStreamed( + input.prompt, + Object.keys(turnOptions).length > 0 ? turnOptions : undefined, + ); + return streamed.events; + } +} diff --git a/packages/cli/src/context/llm/local-config.ts b/packages/cli/src/context/llm/local-config.ts index c64a85cf..58bd29a5 100644 --- a/packages/cli/src/context/llm/local-config.ts +++ b/packages/cli/src/context/llm/local-config.ts @@ -5,6 +5,7 @@ import { resolveKtxConfigReference } from '../core/config-reference.js'; import type { KtxProjectEmbeddingConfig, KtxProjectLlmConfig } from '../project/config.js'; import { AiSdkKtxLlmRuntime } from './ai-sdk-runtime.js'; import { ClaudeCodeKtxLlmRuntime } from './claude-code-runtime.js'; +import { CodexKtxLlmRuntime } from './codex-runtime.js'; import type { KtxLlmRuntimePort } from './runtime-port.js'; interface LocalConfigDeps { @@ -13,6 +14,7 @@ interface LocalConfigDeps { createKtxLlmProvider?: typeof createKtxLlmProvider; createKtxEmbeddingProvider?: typeof createKtxEmbeddingProvider; createClaudeCodeRuntime?: (deps: ConstructorParameters[0]) => KtxLlmRuntimePort; + createCodexRuntime?: (deps: ConstructorParameters[0]) => KtxLlmRuntimePort; createAiSdkRuntime?: (deps: { llmProvider: KtxLlmProvider }) => KtxLlmRuntimePort; } @@ -104,7 +106,7 @@ export function createLocalKtxLlmProviderFromConfig( deps: LocalConfigDeps = {}, ): KtxLlmProvider | null { const resolved = resolveLocalKtxLlmConfig(config, deps.env ?? process.env); - if (!resolved || resolved.backend === 'claude-code') { + if (!resolved || resolved.backend === 'claude-code' || resolved.backend === 'codex') { return null; } return (deps.createKtxLlmProvider ?? createKtxLlmProvider)(resolved); @@ -129,6 +131,16 @@ export function createLocalKtxLlmRuntimeFromConfig( env: deps.env, }); } + if (resolved.backend === 'codex') { + const projectDir = deps.projectDir; + if (!projectDir) { + throw new Error('projectDir is required when creating the codex LLM runtime'); + } + return (deps.createCodexRuntime ?? ((runtimeDeps) => new CodexKtxLlmRuntime(runtimeDeps)))({ + projectDir, + modelSlots: resolved.modelSlots, + }); + } const llmProvider = (deps.createKtxLlmProvider ?? createKtxLlmProvider)(resolved); return (deps.createAiSdkRuntime ?? ((runtimeDeps) => new AiSdkKtxLlmRuntime(runtimeDeps)))({ llmProvider }); } diff --git a/packages/cli/src/context/project/config.ts b/packages/cli/src/context/project/config.ts index a8d38d1d..cbea79b6 100644 --- a/packages/cli/src/context/project/config.ts +++ b/packages/cli/src/context/project/config.ts @@ -3,7 +3,7 @@ import YAML from 'yaml'; import * as z from 'zod'; import { connectionConfigSchema } from './driver-schemas.js'; -const KTX_LLM_BACKENDS = ['none', 'anthropic', 'vertex', 'gateway', 'claude-code'] as const; +const KTX_LLM_BACKENDS = ['none', 'anthropic', 'vertex', 'gateway', 'claude-code', 'codex'] as const; const KTX_EMBEDDING_BACKENDS = ['none', 'openai', 'sentence-transformers'] as const; const KTX_PROMPT_CACHE_TTLS = ['5m', '1h'] as const; const KTX_ENRICHMENT_MODES = ['none', 'deterministic', 'llm'] as const; @@ -38,7 +38,7 @@ const llmProviderSchema = z .enum(KTX_LLM_BACKENDS) .default('none') .describe( - 'LLM provider backend. "none" disables LLM features; "anthropic" / "vertex" / "gateway" require the matching nested credentials block; "claude-code" uses the local Claude Code session.', + 'LLM provider backend. "none" disables LLM features; "anthropic" / "vertex" / "gateway" require the matching nested credentials block; "claude-code" uses the local Claude Code session; "codex" uses the local Codex session.', ), vertex: vertexProviderSchema.optional().describe('Vertex AI credentials, used when backend is "vertex".'), anthropic: apiCredentialsSchema.optional().describe('Anthropic API credentials, used when backend is "anthropic".'), diff --git a/packages/cli/src/llm/types.ts b/packages/cli/src/llm/types.ts index 3f7f67e2..a190b1c0 100644 --- a/packages/cli/src/llm/types.ts +++ b/packages/cli/src/llm/types.ts @@ -3,7 +3,7 @@ import type { LanguageModel, TelemetrySettings, ToolCallRepairFunction, ToolSet export const KTX_MODEL_ROLES = ['default', 'triage', 'candidateExtraction', 'curator', 'reconcile', 'repair'] as const; export type KtxModelRole = (typeof KTX_MODEL_ROLES)[number]; -type KtxLlmBackend = 'anthropic' | 'vertex' | 'gateway' | 'claude-code'; +type KtxLlmBackend = 'anthropic' | 'vertex' | 'gateway' | 'claude-code' | 'codex'; export type KtxPromptCacheTtl = '5m' | '1h'; type KtxJsonValue = diff --git a/packages/cli/src/setup-models.ts b/packages/cli/src/setup-models.ts index 041eef5c..8e8cf30b 100644 --- a/packages/cli/src/setup-models.ts +++ b/packages/cli/src/setup-models.ts @@ -3,6 +3,9 @@ import { writeFile } from 'node:fs/promises'; import { promisify } from 'node:util'; import { resolveLocalKtxLlmConfig } from './context/llm/local-config.js'; import { runClaudeCodeAuthProbe } from './context/llm/claude-code-runtime.js'; +import { formatCodexIsolationWarning } from './context/llm/codex-isolation.js'; +import { runCodexAuthProbe } from './context/llm/codex-runtime.js'; +import { DEFAULT_CODEX_MODEL } from './context/llm/codex-models.js'; import { resolveKtxConfigReference } from './context/core/config-reference.js'; import { type KtxProjectConfig, type KtxProjectLlmConfig, serializeKtxProjectConfig } from './context/project/config.js'; import { loadKtxProject } from './context/project/project.js'; @@ -56,7 +59,7 @@ export interface AnthropicModelChoice { recommended: boolean; } -export type KtxSetupLlmBackend = 'anthropic' | 'vertex' | 'claude-code'; +export type KtxSetupLlmBackend = 'anthropic' | 'vertex' | 'claude-code' | 'codex'; /** @internal */ export interface KtxSetupModelPromptAdapter { @@ -82,6 +85,7 @@ export interface KtxSetupModelDeps { model: string; env?: NodeJS.ProcessEnv; }) => Promise<{ ok: true } | { ok: false; message: string }>; + codexAuthProbe?: (input: { projectDir: string; model: string }) => Promise<{ ok: true } | { ok: false; message: string }>; readGcloudProject?: () => Promise; listGcloudProjects?: () => Promise; spinner?: () => KtxCliSpinner; @@ -110,6 +114,20 @@ const CLAUDE_CODE_MODELS: AnthropicModelChoice[] = [ { id: 'haiku', label: 'Claude Haiku', recommended: false }, ]; +// Curated Codex models from OpenAI's current lineup that work under both +// ChatGPT-account (subscription) and API-key auth. Intentionally omitted: +// the `*-codex` ids (e.g. gpt-5.3-codex, gpt-5.2-codex) are API-key-only and +// fail on ChatGPT-account auth, and gpt-5.3-codex-spark is a ChatGPT-Pro-only +// research preview. Codex resolves real availability per account at runtime +// (its binary remote-fetches the model list), so this is a convenience +// shortlist only — the manual-entry option accepts any id your account's +// `codex` picker exposes, and the auth probe reports an unsupported choice. +const CODEX_MODELS: AnthropicModelChoice[] = [ + { id: 'gpt-5.5', label: 'GPT-5.5', recommended: true }, + { id: 'gpt-5.4', label: 'GPT-5.4', recommended: false }, + { id: 'gpt-5.4-mini', label: 'GPT-5.4 mini', recommended: false }, +]; + const HIDDEN_ANTHROPIC_MODEL_PATTERNS = [ /^claude-sonnet-4$/i, /^claude-opus-4$/i, @@ -272,7 +290,12 @@ export function isKtxSetupLlmConfigReady(config: KtxProjectLlmConfig): boolean { return typeof resolved.vertex?.location === 'string' && resolved.vertex.location.trim().length > 0; } - return resolved.backend === 'anthropic' || resolved.backend === 'gateway' || resolved.backend === 'claude-code'; + return ( + resolved.backend === 'anthropic' || + resolved.backend === 'gateway' || + resolved.backend === 'claude-code' || + resolved.backend === 'codex' + ); } function hasUsableConfiguredLlm(config: KtxProjectConfig): boolean { @@ -284,7 +307,8 @@ function buildProjectLlmConfig( provider: | { backend: 'anthropic'; credentialRef: string } | { backend: 'vertex'; vertex: { project?: string; location: string } } - | { backend: 'claude-code' }, + | { backend: 'claude-code' } + | { backend: 'codex' }, model: string, ): KtxProjectLlmConfig { if (provider.backend === 'claude-code') { @@ -295,6 +319,14 @@ function buildProjectLlmConfig( }; } + if (provider.backend === 'codex') { + return { + provider: { backend: 'codex' }, + models: { ...existing.models, default: model }, + promptCaching: existing.promptCaching, + }; + } + if (provider.backend === 'vertex') { return { provider: { @@ -515,6 +547,7 @@ async function chooseBackend( message: 'Which LLM provider should KTX use?', options: [ { value: 'claude-code', label: 'Claude subscription (Pro/Max)' }, + { value: 'codex', label: 'Codex subscription' }, { value: 'anthropic', label: 'Anthropic API key' }, { value: 'vertex', label: 'Google Vertex AI for Anthropic Claude' }, { value: 'back', label: 'Back' }, @@ -525,7 +558,7 @@ async function chooseBackend( } return { status: 'ready', - backend: choice === 'vertex' || choice === 'claude-code' ? choice : 'anthropic', + backend: choice === 'vertex' || choice === 'claude-code' || choice === 'codex' ? choice : 'anthropic', prompted: true, }; } @@ -884,12 +917,51 @@ async function chooseClaudeCodeModel(args: KtxSetupModelArgs, deps: KtxSetupMode return { status: 'ready', model: choice }; } +async function chooseCodexModel(args: KtxSetupModelArgs, deps: KtxSetupModelDeps): Promise { + const providedModel = requestedModel(args); + if (providedModel) { + return { status: 'ready', model: providedModel }; + } + if (args.inputMode === 'disabled') { + return { status: 'ready', model: DEFAULT_CODEX_MODEL }; + } + + const prompts = deps.prompts ?? createPromptAdapter(); + const choice = await prompts.select({ + message: `Which Codex model should KTX use?\n\n${ANTHROPIC_MODEL_PROMPT_CONTEXT}`, + options: [ + ...CODEX_MODELS.map((model) => ({ + value: model.id, + label: model.label, + ...(model.recommended ? { hint: 'recommended' } : {}), + })), + { value: 'manual', label: 'Enter a Codex model ID manually' }, + { value: 'back', label: 'Back' }, + ], + }); + if (choice === 'back') { + return { status: 'back' }; + } + if (choice === 'manual') { + const manual = await prompts.text({ + message: withTextInputNavigation('Codex model ID'), + placeholder: CODEX_MODELS.find((model) => model.recommended)?.id ?? CODEX_MODELS[0]?.id, + }); + if (manual === undefined) { + return { status: 'back' }; + } + return manual.trim() ? { status: 'ready', model: manual.trim() } : { status: 'missing-input' }; + } + return { status: 'ready', model: choice }; +} + async function persistLlmConfig( projectDir: string, provider: | { backend: 'anthropic'; credentialRef: string } | { backend: 'vertex'; vertex: { project?: string; location: string } } - | { backend: 'claude-code' }, + | { backend: 'claude-code' } + | { backend: 'codex' }, model: string, ): Promise { const project = await loadKtxProject({ projectDir }); @@ -1031,6 +1103,32 @@ export async function runKtxSetupAnthropicModelStep( return { status: 'ready', projectDir: args.projectDir }; } + if (backendChoice.backend === 'codex') { + const model = await chooseCodexModel(backendArgs, deps); + if (model.status === 'back' && backendChoice.prompted) { + attemptArgs = buildInteractiveRetryArgs(args); + continue; + } + if (model.status === 'invalid-credential') { + return { status: 'failed', projectDir: args.projectDir }; + } + if (model.status !== 'ready') { + return { status: model.status, projectDir: args.projectDir }; + } + const probe = deps.codexAuthProbe ?? runCodexAuthProbe; + const health = await probe({ projectDir: args.projectDir, model: model.model }); + if (!health.ok) { + io.stderr.write(`${health.message}\n`); + return { status: 'failed', projectDir: args.projectDir }; + } + // Prefix the clack gutter so the warning sits inside the setup frame + // instead of breaking out of it; kept on stderr for scripted runs. + io.stderr.write(`│ ${formatCodexIsolationWarning()}\n`); + await persistLlmConfig(args.projectDir, { backend: 'codex' }, model.model); + io.stdout.write(`│ LLM ready: yes (codex, ${model.model})\n`); + return { status: 'ready', projectDir: args.projectDir }; + } + const credential = await chooseCredentialRef(backendArgs, io, deps); if (credential.status === 'back' && backendChoice.prompted) { attemptArgs = buildInteractiveRetryArgs(args); diff --git a/packages/cli/src/status-project.ts b/packages/cli/src/status-project.ts index 097f4091..ff7b98f4 100644 --- a/packages/cli/src/status-project.ts +++ b/packages/cli/src/status-project.ts @@ -1,6 +1,11 @@ import { stat as statAsync, readdir as readdirAsync } from 'node:fs/promises'; import { basename, join } from 'node:path'; import { runClaudeCodeAuthProbe } from './context/llm/claude-code-runtime.js'; +import { + CODEX_ISOLATION_WARNING, + CODEX_ISOLATION_WARNING_FIX, +} from './context/llm/codex-isolation.js'; +import { runCodexAuthProbe } from './context/llm/codex-runtime.js'; import type { KtxConfigIssue, KtxProjectConfig, KtxProjectConnectionConfig, KtxProjectEmbeddingConfig, KtxProjectLlmConfig } from './context/project/config.js'; import type { KtxLocalProject } from './context/project/project.js'; import { ktxLocalStateDbPath } from './context/project/local-state-db.js'; @@ -94,6 +99,11 @@ type ClaudeCodeAuthProbe = (input: { env?: NodeJS.ProcessEnv; }) => Promise<{ ok: true } | { ok: false; message: string }>; +type CodexAuthProbe = (input: { + projectDir: string; + model: string; +}) => Promise<{ ok: true } | { ok: false; message: string; fix: string }>; + const PROJECT_READY_COMMANDS = KTX_NEXT_STEP_DIRECT_COMMANDS.map((step) => step.command); interface LocalStatsIngestPerConnection { @@ -194,6 +204,7 @@ async function buildLlmStatus( projectDir: string; env: NodeJS.ProcessEnv; claudeCodeAuthProbe?: ClaudeCodeAuthProbe; + codexAuthProbe?: CodexAuthProbe; fast?: boolean; useSpinner?: boolean; }, @@ -210,6 +221,18 @@ async function buildLlmStatus( fix: 'Run: ktx setup (choose an LLM provider)', }; } + // The runtime (resolveModelSlots) hard-requires llm.models.default for every + // non-none backend; without it ingest/scan/memory throw. Report that here so + // status never marks a project ready that the runtime would refuse to run. + if (!model || model.trim().length === 0) { + return { + backend, + model, + status: 'fail', + detail: `llm.models.default is required for backend "${backend}"`, + fix: 'Set llm.models.default in ktx.yaml, then rerun `ktx status` (or rerun `ktx setup`).', + }; + } if (backend === 'anthropic') { const ref = config.provider.anthropic?.api_key; const resolved = resolveRef(ref, env); @@ -251,7 +274,7 @@ async function buildLlmStatus( }; } if (backend === 'claude-code') { - const modelName = model ?? 'sonnet'; + const modelName = model; if (options.fast === true) { return { backend, @@ -280,6 +303,36 @@ async function buildLlmStatus( fix: 'Authenticate Claude Code locally with the Claude Code CLI, then rerun `ktx status`.', }; } + if (backend === 'codex') { + const modelName = model; + if (options.fast === true) { + return { + backend, + model: modelName, + status: 'skipped', + detail: 'auth probe skipped (--fast)', + }; + } + const probe = options.codexAuthProbe ?? runCodexAuthProbe; + const auth = await withSpinner(options.useSpinner === true, 'Probing Codex authentication', () => + probe({ projectDir: options.projectDir, model: modelName }), + ); + if (auth.ok) { + return { + backend, + model: modelName, + status: 'ok', + detail: 'local Codex session authenticated', + }; + } + return { + backend, + model: modelName, + status: 'fail', + detail: auth.message, + fix: auth.fix, + }; + } return { backend, model, status: 'warn', detail: 'unknown LLM backend' }; } @@ -572,6 +625,13 @@ function buildWarnings( }); } + if (llm.backend === 'codex') { + warnings.push({ + message: CODEX_ISOLATION_WARNING, + fix: CODEX_ISOLATION_WARNING_FIX, + }); + } + return warnings; } @@ -634,6 +694,7 @@ export interface BuildProjectStatusOptions { env?: NodeJS.ProcessEnv; queryHistoryReadinessProbe?: HistoricSqlReadinessProbe; claudeCodeAuthProbe?: ClaudeCodeAuthProbe; + codexAuthProbe?: CodexAuthProbe; configIssues?: KtxConfigIssue[]; fast?: boolean; useSpinner?: boolean; @@ -882,6 +943,7 @@ export async function buildProjectStatus(project: KtxLocalProject, options: Buil projectDir: project.projectDir, env, claudeCodeAuthProbe: options.claudeCodeAuthProbe, + codexAuthProbe: options.codexAuthProbe, fast: options.fast, useSpinner: options.useSpinner, }); diff --git a/packages/cli/test/context/ingest/local-bundle-runtime.test.ts b/packages/cli/test/context/ingest/local-bundle-runtime.test.ts index 64fad53a..9d1ec9b4 100644 --- a/packages/cli/test/context/ingest/local-bundle-runtime.test.ts +++ b/packages/cli/test/context/ingest/local-bundle-runtime.test.ts @@ -77,9 +77,10 @@ describe('createLocalBundleIngestRuntime', () => { }), ).toThrow( [ - 'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.', - 'Configure a local Claude Code session or API-backed LLM, then rerun ingest:', + 'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.', + 'Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:', ` ktx setup --project-dir ${project.projectDir} --llm-backend claude-code --no-input`, + ` ktx setup --project-dir ${project.projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`, ` ktx setup --project-dir ${project.projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`, ].join('\n'), ); diff --git a/packages/cli/test/context/llm/codex-exec-events.test.ts b/packages/cli/test/context/llm/codex-exec-events.test.ts new file mode 100644 index 00000000..5edcfed8 --- /dev/null +++ b/packages/cli/test/context/llm/codex-exec-events.test.ts @@ -0,0 +1,188 @@ +import { describe, expect, it } from 'vitest'; +import { + parseCodexExecEventLine, + summarizeCodexExecEvents, +} from '../../../src/context/llm/codex-exec-events.js'; + +describe('Codex exec event parsing', () => { + it('uses the completed turn as one step when no MCP tools run', () => { + const summary = summarizeCodexExecEvents( + [ + { type: 'thread.started', thread_id: 'thr_1' }, + { type: 'turn.started' }, + { type: 'item.completed', item: { id: 'item_1', type: 'agent_message', text: 'hello from codex' } }, + { + type: 'turn.completed', + usage: { + input_tokens: 12, + cached_input_tokens: 4, + output_tokens: 5, + reasoning_output_tokens: 2, + }, + }, + ], + { startedAt: 100, now: () => 125 }, + ); + + expect(summary).toEqual({ + finalText: 'hello from codex', + stopReason: 'natural', + usage: { inputTokens: 12, outputTokens: 5, totalTokens: 17 }, + stepCount: 1, + stepBoundariesMs: [25], + toolCallCount: 0, + toolFailures: [], + }); + }); + + it('uses completed MCP tool calls as loop steps', () => { + const offsets = [115, 140, 175]; + const summary = summarizeCodexExecEvents( + [ + { type: 'turn.started' }, + { + type: 'item.started', + item: { id: 'call_1', type: 'mcp_tool_call', server: 'ktx', tool: 'search', arguments: {}, status: 'in_progress' }, + }, + { + type: 'item.completed', + item: { id: 'call_1', type: 'mcp_tool_call', server: 'ktx', tool: 'search', arguments: {}, status: 'completed' }, + }, + { + type: 'item.started', + item: { id: 'call_2', type: 'mcp_tool_call', server: 'ktx', tool: 'lookup', arguments: {}, status: 'in_progress' }, + }, + { + type: 'item.completed', + item: { + id: 'call_2', + type: 'mcp_tool_call', + server: 'ktx', + tool: 'lookup', + arguments: {}, + status: 'failed', + error: { message: 'denied' }, + }, + }, + { type: 'item.completed', item: { id: 'item_1', type: 'agent_message', text: 'done' } }, + { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1, cached_input_tokens: 0, reasoning_output_tokens: 0 } }, + ], + { startedAt: 100, now: () => offsets.shift() ?? 175 }, + ); + + expect(summary).toEqual({ + finalText: 'done', + stopReason: 'natural', + usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 }, + stepCount: 2, + stepBoundariesMs: [15, 40], + toolCallCount: 2, + toolFailures: ['lookup: denied'], + }); + }); + + it('does not treat a completed MCP tool call as failed when Codex sends error: null', () => { + // Captured verbatim from a real @openai/codex-sdk run: successful tool calls + // carry `error: null` and `result` alongside `status: "completed"`. + const summary = summarizeCodexExecEvents([ + { type: 'turn.started' }, + { + type: 'item.started', + item: { + id: 'item_1', + type: 'mcp_tool_call', + server: 'ktx', + tool: 'echo_value', + arguments: { value: 'ktx_codex_tool_ok' }, + result: null, + error: null, + status: 'in_progress', + }, + }, + { + type: 'item.completed', + item: { + id: 'item_1', + type: 'mcp_tool_call', + server: 'ktx', + tool: 'echo_value', + arguments: { value: 'ktx_codex_tool_ok' }, + result: { content: [{ type: 'text', text: 'echo:ktx_codex_tool_ok' }], structured_content: null }, + error: null, + status: 'completed', + }, + }, + { type: 'item.completed', item: { id: 'm1', type: 'agent_message', text: 'done' } }, + { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } }, + ]); + + expect(summary.toolFailures).toEqual([]); + expect(summary.toolCallCount).toBe(1); + }); + + it('counts built-in command executions as loop steps without failing the loop', () => { + const offsets = [110, 130]; + const summary = summarizeCodexExecEvents( + [ + { type: 'turn.started' }, + { type: 'item.completed', item: { id: 'c1', type: 'command_execution', command: 'ls', status: 'completed', exit_code: 0 } }, + { type: 'item.completed', item: { id: 'c2', type: 'command_execution', command: 'cat missing', status: 'failed', exit_code: 1 } }, + { type: 'item.completed', item: { id: 'm1', type: 'agent_message', text: 'done' } }, + { type: 'turn.completed', usage: { input_tokens: 2, output_tokens: 1 } }, + ], + { startedAt: 100, now: () => offsets.shift() ?? 130 }, + ); + + expect(summary.stepCount).toBe(2); + expect(summary.stepBoundariesMs).toEqual([10, 30]); + // A non-zero command exit is normal agent exploration, not a runtime tool failure. + expect(summary.toolFailures).toEqual([]); + expect(summary.toolCallCount).toBe(0); + }); + + it('maps turn failures into error stop reason', () => { + const summary = summarizeCodexExecEvents([ + { type: 'turn.started' }, + { type: 'turn.failed', error: { message: 'Codex could not connect to required MCP server' } }, + ]); + + expect(summary.stopReason).toBe('error'); + expect(summary.error?.message).toContain('Codex could not connect to required MCP server'); + }); + + it('unwraps the Codex API error envelope into its human-readable message', () => { + // Codex serializes API errors as a JSON envelope inside the event message. + const apiError = JSON.stringify({ + type: 'error', + status: 400, + error: { + type: 'invalid_request_error', + message: "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.", + }, + }); + const summary = summarizeCodexExecEvents([ + { type: 'thread.started', thread_id: 'thr_1' }, + { type: 'turn.started' }, + { type: 'error', message: apiError }, + { type: 'turn.failed', error: { message: apiError } }, + ]); + + expect(summary.stopReason).toBe('error'); + expect(summary.error?.message).toBe( + "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.", + ); + }); + + it('maps max-turns terminal reasons into budget stop reason when Codex emits one', () => { + const summary = summarizeCodexExecEvents([ + { type: 'turn.started' }, + { type: 'turn.completed', reason: 'max_turns', usage: { input_tokens: 1, output_tokens: 1 } }, + ]); + + expect(summary.stopReason).toBe('budget'); + }); + + it('throws a clear error for malformed JSONL lines', () => { + expect(() => parseCodexExecEventLine('{not-json')).toThrow('Codex JSONL event stream was malformed'); + }); +}); diff --git a/packages/cli/test/context/llm/codex-isolation.test.ts b/packages/cli/test/context/llm/codex-isolation.test.ts new file mode 100644 index 00000000..0ef39ee3 --- /dev/null +++ b/packages/cli/test/context/llm/codex-isolation.test.ts @@ -0,0 +1,19 @@ +import { describe, expect, it } from 'vitest'; +import { + CODEX_ISOLATION_WARNING, + CODEX_ISOLATION_WARNING_FIX, + formatCodexIsolationWarning, +} from '../../../src/context/llm/codex-isolation.js'; + +describe('Codex isolation warning', () => { + it('documents the enforced and unenforced Codex isolation boundaries', () => { + expect(CODEX_ISOLATION_WARNING).toContain('runtime MCP server to the current ktx tool set'); + expect(CODEX_ISOLATION_WARNING).toContain('disables Codex web search'); + expect(CODEX_ISOLATION_WARNING).toContain('may still load user Codex config'); + expect(CODEX_ISOLATION_WARNING).toContain('built-in command execution'); + expect(CODEX_ISOLATION_WARNING_FIX).toContain('claude-code'); + expect(formatCodexIsolationWarning()).toBe( + `${CODEX_ISOLATION_WARNING} ${CODEX_ISOLATION_WARNING_FIX}`, + ); + }); +}); diff --git a/packages/cli/test/context/llm/codex-mcp-runtime-server.test.ts b/packages/cli/test/context/llm/codex-mcp-runtime-server.test.ts new file mode 100644 index 00000000..c793afb7 --- /dev/null +++ b/packages/cli/test/context/llm/codex-mcp-runtime-server.test.ts @@ -0,0 +1,73 @@ +import { describe, expect, it, vi } from 'vitest'; +import { z } from 'zod'; +import { + createCodexRuntimeMcpServer, + startCodexRuntimeMcpServer, +} from '../../../src/context/llm/codex-mcp-runtime-server.js'; + +describe('Codex runtime MCP server', () => { + it('registers runtime tools with markdown output', async () => { + const registered = new Map< + string, + { + config: { description?: string; inputSchema: unknown }; + handler: (input: Record) => Promise; + } + >(); + const server = createCodexRuntimeMcpServer({ + server: { + registerTool(name, config, handler) { + registered.set(name, { config, handler }); + }, + }, + toolSet: { + wiki_search: { + name: 'wiki_search', + description: 'Search the wiki', + inputSchema: z.object({ query: z.string() }), + execute: vi.fn(async () => ({ markdown: 'result markdown', structured: { matches: 1 } })), + }, + }, + }); + + expect(server).toBeDefined(); + expect([...registered.keys()]).toEqual(['wiki_search']); + expect(registered.get('wiki_search')?.config).toMatchObject({ + description: 'Search the wiki', + }); + await expect(registered.get('wiki_search')?.handler({ query: 'revenue' })).resolves.toEqual({ + content: [{ type: 'text', text: 'result markdown' }], + structuredContent: { matches: 1 }, + }); + }); + + it('starts loopback HTTP MCP with a bearer token and reports the runtime URL', async () => { + const close = vi.fn(async () => undefined); + const runServer = vi.fn(async () => ({ + server: { address: () => ({ port: 4321 }) }, + close, + })); + + const handle = await startCodexRuntimeMcpServer({ + projectDir: '/tmp/ktx-project', + toolSet: {}, + runServer: runServer as never, + }); + + expect(handle.url).toBe('http://127.0.0.1:4321/mcp'); + expect(handle.bearerTokenEnvVar).toBe('KTX_CODEX_RUNTIME_MCP_TOKEN'); + expect(handle.bearerToken).toMatch(/^[a-f0-9]{64}$/); + expect(runServer).toHaveBeenCalledWith( + expect.objectContaining({ + projectDir: '/tmp/ktx-project', + host: '127.0.0.1', + port: 0, + token: handle.bearerToken, + allowedHosts: ['127.0.0.1', 'localhost'], + allowedOrigins: [], + }), + ); + await handle.close(); + expect(close).toHaveBeenCalled(); + }); +}); diff --git a/packages/cli/test/context/llm/codex-models.test.ts b/packages/cli/test/context/llm/codex-models.test.ts new file mode 100644 index 00000000..83a1e2c8 --- /dev/null +++ b/packages/cli/test/context/llm/codex-models.test.ts @@ -0,0 +1,17 @@ +import { describe, expect, it } from 'vitest'; +import { resolveCodexModel } from '../../../src/context/llm/codex-models.js'; + +describe('resolveCodexModel', () => { + it.each([ + ['codex', 'gpt-5.5'], + ['default', 'gpt-5.5'], + ['gpt-5.3-codex-spark', 'gpt-5.3-codex-spark'], + ['gpt-5.4', 'gpt-5.4'], + ])('maps %s to %s', (input, expected) => { + expect(resolveCodexModel(input)).toBe(expected); + }); + + it.each(['', ' ', 'sonnet', 'claude-sonnet-4-6'])('rejects %s', (input) => { + expect(() => resolveCodexModel(input)).toThrow('Unsupported Codex model'); + }); +}); diff --git a/packages/cli/test/context/llm/codex-runtime-config.test.ts b/packages/cli/test/context/llm/codex-runtime-config.test.ts new file mode 100644 index 00000000..97c80446 --- /dev/null +++ b/packages/cli/test/context/llm/codex-runtime-config.test.ts @@ -0,0 +1,43 @@ +import { describe, expect, it } from 'vitest'; +import { buildCodexRuntimeConfig } from '../../../src/context/llm/codex-runtime-config.js'; + +describe('buildCodexRuntimeConfig', () => { + it('builds generic config without SDK thread-option fields', () => { + expect(buildCodexRuntimeConfig({ model: 'gpt-5.3-codex' })).toEqual({ + configOverrides: { + history: { persistence: 'none' }, + }, + env: {}, + }); + }); + + it('adds only the temporary ktx MCP server and exact enabled tools', () => { + expect( + buildCodexRuntimeConfig({ + model: 'gpt-5.3-codex', + mcp: { + url: 'http://127.0.0.1:4567/mcp', + bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN', + bearerToken: 'secret-token', + toolNames: ['sl_read_source', 'wiki_search'], + }, + }), + ).toEqual({ + configOverrides: { + history: { persistence: 'none' }, + mcp_servers: { + ktx: { + url: 'http://127.0.0.1:4567/mcp', + bearer_token_env_var: 'KTX_CODEX_RUNTIME_MCP_TOKEN', + enabled_tools: ['sl_read_source', 'wiki_search'], + default_tools_approval_mode: 'approve', + required: true, + }, + }, + }, + env: { + KTX_CODEX_RUNTIME_MCP_TOKEN: 'secret-token', + }, + }); + }); +}); diff --git a/packages/cli/test/context/llm/codex-runtime.test.ts b/packages/cli/test/context/llm/codex-runtime.test.ts new file mode 100644 index 00000000..2d408543 --- /dev/null +++ b/packages/cli/test/context/llm/codex-runtime.test.ts @@ -0,0 +1,460 @@ +import { describe, expect, it, vi } from 'vitest'; +import { z } from 'zod'; +import { + CodexKtxLlmRuntime, + runCodexAuthProbe, +} from '../../../src/context/llm/codex-runtime.js'; + +async function* events(items: unknown[]) { + for (const item of items) { + yield item; + } +} + +function runner(items: unknown[]) { + return { + runStreamed: vi.fn(async () => events(items)), + }; +} + +/** Yields the given events, then throws — mirroring the SDK throwing on a non-zero codex exec exit. */ +function throwingRunner(items: unknown[], error: Error) { + return { + runStreamed: vi.fn(async () => + (async function* () { + for (const item of items) { + yield item; + } + throw error; + })(), + ), + }; +} + +const MODEL_UNSUPPORTED_API_ERROR = JSON.stringify({ + type: 'error', + status: 400, + error: { + type: 'invalid_request_error', + message: "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.", + }, +}); + +function budgetRunner() { + let observedSignal: AbortSignal | undefined; + return { + observedSignal: () => observedSignal, + runStreamed: vi.fn(async (input: { signal?: AbortSignal }) => { + observedSignal = input.signal; + return events([ + { type: 'turn.started' }, + { type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'first', status: 'in_progress' } }, + { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'first', status: 'completed' } }, + { type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'second', status: 'in_progress' } }, + { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'second', status: 'completed' } }, + { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } }, + ]); + }), + }; +} + +describe('CodexKtxLlmRuntime', () => { + it('generates text with the role-selected model and metrics', async () => { + const onMetrics = vi.fn(); + const fakeRunner = runner([ + { type: 'turn.started' }, + { type: 'item.completed', item: { type: 'agent_message', text: 'hello' } }, + { type: 'turn.completed', usage: { input_tokens: 3, output_tokens: 4, total_tokens: 7 } }, + ]); + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex', triage: 'gpt-5.4' }, + runner: fakeRunner, + }); + + await expect(runtime.generateText({ role: 'triage', system: 'system', prompt: 'prompt', onMetrics })).resolves.toBe('hello'); + expect(fakeRunner.runStreamed).toHaveBeenCalledWith( + expect.objectContaining({ + projectDir: '/tmp/project', + model: 'gpt-5.4', + prompt: 'system\n\nprompt', + }), + ); + expect(onMetrics).toHaveBeenCalledWith(expect.objectContaining({ usage: { inputTokens: 3, outputTokens: 4, totalTokens: 7 } })); + }); + + it('generates and validates structured output', async () => { + const fakeRunner = runner([ + { type: 'turn.started' }, + { type: 'item.completed', item: { type: 'agent_message', text: '{"answer":"yes"}' } }, + { type: 'turn.completed' }, + ]); + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: fakeRunner, + }); + + await expect( + runtime.generateObject({ + role: 'default', + prompt: 'json', + schema: z.object({ answer: z.string() }), + }), + ).resolves.toEqual({ answer: 'yes' }); + expect(fakeRunner.runStreamed).toHaveBeenCalledWith( + expect.objectContaining({ + outputSchema: expect.objectContaining({ type: 'object' }), + }), + ); + }); + + it('returns a structured-output error when Codex final text is invalid JSON', async () => { + const fakeRunner = runner([ + { type: 'turn.started' }, + { type: 'item.completed', item: { type: 'agent_message', text: 'not json' } }, + { type: 'turn.completed' }, + ]); + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: fakeRunner, + }); + + await expect( + runtime.generateObject({ + role: 'default', + prompt: 'json', + schema: z.object({ answer: z.string() }), + }), + ).rejects.toThrow('Codex structured output failed validation'); + }); + + it('starts and closes a temporary MCP server for tool-backed agent loops', async () => { + const close = vi.fn(async () => undefined); + const startMcpServer = vi.fn(async () => ({ + url: 'http://127.0.0.1:4321/mcp', + bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN' as const, + bearerToken: 'token', + close, + })); + const fakeRunner = runner([ + { type: 'turn.started' }, + { type: 'item.started', item: { type: 'mcp_tool_call', name: 'wiki_search' } }, + { type: 'item.completed', item: { type: 'agent_message', text: 'done' } }, + { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1, total_tokens: 2 } }, + ]); + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: fakeRunner, + startMcpServer, + }); + const onStepFinish = vi.fn(); + + const result = await runtime.runAgentLoop({ + modelRole: 'default', + systemPrompt: 'system', + userPrompt: 'user', + stepBudget: 5, + telemetryTags: {}, + onStepFinish, + toolSet: { + aliased_wiki_tool: { + name: 'wiki_search', + description: 'Search wiki', + inputSchema: z.object({ query: z.string() }), + execute: vi.fn(), + }, + }, + }); + + expect(result.stopReason).toBe('natural'); + expect(result.metrics).toMatchObject({ stepCount: 1, usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 } }); + expect(onStepFinish).toHaveBeenCalledWith({ stepIndex: 1, stepBudget: 5 }); + expect(startMcpServer).toHaveBeenCalledWith({ projectDir: '/tmp/project', toolSet: expect.any(Object) }); + expect(fakeRunner.runStreamed).toHaveBeenCalledWith( + expect.objectContaining({ + env: { KTX_CODEX_RUNTIME_MCP_TOKEN: 'token' }, + configOverrides: expect.objectContaining({ + mcp_servers: expect.objectContaining({ + ktx: expect.objectContaining({ + url: 'http://127.0.0.1:4321/mcp', + enabled_tools: ['wiki_search'], + required: true, + }), + }), + }), + }), + ); + expect(close).toHaveBeenCalled(); + }); + + it('returns error stop reason on turn failure', async () => { + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: runner([{ type: 'turn.failed', error: { message: 'boom' } }]), + }); + + const result = await runtime.runAgentLoop({ + modelRole: 'default', + systemPrompt: 'system', + userPrompt: 'user', + stepBudget: 5, + telemetryTags: {}, + toolSet: {}, + }); + + expect(result.stopReason).toBe('error'); + expect(result.error?.message).toBe('boom'); + }); + + it('surfaces failed MCP tool calls as agent-loop errors', async () => { + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: runner([ + { type: 'turn.started' }, + { type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'search', status: 'in_progress' } }, + { + type: 'item.completed', + item: { + type: 'mcp_tool_call', + server: 'ktx', + tool: 'search', + status: 'failed', + error: { message: 'denied' }, + }, + }, + { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } }, + ]), + }); + + const result = await runtime.runAgentLoop({ + modelRole: 'default', + systemPrompt: 'system', + userPrompt: 'user', + stepBudget: 5, + telemetryTags: {}, + toolSet: {}, + }); + + expect(result.stopReason).toBe('error'); + expect(result.error?.message).toBe('Codex runtime tool call failed: search: denied'); + expect(result.metrics).toMatchObject({ + stepCount: 1, + usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 }, + }); + }); + + it('returns budget and aborts the Codex stream when local MCP step budget is reached', async () => { + const fakeRunner = budgetRunner(); + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: fakeRunner, + }); + const onStepFinish = vi.fn(); + + const result = await runtime.runAgentLoop({ + modelRole: 'default', + systemPrompt: 'system', + userPrompt: 'user', + stepBudget: 1, + telemetryTags: {}, + onStepFinish, + toolSet: { + first: { + name: 'first', + description: 'First tool', + inputSchema: z.object({}), + execute: vi.fn(), + }, + }, + }); + + expect(result.stopReason).toBe('budget'); + expect(result.error).toBeUndefined(); + expect(result.metrics).toMatchObject({ stepCount: 1 }); + expect(onStepFinish).toHaveBeenCalledTimes(1); + expect(onStepFinish).toHaveBeenCalledWith({ stepIndex: 1, stepBudget: 1 }); + expect(fakeRunner.observedSignal()?.aborted).toBe(true); + }); + + it('counts built-in command_execution steps against the budget and aborts the stream', async () => { + let observedSignal: AbortSignal | undefined; + const fakeRunner = { + observedSignal: () => observedSignal, + runStreamed: vi.fn(async (input: { signal?: AbortSignal }) => { + observedSignal = input.signal; + return events([ + { type: 'turn.started' }, + { type: 'item.started', item: { type: 'command_execution', command: 'ls', status: 'in_progress' } }, + { type: 'item.completed', item: { type: 'command_execution', command: 'ls', status: 'completed', exit_code: 0 } }, + { type: 'item.started', item: { type: 'command_execution', command: 'cat a', status: 'in_progress' } }, + { type: 'item.completed', item: { type: 'command_execution', command: 'cat a', status: 'completed', exit_code: 0 } }, + { type: 'item.completed', item: { type: 'command_execution', command: 'cat b', status: 'completed', exit_code: 0 } }, + { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } }, + ]); + }), + }; + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: fakeRunner, + }); + const onStepFinish = vi.fn(); + + const result = await runtime.runAgentLoop({ + modelRole: 'default', + systemPrompt: 'system', + userPrompt: 'user', + stepBudget: 2, + telemetryTags: {}, + onStepFinish, + toolSet: {}, + }); + + expect(result.stopReason).toBe('budget'); + expect(result.error).toBeUndefined(); + expect(result.metrics).toMatchObject({ stepCount: 2 }); + expect(onStepFinish).toHaveBeenCalledTimes(2); + expect(onStepFinish).toHaveBeenLastCalledWith({ stepIndex: 2, stepBudget: 2 }); + expect(fakeRunner.observedSignal()?.aborted).toBe(true); + }); + + it('fires onStepFinish live as each step completes, before the stream drains', async () => { + const order: string[] = []; + async function* liveEvents() { + yield { type: 'turn.started' }; + yield { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'a', status: 'completed' } }; + order.push('yielded-after-step-1'); + yield { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'b', status: 'completed' } }; + order.push('yielded-after-step-2'); + yield { type: 'item.completed', item: { type: 'agent_message', text: 'done' } }; + yield { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } }; + } + const fakeRunner = { runStreamed: vi.fn(async () => liveEvents()) }; + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: fakeRunner, + }); + + const result = await runtime.runAgentLoop({ + modelRole: 'default', + systemPrompt: 'system', + userPrompt: 'user', + stepBudget: 10, + telemetryTags: {}, + onStepFinish: ({ stepIndex }) => { + order.push(`step-${stepIndex}`); + }, + toolSet: {}, + }); + + expect(result.stopReason).toBe('natural'); + expect(result.metrics).toMatchObject({ stepCount: 2 }); + expect(order).toEqual(['step-1', 'yielded-after-step-1', 'step-2', 'yielded-after-step-2']); + }); + + it('surfaces the real Codex error event even when the SDK stream throws afterward', async () => { + // The SDK yields the error/turn.failed events on stdout, then throws on the + // non-zero exit. The masked exit message must not hide the real API error. + const fakeRunner = throwingRunner( + [ + { type: 'thread.started', thread_id: 't' }, + { type: 'turn.started' }, + { type: 'error', message: MODEL_UNSUPPORTED_API_ERROR }, + { type: 'turn.failed', error: { message: MODEL_UNSUPPORTED_API_ERROR } }, + ], + new Error('Codex Exec exited with code 1: Reading prompt from stdin...'), + ); + const runtime = new CodexKtxLlmRuntime({ + projectDir: '/tmp/project', + modelSlots: { default: 'codex' }, + runner: fakeRunner, + }); + + await expect(runtime.generateText({ role: 'default', prompt: 'hi' })).rejects.toThrow( + 'not supported when using Codex with a ChatGPT account', + ); + }); + + it('probes Codex authentication through a minimal non-interactive turn', async () => { + const fakeRunner = runner([ + { type: 'turn.started' }, + { type: 'item.completed', item: { type: 'agent_message', text: 'ok' } }, + { type: 'turn.completed' }, + ]); + + await expect( + runCodexAuthProbe({ + projectDir: '/tmp/project', + model: 'codex', + runner: fakeRunner, + }), + ).resolves.toEqual({ ok: true }); + }); + + it('reports an unavailable model without blaming auth when Codex rejects the model', async () => { + const fakeRunner = throwingRunner( + [ + { type: 'turn.started' }, + { type: 'turn.failed', error: { message: MODEL_UNSUPPORTED_API_ERROR } }, + ], + new Error('Codex Exec exited with code 1: Reading prompt from stdin...'), + ); + + const result = await runCodexAuthProbe({ + projectDir: '/tmp/project', + model: 'gpt-5.3-codex', + runner: fakeRunner, + }); + + expect(result.ok).toBe(false); + if (!result.ok) { + expect(result.message).not.toContain('authentication is not usable'); + expect(result.message).toContain('not available'); + expect(result.message).toContain('gpt-5.3-codex'); + expect(result.message).toContain('not supported when using Codex with a ChatGPT account'); + // A model-access failure must steer the user at the model config, not auth. + expect(result.fix).toContain('llm.models.default'); + expect(result.fix).not.toContain('Authenticate Codex'); + } + }); + + it('reports an auth failure when Codex exits without an error event', async () => { + const fakeRunner = throwingRunner( + [], + new Error('Codex Exec exited with code 1: Not logged in. Run `codex login`.'), + ); + + const result = await runCodexAuthProbe({ + projectDir: '/tmp/project', + model: 'gpt-5.5', + runner: fakeRunner, + }); + + expect(result.ok).toBe(false); + if (!result.ok) { + expect(result.message).toContain('authentication is not usable'); + expect(result.message).toContain('Not logged in'); + expect(result.fix).toContain('Authenticate Codex'); + } + }); + + it('rejects an unsupported model id before probing, steering at llm.models.default', async () => { + const result = await runCodexAuthProbe({ + projectDir: '/tmp/project', + model: 'not-a-real-model', + }); + + expect(result.ok).toBe(false); + if (!result.ok) { + expect(result.message).toContain('Unsupported Codex model'); + expect(result.fix).toContain('llm.models.default'); + } + }); +}); diff --git a/packages/cli/test/context/llm/codex-sdk-runner.test.ts b/packages/cli/test/context/llm/codex-sdk-runner.test.ts new file mode 100644 index 00000000..fdafc666 --- /dev/null +++ b/packages/cli/test/context/llm/codex-sdk-runner.test.ts @@ -0,0 +1,97 @@ +import { describe, expect, it, vi } from 'vitest'; + +const sdkMock = vi.hoisted(() => { + const events = (async function* () { + yield { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 2 } }; + })(); + const runStreamed = vi.fn(async () => ({ events })); + const startThread = vi.fn(() => ({ runStreamed })); + const Codex = vi.fn(function Codex(this: { startThread: typeof startThread }, options?: unknown) { + Object.assign(this, { options, startThread }); + }); + return { Codex, startThread, runStreamed }; +}); + +vi.mock('@openai/codex-sdk', () => ({ Codex: sdkMock.Codex })); + +import { CodexSdkCliRunner } from '../../../src/context/llm/codex-sdk-runner.js'; + +async function collectAsync(items: AsyncIterable): Promise { + const collected: T[] = []; + for await (const item of items) { + collected.push(item); + } + return collected; +} + +describe('CodexSdkCliRunner', () => { + it('passes isolated env through the SDK and runtime controls through thread options', async () => { + const runner = new CodexSdkCliRunner({ + envBase: { + HOME: '/home/ktx-user', + PATH: '/usr/local/bin:/usr/bin', + CODEX_HOME: '/home/ktx-user/.codex', + HTTPS_PROXY: 'http://proxy.example', + KTX_UNRELATED_SECRET: 'must-not-copy', // pragma: allowlist secret + }, + }); + const previousToken = process.env.KTX_CODEX_RUNTIME_MCP_TOKEN; + process.env.KTX_CODEX_RUNTIME_MCP_TOKEN = 'outer-token'; + const outputSchema = { + type: 'object', + properties: { answer: { type: 'string' } }, + required: ['answer'], + additionalProperties: false, + }; + const controller = new AbortController(); + + try { + const events = await runner.runStreamed({ + projectDir: '/tmp/ktx-project', + model: 'gpt-5.3-codex', + prompt: 'Return JSON.', + configOverrides: { + history: { persistence: 'none' }, + }, + env: { KTX_CODEX_RUNTIME_MCP_TOKEN: 'run-token' }, + outputSchema, + signal: controller.signal, + }); + + expect(sdkMock.Codex).toHaveBeenCalledWith({ + config: { + history: { persistence: 'none' }, + }, + env: { + HOME: '/home/ktx-user', + PATH: '/usr/local/bin:/usr/bin', + CODEX_HOME: '/home/ktx-user/.codex', + HTTPS_PROXY: 'http://proxy.example', + KTX_CODEX_RUNTIME_MCP_TOKEN: 'run-token', + }, + }); + expect(process.env.KTX_CODEX_RUNTIME_MCP_TOKEN).toBe('outer-token'); + expect(sdkMock.startThread).toHaveBeenCalledWith({ + workingDirectory: '/tmp/ktx-project', + skipGitRepoCheck: true, + model: 'gpt-5.3-codex', + sandboxMode: 'read-only', + webSearchMode: 'disabled', + approvalPolicy: 'never', + }); + expect(sdkMock.runStreamed).toHaveBeenCalledWith('Return JSON.', { + outputSchema, + signal: controller.signal, + }); + await expect(collectAsync(events)).resolves.toEqual([ + { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 2 } }, + ]); + } finally { + if (previousToken === undefined) { + delete process.env.KTX_CODEX_RUNTIME_MCP_TOKEN; + } else { + process.env.KTX_CODEX_RUNTIME_MCP_TOKEN = previousToken; + } + } + }); +}); diff --git a/packages/cli/test/context/llm/runtime-local-config.test.ts b/packages/cli/test/context/llm/runtime-local-config.test.ts index 9e432cec..14adca7c 100644 --- a/packages/cli/test/context/llm/runtime-local-config.test.ts +++ b/packages/cli/test/context/llm/runtime-local-config.test.ts @@ -22,4 +22,25 @@ describe('local KTX LLM runtime config', () => { }), ).toBeNull(); }); + + it('creates a Codex runtime for codex backend without creating an AI SDK provider', () => { + const runtime = createLocalKtxLlmRuntimeFromConfig( + { + provider: { backend: 'codex' }, + models: { default: 'codex', triage: 'gpt-5.4' }, + }, + { env: {}, projectDir: '/tmp/project', createCodexRuntime: vi.fn((deps) => ({ deps }) as never) }, + ); + + expect(runtime).toMatchObject({ deps: expect.objectContaining({ projectDir: '/tmp/project' }) }); + }); + + it('returns null from the AI SDK provider factory for codex backend', () => { + expect( + createLocalKtxLlmProviderFromConfig({ + provider: { backend: 'codex' }, + models: { default: 'codex' }, + }), + ).toBeNull(); + }); }); diff --git a/packages/cli/test/context/project/config.test.ts b/packages/cli/test/context/project/config.test.ts index 670e1696..6027d454 100644 --- a/packages/cli/test/context/project/config.test.ts +++ b/packages/cli/test/context/project/config.test.ts @@ -231,6 +231,31 @@ llm: }); }); + it('parses Codex as a first-class LLM backend', () => { + const config = parseKtxProjectConfig(` +llm: + provider: + backend: codex + models: + default: gpt-5.3-codex + triage: gpt-5.3-codex + candidateExtraction: gpt-5.3-codex + curator: gpt-5.3-codex + reconcile: gpt-5.3-codex + repair: gpt-5.3-codex +`); + + expect(config.llm.provider.backend).toBe('codex'); + expect(config.llm.models).toEqual({ + default: 'gpt-5.3-codex', + triage: 'gpt-5.3-codex', + candidateExtraction: 'gpt-5.3-codex', + curator: 'gpt-5.3-codex', + reconcile: 'gpt-5.3-codex', + repair: 'gpt-5.3-codex', + }); + }); + it('parses gateway LLM, OpenAI scan embeddings, and sentence-transformers ingest embeddings', () => { const config = parseKtxProjectConfig(` llm: @@ -530,7 +555,7 @@ describe('generateKtxProjectConfigJsonSchema', () => { const llm = (schema.properties as Record }>).llm; const provider = llm?.properties?.provider as { properties?: Record }; const backend = provider?.properties?.backend as { enum?: readonly string[] }; - expect(backend?.enum).toEqual(['none', 'anthropic', 'vertex', 'gateway', 'claude-code']); + expect(backend?.enum).toEqual(['none', 'anthropic', 'vertex', 'gateway', 'claude-code', 'codex']); const storage = (schema.properties as Record }>).storage; const state = storage?.properties?.state as { enum?: readonly string[] }; diff --git a/packages/cli/test/doctor.test.ts b/packages/cli/test/doctor.test.ts index e3871f28..242331e8 100644 --- a/packages/cli/test/doctor.test.ts +++ b/packages/cli/test/doctor.test.ts @@ -422,6 +422,8 @@ describe('runKtxDoctor', () => { 'llm:', ' provider:', ' backend: anthropic', + ' models:', + ' default: claude-sonnet-4-5', '', ].join('\n'), 'utf-8', @@ -543,6 +545,8 @@ describe('runKtxDoctor', () => { 'llm:', ' provider:', ' backend: anthropic', + ' models:', + ' default: claude-sonnet-4-5', 'ingest:', ' adapters:', ' - live-database', @@ -652,6 +656,8 @@ describe('runKtxDoctor', () => { 'llm:', ' provider:', ' backend: anthropic', + ' models:', + ' default: claude-sonnet-4-5', '', ].join('\n'), 'utf-8', @@ -698,6 +704,8 @@ describe('runKtxDoctor', () => { 'llm:', ' provider:', ' backend: anthropic', + ' models:', + ' default: claude-sonnet-4-5', 'ingest:', ' adapters:', ' - live-database', diff --git a/packages/cli/test/ingest.test.ts b/packages/cli/test/ingest.test.ts index f5cd1ac5..4fc47d0c 100644 --- a/packages/cli/test/ingest.test.ts +++ b/packages/cli/test/ingest.test.ts @@ -337,10 +337,13 @@ describe('runKtxIngest', () => { expect(runIo.stdout()).toBe(''); expect(runIo.stderr()).toContain( - 'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.', + 'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.', ); - expect(runIo.stderr()).toContain('Configure a local Claude Code session or API-backed LLM, then rerun ingest:'); + expect(runIo.stderr()).toContain('Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:'); expect(runIo.stderr()).toContain(`ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`); + expect(runIo.stderr()).toContain( + `ktx setup --project-dir ${projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`, + ); expect(runIo.stderr()).toContain( `ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`, ); diff --git a/packages/cli/test/llm/model-provider.test.ts b/packages/cli/test/llm/model-provider.test.ts index 0e3ef045..17d47c6a 100644 --- a/packages/cli/test/llm/model-provider.test.ts +++ b/packages/cli/test/llm/model-provider.test.ts @@ -312,4 +312,13 @@ describe('createKtxLlmProvider', () => { }), ).toThrow('claude-code is not an AI SDK LanguageModel backend'); }); + + it('rejects codex as an AI SDK LanguageModel backend', () => { + expect(() => + createKtxLlmProvider({ + backend: 'codex', + modelSlots: { default: 'gpt-5.3-codex' }, + }), + ).toThrow('codex is not an AI SDK LanguageModel backend'); + }); }); diff --git a/packages/cli/test/setup-models.test.ts b/packages/cli/test/setup-models.test.ts index f054beff..dedf03bd 100644 --- a/packages/cli/test/setup-models.test.ts +++ b/packages/cli/test/setup-models.test.ts @@ -66,6 +66,7 @@ function makePromptAdapter(options: { nextProviderChoice === 'anthropic' || nextProviderChoice === 'vertex' || nextProviderChoice === 'claude-code' || + nextProviderChoice === 'codex' || nextProviderChoice === 'back' ) { return selectValues.shift() ?? nextProviderChoice; @@ -183,6 +184,7 @@ describe('setup Anthropic model step', () => { message: expect.stringContaining('Which LLM provider should KTX use?'), options: [ { value: 'claude-code', label: 'Claude subscription (Pro/Max)' }, + { value: 'codex', label: 'Codex subscription' }, { value: 'anthropic', label: 'Anthropic API key' }, { value: 'vertex', label: 'Google Vertex AI for Anthropic Claude' }, { value: 'back', label: 'Back' }, @@ -215,6 +217,85 @@ describe('setup Anthropic model step', () => { expect(authProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'sonnet' })); }); + it('configures Codex backend and validates local auth', async () => { + const io = makeIo(); + const codexAuthProbe = vi.fn(async () => ({ ok: true as const })); + + const result = await runKtxSetupAnthropicModelStep( + { + projectDir: tempDir, + inputMode: 'disabled', + llmBackend: 'codex', + llmModel: 'gpt-5.5', + skipLlm: false, + }, + io.io, + { codexAuthProbe }, + ); + + expect(result.status).toBe('ready'); + const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8')); + expect(config.llm).toMatchObject({ + provider: { backend: 'codex' }, + models: { default: 'gpt-5.5' }, + }); + expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'gpt-5.5' })); + // The warning carries the clack gutter so it renders inside the setup frame. + expect(io.stderr()).toContain('│ Codex backend isolation is limited'); + expect(io.stderr()).toContain('may still load user Codex config'); + }); + + it('defaults the Codex model to gpt-5.5 when none is provided non-interactively', async () => { + const io = makeIo(); + const codexAuthProbe = vi.fn(async () => ({ ok: true as const })); + + const result = await runKtxSetupAnthropicModelStep( + { + projectDir: tempDir, + inputMode: 'disabled', + llmBackend: 'codex', + skipLlm: false, + }, + io.io, + { codexAuthProbe }, + ); + + expect(result.status).toBe('ready'); + const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8')); + expect(config.llm).toMatchObject({ + provider: { backend: 'codex' }, + models: { default: 'gpt-5.5' }, + }); + expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'gpt-5.5' })); + }); + + it('offers the curated Codex models during interactive setup', async () => { + const io = makeIo(); + const prompts = makePromptAdapter({ selectValues: ['codex', 'gpt-5.5'] }); + const codexAuthProbe = vi.fn(async () => ({ ok: true as const })); + + const result = await runKtxSetupAnthropicModelStep( + { projectDir: tempDir, inputMode: 'auto', skipLlm: false }, + io.io, + { prompts, codexAuthProbe }, + ); + + expect(result.status).toBe('ready'); + expect(prompts.select).toHaveBeenCalledWith( + expect.objectContaining({ + message: expect.stringContaining('Which Codex model should KTX use?'), + options: [ + { value: 'gpt-5.5', label: 'GPT-5.5', hint: 'recommended' }, + { value: 'gpt-5.4', label: 'GPT-5.4' }, + { value: 'gpt-5.4-mini', label: 'GPT-5.4 mini' }, + { value: 'manual', label: 'Enter a Codex model ID manually' }, + { value: 'back', label: 'Back' }, + ], + }), + ); + expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ model: 'gpt-5.5' })); + }); + it('prompts for the Claude Code model during interactive setup', async () => { const io = makeIo(); const prompts = makePromptAdapter({ selectValues: ['claude-code', 'opus'] }); diff --git a/packages/cli/test/status-project.test.ts b/packages/cli/test/status-project.test.ts index 38d5aa6f..cd63cf19 100644 --- a/packages/cli/test/status-project.test.ts +++ b/packages/cli/test/status-project.test.ts @@ -44,6 +44,17 @@ function withClaudeCodeLlm(config: KtxProjectConfig): KtxProjectConfig { }; } +function withCodexLlm(config: KtxProjectConfig): KtxProjectConfig { + return { + ...config, + llm: { + ...config.llm, + provider: { backend: 'codex' }, + models: { ...config.llm.models, default: 'gpt-5.5' }, + }, + }; +} + function baseProjectConfig(): KtxProjectConfig { return withClaudeCodeLlm(buildDefaultKtxProjectConfig()); } @@ -391,6 +402,126 @@ describe('buildProjectStatus --fast', () => { }); }); +describe('buildProjectStatus codex', () => { + it('reports authenticated local Codex session', async () => { + const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig())); + const status = await buildProjectStatus(project, { + codexAuthProbe: async () => ({ ok: true as const }), + }); + + expect(status.llm).toMatchObject({ + backend: 'codex', + model: 'gpt-5.5', + status: 'ok', + detail: 'local Codex session authenticated', + }); + expect(status.warnings).toEqual( + expect.arrayContaining([ + expect.objectContaining({ + message: expect.stringContaining('Codex backend isolation is limited'), + fix: expect.stringContaining('claude-code'), + }), + ]), + ); + const rendered = renderProjectStatus(status, { verbose: false, useColor: false }); + expect(rendered).toContain('Codex backend isolation is limited'); + }); + + it('skips Codex auth probe with --fast', async () => { + let probeCalls = 0; + const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig())); + const status = await buildProjectStatus(project, { + fast: true, + codexAuthProbe: async () => { + probeCalls += 1; + return { ok: true }; + }, + }); + + expect(probeCalls).toBe(0); + expect(status.llm.status).toBe('skipped'); + expect(status.llm.detail).toMatch(/--fast/); + }); + + it('surfaces the probe fix for a model-access failure instead of an auth fix', async () => { + const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig())); + const status = await buildProjectStatus(project, { + codexAuthProbe: async () => ({ + ok: false, + message: 'Codex is authenticated, but the configured model "gpt-5.5" is not available...', + fix: 'Run `codex` to see the models your account supports, then set llm.models.default in ktx.yaml (or rerun `ktx setup`).', + }), + }); + + expect(status.llm.status).toBe('fail'); + expect(status.llm.fix).toContain('llm.models.default'); + expect(status.llm.fix).not.toContain('Authenticate Codex'); + }); +}); + +describe('buildProjectStatus llm models.default requirement', () => { + function withBackendNoModel( + backend: KtxProjectConfig['llm']['provider']['backend'], + ): KtxProjectConfig { + const config = buildDefaultKtxProjectConfig(); + return { + ...config, + llm: { ...config.llm, provider: { backend }, models: {} }, + }; + } + + it('fails codex without llm.models.default and never probes', async () => { + let probeCalls = 0; + const project = projectWithConfig(withBackendNoModel('codex')); + const status = await buildProjectStatus(project, { + codexAuthProbe: async () => { + probeCalls += 1; + return { ok: true }; + }, + }); + + expect(probeCalls).toBe(0); + expect(status.llm.status).toBe('fail'); + expect(status.llm.detail).toContain('llm.models.default'); + expect(status.verdict).toBe('blocked'); + }); + + it('fails claude-code without llm.models.default and never probes', async () => { + let probeCalls = 0; + const project = projectWithConfig(withBackendNoModel('claude-code')); + const status = await buildProjectStatus(project, { + claudeCodeAuthProbe: async () => { + probeCalls += 1; + return { ok: true }; + }, + }); + + expect(probeCalls).toBe(0); + expect(status.llm.status).toBe('fail'); + expect(status.llm.detail).toContain('llm.models.default'); + expect(status.verdict).toBe('blocked'); + }); + + it('fails anthropic without llm.models.default even when the key is set', async () => { + const config = withBackendNoModel('anthropic'); + const project = projectWithConfig({ + ...config, + llm: { + ...config.llm, + provider: { backend: 'anthropic', anthropic: { api_key: 'env:ANTHROPIC_API_KEY' } }, // pragma: allowlist secret + models: {}, + }, + }); + const status = await buildProjectStatus(project, { + env: { ANTHROPIC_API_KEY: 'sk-test' }, // pragma: allowlist secret + }); + + expect(status.llm.status).toBe('fail'); + expect(status.llm.detail).toContain('llm.models.default'); + expect(status.verdict).toBe('blocked'); + }); +}); + describe('buildLocalStatsStatus', () => { let tempDir: string; diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 15bc75f3..a3eaad5f 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -158,6 +158,9 @@ importers: '@notionhq/client': specifier: ^5.22.0 version: 5.22.0 + '@openai/codex-sdk': + specifier: ^0.133.0 + version: 0.133.0 ai: specifier: ^6.0.188 version: 6.0.188(zod@4.4.3) @@ -1288,6 +1291,51 @@ packages: '@octokit/types@16.0.0': resolution: {integrity: sha512-sKq+9r1Mm4efXW1FCk7hFSeJo4QKreL/tTbR0rz/qx/r1Oa2VV83LTA/H/MuCOX7uCIJmQVRKBcbmWoySjAnSg==} + '@openai/codex-sdk@0.133.0': + resolution: {integrity: sha512-PB82D/1Q0C7nzaV5O+1O4y5LcVwiUvxyHvCUTfz8Cwztv6bOWQ40gFHE5ZFX1EFPJx1cMV0GPVODWuXIKAuayQ==} + engines: {node: '>=18'} + + '@openai/codex@0.133.0': + resolution: {integrity: sha512-Gh42kLLBo/6gpnHmDzUWDVvyS57ekCB1+1Dz0RG2oIl3Lhk1uwrjSj/PwaJWWh4Rw/rUp1RqkwrMugFfFEOlqQ==} + engines: {node: '>=16'} + hasBin: true + + '@openai/codex@0.133.0-darwin-arm64': + resolution: {integrity: sha512-W7f8+DckLujnqGlptKCzgJU+ooeHKMuk6KYgMFP6A9asn7YUsGUgJqjiBaX8oNcXO6w/pTbKGRARx1kCNS8lIg==} + engines: {node: '>=16'} + cpu: [arm64] + os: [darwin] + + '@openai/codex@0.133.0-darwin-x64': + resolution: {integrity: sha512-Ek8ikvLOiXZ8emcIJVBXxK6fm8ratBy0kaEt3JNisTNszxGshUHf/R4xxDxIyKNcUkYYXjW7A/rMwW3iu3OFlg==} + engines: {node: '>=16'} + cpu: [x64] + os: [darwin] + + '@openai/codex@0.133.0-linux-arm64': + resolution: {integrity: sha512-uKXYYSJ3mY16sp4hcG/4BMNRjva/ZS4oARiI1+7k8+NiuoAhdCGWNe5u4KJ3sMuL3tp/IXcmc6B56EFX1+WDBQ==} + engines: {node: '>=16'} + cpu: [arm64] + os: [linux] + + '@openai/codex@0.133.0-linux-x64': + resolution: {integrity: sha512-9YfyqrfUj/UZ2+aXE4zBz47t6RXbVni95ZorGsNh857vxYK/asVpUtR2cymo9lB3JaI4mQaKFfV/t7IRItqkuA==} + engines: {node: '>=16'} + cpu: [x64] + os: [linux] + + '@openai/codex@0.133.0-win32-arm64': + resolution: {integrity: sha512-mRzND0PSGHRoLk0X41GTSoc3tFjZSF4HgDlfjU5fiQcWVi0/kLb7Ku6/tPFT/X2hOLa3YdJkbIcHC0Hc9ni80g==} + engines: {node: '>=16'} + cpu: [arm64] + os: [win32] + + '@openai/codex@0.133.0-win32-x64': + resolution: {integrity: sha512-u3ji78DIPZCGJeELuovsAnaZH+vK9gsA4F6M1y+Uy2s80Sz7/i1S0KL81qGReYji3urSjgBpkQuNP47GXOqxrQ==} + engines: {node: '>=16'} + cpu: [x64] + os: [win32] + '@opentelemetry/api@1.9.1': resolution: {integrity: sha512-gLyJlPHPZYdAk1JENA9LeHejZe1Ti77/pTeFm/nMXmQH/HFZlcS/O2XJB+L8fkbrNSqhdtlvjBVjxwUYanNH5Q==} engines: {node: '>=8.0.0'} @@ -7145,6 +7193,37 @@ snapshots: dependencies: '@octokit/openapi-types': 27.0.0 + '@openai/codex-sdk@0.133.0': + dependencies: + '@openai/codex': 0.133.0 + + '@openai/codex@0.133.0': + optionalDependencies: + '@openai/codex-darwin-arm64': '@openai/codex@0.133.0-darwin-arm64' + '@openai/codex-darwin-x64': '@openai/codex@0.133.0-darwin-x64' + '@openai/codex-linux-arm64': '@openai/codex@0.133.0-linux-arm64' + '@openai/codex-linux-x64': '@openai/codex@0.133.0-linux-x64' + '@openai/codex-win32-arm64': '@openai/codex@0.133.0-win32-arm64' + '@openai/codex-win32-x64': '@openai/codex@0.133.0-win32-x64' + + '@openai/codex@0.133.0-darwin-arm64': + optional: true + + '@openai/codex@0.133.0-darwin-x64': + optional: true + + '@openai/codex@0.133.0-linux-arm64': + optional: true + + '@openai/codex@0.133.0-linux-x64': + optional: true + + '@openai/codex@0.133.0-win32-arm64': + optional: true + + '@openai/codex@0.133.0-win32-x64': + optional: true + '@opentelemetry/api@1.9.1': {} '@orama/orama@3.1.18': {} diff --git a/scripts/codex-backend-live-smoke.mjs b/scripts/codex-backend-live-smoke.mjs new file mode 100644 index 00000000..7793fefc --- /dev/null +++ b/scripts/codex-backend-live-smoke.mjs @@ -0,0 +1,160 @@ +import { execFile } from 'node:child_process'; +import { mkdtemp, rm } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import { dirname, join, resolve } from 'node:path'; +import { fileURLToPath, pathToFileURL } from 'node:url'; +import { promisify } from 'node:util'; + +const execFileAsync = promisify(execFile); +const SCRIPT_DIR = dirname(fileURLToPath(import.meta.url)); +const ROOT_DIR = resolve(SCRIPT_DIR, '..'); +const OPT_IN_MESSAGE = + 'Set KTX_RUN_CODEX_BACKEND_SMOKE=1 or pass --force to run the Codex backend live smoke.'; + +export function codexBackendSmokeOptIn(env = process.env, args = process.argv.slice(2)) { + if (env.KTX_RUN_CODEX_BACKEND_SMOKE === '1' || args.includes('--force')) { + return { run: true }; + } + return { run: false, message: OPT_IN_MESSAGE }; +} + +async function run(command, args, options = {}) { + process.stdout.write(`$ ${command} ${args.join(' ')}\n`); + try { + const result = await execFileAsync(command, args, { + cwd: options.cwd ?? ROOT_DIR, + env: { ...process.env, ...(options.env ?? {}) }, + encoding: 'utf8', + maxBuffer: 1024 * 1024 * 20, + timeout: options.timeoutMs ?? 300_000, + }); + if (result.stdout) { + process.stdout.write(result.stdout); + } + if (result.stderr) { + process.stderr.write(result.stderr); + } + return { code: 0, stdout: result.stdout, stderr: result.stderr }; + } catch (error) { + const stdout = typeof error.stdout === 'string' ? error.stdout : ''; + const stderr = typeof error.stderr === 'string' ? error.stderr : error.message; + if (stdout) { + process.stdout.write(stdout); + } + if (stderr) { + process.stderr.write(stderr); + } + return { + code: typeof error.code === 'number' ? error.code : 1, + stdout, + stderr, + }; + } +} + +function requireSuccess(label, result) { + if (result.code !== 0) { + throw new Error(`${label} failed with code ${result.code}\nstdout:\n${result.stdout}\nstderr:\n${result.stderr}`); + } +} + +async function runSetupSmoke(projectDir) { + const result = await run( + 'node', + [ + join(ROOT_DIR, 'packages/cli/dist/bin.js'), + 'setup', + '--project-dir', + projectDir, + '--llm-backend', + 'codex', + '--llm-model', + 'gpt-5.3-codex', + '--no-input', + '--yes', + '--skip-databases', + '--skip-sources', + '--skip-agents', + ], + { timeoutMs: 600_000 }, + ); + requireSuccess('ktx setup codex backend', result); + if (!result.stdout.includes('LLM ready: yes (codex, gpt-5.3-codex)')) { + throw new Error(`setup did not report Codex LLM readiness\nstdout:\n${result.stdout}`); + } +} + +async function runRuntimeSmoke(projectDir) { + const runtimeUrl = pathToFileURL(join(ROOT_DIR, 'packages/cli/dist/context/llm/codex-runtime.js')).href; + const zodUrl = pathToFileURL(join(ROOT_DIR, 'packages/cli/node_modules/zod/index.js')).href; + const { CodexKtxLlmRuntime } = await import(runtimeUrl); + const { z } = await import(zodUrl); + const runtime = new CodexKtxLlmRuntime({ + projectDir, + modelSlots: { default: 'gpt-5.3-codex' }, + }); + + const text = await runtime.generateText({ + role: 'default', + prompt: 'Reply with exactly: ktx_codex_text_ok', + }); + if (text.trim() !== 'ktx_codex_text_ok') { + throw new Error(`Codex text smoke returned unexpected text: ${text}`); + } + + let toolCalls = 0; + const loop = await runtime.runAgentLoop({ + modelRole: 'default', + systemPrompt: 'You must use available tools when the user asks for a tool result.', + userPrompt: + 'Call the echo_value tool with {"value":"ktx_codex_tool_ok"}, then finish after the tool returns.', + toolSet: { + echo_value: { + name: 'echo_value', + description: 'Return the provided value as markdown.', + inputSchema: z.object({ value: z.string() }), + execute: async (input) => { + toolCalls += 1; + return { markdown: `echo:${input.value}` }; + }, + }, + }, + stepBudget: 4, + telemetryTags: {}, + }); + + if (loop.stopReason !== 'natural') { + throw new Error(`Codex tool smoke stopped with ${loop.stopReason}: ${loop.error?.message ?? 'no error'}`); + } + if (toolCalls !== 1) { + throw new Error(`Expected Codex to call echo_value exactly once, got ${toolCalls}`); + } +} + +export async function runCodexBackendLiveSmoke() { + const projectDir = await mkdtemp(join(tmpdir(), 'ktx-codex-backend-smoke-')); + try { + requireSuccess( + 'ktx build', + await run('pnpm', ['--filter', '@kaelio/ktx', 'run', 'build'], { timeoutMs: 600_000 }), + ); + await runSetupSmoke(projectDir); + await runRuntimeSmoke(projectDir); + process.stdout.write(`Codex backend live smoke passed in ${projectDir}\n`); + } finally { + await rm(projectDir, { recursive: true, force: true }); + } +} + +async function main() { + const optIn = codexBackendSmokeOptIn(); + if (!optIn.run) { + process.stdout.write(`${optIn.message}\n`); + return; + } + await runCodexBackendLiveSmoke(); +} + +if (import.meta.url === pathToFileURL(process.argv[1] ?? '').href) { + await main(); +} diff --git a/scripts/codex-backend-live-smoke.test.mjs b/scripts/codex-backend-live-smoke.test.mjs new file mode 100644 index 00000000..8d8c051f --- /dev/null +++ b/scripts/codex-backend-live-smoke.test.mjs @@ -0,0 +1,18 @@ +import assert from 'node:assert/strict'; +import test from 'node:test'; +import { codexBackendSmokeOptIn } from './codex-backend-live-smoke.mjs'; + +test('codex backend smoke stays disabled by default', () => { + assert.deepEqual(codexBackendSmokeOptIn({}, []), { + run: false, + message: 'Set KTX_RUN_CODEX_BACKEND_SMOKE=1 or pass --force to run the Codex backend live smoke.', + }); +}); + +test('codex backend smoke runs with env opt-in', () => { + assert.deepEqual(codexBackendSmokeOptIn({ KTX_RUN_CODEX_BACKEND_SMOKE: '1' }, []), { run: true }); +}); + +test('codex backend smoke runs with force flag', () => { + assert.deepEqual(codexBackendSmokeOptIn({}, ['--force']), { run: true }); +});