feat: add codex llm backend for ktx runtime work (#253)

* feat: add codex sdk runner foundation

* feat: parse codex runtime events

* feat: expose codex runtime mcp tools

* feat: add codex llm runtime

* feat: wire codex llm backend

* test: avoid Array.fromAsync in codex runner test

* docs: document codex llm backend

* fix: tighten codex runtime config ownership

* fix: use codex sdk env and thread options

* fix: parse codex sdk event shapes

* test: add codex backend live smoke

* docs: clarify codex backend isolation

* fix: drive codex loop metrics from mcp events

* fix: enforce codex local step budget

* docs: disclose codex isolation limits

* fix: count all codex agent steps and stream step callbacks live

The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.

collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.

* test: align ingest llm-guard assertions with codex backend

The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.

* fix: treat codex error:null tool calls as success

The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.

Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.

* fix: default codex backend to gpt-5.5 and report real probe errors

The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.

- Default codex model is now gpt-5.5 (works on both subscription and API-key
  auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
  keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
  failure and surfaces the real API error: collectEvents retains stream
  events when the SDK throws on a non-zero exit, and the API error JSON
  envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.

* fix: require llm.models.default in status and match codex probe remediation

Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.

Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.

Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.

Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
This commit is contained in:
Andrey Avtomonov 2026-06-02 13:57:11 +02:00 committed by GitHub
parent 74c6076b72
commit 494618ab14
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
41 changed files with 2544 additions and 30 deletions

View file

@ -30,8 +30,9 @@ warehouse accurately - from approved metric definitions, joinable columns, and
business knowledge it builds and maintains for you.
> [!NOTE]
> Run **ktx** with your own LLM API keys or a **Claude Pro/Max** subscription.
> No extra usage billing from **ktx**.
> Run **ktx** with your own LLM API keys or a local agent sign-in — a
> **Claude Pro/Max** subscription through Claude Code, or your local Codex
> authentication. No extra usage billing from **ktx**.
<p align="center">
<a href="https://youtu.be/5V4TuzYVlrA">
@ -175,8 +176,9 @@ then the current directory. Pass `--project-dir <path>` when scripting.
No. **ktx** runs locally. The only data leaving your machine is what you
send to the LLM provider you configured.
- **Which LLM backends are supported?**
Anthropic API, Google Vertex AI, AI Gateway, and the local Claude Code
session through the Claude Agent SDK. See
Anthropic API, Google Vertex AI, AI Gateway, the local Claude Code session
through the Claude Agent SDK, and your local Codex authentication through the
Codex SDK. See
[LLM configuration](https://docs.kaelio.com/ktx/docs/guides/llm-configuration).
- **How is ktx different from a dbt or MetricFlow semantic layer?**
**ktx** *ingests* those layers and combines them with raw-table

View file

@ -51,8 +51,9 @@ prompts.
| Flag | Description |
|------|-------------|
| `--llm-backend <backend>` | LLM backend: `anthropic`, `vertex`, or `claude-code` |
| `--llm-backend <backend>` | LLM backend: `anthropic`, `vertex`, `claude-code`, or `codex` |
| `--llm-backend claude-code` | Use the local Claude Code session for **ktx** LLM calls |
| `--llm-backend codex` | Use local Codex authentication for **ktx** LLM calls |
| `--llm-model <model>` | LLM model ID or backend model alias to validate and save |
| `--anthropic-api-key-env <name>` | Environment variable containing the Anthropic API key |
| `--anthropic-api-key-file <path>` | File containing the Anthropic API key |
@ -62,9 +63,14 @@ prompts.
Choose only one Anthropic credential source. Anthropic credential flags are only
valid with the Anthropic backend; Vertex flags are only valid with the Vertex
backend. The `claude-code` backend uses local Claude Code authentication instead
backend. The `claude-code` and `codex` backends use local authentication instead
of Anthropic API key or Vertex flags. For Claude Code, `--llm-model` accepts
`sonnet`, `opus`, `haiku`, or a full Claude model ID.
`sonnet`, `opus`, `haiku`, or a full Claude model ID. For Codex, `--llm-model`
accepts `codex`, `default`, or a `gpt-*` / `codex-*` model ID such as
`gpt-5.5`; any other value is rejected before the auth probe. Run `codex` to
see the models available to your login, and pick a `gpt-*` / `codex-*` id from
that list. Note that `*-codex` API-billing model IDs (for example
`gpt-5.3-codex`) are not available to ChatGPT-subscription logins.
### Embeddings
@ -191,6 +197,17 @@ ktx setup \
--llm-backend claude-code \
--llm-model opus
# Configure **ktx** to use local Codex authentication for LLM work
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
When you choose `--llm-backend codex`, setup prints a warning if the public
Codex SDK and CLI surface cannot prove full Claude-Code-style isolation. The
backend restricts **ktx** runtime MCP tools to each run, but Codex may still
load user Codex config and built-in command execution or read-only file
capabilities.
```bash
# Script a Postgres connection that reads its URL from the environment
ktx setup \
--project-dir ./analytics \

View file

@ -21,7 +21,7 @@ ktx status [options]
| `--json` | Print JSON output | `false` |
| `-v`, `--verbose` | Show every check, including passing ones | `false` |
| `--validate` | Only validate the `ktx.yaml` schema; skip readiness checks | `false` |
| `--fast` | Skip checks that require external communication (query-history readiness probes and Claude Code auth probe) | `false` |
| `--fast` | Skip checks that require external communication (query-history readiness probes, Claude Code auth probe, and Codex auth probe) | `false` |
| `--no-input` | Disable interactive terminal input | - |
## Examples
@ -39,7 +39,7 @@ ktx status --verbose
# Validate ktx.yaml without running readiness checks
ktx status --validate
# Skip slow probes (query-history readiness, Claude Code auth)
# Skip slow probes (query-history readiness, Claude Code auth, Codex auth)
ktx status --fast
# Check a project from another directory
@ -57,6 +57,16 @@ flow, then rerun `ktx status`. Use `--fast` to skip this probe (useful in CI
or offline contexts); skipped checks render as `-` and carry
`"status": "skipped"` in JSON output.
For `llm.provider.backend: codex`, `ktx status` runs a minimal non-interactive
Codex request. If the probe fails, authenticate Codex locally with the Codex CLI
and verify the Codex CLI installation.
When `llm.provider.backend: codex` is configured, `ktx status` also prints a
warning when the installed public Codex SDK and CLI surface cannot prove full
Claude-Code-style isolation. The warning does not block authenticated Codex
usage, but it marks the project status as partial so you can make an explicit
runtime-isolation decision.
A `Local data` section summarises what the project has accumulated locally:
ingest run counts, last completed timestamp per connection, knowledge page
counts by scope, semantic-layer source and dictionary value counts, and the

View file

@ -376,13 +376,23 @@ llm:
| Field | Type | Default | Purpose |
|-------|------|---------|---------|
| `provider.backend` | `none` \| `anthropic` \| `vertex` \| `gateway` \| `claude-code` | `none` | Selected backend. `none` disables LLM features. `claude-code` uses the local Claude Code session and needs no API key. |
| `provider.backend` | `none` \| `anthropic` \| `vertex` \| `gateway` \| `claude-code` \| `codex` | `none` | Selected backend. `none` disables LLM features. `claude-code` uses the local Claude Code session and needs no API key. `codex` uses local Codex authentication and needs no API key. |
| `provider.anthropic.api_key` | `string` | - | Anthropic API key. Required when `backend: anthropic`. Accepts `env:` or `file:` references. |
| `provider.anthropic.base_url` | `string` | - | Override the Anthropic API base URL (proxy, self-hosted gateway). |
| `provider.gateway.api_key` / `base_url` | `string` | - | Credentials for an AI Gateway provider. Required when `backend: gateway`. |
| `provider.vertex.project` | `string` | - | Google Cloud project ID hosting the Vertex AI endpoint. |
| `provider.vertex.location` | `string` | - | Vertex AI region (for example `us-east5`). Required when the `vertex` block is present. |
Use `codex` when local Codex authentication should power **ktx** LLM work:
```yaml
llm:
provider:
backend: codex
models:
default: gpt-5.5
```
### Model roles
`models` overrides the per-role model. Keys are fixed; values are

View file

@ -39,8 +39,20 @@ ktx ingest --all
Enriched ingest needs a configured model and embeddings. Run `ktx setup` first;
connections without that configuration fail before any work starts.
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools for the
current run.
Local-auth backends keep provider credentials out of `ktx.yaml`:
```bash
ktx setup --llm-backend claude-code --no-input
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools
for the current run. With `codex`, **ktx** restricts the temporary runtime MCP
server to the current run's tool set, disables Codex web search, requests a
read-only sandbox, and sets `approval_policy=never`. The public Codex SDK and
CLI surface may still load user Codex config and built-in command execution or
read-only file capabilities, so use `claude-code` for stricter runtime tool
isolation.
## Query history

View file

@ -16,6 +16,7 @@ Set `llm.provider.backend` to one of these values:
- `gateway`: Use AI Gateway-compatible Anthropic model ids.
- `claude-code`: Use your local Claude Code session through the Claude Agent
SDK. **ktx** strips provider-routing environment variables from child processes.
- `codex`: Use your local Codex authentication through the Codex SDK.
## Claude Code
@ -47,6 +48,42 @@ model IDs are also accepted.
metadata may still list host slash commands, skills, and subagents; **ktx** does not
grant execution access to them.
## Codex backend
Use `codex` when you want **ktx** to run LLM-backed workflows through your
local Codex authentication instead of a direct provider API key.
```yaml
llm:
provider:
backend: codex
models:
default: gpt-5.5
```
Configure it non-interactively:
```bash
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
This is separate from Codex agent-client setup. `ktx setup --agents --target
codex` installs instructions and MCP access for an end-user Codex session.
`ktx setup --llm-backend codex` makes **ktx** itself execute ingest, scan
enrichment, memory, and other LLM-backed work through Codex.
During runtime loops, **ktx** starts a temporary loopback MCP server for the
current run, exposes only the tools passed to that run, asks Codex to use a
read-only sandbox, sets `approval_policy=never`, auto-approves only those
run-scoped MCP tools, and disables Codex web search.
Codex backend isolation is currently limited by the public Codex SDK and CLI
surface. Codex may still load user Codex config and built-in command execution
or read-only file capabilities. Use `llm.provider.backend: claude-code` when
you need stricter Claude-Code-style runtime tool isolation, or remove host
Codex MCP and tool config before running untrusted prompts through the `codex`
backend.
## Prompt caching
`llm.promptCaching` has partial parity on `claude-code`. Status and doctor warn

View file

@ -37,6 +37,9 @@
"@semantic-release/release-notes-generator",
"conventional-changelog-conventionalcommits"
],
"ignore": [
".context/**"
],
"ignoreBinaries": [
"uv",
"lsof"

View file

@ -32,6 +32,7 @@
"setup:dev": "node scripts/setup-dev.mjs",
"release:published-smoke": "node scripts/published-package-smoke.mjs --require-config",
"release:local-embeddings-smoke": "node scripts/local-embeddings-runtime-smoke.mjs --require-opt-in",
"release:codex-backend-smoke": "node scripts/codex-backend-live-smoke.mjs",
"release:readiness": "node scripts/release-readiness.mjs",
"release:update-version": "node scripts/update-public-release-version.mjs",
"relationships:acquire-public-fixtures": "node scripts/acquire-public-benchmark-fixtures.mjs",

View file

@ -56,6 +56,7 @@
"@looker/sdk-rtl": "^21.6.5",
"@modelcontextprotocol/sdk": "^1.29.0",
"@notionhq/client": "^5.22.0",
"@openai/codex-sdk": "^0.133.0",
"ai": "^6.0.188",
"better-sqlite3": "^12.10.0",
"commander": "14.0.3",

View file

@ -29,7 +29,7 @@ function embeddingBackend(value: string): 'openai' | 'sentence-transformers' {
}
function llmBackend(value: string): KtxSetupLlmBackend {
if (value === 'anthropic' || value === 'vertex' || value === 'claude-code') {
if (value === 'anthropic' || value === 'vertex' || value === 'claude-code' || value === 'codex') {
return value;
}
throw new InvalidArgumentError(`invalid choice '${value}'`);

View file

@ -611,9 +611,10 @@ function nextLocalJobId(): string {
function localIngestLlmProviderGuardMessage(projectDir: string): string {
return [
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
'Configure a local Claude Code session or API-backed LLM, then rerun ingest:',
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.',
'Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:',
` ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`,
` ktx setup --project-dir ${projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`,
` ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`,
].join('\n');
}

View file

@ -0,0 +1,194 @@
import type { LlmTokenUsage, RunLoopStopReason } from './runtime-port.js';
export interface CodexExecEventSummary {
finalText: string;
stopReason: RunLoopStopReason;
usage: LlmTokenUsage;
stepCount: number;
stepBoundariesMs: number[];
toolCallCount: number;
toolFailures: string[];
error?: Error;
}
interface CodexEventParseOptions {
startedAt?: number;
now?: () => number;
}
function record(value: unknown): Record<string, unknown> | undefined {
return value && typeof value === 'object' ? (value as Record<string, unknown>) : undefined;
}
/**
* Codex thread items that represent a discrete agent action consuming one loop
* step. The step budget caps the total number of these regardless of which
* capability the agent reaches for, so built-in `command_execution` (and any
* file/web action the public Codex surface still exposes) count alongside our
* own `mcp_tool_call` items rather than only the MCP ones.
*/
const AGENT_STEP_ITEM_TYPES = new Set(['command_execution', 'mcp_tool_call', 'file_change', 'web_search']);
export function isCompletedAgentStep(event: unknown): boolean {
const eventRecord = record(event);
if (eventRecord?.type !== 'item.completed') {
return false;
}
const itemType = record(eventRecord.item)?.type;
return typeof itemType === 'string' && AGENT_STEP_ITEM_TYPES.has(itemType);
}
function text(value: unknown): string | undefined {
return typeof value === 'string' && value.trim().length > 0 ? value : undefined;
}
function numberValue(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
}
function usageFrom(value: unknown): LlmTokenUsage {
const usage = record(value);
if (!usage) {
return {};
}
const inputTokens = numberValue(usage.input_tokens ?? usage.inputTokens);
const outputTokens = numberValue(usage.output_tokens ?? usage.outputTokens);
const explicitTotalTokens = numberValue(usage.total_tokens ?? usage.totalTokens);
const totalTokens =
explicitTotalTokens ??
(inputTokens !== undefined && outputTokens !== undefined ? inputTokens + outputTokens : undefined);
return {
...(inputTokens !== undefined ? { inputTokens } : {}),
...(outputTokens !== undefined ? { outputTokens } : {}),
...(totalTokens !== undefined ? { totalTokens } : {}),
};
}
function stopReasonFrom(value: unknown): RunLoopStopReason {
const reason = text(value)?.toLowerCase();
if (reason && /(budget|max_turn|max-turn|limit)/.test(reason)) {
return 'budget';
}
return 'natural';
}
function errorMessageFrom(value: unknown): string {
if (value instanceof Error) {
return value.message;
}
const asRecord = record(value);
const message = text(asRecord?.message);
return message ?? text(value) ?? 'Codex turn failed';
}
/**
* Codex serializes API failures as a JSON envelope inside the event message
* (e.g. `{"type":"error","status":400,"error":{"message":"…"}}`). Surface the
* human-readable inner message so callers don't leak raw JSON; pass plain
* strings through unchanged.
*/
function unwrapCodexApiErrorMessage(raw: string): string {
const trimmed = raw.trim();
if (!trimmed.startsWith('{')) {
return raw;
}
try {
const parsed = record(JSON.parse(trimmed));
return text(record(parsed?.error)?.message) ?? text(parsed?.message) ?? raw;
} catch {
return raw;
}
}
/** @internal */
export function parseCodexExecEventLine(line: string): unknown {
try {
return JSON.parse(line) as unknown;
} catch (error) {
throw new Error(`Codex JSONL event stream was malformed: ${error instanceof Error ? error.message : String(error)}`);
}
}
export function summarizeCodexExecEvents(
events: Iterable<unknown>,
options: CodexEventParseOptions = {},
): CodexExecEventSummary {
const startedAt = options.startedAt ?? Date.now();
const now = options.now ?? Date.now;
let finalText = '';
let stopReason: RunLoopStopReason = 'natural';
let usage: LlmTokenUsage = {};
let turnCount = 0;
let completedStepCount = 0;
const stepBoundariesMs: number[] = [];
let toolCallCount = 0;
const toolFailures: string[] = [];
let error: Error | undefined;
for (const event of events) {
const eventRecord = record(event);
const eventType = text(eventRecord?.type);
if (!eventRecord || !eventType) {
continue;
}
if (eventType === 'turn.started') {
turnCount += 1;
continue;
}
const item = record(eventRecord.item);
const itemType = text(item?.type);
if (eventType === 'item.started' && itemType === 'mcp_tool_call') {
toolCallCount += 1;
continue;
}
if (isCompletedAgentStep(event)) {
completedStepCount += 1;
stepBoundariesMs.push(now() - startedAt);
// Only MCP tool calls fail the loop: a non-zero `command_execution` exit
// is normal agent exploration, not a runtime error. `status` is the
// authoritative signal (the SDK always sets it); the SDK also serializes
// `error: null` on successful calls, so an explicit-null `error` must NOT
// be read as a failure — only a populated error object counts.
if (itemType === 'mcp_tool_call' && (item?.status === 'failed' || (item?.error !== undefined && item?.error !== null))) {
const name = text(item?.name) ?? text(item?.tool) ?? text(item?.tool_name) ?? 'unknown';
toolFailures.push(`${name}: ${errorMessageFrom(item?.error)}`);
}
continue;
}
if (eventType === 'item.completed' && itemType === 'agent_message') {
finalText = text(item?.text) ?? finalText;
continue;
}
if (eventType === 'turn.completed') {
usage = usageFrom(eventRecord.usage);
if (completedStepCount === 0) {
stepBoundariesMs.push(now() - startedAt);
}
stopReason = stopReasonFrom(eventRecord.reason ?? eventRecord.stop_reason ?? eventRecord.terminal_reason);
continue;
}
if (eventType === 'turn.failed' || eventType === 'error') {
stopReason = 'error';
error = new Error(unwrapCodexApiErrorMessage(errorMessageFrom(eventRecord.error ?? eventRecord.message)));
continue;
}
}
return {
finalText,
stopReason,
usage,
stepCount: completedStepCount > 0 ? completedStepCount : turnCount,
stepBoundariesMs,
toolCallCount,
toolFailures,
...(error ? { error } : {}),
};
}

View file

@ -0,0 +1,9 @@
export const CODEX_ISOLATION_WARNING =
'Codex backend isolation is limited by the public Codex SDK/CLI surface: ktx restricts the runtime MCP server to the current ktx tool set, disables Codex web search, asks for a read-only sandbox, and sets approval_policy=never, but Codex may still load user Codex config and built-in command execution or read-only file capabilities.';
export const CODEX_ISOLATION_WARNING_FIX =
'Use llm.provider.backend: claude-code when you need stricter Claude-Code-style runtime tool isolation, or remove host Codex MCP/tool config before running untrusted prompts through the codex backend.';
export function formatCodexIsolationWarning(): string {
return `${CODEX_ISOLATION_WARNING} ${CODEX_ISOLATION_WARNING_FIX}`;
}

View file

@ -0,0 +1,87 @@
import { randomBytes } from 'node:crypto';
import type { Server } from 'node:http';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import type { KtxMcpServerLike } from '../mcp/types.js';
import { runKtxMcpHttpServer, type KtxMcpHttpServerHandle } from '../../mcp-http-server.js';
import type { KtxRuntimeToolSet } from './runtime-port.js';
import { normalizeKtxRuntimeToolOutput } from './runtime-tools.js';
/** @internal */
export interface CreateCodexRuntimeMcpServerInput {
server?: KtxMcpServerLike;
toolSet: KtxRuntimeToolSet;
}
export interface CodexRuntimeMcpServerHandle {
url: string;
bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN';
bearerToken: string;
close(): Promise<void>;
}
type RunServer = typeof runKtxMcpHttpServer;
export interface StartCodexRuntimeMcpServerInput {
projectDir: string;
toolSet: KtxRuntimeToolSet;
runServer?: RunServer;
}
/** @internal */
export function createCodexRuntimeMcpServer(input: CreateCodexRuntimeMcpServerInput): KtxMcpServerLike {
const server =
input.server ??
(new McpServer({
name: 'ktx-runtime',
version: '0.0.0',
}) as KtxMcpServerLike);
for (const descriptor of Object.values(input.toolSet)) {
server.registerTool(
descriptor.name,
{
description: descriptor.description,
inputSchema: descriptor.inputSchema.shape,
},
async (toolInput) => {
const normalized = normalizeKtxRuntimeToolOutput(await descriptor.execute(toolInput));
return {
content: [{ type: 'text', text: normalized.markdown }],
...(normalized.structured !== undefined && normalized.structured !== null && typeof normalized.structured === 'object'
? { structuredContent: normalized.structured as object }
: {}),
};
},
);
}
return server;
}
function serverPort(server: Server, fallback: number): number {
const address = server.address();
return typeof address === 'object' && address ? address.port : fallback;
}
export async function startCodexRuntimeMcpServer(
input: StartCodexRuntimeMcpServerInput,
): Promise<CodexRuntimeMcpServerHandle> {
const bearerToken = randomBytes(32).toString('hex');
const runServer = input.runServer ?? runKtxMcpHttpServer;
const handle = (await runServer({
projectDir: input.projectDir,
host: '127.0.0.1',
port: 0,
token: bearerToken,
allowedHosts: ['127.0.0.1', 'localhost'],
allowedOrigins: [],
createMcpServer: () => createCodexRuntimeMcpServer({ toolSet: input.toolSet }) as McpServer,
})) as KtxMcpHttpServerHandle;
const port = serverPort(handle.server, 0);
return {
url: `http://127.0.0.1:${port}/mcp`,
bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN',
bearerToken,
close: () => handle.close(),
};
}

View file

@ -0,0 +1,20 @@
export const DEFAULT_CODEX_MODEL = 'gpt-5.5';
const CODEX_MODEL_ALIASES: Record<string, string> = {
codex: DEFAULT_CODEX_MODEL,
default: DEFAULT_CODEX_MODEL,
};
const EXPLICIT_CODEX_MODEL_ID = /^(?:gpt|codex)-[a-z0-9][a-z0-9._-]*$/i;
export function resolveCodexModel(model: string): string {
const normalized = model.trim();
const alias = CODEX_MODEL_ALIASES[normalized];
if (alias) {
return alias;
}
if (EXPLICIT_CODEX_MODEL_ID.test(normalized)) {
return normalized;
}
throw new Error(`Unsupported Codex model "${model}". Use codex, default, or a gpt-* / codex-* model id.`);
}

View file

@ -0,0 +1,38 @@
interface CodexRuntimeMcpConfig {
url: string;
bearerTokenEnvVar: string;
bearerToken: string;
toolNames: string[];
}
export interface BuildCodexRuntimeConfigInput {
model: string;
mcp?: CodexRuntimeMcpConfig;
}
export interface CodexRuntimeConfig {
configOverrides: Record<string, unknown>;
env: Record<string, string>;
}
export function buildCodexRuntimeConfig(input: BuildCodexRuntimeConfigInput): CodexRuntimeConfig {
const configOverrides: Record<string, unknown> = {
history: { persistence: 'none' },
};
const env: Record<string, string> = {};
if (input.mcp) {
configOverrides.mcp_servers = {
ktx: {
url: input.mcp.url,
bearer_token_env_var: input.mcp.bearerTokenEnvVar,
enabled_tools: input.mcp.toolNames,
default_tools_approval_mode: 'approve',
required: true,
},
};
env[input.mcp.bearerTokenEnvVar] = input.mcp.bearerToken;
}
return { configOverrides, env };
}

View file

@ -0,0 +1,371 @@
import { z } from 'zod';
import { noopLogger, type KtxLogger } from '../core/config.js';
import { isCompletedAgentStep, summarizeCodexExecEvents, type CodexExecEventSummary } from './codex-exec-events.js';
import {
startCodexRuntimeMcpServer,
type CodexRuntimeMcpServerHandle,
} from './codex-mcp-runtime-server.js';
import { resolveCodexModel } from './codex-models.js';
import { buildCodexRuntimeConfig } from './codex-runtime-config.js';
import { CodexSdkCliRunner, type CodexSdkRunner } from './codex-sdk-runner.js';
import type {
KtxGenerateObjectInput,
KtxGenerateTextInput,
KtxLlmRuntimePort,
KtxRuntimeToolSet,
LlmTokenUsage,
RunLoopParams,
RunLoopResult,
} from './runtime-port.js';
export interface CodexKtxLlmRuntimeDeps {
projectDir: string;
modelSlots: { default: string } & Partial<Record<string, string>>;
runner?: CodexSdkRunner;
startMcpServer?: (input: { projectDir: string; toolSet: KtxRuntimeToolSet }) => Promise<CodexRuntimeMcpServerHandle>;
logger?: KtxLogger;
}
function modelForRole(modelSlots: CodexKtxLlmRuntimeDeps['modelSlots'], role: string): string {
return resolveCodexModel(modelSlots[role] ?? modelSlots.default);
}
function promptWithSystem(system: string | undefined, prompt: string): string {
return [system, prompt].filter(Boolean).join('\n\n');
}
interface CollectCodexEventsOptions {
stepBudget?: number;
abortController?: AbortController;
onStep?: (stepIndex: number) => void | Promise<void>;
}
interface CollectCodexEventsResult {
events: unknown[];
budgetExceeded: boolean;
streamError?: Error;
}
function eventRecord(value: unknown): Record<string, unknown> | undefined {
return value && typeof value === 'object' ? (value as Record<string, unknown>) : undefined;
}
function isTurnCompleted(event: unknown): boolean {
return eventRecord(event)?.type === 'turn.completed';
}
/**
* Drains the Codex stream once, emitting a step as each agent action completes
* so callers see live progress and the step budget is enforced mid-run. Every
* completed agent-action item counts (see {@link isCompletedAgentStep}), so
* built-in `command_execution` steps decrement the budget the same as
* `mcp_tool_call`s. A turn that produced no actions still counts as one step,
* matching the metrics summary and the AI SDK backend.
*/
async function collectEvents(
events: AsyncIterable<unknown>,
options: CollectCodexEventsOptions = {},
): Promise<CollectCodexEventsResult> {
const collected: unknown[] = [];
let completedSteps = 0;
let sawActionStep = false;
let budgetExceeded = false;
let streamError: Error | undefined;
// The SDK yields every stdout event, then throws on a non-zero codex exec
// exit. Catch that throw so the events already collected (which carry the
// real `turn.failed`/`error` reason) survive for the summary; the masked
// exit message is kept only as a fallback when no error event was emitted.
try {
for await (const event of events) {
collected.push(event);
const isActionStep = isCompletedAgentStep(event);
if (isActionStep) {
sawActionStep = true;
} else if (sawActionStep || !isTurnCompleted(event)) {
// Only fall back to counting a bare turn as a step when the turn produced
// no agent actions; a completed turn is terminal, so it never aborts.
continue;
}
completedSteps += 1;
await options.onStep?.(completedSteps);
if (isActionStep && options.stepBudget !== undefined && completedSteps >= options.stepBudget) {
budgetExceeded = true;
options.abortController?.abort();
break;
}
}
} catch (error) {
streamError = error instanceof Error ? error : new Error(String(error));
}
return { events: collected, budgetExceeded, ...(streamError ? { streamError } : {}) };
}
function metrics(summary: CodexExecEventSummary, startedAt: number): { totalMs: number; usage: LlmTokenUsage } {
return { totalMs: Date.now() - startedAt, usage: summary.usage };
}
function summaryError(summary: CodexExecEventSummary, streamError?: Error): Error | undefined {
// A `turn.failed`/`error` event carries the real reason; prefer it over the
// SDK's generic non-zero-exit throw. Fall back to the stream error only when
// no event explained the failure (e.g. spawn failure or auth before a turn).
if (summary.error) {
return summary.error;
}
if (summary.toolFailures.length > 0) {
return new Error(`Codex runtime tool call failed: ${summary.toolFailures.join('; ')}`);
}
return streamError;
}
function assertSuccessfulText(summary: CodexExecEventSummary, streamError?: Error): string {
const error = summaryError(summary, streamError);
if (error) {
throw error;
}
if (!summary.finalText.trim()) {
throw new Error('Codex completed without an agent message');
}
return summary.finalText;
}
function parseStructuredOutput<TOutput, TSchema extends z.ZodType<TOutput>>(schema: TSchema, text: string): TOutput {
try {
return schema.parse(JSON.parse(text));
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
throw new Error(`Codex structured output failed validation: ${message}`);
}
}
async function mcpForTools(input: {
projectDir: string;
toolSet?: KtxRuntimeToolSet;
startMcpServer: CodexKtxLlmRuntimeDeps['startMcpServer'];
}): Promise<CodexRuntimeMcpServerHandle | undefined> {
if (!input.toolSet || Object.keys(input.toolSet).length === 0) {
return undefined;
}
return (input.startMcpServer ?? startCodexRuntimeMcpServer)({
projectDir: input.projectDir,
toolSet: input.toolSet,
});
}
function runtimeToolNames(toolSet: KtxRuntimeToolSet | undefined): string[] {
return Object.values(toolSet ?? {}).map((descriptor) => descriptor.name);
}
export class CodexKtxLlmRuntime implements KtxLlmRuntimePort {
private readonly runner: CodexSdkRunner;
private readonly logger: KtxLogger;
constructor(private readonly deps: CodexKtxLlmRuntimeDeps) {
this.runner = deps.runner ?? new CodexSdkCliRunner();
this.logger = deps.logger ?? noopLogger;
}
async generateText(input: KtxGenerateTextInput): Promise<string> {
const startedAt = Date.now();
const model = modelForRole(this.deps.modelSlots, input.role);
const mcp = await mcpForTools({
projectDir: this.deps.projectDir,
toolSet: input.tools,
startMcpServer: this.deps.startMcpServer,
});
try {
const config = buildCodexRuntimeConfig({
model,
...(mcp
? {
mcp: {
url: mcp.url,
bearerTokenEnvVar: mcp.bearerTokenEnvVar,
bearerToken: mcp.bearerToken,
toolNames: runtimeToolNames(input.tools),
},
}
: {}),
});
const collected = await collectEvents(
await this.runner.runStreamed({
projectDir: this.deps.projectDir,
model,
prompt: promptWithSystem(input.system, input.prompt),
configOverrides: config.configOverrides,
env: config.env,
}),
);
const summary = summarizeCodexExecEvents(collected.events, { startedAt });
input.onMetrics?.(metrics(summary, startedAt));
return assertSuccessfulText(summary, collected.streamError);
} finally {
await mcp?.close();
}
}
async generateObject<TOutput, TSchema extends z.ZodType<TOutput>>(
input: KtxGenerateObjectInput<TOutput, TSchema>,
): Promise<TOutput> {
const startedAt = Date.now();
const model = modelForRole(this.deps.modelSlots, input.role);
const mcp = await mcpForTools({
projectDir: this.deps.projectDir,
toolSet: input.tools,
startMcpServer: this.deps.startMcpServer,
});
try {
const config = buildCodexRuntimeConfig({
model,
...(mcp
? {
mcp: {
url: mcp.url,
bearerTokenEnvVar: mcp.bearerTokenEnvVar,
bearerToken: mcp.bearerToken,
toolNames: runtimeToolNames(input.tools),
},
}
: {}),
});
const collected = await collectEvents(
await this.runner.runStreamed({
projectDir: this.deps.projectDir,
model,
prompt: promptWithSystem(input.system, input.prompt),
configOverrides: config.configOverrides,
env: config.env,
outputSchema: z.toJSONSchema(input.schema, { target: 'draft-7' }) as Record<string, unknown>,
}),
);
const summary = summarizeCodexExecEvents(collected.events, { startedAt });
input.onMetrics?.(metrics(summary, startedAt));
return parseStructuredOutput(input.schema, assertSuccessfulText(summary, collected.streamError));
} finally {
await mcp?.close();
}
}
async runAgentLoop(params: RunLoopParams): Promise<RunLoopResult> {
const startedAt = Date.now();
const model = modelForRole(this.deps.modelSlots, params.modelRole);
let mcp: CodexRuntimeMcpServerHandle | undefined;
try {
mcp = await mcpForTools({
projectDir: this.deps.projectDir,
toolSet: params.toolSet,
startMcpServer: this.deps.startMcpServer,
});
const config = buildCodexRuntimeConfig({
model,
...(mcp
? {
mcp: {
url: mcp.url,
bearerTokenEnvVar: mcp.bearerTokenEnvVar,
bearerToken: mcp.bearerToken,
toolNames: runtimeToolNames(params.toolSet),
},
}
: {}),
});
const abortController = new AbortController();
const onStep = async (stepIndex: number): Promise<void> => {
try {
await params.onStepFinish?.({ stepIndex, stepBudget: params.stepBudget });
} catch (error) {
this.logger.warn(
`[codex-runner] onStepFinish callback threw; ignoring: ${error instanceof Error ? error.message : String(error)}`,
);
}
};
const collected = await collectEvents(
await this.runner.runStreamed({
projectDir: this.deps.projectDir,
model,
prompt: promptWithSystem(params.systemPrompt, params.userPrompt),
configOverrides: config.configOverrides,
env: config.env,
signal: abortController.signal,
}),
{ stepBudget: params.stepBudget, abortController, onStep },
);
const summary = summarizeCodexExecEvents(collected.events, { startedAt });
const error = summaryError(summary, collected.streamError);
const stopReason = collected.budgetExceeded ? 'budget' : error ? 'error' : summary.stopReason;
return {
stopReason,
...(stopReason === 'error' && error ? { error } : {}),
metrics: {
totalMs: Date.now() - startedAt,
usage: summary.usage,
stepCount: summary.stepCount,
stepBoundariesMs: summary.stepBoundariesMs,
},
};
} catch (error) {
const err = error instanceof Error ? error : new Error(String(error));
return {
stopReason: 'error',
error: err,
metrics: { totalMs: Date.now() - startedAt, usage: {}, stepCount: 0, stepBoundariesMs: [] },
};
} finally {
await mcp?.close();
}
}
}
// A rejected model is not an auth failure: Codex authenticated, connected, and
// the API refused the model id. These markers come from the API error envelope
// (e.g. "model is not supported", "invalid_request_error").
const MODEL_UNAVAILABLE_MARKERS =
/\bnot supported\b|\bnot available\b|\bdoes not exist\b|invalid_request_error|\bunknown model\b|\bunsupported model\b/i;
function describeCodexProbeFailure(model: string, message: string): { message: string; fix: string } {
if (MODEL_UNAVAILABLE_MARKERS.test(message)) {
const fix = `Run \`codex\` to see the models your account supports, then set llm.models.default in ktx.yaml (or rerun \`ktx setup\`).`;
return {
message: `Codex is authenticated, but the configured model "${model}" is not available for this Codex account. ${fix} Details: ${message}`,
fix,
};
}
const fix = `Authenticate Codex locally with the Codex CLI, verify the Codex CLI is installed, then rerun setup or \`ktx status\`.`;
return {
message: `Codex authentication is not usable. ${fix} Details: ${message}`,
fix,
};
}
export async function runCodexAuthProbe(input: {
projectDir: string;
model: string;
runner?: CodexSdkRunner;
}): Promise<{ ok: true } | { ok: false; message: string; fix: string }> {
let model: string;
try {
model = resolveCodexModel(input.model);
} catch (error) {
return {
ok: false,
message: error instanceof Error ? error.message : String(error),
fix: 'Set llm.models.default in ktx.yaml to a supported codex model (codex, default, or a gpt-* / codex-* id), or rerun `ktx setup`.',
};
}
const runtime = new CodexKtxLlmRuntime({
projectDir: input.projectDir,
modelSlots: { default: model },
...(input.runner ? { runner: input.runner } : {}),
});
try {
await runtime.generateText({ role: 'default', prompt: 'Reply with exactly: ok' });
return { ok: true };
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return { ok: false, ...describeCodexProbeFailure(model, message) };
}
}

View file

@ -0,0 +1,96 @@
import { Codex, type CodexOptions, type ThreadOptions, type TurnOptions } from '@openai/codex-sdk';
export interface CodexSdkRunnerInput {
projectDir: string;
model: string;
prompt: string;
configOverrides?: Record<string, unknown>;
env?: Record<string, string>;
outputSchema?: Record<string, unknown>;
signal?: AbortSignal;
}
export interface CodexSdkRunner {
runStreamed(input: CodexSdkRunnerInput): Promise<AsyncIterable<unknown>>;
}
type CodexThread = {
runStreamed(input: string, turnOptions?: TurnOptions): Promise<{ events: AsyncIterable<unknown> }>;
};
type CodexClient = {
startThread(options: ThreadOptions): CodexThread;
};
type CodexConstructor = new (options?: CodexOptions) => CodexClient;
export interface CodexSdkCliRunnerOptions {
envBase?: NodeJS.ProcessEnv;
codexPathOverride?: string;
}
const CODEX_ENV_ALLOWLIST = new Set([
'HOME',
'USERPROFILE',
'APPDATA',
'LOCALAPPDATA',
'XDG_CONFIG_HOME',
'CODEX_HOME',
'CODEX_API_KEY',
'OPENAI_API_KEY',
'PATH',
'Path',
'SYSTEMROOT',
'COMSPEC',
'TMPDIR',
'TMP',
'TEMP',
'SSL_CERT_FILE',
'SSL_CERT_DIR',
'NODE_EXTRA_CA_CERTS',
'HTTPS_PROXY',
'HTTP_PROXY',
'ALL_PROXY',
'NO_PROXY',
]);
function buildCodexSdkEnv(baseEnv: NodeJS.ProcessEnv, overrides: Record<string, string> | undefined): Record<string, string> {
const env: Record<string, string> = {};
for (const key of CODEX_ENV_ALLOWLIST) {
const value = baseEnv[key];
if (typeof value === 'string') {
env[key] = value;
}
}
return { ...env, ...(overrides ?? {}) };
}
export class CodexSdkCliRunner implements CodexSdkRunner {
constructor(private readonly options: CodexSdkCliRunnerOptions = {}) {}
async runStreamed(input: CodexSdkRunnerInput): Promise<AsyncIterable<unknown>> {
const CodexClass = Codex as CodexConstructor;
const codex = new CodexClass({
...(input.configOverrides ? { config: input.configOverrides as CodexOptions['config'] } : {}),
env: buildCodexSdkEnv(this.options.envBase ?? process.env, input.env),
...(this.options.codexPathOverride ? { codexPathOverride: this.options.codexPathOverride } : {}),
});
const thread = codex.startThread({
workingDirectory: input.projectDir,
skipGitRepoCheck: true,
model: input.model,
sandboxMode: 'read-only',
webSearchMode: 'disabled',
approvalPolicy: 'never',
});
const turnOptions: TurnOptions = {
...(input.outputSchema ? { outputSchema: input.outputSchema } : {}),
...(input.signal ? { signal: input.signal } : {}),
};
const streamed = await thread.runStreamed(
input.prompt,
Object.keys(turnOptions).length > 0 ? turnOptions : undefined,
);
return streamed.events;
}
}

View file

@ -5,6 +5,7 @@ import { resolveKtxConfigReference } from '../core/config-reference.js';
import type { KtxProjectEmbeddingConfig, KtxProjectLlmConfig } from '../project/config.js';
import { AiSdkKtxLlmRuntime } from './ai-sdk-runtime.js';
import { ClaudeCodeKtxLlmRuntime } from './claude-code-runtime.js';
import { CodexKtxLlmRuntime } from './codex-runtime.js';
import type { KtxLlmRuntimePort } from './runtime-port.js';
interface LocalConfigDeps {
@ -13,6 +14,7 @@ interface LocalConfigDeps {
createKtxLlmProvider?: typeof createKtxLlmProvider;
createKtxEmbeddingProvider?: typeof createKtxEmbeddingProvider;
createClaudeCodeRuntime?: (deps: ConstructorParameters<typeof ClaudeCodeKtxLlmRuntime>[0]) => KtxLlmRuntimePort;
createCodexRuntime?: (deps: ConstructorParameters<typeof CodexKtxLlmRuntime>[0]) => KtxLlmRuntimePort;
createAiSdkRuntime?: (deps: { llmProvider: KtxLlmProvider }) => KtxLlmRuntimePort;
}
@ -104,7 +106,7 @@ export function createLocalKtxLlmProviderFromConfig(
deps: LocalConfigDeps = {},
): KtxLlmProvider | null {
const resolved = resolveLocalKtxLlmConfig(config, deps.env ?? process.env);
if (!resolved || resolved.backend === 'claude-code') {
if (!resolved || resolved.backend === 'claude-code' || resolved.backend === 'codex') {
return null;
}
return (deps.createKtxLlmProvider ?? createKtxLlmProvider)(resolved);
@ -129,6 +131,16 @@ export function createLocalKtxLlmRuntimeFromConfig(
env: deps.env,
});
}
if (resolved.backend === 'codex') {
const projectDir = deps.projectDir;
if (!projectDir) {
throw new Error('projectDir is required when creating the codex LLM runtime');
}
return (deps.createCodexRuntime ?? ((runtimeDeps) => new CodexKtxLlmRuntime(runtimeDeps)))({
projectDir,
modelSlots: resolved.modelSlots,
});
}
const llmProvider = (deps.createKtxLlmProvider ?? createKtxLlmProvider)(resolved);
return (deps.createAiSdkRuntime ?? ((runtimeDeps) => new AiSdkKtxLlmRuntime(runtimeDeps)))({ llmProvider });
}

View file

@ -3,7 +3,7 @@ import YAML from 'yaml';
import * as z from 'zod';
import { connectionConfigSchema } from './driver-schemas.js';
const KTX_LLM_BACKENDS = ['none', 'anthropic', 'vertex', 'gateway', 'claude-code'] as const;
const KTX_LLM_BACKENDS = ['none', 'anthropic', 'vertex', 'gateway', 'claude-code', 'codex'] as const;
const KTX_EMBEDDING_BACKENDS = ['none', 'openai', 'sentence-transformers'] as const;
const KTX_PROMPT_CACHE_TTLS = ['5m', '1h'] as const;
const KTX_ENRICHMENT_MODES = ['none', 'deterministic', 'llm'] as const;
@ -38,7 +38,7 @@ const llmProviderSchema = z
.enum(KTX_LLM_BACKENDS)
.default('none')
.describe(
'LLM provider backend. "none" disables LLM features; "anthropic" / "vertex" / "gateway" require the matching nested credentials block; "claude-code" uses the local Claude Code session.',
'LLM provider backend. "none" disables LLM features; "anthropic" / "vertex" / "gateway" require the matching nested credentials block; "claude-code" uses the local Claude Code session; "codex" uses the local Codex session.',
),
vertex: vertexProviderSchema.optional().describe('Vertex AI credentials, used when backend is "vertex".'),
anthropic: apiCredentialsSchema.optional().describe('Anthropic API credentials, used when backend is "anthropic".'),

View file

@ -3,7 +3,7 @@ import type { LanguageModel, TelemetrySettings, ToolCallRepairFunction, ToolSet
export const KTX_MODEL_ROLES = ['default', 'triage', 'candidateExtraction', 'curator', 'reconcile', 'repair'] as const;
export type KtxModelRole = (typeof KTX_MODEL_ROLES)[number];
type KtxLlmBackend = 'anthropic' | 'vertex' | 'gateway' | 'claude-code';
type KtxLlmBackend = 'anthropic' | 'vertex' | 'gateway' | 'claude-code' | 'codex';
export type KtxPromptCacheTtl = '5m' | '1h';
type KtxJsonValue =

View file

@ -3,6 +3,9 @@ import { writeFile } from 'node:fs/promises';
import { promisify } from 'node:util';
import { resolveLocalKtxLlmConfig } from './context/llm/local-config.js';
import { runClaudeCodeAuthProbe } from './context/llm/claude-code-runtime.js';
import { formatCodexIsolationWarning } from './context/llm/codex-isolation.js';
import { runCodexAuthProbe } from './context/llm/codex-runtime.js';
import { DEFAULT_CODEX_MODEL } from './context/llm/codex-models.js';
import { resolveKtxConfigReference } from './context/core/config-reference.js';
import { type KtxProjectConfig, type KtxProjectLlmConfig, serializeKtxProjectConfig } from './context/project/config.js';
import { loadKtxProject } from './context/project/project.js';
@ -56,7 +59,7 @@ export interface AnthropicModelChoice {
recommended: boolean;
}
export type KtxSetupLlmBackend = 'anthropic' | 'vertex' | 'claude-code';
export type KtxSetupLlmBackend = 'anthropic' | 'vertex' | 'claude-code' | 'codex';
/** @internal */
export interface KtxSetupModelPromptAdapter {
@ -82,6 +85,7 @@ export interface KtxSetupModelDeps {
model: string;
env?: NodeJS.ProcessEnv;
}) => Promise<{ ok: true } | { ok: false; message: string }>;
codexAuthProbe?: (input: { projectDir: string; model: string }) => Promise<{ ok: true } | { ok: false; message: string }>;
readGcloudProject?: () => Promise<string | undefined>;
listGcloudProjects?: () => Promise<GcloudProjectChoice[]>;
spinner?: () => KtxCliSpinner;
@ -110,6 +114,20 @@ const CLAUDE_CODE_MODELS: AnthropicModelChoice[] = [
{ id: 'haiku', label: 'Claude Haiku', recommended: false },
];
// Curated Codex models from OpenAI's current lineup that work under both
// ChatGPT-account (subscription) and API-key auth. Intentionally omitted:
// the `*-codex` ids (e.g. gpt-5.3-codex, gpt-5.2-codex) are API-key-only and
// fail on ChatGPT-account auth, and gpt-5.3-codex-spark is a ChatGPT-Pro-only
// research preview. Codex resolves real availability per account at runtime
// (its binary remote-fetches the model list), so this is a convenience
// shortlist only — the manual-entry option accepts any id your account's
// `codex` picker exposes, and the auth probe reports an unsupported choice.
const CODEX_MODELS: AnthropicModelChoice[] = [
{ id: 'gpt-5.5', label: 'GPT-5.5', recommended: true },
{ id: 'gpt-5.4', label: 'GPT-5.4', recommended: false },
{ id: 'gpt-5.4-mini', label: 'GPT-5.4 mini', recommended: false },
];
const HIDDEN_ANTHROPIC_MODEL_PATTERNS = [
/^claude-sonnet-4$/i,
/^claude-opus-4$/i,
@ -272,7 +290,12 @@ export function isKtxSetupLlmConfigReady(config: KtxProjectLlmConfig): boolean {
return typeof resolved.vertex?.location === 'string' && resolved.vertex.location.trim().length > 0;
}
return resolved.backend === 'anthropic' || resolved.backend === 'gateway' || resolved.backend === 'claude-code';
return (
resolved.backend === 'anthropic' ||
resolved.backend === 'gateway' ||
resolved.backend === 'claude-code' ||
resolved.backend === 'codex'
);
}
function hasUsableConfiguredLlm(config: KtxProjectConfig): boolean {
@ -284,7 +307,8 @@ function buildProjectLlmConfig(
provider:
| { backend: 'anthropic'; credentialRef: string }
| { backend: 'vertex'; vertex: { project?: string; location: string } }
| { backend: 'claude-code' },
| { backend: 'claude-code' }
| { backend: 'codex' },
model: string,
): KtxProjectLlmConfig {
if (provider.backend === 'claude-code') {
@ -295,6 +319,14 @@ function buildProjectLlmConfig(
};
}
if (provider.backend === 'codex') {
return {
provider: { backend: 'codex' },
models: { ...existing.models, default: model },
promptCaching: existing.promptCaching,
};
}
if (provider.backend === 'vertex') {
return {
provider: {
@ -515,6 +547,7 @@ async function chooseBackend(
message: 'Which LLM provider should KTX use?',
options: [
{ value: 'claude-code', label: 'Claude subscription (Pro/Max)' },
{ value: 'codex', label: 'Codex subscription' },
{ value: 'anthropic', label: 'Anthropic API key' },
{ value: 'vertex', label: 'Google Vertex AI for Anthropic Claude' },
{ value: 'back', label: 'Back' },
@ -525,7 +558,7 @@ async function chooseBackend(
}
return {
status: 'ready',
backend: choice === 'vertex' || choice === 'claude-code' ? choice : 'anthropic',
backend: choice === 'vertex' || choice === 'claude-code' || choice === 'codex' ? choice : 'anthropic',
prompted: true,
};
}
@ -884,12 +917,51 @@ async function chooseClaudeCodeModel(args: KtxSetupModelArgs, deps: KtxSetupMode
return { status: 'ready', model: choice };
}
async function chooseCodexModel(args: KtxSetupModelArgs, deps: KtxSetupModelDeps): Promise<ChooseModelResult> {
const providedModel = requestedModel(args);
if (providedModel) {
return { status: 'ready', model: providedModel };
}
if (args.inputMode === 'disabled') {
return { status: 'ready', model: DEFAULT_CODEX_MODEL };
}
const prompts = deps.prompts ?? createPromptAdapter();
const choice = await prompts.select({
message: `Which Codex model should KTX use?\n\n${ANTHROPIC_MODEL_PROMPT_CONTEXT}`,
options: [
...CODEX_MODELS.map((model) => ({
value: model.id,
label: model.label,
...(model.recommended ? { hint: 'recommended' } : {}),
})),
{ value: 'manual', label: 'Enter a Codex model ID manually' },
{ value: 'back', label: 'Back' },
],
});
if (choice === 'back') {
return { status: 'back' };
}
if (choice === 'manual') {
const manual = await prompts.text({
message: withTextInputNavigation('Codex model ID'),
placeholder: CODEX_MODELS.find((model) => model.recommended)?.id ?? CODEX_MODELS[0]?.id,
});
if (manual === undefined) {
return { status: 'back' };
}
return manual.trim() ? { status: 'ready', model: manual.trim() } : { status: 'missing-input' };
}
return { status: 'ready', model: choice };
}
async function persistLlmConfig(
projectDir: string,
provider:
| { backend: 'anthropic'; credentialRef: string }
| { backend: 'vertex'; vertex: { project?: string; location: string } }
| { backend: 'claude-code' },
| { backend: 'claude-code' }
| { backend: 'codex' },
model: string,
): Promise<void> {
const project = await loadKtxProject({ projectDir });
@ -1031,6 +1103,32 @@ export async function runKtxSetupAnthropicModelStep(
return { status: 'ready', projectDir: args.projectDir };
}
if (backendChoice.backend === 'codex') {
const model = await chooseCodexModel(backendArgs, deps);
if (model.status === 'back' && backendChoice.prompted) {
attemptArgs = buildInteractiveRetryArgs(args);
continue;
}
if (model.status === 'invalid-credential') {
return { status: 'failed', projectDir: args.projectDir };
}
if (model.status !== 'ready') {
return { status: model.status, projectDir: args.projectDir };
}
const probe = deps.codexAuthProbe ?? runCodexAuthProbe;
const health = await probe({ projectDir: args.projectDir, model: model.model });
if (!health.ok) {
io.stderr.write(`${health.message}\n`);
return { status: 'failed', projectDir: args.projectDir };
}
// Prefix the clack gutter so the warning sits inside the setup frame
// instead of breaking out of it; kept on stderr for scripted runs.
io.stderr.write(`${formatCodexIsolationWarning()}\n`);
await persistLlmConfig(args.projectDir, { backend: 'codex' }, model.model);
io.stdout.write(`│ LLM ready: yes (codex, ${model.model})\n`);
return { status: 'ready', projectDir: args.projectDir };
}
const credential = await chooseCredentialRef(backendArgs, io, deps);
if (credential.status === 'back' && backendChoice.prompted) {
attemptArgs = buildInteractiveRetryArgs(args);

View file

@ -1,6 +1,11 @@
import { stat as statAsync, readdir as readdirAsync } from 'node:fs/promises';
import { basename, join } from 'node:path';
import { runClaudeCodeAuthProbe } from './context/llm/claude-code-runtime.js';
import {
CODEX_ISOLATION_WARNING,
CODEX_ISOLATION_WARNING_FIX,
} from './context/llm/codex-isolation.js';
import { runCodexAuthProbe } from './context/llm/codex-runtime.js';
import type { KtxConfigIssue, KtxProjectConfig, KtxProjectConnectionConfig, KtxProjectEmbeddingConfig, KtxProjectLlmConfig } from './context/project/config.js';
import type { KtxLocalProject } from './context/project/project.js';
import { ktxLocalStateDbPath } from './context/project/local-state-db.js';
@ -94,6 +99,11 @@ type ClaudeCodeAuthProbe = (input: {
env?: NodeJS.ProcessEnv;
}) => Promise<{ ok: true } | { ok: false; message: string }>;
type CodexAuthProbe = (input: {
projectDir: string;
model: string;
}) => Promise<{ ok: true } | { ok: false; message: string; fix: string }>;
const PROJECT_READY_COMMANDS = KTX_NEXT_STEP_DIRECT_COMMANDS.map((step) => step.command);
interface LocalStatsIngestPerConnection {
@ -194,6 +204,7 @@ async function buildLlmStatus(
projectDir: string;
env: NodeJS.ProcessEnv;
claudeCodeAuthProbe?: ClaudeCodeAuthProbe;
codexAuthProbe?: CodexAuthProbe;
fast?: boolean;
useSpinner?: boolean;
},
@ -210,6 +221,18 @@ async function buildLlmStatus(
fix: 'Run: ktx setup (choose an LLM provider)',
};
}
// The runtime (resolveModelSlots) hard-requires llm.models.default for every
// non-none backend; without it ingest/scan/memory throw. Report that here so
// status never marks a project ready that the runtime would refuse to run.
if (!model || model.trim().length === 0) {
return {
backend,
model,
status: 'fail',
detail: `llm.models.default is required for backend "${backend}"`,
fix: 'Set llm.models.default in ktx.yaml, then rerun `ktx status` (or rerun `ktx setup`).',
};
}
if (backend === 'anthropic') {
const ref = config.provider.anthropic?.api_key;
const resolved = resolveRef(ref, env);
@ -251,7 +274,7 @@ async function buildLlmStatus(
};
}
if (backend === 'claude-code') {
const modelName = model ?? 'sonnet';
const modelName = model;
if (options.fast === true) {
return {
backend,
@ -280,6 +303,36 @@ async function buildLlmStatus(
fix: 'Authenticate Claude Code locally with the Claude Code CLI, then rerun `ktx status`.',
};
}
if (backend === 'codex') {
const modelName = model;
if (options.fast === true) {
return {
backend,
model: modelName,
status: 'skipped',
detail: 'auth probe skipped (--fast)',
};
}
const probe = options.codexAuthProbe ?? runCodexAuthProbe;
const auth = await withSpinner(options.useSpinner === true, 'Probing Codex authentication', () =>
probe({ projectDir: options.projectDir, model: modelName }),
);
if (auth.ok) {
return {
backend,
model: modelName,
status: 'ok',
detail: 'local Codex session authenticated',
};
}
return {
backend,
model: modelName,
status: 'fail',
detail: auth.message,
fix: auth.fix,
};
}
return { backend, model, status: 'warn', detail: 'unknown LLM backend' };
}
@ -572,6 +625,13 @@ function buildWarnings(
});
}
if (llm.backend === 'codex') {
warnings.push({
message: CODEX_ISOLATION_WARNING,
fix: CODEX_ISOLATION_WARNING_FIX,
});
}
return warnings;
}
@ -634,6 +694,7 @@ export interface BuildProjectStatusOptions {
env?: NodeJS.ProcessEnv;
queryHistoryReadinessProbe?: HistoricSqlReadinessProbe;
claudeCodeAuthProbe?: ClaudeCodeAuthProbe;
codexAuthProbe?: CodexAuthProbe;
configIssues?: KtxConfigIssue[];
fast?: boolean;
useSpinner?: boolean;
@ -882,6 +943,7 @@ export async function buildProjectStatus(project: KtxLocalProject, options: Buil
projectDir: project.projectDir,
env,
claudeCodeAuthProbe: options.claudeCodeAuthProbe,
codexAuthProbe: options.codexAuthProbe,
fast: options.fast,
useSpinner: options.useSpinner,
});

View file

@ -77,9 +77,10 @@ describe('createLocalBundleIngestRuntime', () => {
}),
).toThrow(
[
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
'Configure a local Claude Code session or API-backed LLM, then rerun ingest:',
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.',
'Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:',
` ktx setup --project-dir ${project.projectDir} --llm-backend claude-code --no-input`,
` ktx setup --project-dir ${project.projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`,
` ktx setup --project-dir ${project.projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`,
].join('\n'),
);

View file

@ -0,0 +1,188 @@
import { describe, expect, it } from 'vitest';
import {
parseCodexExecEventLine,
summarizeCodexExecEvents,
} from '../../../src/context/llm/codex-exec-events.js';
describe('Codex exec event parsing', () => {
it('uses the completed turn as one step when no MCP tools run', () => {
const summary = summarizeCodexExecEvents(
[
{ type: 'thread.started', thread_id: 'thr_1' },
{ type: 'turn.started' },
{ type: 'item.completed', item: { id: 'item_1', type: 'agent_message', text: 'hello from codex' } },
{
type: 'turn.completed',
usage: {
input_tokens: 12,
cached_input_tokens: 4,
output_tokens: 5,
reasoning_output_tokens: 2,
},
},
],
{ startedAt: 100, now: () => 125 },
);
expect(summary).toEqual({
finalText: 'hello from codex',
stopReason: 'natural',
usage: { inputTokens: 12, outputTokens: 5, totalTokens: 17 },
stepCount: 1,
stepBoundariesMs: [25],
toolCallCount: 0,
toolFailures: [],
});
});
it('uses completed MCP tool calls as loop steps', () => {
const offsets = [115, 140, 175];
const summary = summarizeCodexExecEvents(
[
{ type: 'turn.started' },
{
type: 'item.started',
item: { id: 'call_1', type: 'mcp_tool_call', server: 'ktx', tool: 'search', arguments: {}, status: 'in_progress' },
},
{
type: 'item.completed',
item: { id: 'call_1', type: 'mcp_tool_call', server: 'ktx', tool: 'search', arguments: {}, status: 'completed' },
},
{
type: 'item.started',
item: { id: 'call_2', type: 'mcp_tool_call', server: 'ktx', tool: 'lookup', arguments: {}, status: 'in_progress' },
},
{
type: 'item.completed',
item: {
id: 'call_2',
type: 'mcp_tool_call',
server: 'ktx',
tool: 'lookup',
arguments: {},
status: 'failed',
error: { message: 'denied' },
},
},
{ type: 'item.completed', item: { id: 'item_1', type: 'agent_message', text: 'done' } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1, cached_input_tokens: 0, reasoning_output_tokens: 0 } },
],
{ startedAt: 100, now: () => offsets.shift() ?? 175 },
);
expect(summary).toEqual({
finalText: 'done',
stopReason: 'natural',
usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 },
stepCount: 2,
stepBoundariesMs: [15, 40],
toolCallCount: 2,
toolFailures: ['lookup: denied'],
});
});
it('does not treat a completed MCP tool call as failed when Codex sends error: null', () => {
// Captured verbatim from a real @openai/codex-sdk run: successful tool calls
// carry `error: null` and `result` alongside `status: "completed"`.
const summary = summarizeCodexExecEvents([
{ type: 'turn.started' },
{
type: 'item.started',
item: {
id: 'item_1',
type: 'mcp_tool_call',
server: 'ktx',
tool: 'echo_value',
arguments: { value: 'ktx_codex_tool_ok' },
result: null,
error: null,
status: 'in_progress',
},
},
{
type: 'item.completed',
item: {
id: 'item_1',
type: 'mcp_tool_call',
server: 'ktx',
tool: 'echo_value',
arguments: { value: 'ktx_codex_tool_ok' },
result: { content: [{ type: 'text', text: 'echo:ktx_codex_tool_ok' }], structured_content: null },
error: null,
status: 'completed',
},
},
{ type: 'item.completed', item: { id: 'm1', type: 'agent_message', text: 'done' } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
]);
expect(summary.toolFailures).toEqual([]);
expect(summary.toolCallCount).toBe(1);
});
it('counts built-in command executions as loop steps without failing the loop', () => {
const offsets = [110, 130];
const summary = summarizeCodexExecEvents(
[
{ type: 'turn.started' },
{ type: 'item.completed', item: { id: 'c1', type: 'command_execution', command: 'ls', status: 'completed', exit_code: 0 } },
{ type: 'item.completed', item: { id: 'c2', type: 'command_execution', command: 'cat missing', status: 'failed', exit_code: 1 } },
{ type: 'item.completed', item: { id: 'm1', type: 'agent_message', text: 'done' } },
{ type: 'turn.completed', usage: { input_tokens: 2, output_tokens: 1 } },
],
{ startedAt: 100, now: () => offsets.shift() ?? 130 },
);
expect(summary.stepCount).toBe(2);
expect(summary.stepBoundariesMs).toEqual([10, 30]);
// A non-zero command exit is normal agent exploration, not a runtime tool failure.
expect(summary.toolFailures).toEqual([]);
expect(summary.toolCallCount).toBe(0);
});
it('maps turn failures into error stop reason', () => {
const summary = summarizeCodexExecEvents([
{ type: 'turn.started' },
{ type: 'turn.failed', error: { message: 'Codex could not connect to required MCP server' } },
]);
expect(summary.stopReason).toBe('error');
expect(summary.error?.message).toContain('Codex could not connect to required MCP server');
});
it('unwraps the Codex API error envelope into its human-readable message', () => {
// Codex serializes API errors as a JSON envelope inside the event message.
const apiError = JSON.stringify({
type: 'error',
status: 400,
error: {
type: 'invalid_request_error',
message: "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
},
});
const summary = summarizeCodexExecEvents([
{ type: 'thread.started', thread_id: 'thr_1' },
{ type: 'turn.started' },
{ type: 'error', message: apiError },
{ type: 'turn.failed', error: { message: apiError } },
]);
expect(summary.stopReason).toBe('error');
expect(summary.error?.message).toBe(
"The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
);
});
it('maps max-turns terminal reasons into budget stop reason when Codex emits one', () => {
const summary = summarizeCodexExecEvents([
{ type: 'turn.started' },
{ type: 'turn.completed', reason: 'max_turns', usage: { input_tokens: 1, output_tokens: 1 } },
]);
expect(summary.stopReason).toBe('budget');
});
it('throws a clear error for malformed JSONL lines', () => {
expect(() => parseCodexExecEventLine('{not-json')).toThrow('Codex JSONL event stream was malformed');
});
});

View file

@ -0,0 +1,19 @@
import { describe, expect, it } from 'vitest';
import {
CODEX_ISOLATION_WARNING,
CODEX_ISOLATION_WARNING_FIX,
formatCodexIsolationWarning,
} from '../../../src/context/llm/codex-isolation.js';
describe('Codex isolation warning', () => {
it('documents the enforced and unenforced Codex isolation boundaries', () => {
expect(CODEX_ISOLATION_WARNING).toContain('runtime MCP server to the current ktx tool set');
expect(CODEX_ISOLATION_WARNING).toContain('disables Codex web search');
expect(CODEX_ISOLATION_WARNING).toContain('may still load user Codex config');
expect(CODEX_ISOLATION_WARNING).toContain('built-in command execution');
expect(CODEX_ISOLATION_WARNING_FIX).toContain('claude-code');
expect(formatCodexIsolationWarning()).toBe(
`${CODEX_ISOLATION_WARNING} ${CODEX_ISOLATION_WARNING_FIX}`,
);
});
});

View file

@ -0,0 +1,73 @@
import { describe, expect, it, vi } from 'vitest';
import { z } from 'zod';
import {
createCodexRuntimeMcpServer,
startCodexRuntimeMcpServer,
} from '../../../src/context/llm/codex-mcp-runtime-server.js';
describe('Codex runtime MCP server', () => {
it('registers runtime tools with markdown output', async () => {
const registered = new Map<
string,
{
config: { description?: string; inputSchema: unknown };
handler: (input: Record<string, unknown>) => Promise<unknown>;
}
>();
const server = createCodexRuntimeMcpServer({
server: {
registerTool(name, config, handler) {
registered.set(name, { config, handler });
},
},
toolSet: {
wiki_search: {
name: 'wiki_search',
description: 'Search the wiki',
inputSchema: z.object({ query: z.string() }),
execute: vi.fn(async () => ({ markdown: 'result markdown', structured: { matches: 1 } })),
},
},
});
expect(server).toBeDefined();
expect([...registered.keys()]).toEqual(['wiki_search']);
expect(registered.get('wiki_search')?.config).toMatchObject({
description: 'Search the wiki',
});
await expect(registered.get('wiki_search')?.handler({ query: 'revenue' })).resolves.toEqual({
content: [{ type: 'text', text: 'result markdown' }],
structuredContent: { matches: 1 },
});
});
it('starts loopback HTTP MCP with a bearer token and reports the runtime URL', async () => {
const close = vi.fn(async () => undefined);
const runServer = vi.fn(async () => ({
server: { address: () => ({ port: 4321 }) },
close,
}));
const handle = await startCodexRuntimeMcpServer({
projectDir: '/tmp/ktx-project',
toolSet: {},
runServer: runServer as never,
});
expect(handle.url).toBe('http://127.0.0.1:4321/mcp');
expect(handle.bearerTokenEnvVar).toBe('KTX_CODEX_RUNTIME_MCP_TOKEN');
expect(handle.bearerToken).toMatch(/^[a-f0-9]{64}$/);
expect(runServer).toHaveBeenCalledWith(
expect.objectContaining({
projectDir: '/tmp/ktx-project',
host: '127.0.0.1',
port: 0,
token: handle.bearerToken,
allowedHosts: ['127.0.0.1', 'localhost'],
allowedOrigins: [],
}),
);
await handle.close();
expect(close).toHaveBeenCalled();
});
});

View file

@ -0,0 +1,17 @@
import { describe, expect, it } from 'vitest';
import { resolveCodexModel } from '../../../src/context/llm/codex-models.js';
describe('resolveCodexModel', () => {
it.each([
['codex', 'gpt-5.5'],
['default', 'gpt-5.5'],
['gpt-5.3-codex-spark', 'gpt-5.3-codex-spark'],
['gpt-5.4', 'gpt-5.4'],
])('maps %s to %s', (input, expected) => {
expect(resolveCodexModel(input)).toBe(expected);
});
it.each(['', ' ', 'sonnet', 'claude-sonnet-4-6'])('rejects %s', (input) => {
expect(() => resolveCodexModel(input)).toThrow('Unsupported Codex model');
});
});

View file

@ -0,0 +1,43 @@
import { describe, expect, it } from 'vitest';
import { buildCodexRuntimeConfig } from '../../../src/context/llm/codex-runtime-config.js';
describe('buildCodexRuntimeConfig', () => {
it('builds generic config without SDK thread-option fields', () => {
expect(buildCodexRuntimeConfig({ model: 'gpt-5.3-codex' })).toEqual({
configOverrides: {
history: { persistence: 'none' },
},
env: {},
});
});
it('adds only the temporary ktx MCP server and exact enabled tools', () => {
expect(
buildCodexRuntimeConfig({
model: 'gpt-5.3-codex',
mcp: {
url: 'http://127.0.0.1:4567/mcp',
bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN',
bearerToken: 'secret-token',
toolNames: ['sl_read_source', 'wiki_search'],
},
}),
).toEqual({
configOverrides: {
history: { persistence: 'none' },
mcp_servers: {
ktx: {
url: 'http://127.0.0.1:4567/mcp',
bearer_token_env_var: 'KTX_CODEX_RUNTIME_MCP_TOKEN',
enabled_tools: ['sl_read_source', 'wiki_search'],
default_tools_approval_mode: 'approve',
required: true,
},
},
},
env: {
KTX_CODEX_RUNTIME_MCP_TOKEN: 'secret-token',
},
});
});
});

View file

@ -0,0 +1,460 @@
import { describe, expect, it, vi } from 'vitest';
import { z } from 'zod';
import {
CodexKtxLlmRuntime,
runCodexAuthProbe,
} from '../../../src/context/llm/codex-runtime.js';
async function* events(items: unknown[]) {
for (const item of items) {
yield item;
}
}
function runner(items: unknown[]) {
return {
runStreamed: vi.fn(async () => events(items)),
};
}
/** Yields the given events, then throws — mirroring the SDK throwing on a non-zero codex exec exit. */
function throwingRunner(items: unknown[], error: Error) {
return {
runStreamed: vi.fn(async () =>
(async function* () {
for (const item of items) {
yield item;
}
throw error;
})(),
),
};
}
const MODEL_UNSUPPORTED_API_ERROR = JSON.stringify({
type: 'error',
status: 400,
error: {
type: 'invalid_request_error',
message: "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
},
});
function budgetRunner() {
let observedSignal: AbortSignal | undefined;
return {
observedSignal: () => observedSignal,
runStreamed: vi.fn(async (input: { signal?: AbortSignal }) => {
observedSignal = input.signal;
return events([
{ type: 'turn.started' },
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'first', status: 'in_progress' } },
{ type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'first', status: 'completed' } },
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'second', status: 'in_progress' } },
{ type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'second', status: 'completed' } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
]);
}),
};
}
describe('CodexKtxLlmRuntime', () => {
it('generates text with the role-selected model and metrics', async () => {
const onMetrics = vi.fn();
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.completed', item: { type: 'agent_message', text: 'hello' } },
{ type: 'turn.completed', usage: { input_tokens: 3, output_tokens: 4, total_tokens: 7 } },
]);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex', triage: 'gpt-5.4' },
runner: fakeRunner,
});
await expect(runtime.generateText({ role: 'triage', system: 'system', prompt: 'prompt', onMetrics })).resolves.toBe('hello');
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
expect.objectContaining({
projectDir: '/tmp/project',
model: 'gpt-5.4',
prompt: 'system\n\nprompt',
}),
);
expect(onMetrics).toHaveBeenCalledWith(expect.objectContaining({ usage: { inputTokens: 3, outputTokens: 4, totalTokens: 7 } }));
});
it('generates and validates structured output', async () => {
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.completed', item: { type: 'agent_message', text: '{"answer":"yes"}' } },
{ type: 'turn.completed' },
]);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
await expect(
runtime.generateObject({
role: 'default',
prompt: 'json',
schema: z.object({ answer: z.string() }),
}),
).resolves.toEqual({ answer: 'yes' });
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
expect.objectContaining({
outputSchema: expect.objectContaining({ type: 'object' }),
}),
);
});
it('returns a structured-output error when Codex final text is invalid JSON', async () => {
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.completed', item: { type: 'agent_message', text: 'not json' } },
{ type: 'turn.completed' },
]);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
await expect(
runtime.generateObject({
role: 'default',
prompt: 'json',
schema: z.object({ answer: z.string() }),
}),
).rejects.toThrow('Codex structured output failed validation');
});
it('starts and closes a temporary MCP server for tool-backed agent loops', async () => {
const close = vi.fn(async () => undefined);
const startMcpServer = vi.fn(async () => ({
url: 'http://127.0.0.1:4321/mcp',
bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN' as const,
bearerToken: 'token',
close,
}));
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.started', item: { type: 'mcp_tool_call', name: 'wiki_search' } },
{ type: 'item.completed', item: { type: 'agent_message', text: 'done' } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1, total_tokens: 2 } },
]);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
startMcpServer,
});
const onStepFinish = vi.fn();
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 5,
telemetryTags: {},
onStepFinish,
toolSet: {
aliased_wiki_tool: {
name: 'wiki_search',
description: 'Search wiki',
inputSchema: z.object({ query: z.string() }),
execute: vi.fn(),
},
},
});
expect(result.stopReason).toBe('natural');
expect(result.metrics).toMatchObject({ stepCount: 1, usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 } });
expect(onStepFinish).toHaveBeenCalledWith({ stepIndex: 1, stepBudget: 5 });
expect(startMcpServer).toHaveBeenCalledWith({ projectDir: '/tmp/project', toolSet: expect.any(Object) });
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
expect.objectContaining({
env: { KTX_CODEX_RUNTIME_MCP_TOKEN: 'token' },
configOverrides: expect.objectContaining({
mcp_servers: expect.objectContaining({
ktx: expect.objectContaining({
url: 'http://127.0.0.1:4321/mcp',
enabled_tools: ['wiki_search'],
required: true,
}),
}),
}),
}),
);
expect(close).toHaveBeenCalled();
});
it('returns error stop reason on turn failure', async () => {
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: runner([{ type: 'turn.failed', error: { message: 'boom' } }]),
});
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 5,
telemetryTags: {},
toolSet: {},
});
expect(result.stopReason).toBe('error');
expect(result.error?.message).toBe('boom');
});
it('surfaces failed MCP tool calls as agent-loop errors', async () => {
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: runner([
{ type: 'turn.started' },
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'search', status: 'in_progress' } },
{
type: 'item.completed',
item: {
type: 'mcp_tool_call',
server: 'ktx',
tool: 'search',
status: 'failed',
error: { message: 'denied' },
},
},
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
]),
});
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 5,
telemetryTags: {},
toolSet: {},
});
expect(result.stopReason).toBe('error');
expect(result.error?.message).toBe('Codex runtime tool call failed: search: denied');
expect(result.metrics).toMatchObject({
stepCount: 1,
usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 },
});
});
it('returns budget and aborts the Codex stream when local MCP step budget is reached', async () => {
const fakeRunner = budgetRunner();
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
const onStepFinish = vi.fn();
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 1,
telemetryTags: {},
onStepFinish,
toolSet: {
first: {
name: 'first',
description: 'First tool',
inputSchema: z.object({}),
execute: vi.fn(),
},
},
});
expect(result.stopReason).toBe('budget');
expect(result.error).toBeUndefined();
expect(result.metrics).toMatchObject({ stepCount: 1 });
expect(onStepFinish).toHaveBeenCalledTimes(1);
expect(onStepFinish).toHaveBeenCalledWith({ stepIndex: 1, stepBudget: 1 });
expect(fakeRunner.observedSignal()?.aborted).toBe(true);
});
it('counts built-in command_execution steps against the budget and aborts the stream', async () => {
let observedSignal: AbortSignal | undefined;
const fakeRunner = {
observedSignal: () => observedSignal,
runStreamed: vi.fn(async (input: { signal?: AbortSignal }) => {
observedSignal = input.signal;
return events([
{ type: 'turn.started' },
{ type: 'item.started', item: { type: 'command_execution', command: 'ls', status: 'in_progress' } },
{ type: 'item.completed', item: { type: 'command_execution', command: 'ls', status: 'completed', exit_code: 0 } },
{ type: 'item.started', item: { type: 'command_execution', command: 'cat a', status: 'in_progress' } },
{ type: 'item.completed', item: { type: 'command_execution', command: 'cat a', status: 'completed', exit_code: 0 } },
{ type: 'item.completed', item: { type: 'command_execution', command: 'cat b', status: 'completed', exit_code: 0 } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
]);
}),
};
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
const onStepFinish = vi.fn();
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 2,
telemetryTags: {},
onStepFinish,
toolSet: {},
});
expect(result.stopReason).toBe('budget');
expect(result.error).toBeUndefined();
expect(result.metrics).toMatchObject({ stepCount: 2 });
expect(onStepFinish).toHaveBeenCalledTimes(2);
expect(onStepFinish).toHaveBeenLastCalledWith({ stepIndex: 2, stepBudget: 2 });
expect(fakeRunner.observedSignal()?.aborted).toBe(true);
});
it('fires onStepFinish live as each step completes, before the stream drains', async () => {
const order: string[] = [];
async function* liveEvents() {
yield { type: 'turn.started' };
yield { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'a', status: 'completed' } };
order.push('yielded-after-step-1');
yield { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'b', status: 'completed' } };
order.push('yielded-after-step-2');
yield { type: 'item.completed', item: { type: 'agent_message', text: 'done' } };
yield { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } };
}
const fakeRunner = { runStreamed: vi.fn(async () => liveEvents()) };
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 10,
telemetryTags: {},
onStepFinish: ({ stepIndex }) => {
order.push(`step-${stepIndex}`);
},
toolSet: {},
});
expect(result.stopReason).toBe('natural');
expect(result.metrics).toMatchObject({ stepCount: 2 });
expect(order).toEqual(['step-1', 'yielded-after-step-1', 'step-2', 'yielded-after-step-2']);
});
it('surfaces the real Codex error event even when the SDK stream throws afterward', async () => {
// The SDK yields the error/turn.failed events on stdout, then throws on the
// non-zero exit. The masked exit message must not hide the real API error.
const fakeRunner = throwingRunner(
[
{ type: 'thread.started', thread_id: 't' },
{ type: 'turn.started' },
{ type: 'error', message: MODEL_UNSUPPORTED_API_ERROR },
{ type: 'turn.failed', error: { message: MODEL_UNSUPPORTED_API_ERROR } },
],
new Error('Codex Exec exited with code 1: Reading prompt from stdin...'),
);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
await expect(runtime.generateText({ role: 'default', prompt: 'hi' })).rejects.toThrow(
'not supported when using Codex with a ChatGPT account',
);
});
it('probes Codex authentication through a minimal non-interactive turn', async () => {
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.completed', item: { type: 'agent_message', text: 'ok' } },
{ type: 'turn.completed' },
]);
await expect(
runCodexAuthProbe({
projectDir: '/tmp/project',
model: 'codex',
runner: fakeRunner,
}),
).resolves.toEqual({ ok: true });
});
it('reports an unavailable model without blaming auth when Codex rejects the model', async () => {
const fakeRunner = throwingRunner(
[
{ type: 'turn.started' },
{ type: 'turn.failed', error: { message: MODEL_UNSUPPORTED_API_ERROR } },
],
new Error('Codex Exec exited with code 1: Reading prompt from stdin...'),
);
const result = await runCodexAuthProbe({
projectDir: '/tmp/project',
model: 'gpt-5.3-codex',
runner: fakeRunner,
});
expect(result.ok).toBe(false);
if (!result.ok) {
expect(result.message).not.toContain('authentication is not usable');
expect(result.message).toContain('not available');
expect(result.message).toContain('gpt-5.3-codex');
expect(result.message).toContain('not supported when using Codex with a ChatGPT account');
// A model-access failure must steer the user at the model config, not auth.
expect(result.fix).toContain('llm.models.default');
expect(result.fix).not.toContain('Authenticate Codex');
}
});
it('reports an auth failure when Codex exits without an error event', async () => {
const fakeRunner = throwingRunner(
[],
new Error('Codex Exec exited with code 1: Not logged in. Run `codex login`.'),
);
const result = await runCodexAuthProbe({
projectDir: '/tmp/project',
model: 'gpt-5.5',
runner: fakeRunner,
});
expect(result.ok).toBe(false);
if (!result.ok) {
expect(result.message).toContain('authentication is not usable');
expect(result.message).toContain('Not logged in');
expect(result.fix).toContain('Authenticate Codex');
}
});
it('rejects an unsupported model id before probing, steering at llm.models.default', async () => {
const result = await runCodexAuthProbe({
projectDir: '/tmp/project',
model: 'not-a-real-model',
});
expect(result.ok).toBe(false);
if (!result.ok) {
expect(result.message).toContain('Unsupported Codex model');
expect(result.fix).toContain('llm.models.default');
}
});
});

View file

@ -0,0 +1,97 @@
import { describe, expect, it, vi } from 'vitest';
const sdkMock = vi.hoisted(() => {
const events = (async function* () {
yield { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 2 } };
})();
const runStreamed = vi.fn(async () => ({ events }));
const startThread = vi.fn(() => ({ runStreamed }));
const Codex = vi.fn(function Codex(this: { startThread: typeof startThread }, options?: unknown) {
Object.assign(this, { options, startThread });
});
return { Codex, startThread, runStreamed };
});
vi.mock('@openai/codex-sdk', () => ({ Codex: sdkMock.Codex }));
import { CodexSdkCliRunner } from '../../../src/context/llm/codex-sdk-runner.js';
async function collectAsync<T>(items: AsyncIterable<T>): Promise<T[]> {
const collected: T[] = [];
for await (const item of items) {
collected.push(item);
}
return collected;
}
describe('CodexSdkCliRunner', () => {
it('passes isolated env through the SDK and runtime controls through thread options', async () => {
const runner = new CodexSdkCliRunner({
envBase: {
HOME: '/home/ktx-user',
PATH: '/usr/local/bin:/usr/bin',
CODEX_HOME: '/home/ktx-user/.codex',
HTTPS_PROXY: 'http://proxy.example',
KTX_UNRELATED_SECRET: 'must-not-copy', // pragma: allowlist secret
},
});
const previousToken = process.env.KTX_CODEX_RUNTIME_MCP_TOKEN;
process.env.KTX_CODEX_RUNTIME_MCP_TOKEN = 'outer-token';
const outputSchema = {
type: 'object',
properties: { answer: { type: 'string' } },
required: ['answer'],
additionalProperties: false,
};
const controller = new AbortController();
try {
const events = await runner.runStreamed({
projectDir: '/tmp/ktx-project',
model: 'gpt-5.3-codex',
prompt: 'Return JSON.',
configOverrides: {
history: { persistence: 'none' },
},
env: { KTX_CODEX_RUNTIME_MCP_TOKEN: 'run-token' },
outputSchema,
signal: controller.signal,
});
expect(sdkMock.Codex).toHaveBeenCalledWith({
config: {
history: { persistence: 'none' },
},
env: {
HOME: '/home/ktx-user',
PATH: '/usr/local/bin:/usr/bin',
CODEX_HOME: '/home/ktx-user/.codex',
HTTPS_PROXY: 'http://proxy.example',
KTX_CODEX_RUNTIME_MCP_TOKEN: 'run-token',
},
});
expect(process.env.KTX_CODEX_RUNTIME_MCP_TOKEN).toBe('outer-token');
expect(sdkMock.startThread).toHaveBeenCalledWith({
workingDirectory: '/tmp/ktx-project',
skipGitRepoCheck: true,
model: 'gpt-5.3-codex',
sandboxMode: 'read-only',
webSearchMode: 'disabled',
approvalPolicy: 'never',
});
expect(sdkMock.runStreamed).toHaveBeenCalledWith('Return JSON.', {
outputSchema,
signal: controller.signal,
});
await expect(collectAsync(events)).resolves.toEqual([
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 2 } },
]);
} finally {
if (previousToken === undefined) {
delete process.env.KTX_CODEX_RUNTIME_MCP_TOKEN;
} else {
process.env.KTX_CODEX_RUNTIME_MCP_TOKEN = previousToken;
}
}
});
});

View file

@ -22,4 +22,25 @@ describe('local KTX LLM runtime config', () => {
}),
).toBeNull();
});
it('creates a Codex runtime for codex backend without creating an AI SDK provider', () => {
const runtime = createLocalKtxLlmRuntimeFromConfig(
{
provider: { backend: 'codex' },
models: { default: 'codex', triage: 'gpt-5.4' },
},
{ env: {}, projectDir: '/tmp/project', createCodexRuntime: vi.fn((deps) => ({ deps }) as never) },
);
expect(runtime).toMatchObject({ deps: expect.objectContaining({ projectDir: '/tmp/project' }) });
});
it('returns null from the AI SDK provider factory for codex backend', () => {
expect(
createLocalKtxLlmProviderFromConfig({
provider: { backend: 'codex' },
models: { default: 'codex' },
}),
).toBeNull();
});
});

View file

@ -231,6 +231,31 @@ llm:
});
});
it('parses Codex as a first-class LLM backend', () => {
const config = parseKtxProjectConfig(`
llm:
provider:
backend: codex
models:
default: gpt-5.3-codex
triage: gpt-5.3-codex
candidateExtraction: gpt-5.3-codex
curator: gpt-5.3-codex
reconcile: gpt-5.3-codex
repair: gpt-5.3-codex
`);
expect(config.llm.provider.backend).toBe('codex');
expect(config.llm.models).toEqual({
default: 'gpt-5.3-codex',
triage: 'gpt-5.3-codex',
candidateExtraction: 'gpt-5.3-codex',
curator: 'gpt-5.3-codex',
reconcile: 'gpt-5.3-codex',
repair: 'gpt-5.3-codex',
});
});
it('parses gateway LLM, OpenAI scan embeddings, and sentence-transformers ingest embeddings', () => {
const config = parseKtxProjectConfig(`
llm:
@ -530,7 +555,7 @@ describe('generateKtxProjectConfigJsonSchema', () => {
const llm = (schema.properties as Record<string, { properties?: Record<string, unknown> }>).llm;
const provider = llm?.properties?.provider as { properties?: Record<string, unknown> };
const backend = provider?.properties?.backend as { enum?: readonly string[] };
expect(backend?.enum).toEqual(['none', 'anthropic', 'vertex', 'gateway', 'claude-code']);
expect(backend?.enum).toEqual(['none', 'anthropic', 'vertex', 'gateway', 'claude-code', 'codex']);
const storage = (schema.properties as Record<string, { properties?: Record<string, unknown> }>).storage;
const state = storage?.properties?.state as { enum?: readonly string[] };

View file

@ -422,6 +422,8 @@ describe('runKtxDoctor', () => {
'llm:',
' provider:',
' backend: anthropic',
' models:',
' default: claude-sonnet-4-5',
'',
].join('\n'),
'utf-8',
@ -543,6 +545,8 @@ describe('runKtxDoctor', () => {
'llm:',
' provider:',
' backend: anthropic',
' models:',
' default: claude-sonnet-4-5',
'ingest:',
' adapters:',
' - live-database',
@ -652,6 +656,8 @@ describe('runKtxDoctor', () => {
'llm:',
' provider:',
' backend: anthropic',
' models:',
' default: claude-sonnet-4-5',
'',
].join('\n'),
'utf-8',
@ -698,6 +704,8 @@ describe('runKtxDoctor', () => {
'llm:',
' provider:',
' backend: anthropic',
' models:',
' default: claude-sonnet-4-5',
'ingest:',
' adapters:',
' - live-database',

View file

@ -337,10 +337,13 @@ describe('runKtxIngest', () => {
expect(runIo.stdout()).toBe('');
expect(runIo.stderr()).toContain(
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.',
);
expect(runIo.stderr()).toContain('Configure a local Claude Code session or API-backed LLM, then rerun ingest:');
expect(runIo.stderr()).toContain('Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:');
expect(runIo.stderr()).toContain(`ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`);
expect(runIo.stderr()).toContain(
`ktx setup --project-dir ${projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`,
);
expect(runIo.stderr()).toContain(
`ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`,
);

View file

@ -312,4 +312,13 @@ describe('createKtxLlmProvider', () => {
}),
).toThrow('claude-code is not an AI SDK LanguageModel backend');
});
it('rejects codex as an AI SDK LanguageModel backend', () => {
expect(() =>
createKtxLlmProvider({
backend: 'codex',
modelSlots: { default: 'gpt-5.3-codex' },
}),
).toThrow('codex is not an AI SDK LanguageModel backend');
});
});

View file

@ -66,6 +66,7 @@ function makePromptAdapter(options: {
nextProviderChoice === 'anthropic' ||
nextProviderChoice === 'vertex' ||
nextProviderChoice === 'claude-code' ||
nextProviderChoice === 'codex' ||
nextProviderChoice === 'back'
) {
return selectValues.shift() ?? nextProviderChoice;
@ -183,6 +184,7 @@ describe('setup Anthropic model step', () => {
message: expect.stringContaining('Which LLM provider should KTX use?'),
options: [
{ value: 'claude-code', label: 'Claude subscription (Pro/Max)' },
{ value: 'codex', label: 'Codex subscription' },
{ value: 'anthropic', label: 'Anthropic API key' },
{ value: 'vertex', label: 'Google Vertex AI for Anthropic Claude' },
{ value: 'back', label: 'Back' },
@ -215,6 +217,85 @@ describe('setup Anthropic model step', () => {
expect(authProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'sonnet' }));
});
it('configures Codex backend and validates local auth', async () => {
const io = makeIo();
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
const result = await runKtxSetupAnthropicModelStep(
{
projectDir: tempDir,
inputMode: 'disabled',
llmBackend: 'codex',
llmModel: 'gpt-5.5',
skipLlm: false,
},
io.io,
{ codexAuthProbe },
);
expect(result.status).toBe('ready');
const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8'));
expect(config.llm).toMatchObject({
provider: { backend: 'codex' },
models: { default: 'gpt-5.5' },
});
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'gpt-5.5' }));
// The warning carries the clack gutter so it renders inside the setup frame.
expect(io.stderr()).toContain('│ Codex backend isolation is limited');
expect(io.stderr()).toContain('may still load user Codex config');
});
it('defaults the Codex model to gpt-5.5 when none is provided non-interactively', async () => {
const io = makeIo();
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
const result = await runKtxSetupAnthropicModelStep(
{
projectDir: tempDir,
inputMode: 'disabled',
llmBackend: 'codex',
skipLlm: false,
},
io.io,
{ codexAuthProbe },
);
expect(result.status).toBe('ready');
const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8'));
expect(config.llm).toMatchObject({
provider: { backend: 'codex' },
models: { default: 'gpt-5.5' },
});
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'gpt-5.5' }));
});
it('offers the curated Codex models during interactive setup', async () => {
const io = makeIo();
const prompts = makePromptAdapter({ selectValues: ['codex', 'gpt-5.5'] });
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
const result = await runKtxSetupAnthropicModelStep(
{ projectDir: tempDir, inputMode: 'auto', skipLlm: false },
io.io,
{ prompts, codexAuthProbe },
);
expect(result.status).toBe('ready');
expect(prompts.select).toHaveBeenCalledWith(
expect.objectContaining({
message: expect.stringContaining('Which Codex model should KTX use?'),
options: [
{ value: 'gpt-5.5', label: 'GPT-5.5', hint: 'recommended' },
{ value: 'gpt-5.4', label: 'GPT-5.4' },
{ value: 'gpt-5.4-mini', label: 'GPT-5.4 mini' },
{ value: 'manual', label: 'Enter a Codex model ID manually' },
{ value: 'back', label: 'Back' },
],
}),
);
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ model: 'gpt-5.5' }));
});
it('prompts for the Claude Code model during interactive setup', async () => {
const io = makeIo();
const prompts = makePromptAdapter({ selectValues: ['claude-code', 'opus'] });

View file

@ -44,6 +44,17 @@ function withClaudeCodeLlm(config: KtxProjectConfig): KtxProjectConfig {
};
}
function withCodexLlm(config: KtxProjectConfig): KtxProjectConfig {
return {
...config,
llm: {
...config.llm,
provider: { backend: 'codex' },
models: { ...config.llm.models, default: 'gpt-5.5' },
},
};
}
function baseProjectConfig(): KtxProjectConfig {
return withClaudeCodeLlm(buildDefaultKtxProjectConfig());
}
@ -391,6 +402,126 @@ describe('buildProjectStatus --fast', () => {
});
});
describe('buildProjectStatus codex', () => {
it('reports authenticated local Codex session', async () => {
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
const status = await buildProjectStatus(project, {
codexAuthProbe: async () => ({ ok: true as const }),
});
expect(status.llm).toMatchObject({
backend: 'codex',
model: 'gpt-5.5',
status: 'ok',
detail: 'local Codex session authenticated',
});
expect(status.warnings).toEqual(
expect.arrayContaining([
expect.objectContaining({
message: expect.stringContaining('Codex backend isolation is limited'),
fix: expect.stringContaining('claude-code'),
}),
]),
);
const rendered = renderProjectStatus(status, { verbose: false, useColor: false });
expect(rendered).toContain('Codex backend isolation is limited');
});
it('skips Codex auth probe with --fast', async () => {
let probeCalls = 0;
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
const status = await buildProjectStatus(project, {
fast: true,
codexAuthProbe: async () => {
probeCalls += 1;
return { ok: true };
},
});
expect(probeCalls).toBe(0);
expect(status.llm.status).toBe('skipped');
expect(status.llm.detail).toMatch(/--fast/);
});
it('surfaces the probe fix for a model-access failure instead of an auth fix', async () => {
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
const status = await buildProjectStatus(project, {
codexAuthProbe: async () => ({
ok: false,
message: 'Codex is authenticated, but the configured model "gpt-5.5" is not available...',
fix: 'Run `codex` to see the models your account supports, then set llm.models.default in ktx.yaml (or rerun `ktx setup`).',
}),
});
expect(status.llm.status).toBe('fail');
expect(status.llm.fix).toContain('llm.models.default');
expect(status.llm.fix).not.toContain('Authenticate Codex');
});
});
describe('buildProjectStatus llm models.default requirement', () => {
function withBackendNoModel(
backend: KtxProjectConfig['llm']['provider']['backend'],
): KtxProjectConfig {
const config = buildDefaultKtxProjectConfig();
return {
...config,
llm: { ...config.llm, provider: { backend }, models: {} },
};
}
it('fails codex without llm.models.default and never probes', async () => {
let probeCalls = 0;
const project = projectWithConfig(withBackendNoModel('codex'));
const status = await buildProjectStatus(project, {
codexAuthProbe: async () => {
probeCalls += 1;
return { ok: true };
},
});
expect(probeCalls).toBe(0);
expect(status.llm.status).toBe('fail');
expect(status.llm.detail).toContain('llm.models.default');
expect(status.verdict).toBe('blocked');
});
it('fails claude-code without llm.models.default and never probes', async () => {
let probeCalls = 0;
const project = projectWithConfig(withBackendNoModel('claude-code'));
const status = await buildProjectStatus(project, {
claudeCodeAuthProbe: async () => {
probeCalls += 1;
return { ok: true };
},
});
expect(probeCalls).toBe(0);
expect(status.llm.status).toBe('fail');
expect(status.llm.detail).toContain('llm.models.default');
expect(status.verdict).toBe('blocked');
});
it('fails anthropic without llm.models.default even when the key is set', async () => {
const config = withBackendNoModel('anthropic');
const project = projectWithConfig({
...config,
llm: {
...config.llm,
provider: { backend: 'anthropic', anthropic: { api_key: 'env:ANTHROPIC_API_KEY' } }, // pragma: allowlist secret
models: {},
},
});
const status = await buildProjectStatus(project, {
env: { ANTHROPIC_API_KEY: 'sk-test' }, // pragma: allowlist secret
});
expect(status.llm.status).toBe('fail');
expect(status.llm.detail).toContain('llm.models.default');
expect(status.verdict).toBe('blocked');
});
});
describe('buildLocalStatsStatus', () => {
let tempDir: string;

79
pnpm-lock.yaml generated
View file

@ -158,6 +158,9 @@ importers:
'@notionhq/client':
specifier: ^5.22.0
version: 5.22.0
'@openai/codex-sdk':
specifier: ^0.133.0
version: 0.133.0
ai:
specifier: ^6.0.188
version: 6.0.188(zod@4.4.3)
@ -1288,6 +1291,51 @@ packages:
'@octokit/types@16.0.0':
resolution: {integrity: sha512-sKq+9r1Mm4efXW1FCk7hFSeJo4QKreL/tTbR0rz/qx/r1Oa2VV83LTA/H/MuCOX7uCIJmQVRKBcbmWoySjAnSg==}
'@openai/codex-sdk@0.133.0':
resolution: {integrity: sha512-PB82D/1Q0C7nzaV5O+1O4y5LcVwiUvxyHvCUTfz8Cwztv6bOWQ40gFHE5ZFX1EFPJx1cMV0GPVODWuXIKAuayQ==}
engines: {node: '>=18'}
'@openai/codex@0.133.0':
resolution: {integrity: sha512-Gh42kLLBo/6gpnHmDzUWDVvyS57ekCB1+1Dz0RG2oIl3Lhk1uwrjSj/PwaJWWh4Rw/rUp1RqkwrMugFfFEOlqQ==}
engines: {node: '>=16'}
hasBin: true
'@openai/codex@0.133.0-darwin-arm64':
resolution: {integrity: sha512-W7f8+DckLujnqGlptKCzgJU+ooeHKMuk6KYgMFP6A9asn7YUsGUgJqjiBaX8oNcXO6w/pTbKGRARx1kCNS8lIg==}
engines: {node: '>=16'}
cpu: [arm64]
os: [darwin]
'@openai/codex@0.133.0-darwin-x64':
resolution: {integrity: sha512-Ek8ikvLOiXZ8emcIJVBXxK6fm8ratBy0kaEt3JNisTNszxGshUHf/R4xxDxIyKNcUkYYXjW7A/rMwW3iu3OFlg==}
engines: {node: '>=16'}
cpu: [x64]
os: [darwin]
'@openai/codex@0.133.0-linux-arm64':
resolution: {integrity: sha512-uKXYYSJ3mY16sp4hcG/4BMNRjva/ZS4oARiI1+7k8+NiuoAhdCGWNe5u4KJ3sMuL3tp/IXcmc6B56EFX1+WDBQ==}
engines: {node: '>=16'}
cpu: [arm64]
os: [linux]
'@openai/codex@0.133.0-linux-x64':
resolution: {integrity: sha512-9YfyqrfUj/UZ2+aXE4zBz47t6RXbVni95ZorGsNh857vxYK/asVpUtR2cymo9lB3JaI4mQaKFfV/t7IRItqkuA==}
engines: {node: '>=16'}
cpu: [x64]
os: [linux]
'@openai/codex@0.133.0-win32-arm64':
resolution: {integrity: sha512-mRzND0PSGHRoLk0X41GTSoc3tFjZSF4HgDlfjU5fiQcWVi0/kLb7Ku6/tPFT/X2hOLa3YdJkbIcHC0Hc9ni80g==}
engines: {node: '>=16'}
cpu: [arm64]
os: [win32]
'@openai/codex@0.133.0-win32-x64':
resolution: {integrity: sha512-u3ji78DIPZCGJeELuovsAnaZH+vK9gsA4F6M1y+Uy2s80Sz7/i1S0KL81qGReYji3urSjgBpkQuNP47GXOqxrQ==}
engines: {node: '>=16'}
cpu: [x64]
os: [win32]
'@opentelemetry/api@1.9.1':
resolution: {integrity: sha512-gLyJlPHPZYdAk1JENA9LeHejZe1Ti77/pTeFm/nMXmQH/HFZlcS/O2XJB+L8fkbrNSqhdtlvjBVjxwUYanNH5Q==}
engines: {node: '>=8.0.0'}
@ -7145,6 +7193,37 @@ snapshots:
dependencies:
'@octokit/openapi-types': 27.0.0
'@openai/codex-sdk@0.133.0':
dependencies:
'@openai/codex': 0.133.0
'@openai/codex@0.133.0':
optionalDependencies:
'@openai/codex-darwin-arm64': '@openai/codex@0.133.0-darwin-arm64'
'@openai/codex-darwin-x64': '@openai/codex@0.133.0-darwin-x64'
'@openai/codex-linux-arm64': '@openai/codex@0.133.0-linux-arm64'
'@openai/codex-linux-x64': '@openai/codex@0.133.0-linux-x64'
'@openai/codex-win32-arm64': '@openai/codex@0.133.0-win32-arm64'
'@openai/codex-win32-x64': '@openai/codex@0.133.0-win32-x64'
'@openai/codex@0.133.0-darwin-arm64':
optional: true
'@openai/codex@0.133.0-darwin-x64':
optional: true
'@openai/codex@0.133.0-linux-arm64':
optional: true
'@openai/codex@0.133.0-linux-x64':
optional: true
'@openai/codex@0.133.0-win32-arm64':
optional: true
'@openai/codex@0.133.0-win32-x64':
optional: true
'@opentelemetry/api@1.9.1': {}
'@orama/orama@3.1.18': {}

View file

@ -0,0 +1,160 @@
import { execFile } from 'node:child_process';
import { mkdtemp, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { dirname, join, resolve } from 'node:path';
import { fileURLToPath, pathToFileURL } from 'node:url';
import { promisify } from 'node:util';
const execFileAsync = promisify(execFile);
const SCRIPT_DIR = dirname(fileURLToPath(import.meta.url));
const ROOT_DIR = resolve(SCRIPT_DIR, '..');
const OPT_IN_MESSAGE =
'Set KTX_RUN_CODEX_BACKEND_SMOKE=1 or pass --force to run the Codex backend live smoke.';
export function codexBackendSmokeOptIn(env = process.env, args = process.argv.slice(2)) {
if (env.KTX_RUN_CODEX_BACKEND_SMOKE === '1' || args.includes('--force')) {
return { run: true };
}
return { run: false, message: OPT_IN_MESSAGE };
}
async function run(command, args, options = {}) {
process.stdout.write(`$ ${command} ${args.join(' ')}\n`);
try {
const result = await execFileAsync(command, args, {
cwd: options.cwd ?? ROOT_DIR,
env: { ...process.env, ...(options.env ?? {}) },
encoding: 'utf8',
maxBuffer: 1024 * 1024 * 20,
timeout: options.timeoutMs ?? 300_000,
});
if (result.stdout) {
process.stdout.write(result.stdout);
}
if (result.stderr) {
process.stderr.write(result.stderr);
}
return { code: 0, stdout: result.stdout, stderr: result.stderr };
} catch (error) {
const stdout = typeof error.stdout === 'string' ? error.stdout : '';
const stderr = typeof error.stderr === 'string' ? error.stderr : error.message;
if (stdout) {
process.stdout.write(stdout);
}
if (stderr) {
process.stderr.write(stderr);
}
return {
code: typeof error.code === 'number' ? error.code : 1,
stdout,
stderr,
};
}
}
function requireSuccess(label, result) {
if (result.code !== 0) {
throw new Error(`${label} failed with code ${result.code}\nstdout:\n${result.stdout}\nstderr:\n${result.stderr}`);
}
}
async function runSetupSmoke(projectDir) {
const result = await run(
'node',
[
join(ROOT_DIR, 'packages/cli/dist/bin.js'),
'setup',
'--project-dir',
projectDir,
'--llm-backend',
'codex',
'--llm-model',
'gpt-5.3-codex',
'--no-input',
'--yes',
'--skip-databases',
'--skip-sources',
'--skip-agents',
],
{ timeoutMs: 600_000 },
);
requireSuccess('ktx setup codex backend', result);
if (!result.stdout.includes('LLM ready: yes (codex, gpt-5.3-codex)')) {
throw new Error(`setup did not report Codex LLM readiness\nstdout:\n${result.stdout}`);
}
}
async function runRuntimeSmoke(projectDir) {
const runtimeUrl = pathToFileURL(join(ROOT_DIR, 'packages/cli/dist/context/llm/codex-runtime.js')).href;
const zodUrl = pathToFileURL(join(ROOT_DIR, 'packages/cli/node_modules/zod/index.js')).href;
const { CodexKtxLlmRuntime } = await import(runtimeUrl);
const { z } = await import(zodUrl);
const runtime = new CodexKtxLlmRuntime({
projectDir,
modelSlots: { default: 'gpt-5.3-codex' },
});
const text = await runtime.generateText({
role: 'default',
prompt: 'Reply with exactly: ktx_codex_text_ok',
});
if (text.trim() !== 'ktx_codex_text_ok') {
throw new Error(`Codex text smoke returned unexpected text: ${text}`);
}
let toolCalls = 0;
const loop = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'You must use available tools when the user asks for a tool result.',
userPrompt:
'Call the echo_value tool with {"value":"ktx_codex_tool_ok"}, then finish after the tool returns.',
toolSet: {
echo_value: {
name: 'echo_value',
description: 'Return the provided value as markdown.',
inputSchema: z.object({ value: z.string() }),
execute: async (input) => {
toolCalls += 1;
return { markdown: `echo:${input.value}` };
},
},
},
stepBudget: 4,
telemetryTags: {},
});
if (loop.stopReason !== 'natural') {
throw new Error(`Codex tool smoke stopped with ${loop.stopReason}: ${loop.error?.message ?? 'no error'}`);
}
if (toolCalls !== 1) {
throw new Error(`Expected Codex to call echo_value exactly once, got ${toolCalls}`);
}
}
export async function runCodexBackendLiveSmoke() {
const projectDir = await mkdtemp(join(tmpdir(), 'ktx-codex-backend-smoke-'));
try {
requireSuccess(
'ktx build',
await run('pnpm', ['--filter', '@kaelio/ktx', 'run', 'build'], { timeoutMs: 600_000 }),
);
await runSetupSmoke(projectDir);
await runRuntimeSmoke(projectDir);
process.stdout.write(`Codex backend live smoke passed in ${projectDir}\n`);
} finally {
await rm(projectDir, { recursive: true, force: true });
}
}
async function main() {
const optIn = codexBackendSmokeOptIn();
if (!optIn.run) {
process.stdout.write(`${optIn.message}\n`);
return;
}
await runCodexBackendLiveSmoke();
}
if (import.meta.url === pathToFileURL(process.argv[1] ?? '').href) {
await main();
}

View file

@ -0,0 +1,18 @@
import assert from 'node:assert/strict';
import test from 'node:test';
import { codexBackendSmokeOptIn } from './codex-backend-live-smoke.mjs';
test('codex backend smoke stays disabled by default', () => {
assert.deepEqual(codexBackendSmokeOptIn({}, []), {
run: false,
message: 'Set KTX_RUN_CODEX_BACKEND_SMOKE=1 or pass --force to run the Codex backend live smoke.',
});
});
test('codex backend smoke runs with env opt-in', () => {
assert.deepEqual(codexBackendSmokeOptIn({ KTX_RUN_CODEX_BACKEND_SMOKE: '1' }, []), { run: true });
});
test('codex backend smoke runs with force flag', () => {
assert.deepEqual(codexBackendSmokeOptIn({}, ['--force']), { run: true });
});