feat: add codex llm backend for ktx runtime work (#253)

* feat: add codex sdk runner foundation

* feat: parse codex runtime events

* feat: expose codex runtime mcp tools

* feat: add codex llm runtime

* feat: wire codex llm backend

* test: avoid Array.fromAsync in codex runner test

* docs: document codex llm backend

* fix: tighten codex runtime config ownership

* fix: use codex sdk env and thread options

* fix: parse codex sdk event shapes

* test: add codex backend live smoke

* docs: clarify codex backend isolation

* fix: drive codex loop metrics from mcp events

* fix: enforce codex local step budget

* docs: disclose codex isolation limits

* fix: count all codex agent steps and stream step callbacks live

The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.

collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.

* test: align ingest llm-guard assertions with codex backend

The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.

* fix: treat codex error:null tool calls as success

The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.

Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.

* fix: default codex backend to gpt-5.5 and report real probe errors

The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.

- Default codex model is now gpt-5.5 (works on both subscription and API-key
  auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
  keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
  failure and surfaces the real API error: collectEvents retains stream
  events when the SDK throws on a non-zero exit, and the API error JSON
  envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.

* fix: require llm.models.default in status and match codex probe remediation

Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.

Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.

Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.

Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
This commit is contained in:
Andrey Avtomonov 2026-06-02 13:57:11 +02:00 committed by GitHub
parent 74c6076b72
commit 494618ab14
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
41 changed files with 2544 additions and 30 deletions

View file

@ -77,9 +77,10 @@ describe('createLocalBundleIngestRuntime', () => {
}),
).toThrow(
[
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
'Configure a local Claude Code session or API-backed LLM, then rerun ingest:',
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.',
'Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:',
` ktx setup --project-dir ${project.projectDir} --llm-backend claude-code --no-input`,
` ktx setup --project-dir ${project.projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`,
` ktx setup --project-dir ${project.projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`,
].join('\n'),
);

View file

@ -0,0 +1,188 @@
import { describe, expect, it } from 'vitest';
import {
parseCodexExecEventLine,
summarizeCodexExecEvents,
} from '../../../src/context/llm/codex-exec-events.js';
describe('Codex exec event parsing', () => {
it('uses the completed turn as one step when no MCP tools run', () => {
const summary = summarizeCodexExecEvents(
[
{ type: 'thread.started', thread_id: 'thr_1' },
{ type: 'turn.started' },
{ type: 'item.completed', item: { id: 'item_1', type: 'agent_message', text: 'hello from codex' } },
{
type: 'turn.completed',
usage: {
input_tokens: 12,
cached_input_tokens: 4,
output_tokens: 5,
reasoning_output_tokens: 2,
},
},
],
{ startedAt: 100, now: () => 125 },
);
expect(summary).toEqual({
finalText: 'hello from codex',
stopReason: 'natural',
usage: { inputTokens: 12, outputTokens: 5, totalTokens: 17 },
stepCount: 1,
stepBoundariesMs: [25],
toolCallCount: 0,
toolFailures: [],
});
});
it('uses completed MCP tool calls as loop steps', () => {
const offsets = [115, 140, 175];
const summary = summarizeCodexExecEvents(
[
{ type: 'turn.started' },
{
type: 'item.started',
item: { id: 'call_1', type: 'mcp_tool_call', server: 'ktx', tool: 'search', arguments: {}, status: 'in_progress' },
},
{
type: 'item.completed',
item: { id: 'call_1', type: 'mcp_tool_call', server: 'ktx', tool: 'search', arguments: {}, status: 'completed' },
},
{
type: 'item.started',
item: { id: 'call_2', type: 'mcp_tool_call', server: 'ktx', tool: 'lookup', arguments: {}, status: 'in_progress' },
},
{
type: 'item.completed',
item: {
id: 'call_2',
type: 'mcp_tool_call',
server: 'ktx',
tool: 'lookup',
arguments: {},
status: 'failed',
error: { message: 'denied' },
},
},
{ type: 'item.completed', item: { id: 'item_1', type: 'agent_message', text: 'done' } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1, cached_input_tokens: 0, reasoning_output_tokens: 0 } },
],
{ startedAt: 100, now: () => offsets.shift() ?? 175 },
);
expect(summary).toEqual({
finalText: 'done',
stopReason: 'natural',
usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 },
stepCount: 2,
stepBoundariesMs: [15, 40],
toolCallCount: 2,
toolFailures: ['lookup: denied'],
});
});
it('does not treat a completed MCP tool call as failed when Codex sends error: null', () => {
// Captured verbatim from a real @openai/codex-sdk run: successful tool calls
// carry `error: null` and `result` alongside `status: "completed"`.
const summary = summarizeCodexExecEvents([
{ type: 'turn.started' },
{
type: 'item.started',
item: {
id: 'item_1',
type: 'mcp_tool_call',
server: 'ktx',
tool: 'echo_value',
arguments: { value: 'ktx_codex_tool_ok' },
result: null,
error: null,
status: 'in_progress',
},
},
{
type: 'item.completed',
item: {
id: 'item_1',
type: 'mcp_tool_call',
server: 'ktx',
tool: 'echo_value',
arguments: { value: 'ktx_codex_tool_ok' },
result: { content: [{ type: 'text', text: 'echo:ktx_codex_tool_ok' }], structured_content: null },
error: null,
status: 'completed',
},
},
{ type: 'item.completed', item: { id: 'm1', type: 'agent_message', text: 'done' } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
]);
expect(summary.toolFailures).toEqual([]);
expect(summary.toolCallCount).toBe(1);
});
it('counts built-in command executions as loop steps without failing the loop', () => {
const offsets = [110, 130];
const summary = summarizeCodexExecEvents(
[
{ type: 'turn.started' },
{ type: 'item.completed', item: { id: 'c1', type: 'command_execution', command: 'ls', status: 'completed', exit_code: 0 } },
{ type: 'item.completed', item: { id: 'c2', type: 'command_execution', command: 'cat missing', status: 'failed', exit_code: 1 } },
{ type: 'item.completed', item: { id: 'm1', type: 'agent_message', text: 'done' } },
{ type: 'turn.completed', usage: { input_tokens: 2, output_tokens: 1 } },
],
{ startedAt: 100, now: () => offsets.shift() ?? 130 },
);
expect(summary.stepCount).toBe(2);
expect(summary.stepBoundariesMs).toEqual([10, 30]);
// A non-zero command exit is normal agent exploration, not a runtime tool failure.
expect(summary.toolFailures).toEqual([]);
expect(summary.toolCallCount).toBe(0);
});
it('maps turn failures into error stop reason', () => {
const summary = summarizeCodexExecEvents([
{ type: 'turn.started' },
{ type: 'turn.failed', error: { message: 'Codex could not connect to required MCP server' } },
]);
expect(summary.stopReason).toBe('error');
expect(summary.error?.message).toContain('Codex could not connect to required MCP server');
});
it('unwraps the Codex API error envelope into its human-readable message', () => {
// Codex serializes API errors as a JSON envelope inside the event message.
const apiError = JSON.stringify({
type: 'error',
status: 400,
error: {
type: 'invalid_request_error',
message: "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
},
});
const summary = summarizeCodexExecEvents([
{ type: 'thread.started', thread_id: 'thr_1' },
{ type: 'turn.started' },
{ type: 'error', message: apiError },
{ type: 'turn.failed', error: { message: apiError } },
]);
expect(summary.stopReason).toBe('error');
expect(summary.error?.message).toBe(
"The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
);
});
it('maps max-turns terminal reasons into budget stop reason when Codex emits one', () => {
const summary = summarizeCodexExecEvents([
{ type: 'turn.started' },
{ type: 'turn.completed', reason: 'max_turns', usage: { input_tokens: 1, output_tokens: 1 } },
]);
expect(summary.stopReason).toBe('budget');
});
it('throws a clear error for malformed JSONL lines', () => {
expect(() => parseCodexExecEventLine('{not-json')).toThrow('Codex JSONL event stream was malformed');
});
});

View file

@ -0,0 +1,19 @@
import { describe, expect, it } from 'vitest';
import {
CODEX_ISOLATION_WARNING,
CODEX_ISOLATION_WARNING_FIX,
formatCodexIsolationWarning,
} from '../../../src/context/llm/codex-isolation.js';
describe('Codex isolation warning', () => {
it('documents the enforced and unenforced Codex isolation boundaries', () => {
expect(CODEX_ISOLATION_WARNING).toContain('runtime MCP server to the current ktx tool set');
expect(CODEX_ISOLATION_WARNING).toContain('disables Codex web search');
expect(CODEX_ISOLATION_WARNING).toContain('may still load user Codex config');
expect(CODEX_ISOLATION_WARNING).toContain('built-in command execution');
expect(CODEX_ISOLATION_WARNING_FIX).toContain('claude-code');
expect(formatCodexIsolationWarning()).toBe(
`${CODEX_ISOLATION_WARNING} ${CODEX_ISOLATION_WARNING_FIX}`,
);
});
});

View file

@ -0,0 +1,73 @@
import { describe, expect, it, vi } from 'vitest';
import { z } from 'zod';
import {
createCodexRuntimeMcpServer,
startCodexRuntimeMcpServer,
} from '../../../src/context/llm/codex-mcp-runtime-server.js';
describe('Codex runtime MCP server', () => {
it('registers runtime tools with markdown output', async () => {
const registered = new Map<
string,
{
config: { description?: string; inputSchema: unknown };
handler: (input: Record<string, unknown>) => Promise<unknown>;
}
>();
const server = createCodexRuntimeMcpServer({
server: {
registerTool(name, config, handler) {
registered.set(name, { config, handler });
},
},
toolSet: {
wiki_search: {
name: 'wiki_search',
description: 'Search the wiki',
inputSchema: z.object({ query: z.string() }),
execute: vi.fn(async () => ({ markdown: 'result markdown', structured: { matches: 1 } })),
},
},
});
expect(server).toBeDefined();
expect([...registered.keys()]).toEqual(['wiki_search']);
expect(registered.get('wiki_search')?.config).toMatchObject({
description: 'Search the wiki',
});
await expect(registered.get('wiki_search')?.handler({ query: 'revenue' })).resolves.toEqual({
content: [{ type: 'text', text: 'result markdown' }],
structuredContent: { matches: 1 },
});
});
it('starts loopback HTTP MCP with a bearer token and reports the runtime URL', async () => {
const close = vi.fn(async () => undefined);
const runServer = vi.fn(async () => ({
server: { address: () => ({ port: 4321 }) },
close,
}));
const handle = await startCodexRuntimeMcpServer({
projectDir: '/tmp/ktx-project',
toolSet: {},
runServer: runServer as never,
});
expect(handle.url).toBe('http://127.0.0.1:4321/mcp');
expect(handle.bearerTokenEnvVar).toBe('KTX_CODEX_RUNTIME_MCP_TOKEN');
expect(handle.bearerToken).toMatch(/^[a-f0-9]{64}$/);
expect(runServer).toHaveBeenCalledWith(
expect.objectContaining({
projectDir: '/tmp/ktx-project',
host: '127.0.0.1',
port: 0,
token: handle.bearerToken,
allowedHosts: ['127.0.0.1', 'localhost'],
allowedOrigins: [],
}),
);
await handle.close();
expect(close).toHaveBeenCalled();
});
});

View file

@ -0,0 +1,17 @@
import { describe, expect, it } from 'vitest';
import { resolveCodexModel } from '../../../src/context/llm/codex-models.js';
describe('resolveCodexModel', () => {
it.each([
['codex', 'gpt-5.5'],
['default', 'gpt-5.5'],
['gpt-5.3-codex-spark', 'gpt-5.3-codex-spark'],
['gpt-5.4', 'gpt-5.4'],
])('maps %s to %s', (input, expected) => {
expect(resolveCodexModel(input)).toBe(expected);
});
it.each(['', ' ', 'sonnet', 'claude-sonnet-4-6'])('rejects %s', (input) => {
expect(() => resolveCodexModel(input)).toThrow('Unsupported Codex model');
});
});

View file

@ -0,0 +1,43 @@
import { describe, expect, it } from 'vitest';
import { buildCodexRuntimeConfig } from '../../../src/context/llm/codex-runtime-config.js';
describe('buildCodexRuntimeConfig', () => {
it('builds generic config without SDK thread-option fields', () => {
expect(buildCodexRuntimeConfig({ model: 'gpt-5.3-codex' })).toEqual({
configOverrides: {
history: { persistence: 'none' },
},
env: {},
});
});
it('adds only the temporary ktx MCP server and exact enabled tools', () => {
expect(
buildCodexRuntimeConfig({
model: 'gpt-5.3-codex',
mcp: {
url: 'http://127.0.0.1:4567/mcp',
bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN',
bearerToken: 'secret-token',
toolNames: ['sl_read_source', 'wiki_search'],
},
}),
).toEqual({
configOverrides: {
history: { persistence: 'none' },
mcp_servers: {
ktx: {
url: 'http://127.0.0.1:4567/mcp',
bearer_token_env_var: 'KTX_CODEX_RUNTIME_MCP_TOKEN',
enabled_tools: ['sl_read_source', 'wiki_search'],
default_tools_approval_mode: 'approve',
required: true,
},
},
},
env: {
KTX_CODEX_RUNTIME_MCP_TOKEN: 'secret-token',
},
});
});
});

View file

@ -0,0 +1,460 @@
import { describe, expect, it, vi } from 'vitest';
import { z } from 'zod';
import {
CodexKtxLlmRuntime,
runCodexAuthProbe,
} from '../../../src/context/llm/codex-runtime.js';
async function* events(items: unknown[]) {
for (const item of items) {
yield item;
}
}
function runner(items: unknown[]) {
return {
runStreamed: vi.fn(async () => events(items)),
};
}
/** Yields the given events, then throws — mirroring the SDK throwing on a non-zero codex exec exit. */
function throwingRunner(items: unknown[], error: Error) {
return {
runStreamed: vi.fn(async () =>
(async function* () {
for (const item of items) {
yield item;
}
throw error;
})(),
),
};
}
const MODEL_UNSUPPORTED_API_ERROR = JSON.stringify({
type: 'error',
status: 400,
error: {
type: 'invalid_request_error',
message: "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
},
});
function budgetRunner() {
let observedSignal: AbortSignal | undefined;
return {
observedSignal: () => observedSignal,
runStreamed: vi.fn(async (input: { signal?: AbortSignal }) => {
observedSignal = input.signal;
return events([
{ type: 'turn.started' },
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'first', status: 'in_progress' } },
{ type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'first', status: 'completed' } },
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'second', status: 'in_progress' } },
{ type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'second', status: 'completed' } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
]);
}),
};
}
describe('CodexKtxLlmRuntime', () => {
it('generates text with the role-selected model and metrics', async () => {
const onMetrics = vi.fn();
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.completed', item: { type: 'agent_message', text: 'hello' } },
{ type: 'turn.completed', usage: { input_tokens: 3, output_tokens: 4, total_tokens: 7 } },
]);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex', triage: 'gpt-5.4' },
runner: fakeRunner,
});
await expect(runtime.generateText({ role: 'triage', system: 'system', prompt: 'prompt', onMetrics })).resolves.toBe('hello');
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
expect.objectContaining({
projectDir: '/tmp/project',
model: 'gpt-5.4',
prompt: 'system\n\nprompt',
}),
);
expect(onMetrics).toHaveBeenCalledWith(expect.objectContaining({ usage: { inputTokens: 3, outputTokens: 4, totalTokens: 7 } }));
});
it('generates and validates structured output', async () => {
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.completed', item: { type: 'agent_message', text: '{"answer":"yes"}' } },
{ type: 'turn.completed' },
]);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
await expect(
runtime.generateObject({
role: 'default',
prompt: 'json',
schema: z.object({ answer: z.string() }),
}),
).resolves.toEqual({ answer: 'yes' });
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
expect.objectContaining({
outputSchema: expect.objectContaining({ type: 'object' }),
}),
);
});
it('returns a structured-output error when Codex final text is invalid JSON', async () => {
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.completed', item: { type: 'agent_message', text: 'not json' } },
{ type: 'turn.completed' },
]);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
await expect(
runtime.generateObject({
role: 'default',
prompt: 'json',
schema: z.object({ answer: z.string() }),
}),
).rejects.toThrow('Codex structured output failed validation');
});
it('starts and closes a temporary MCP server for tool-backed agent loops', async () => {
const close = vi.fn(async () => undefined);
const startMcpServer = vi.fn(async () => ({
url: 'http://127.0.0.1:4321/mcp',
bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN' as const,
bearerToken: 'token',
close,
}));
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.started', item: { type: 'mcp_tool_call', name: 'wiki_search' } },
{ type: 'item.completed', item: { type: 'agent_message', text: 'done' } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1, total_tokens: 2 } },
]);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
startMcpServer,
});
const onStepFinish = vi.fn();
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 5,
telemetryTags: {},
onStepFinish,
toolSet: {
aliased_wiki_tool: {
name: 'wiki_search',
description: 'Search wiki',
inputSchema: z.object({ query: z.string() }),
execute: vi.fn(),
},
},
});
expect(result.stopReason).toBe('natural');
expect(result.metrics).toMatchObject({ stepCount: 1, usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 } });
expect(onStepFinish).toHaveBeenCalledWith({ stepIndex: 1, stepBudget: 5 });
expect(startMcpServer).toHaveBeenCalledWith({ projectDir: '/tmp/project', toolSet: expect.any(Object) });
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
expect.objectContaining({
env: { KTX_CODEX_RUNTIME_MCP_TOKEN: 'token' },
configOverrides: expect.objectContaining({
mcp_servers: expect.objectContaining({
ktx: expect.objectContaining({
url: 'http://127.0.0.1:4321/mcp',
enabled_tools: ['wiki_search'],
required: true,
}),
}),
}),
}),
);
expect(close).toHaveBeenCalled();
});
it('returns error stop reason on turn failure', async () => {
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: runner([{ type: 'turn.failed', error: { message: 'boom' } }]),
});
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 5,
telemetryTags: {},
toolSet: {},
});
expect(result.stopReason).toBe('error');
expect(result.error?.message).toBe('boom');
});
it('surfaces failed MCP tool calls as agent-loop errors', async () => {
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: runner([
{ type: 'turn.started' },
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'search', status: 'in_progress' } },
{
type: 'item.completed',
item: {
type: 'mcp_tool_call',
server: 'ktx',
tool: 'search',
status: 'failed',
error: { message: 'denied' },
},
},
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
]),
});
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 5,
telemetryTags: {},
toolSet: {},
});
expect(result.stopReason).toBe('error');
expect(result.error?.message).toBe('Codex runtime tool call failed: search: denied');
expect(result.metrics).toMatchObject({
stepCount: 1,
usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 },
});
});
it('returns budget and aborts the Codex stream when local MCP step budget is reached', async () => {
const fakeRunner = budgetRunner();
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
const onStepFinish = vi.fn();
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 1,
telemetryTags: {},
onStepFinish,
toolSet: {
first: {
name: 'first',
description: 'First tool',
inputSchema: z.object({}),
execute: vi.fn(),
},
},
});
expect(result.stopReason).toBe('budget');
expect(result.error).toBeUndefined();
expect(result.metrics).toMatchObject({ stepCount: 1 });
expect(onStepFinish).toHaveBeenCalledTimes(1);
expect(onStepFinish).toHaveBeenCalledWith({ stepIndex: 1, stepBudget: 1 });
expect(fakeRunner.observedSignal()?.aborted).toBe(true);
});
it('counts built-in command_execution steps against the budget and aborts the stream', async () => {
let observedSignal: AbortSignal | undefined;
const fakeRunner = {
observedSignal: () => observedSignal,
runStreamed: vi.fn(async (input: { signal?: AbortSignal }) => {
observedSignal = input.signal;
return events([
{ type: 'turn.started' },
{ type: 'item.started', item: { type: 'command_execution', command: 'ls', status: 'in_progress' } },
{ type: 'item.completed', item: { type: 'command_execution', command: 'ls', status: 'completed', exit_code: 0 } },
{ type: 'item.started', item: { type: 'command_execution', command: 'cat a', status: 'in_progress' } },
{ type: 'item.completed', item: { type: 'command_execution', command: 'cat a', status: 'completed', exit_code: 0 } },
{ type: 'item.completed', item: { type: 'command_execution', command: 'cat b', status: 'completed', exit_code: 0 } },
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
]);
}),
};
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
const onStepFinish = vi.fn();
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 2,
telemetryTags: {},
onStepFinish,
toolSet: {},
});
expect(result.stopReason).toBe('budget');
expect(result.error).toBeUndefined();
expect(result.metrics).toMatchObject({ stepCount: 2 });
expect(onStepFinish).toHaveBeenCalledTimes(2);
expect(onStepFinish).toHaveBeenLastCalledWith({ stepIndex: 2, stepBudget: 2 });
expect(fakeRunner.observedSignal()?.aborted).toBe(true);
});
it('fires onStepFinish live as each step completes, before the stream drains', async () => {
const order: string[] = [];
async function* liveEvents() {
yield { type: 'turn.started' };
yield { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'a', status: 'completed' } };
order.push('yielded-after-step-1');
yield { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'b', status: 'completed' } };
order.push('yielded-after-step-2');
yield { type: 'item.completed', item: { type: 'agent_message', text: 'done' } };
yield { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } };
}
const fakeRunner = { runStreamed: vi.fn(async () => liveEvents()) };
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
const result = await runtime.runAgentLoop({
modelRole: 'default',
systemPrompt: 'system',
userPrompt: 'user',
stepBudget: 10,
telemetryTags: {},
onStepFinish: ({ stepIndex }) => {
order.push(`step-${stepIndex}`);
},
toolSet: {},
});
expect(result.stopReason).toBe('natural');
expect(result.metrics).toMatchObject({ stepCount: 2 });
expect(order).toEqual(['step-1', 'yielded-after-step-1', 'step-2', 'yielded-after-step-2']);
});
it('surfaces the real Codex error event even when the SDK stream throws afterward', async () => {
// The SDK yields the error/turn.failed events on stdout, then throws on the
// non-zero exit. The masked exit message must not hide the real API error.
const fakeRunner = throwingRunner(
[
{ type: 'thread.started', thread_id: 't' },
{ type: 'turn.started' },
{ type: 'error', message: MODEL_UNSUPPORTED_API_ERROR },
{ type: 'turn.failed', error: { message: MODEL_UNSUPPORTED_API_ERROR } },
],
new Error('Codex Exec exited with code 1: Reading prompt from stdin...'),
);
const runtime = new CodexKtxLlmRuntime({
projectDir: '/tmp/project',
modelSlots: { default: 'codex' },
runner: fakeRunner,
});
await expect(runtime.generateText({ role: 'default', prompt: 'hi' })).rejects.toThrow(
'not supported when using Codex with a ChatGPT account',
);
});
it('probes Codex authentication through a minimal non-interactive turn', async () => {
const fakeRunner = runner([
{ type: 'turn.started' },
{ type: 'item.completed', item: { type: 'agent_message', text: 'ok' } },
{ type: 'turn.completed' },
]);
await expect(
runCodexAuthProbe({
projectDir: '/tmp/project',
model: 'codex',
runner: fakeRunner,
}),
).resolves.toEqual({ ok: true });
});
it('reports an unavailable model without blaming auth when Codex rejects the model', async () => {
const fakeRunner = throwingRunner(
[
{ type: 'turn.started' },
{ type: 'turn.failed', error: { message: MODEL_UNSUPPORTED_API_ERROR } },
],
new Error('Codex Exec exited with code 1: Reading prompt from stdin...'),
);
const result = await runCodexAuthProbe({
projectDir: '/tmp/project',
model: 'gpt-5.3-codex',
runner: fakeRunner,
});
expect(result.ok).toBe(false);
if (!result.ok) {
expect(result.message).not.toContain('authentication is not usable');
expect(result.message).toContain('not available');
expect(result.message).toContain('gpt-5.3-codex');
expect(result.message).toContain('not supported when using Codex with a ChatGPT account');
// A model-access failure must steer the user at the model config, not auth.
expect(result.fix).toContain('llm.models.default');
expect(result.fix).not.toContain('Authenticate Codex');
}
});
it('reports an auth failure when Codex exits without an error event', async () => {
const fakeRunner = throwingRunner(
[],
new Error('Codex Exec exited with code 1: Not logged in. Run `codex login`.'),
);
const result = await runCodexAuthProbe({
projectDir: '/tmp/project',
model: 'gpt-5.5',
runner: fakeRunner,
});
expect(result.ok).toBe(false);
if (!result.ok) {
expect(result.message).toContain('authentication is not usable');
expect(result.message).toContain('Not logged in');
expect(result.fix).toContain('Authenticate Codex');
}
});
it('rejects an unsupported model id before probing, steering at llm.models.default', async () => {
const result = await runCodexAuthProbe({
projectDir: '/tmp/project',
model: 'not-a-real-model',
});
expect(result.ok).toBe(false);
if (!result.ok) {
expect(result.message).toContain('Unsupported Codex model');
expect(result.fix).toContain('llm.models.default');
}
});
});

View file

@ -0,0 +1,97 @@
import { describe, expect, it, vi } from 'vitest';
const sdkMock = vi.hoisted(() => {
const events = (async function* () {
yield { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 2 } };
})();
const runStreamed = vi.fn(async () => ({ events }));
const startThread = vi.fn(() => ({ runStreamed }));
const Codex = vi.fn(function Codex(this: { startThread: typeof startThread }, options?: unknown) {
Object.assign(this, { options, startThread });
});
return { Codex, startThread, runStreamed };
});
vi.mock('@openai/codex-sdk', () => ({ Codex: sdkMock.Codex }));
import { CodexSdkCliRunner } from '../../../src/context/llm/codex-sdk-runner.js';
async function collectAsync<T>(items: AsyncIterable<T>): Promise<T[]> {
const collected: T[] = [];
for await (const item of items) {
collected.push(item);
}
return collected;
}
describe('CodexSdkCliRunner', () => {
it('passes isolated env through the SDK and runtime controls through thread options', async () => {
const runner = new CodexSdkCliRunner({
envBase: {
HOME: '/home/ktx-user',
PATH: '/usr/local/bin:/usr/bin',
CODEX_HOME: '/home/ktx-user/.codex',
HTTPS_PROXY: 'http://proxy.example',
KTX_UNRELATED_SECRET: 'must-not-copy', // pragma: allowlist secret
},
});
const previousToken = process.env.KTX_CODEX_RUNTIME_MCP_TOKEN;
process.env.KTX_CODEX_RUNTIME_MCP_TOKEN = 'outer-token';
const outputSchema = {
type: 'object',
properties: { answer: { type: 'string' } },
required: ['answer'],
additionalProperties: false,
};
const controller = new AbortController();
try {
const events = await runner.runStreamed({
projectDir: '/tmp/ktx-project',
model: 'gpt-5.3-codex',
prompt: 'Return JSON.',
configOverrides: {
history: { persistence: 'none' },
},
env: { KTX_CODEX_RUNTIME_MCP_TOKEN: 'run-token' },
outputSchema,
signal: controller.signal,
});
expect(sdkMock.Codex).toHaveBeenCalledWith({
config: {
history: { persistence: 'none' },
},
env: {
HOME: '/home/ktx-user',
PATH: '/usr/local/bin:/usr/bin',
CODEX_HOME: '/home/ktx-user/.codex',
HTTPS_PROXY: 'http://proxy.example',
KTX_CODEX_RUNTIME_MCP_TOKEN: 'run-token',
},
});
expect(process.env.KTX_CODEX_RUNTIME_MCP_TOKEN).toBe('outer-token');
expect(sdkMock.startThread).toHaveBeenCalledWith({
workingDirectory: '/tmp/ktx-project',
skipGitRepoCheck: true,
model: 'gpt-5.3-codex',
sandboxMode: 'read-only',
webSearchMode: 'disabled',
approvalPolicy: 'never',
});
expect(sdkMock.runStreamed).toHaveBeenCalledWith('Return JSON.', {
outputSchema,
signal: controller.signal,
});
await expect(collectAsync(events)).resolves.toEqual([
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 2 } },
]);
} finally {
if (previousToken === undefined) {
delete process.env.KTX_CODEX_RUNTIME_MCP_TOKEN;
} else {
process.env.KTX_CODEX_RUNTIME_MCP_TOKEN = previousToken;
}
}
});
});

View file

@ -22,4 +22,25 @@ describe('local KTX LLM runtime config', () => {
}),
).toBeNull();
});
it('creates a Codex runtime for codex backend without creating an AI SDK provider', () => {
const runtime = createLocalKtxLlmRuntimeFromConfig(
{
provider: { backend: 'codex' },
models: { default: 'codex', triage: 'gpt-5.4' },
},
{ env: {}, projectDir: '/tmp/project', createCodexRuntime: vi.fn((deps) => ({ deps }) as never) },
);
expect(runtime).toMatchObject({ deps: expect.objectContaining({ projectDir: '/tmp/project' }) });
});
it('returns null from the AI SDK provider factory for codex backend', () => {
expect(
createLocalKtxLlmProviderFromConfig({
provider: { backend: 'codex' },
models: { default: 'codex' },
}),
).toBeNull();
});
});

View file

@ -231,6 +231,31 @@ llm:
});
});
it('parses Codex as a first-class LLM backend', () => {
const config = parseKtxProjectConfig(`
llm:
provider:
backend: codex
models:
default: gpt-5.3-codex
triage: gpt-5.3-codex
candidateExtraction: gpt-5.3-codex
curator: gpt-5.3-codex
reconcile: gpt-5.3-codex
repair: gpt-5.3-codex
`);
expect(config.llm.provider.backend).toBe('codex');
expect(config.llm.models).toEqual({
default: 'gpt-5.3-codex',
triage: 'gpt-5.3-codex',
candidateExtraction: 'gpt-5.3-codex',
curator: 'gpt-5.3-codex',
reconcile: 'gpt-5.3-codex',
repair: 'gpt-5.3-codex',
});
});
it('parses gateway LLM, OpenAI scan embeddings, and sentence-transformers ingest embeddings', () => {
const config = parseKtxProjectConfig(`
llm:
@ -530,7 +555,7 @@ describe('generateKtxProjectConfigJsonSchema', () => {
const llm = (schema.properties as Record<string, { properties?: Record<string, unknown> }>).llm;
const provider = llm?.properties?.provider as { properties?: Record<string, unknown> };
const backend = provider?.properties?.backend as { enum?: readonly string[] };
expect(backend?.enum).toEqual(['none', 'anthropic', 'vertex', 'gateway', 'claude-code']);
expect(backend?.enum).toEqual(['none', 'anthropic', 'vertex', 'gateway', 'claude-code', 'codex']);
const storage = (schema.properties as Record<string, { properties?: Record<string, unknown> }>).storage;
const state = storage?.properties?.state as { enum?: readonly string[] };

View file

@ -422,6 +422,8 @@ describe('runKtxDoctor', () => {
'llm:',
' provider:',
' backend: anthropic',
' models:',
' default: claude-sonnet-4-5',
'',
].join('\n'),
'utf-8',
@ -543,6 +545,8 @@ describe('runKtxDoctor', () => {
'llm:',
' provider:',
' backend: anthropic',
' models:',
' default: claude-sonnet-4-5',
'ingest:',
' adapters:',
' - live-database',
@ -652,6 +656,8 @@ describe('runKtxDoctor', () => {
'llm:',
' provider:',
' backend: anthropic',
' models:',
' default: claude-sonnet-4-5',
'',
].join('\n'),
'utf-8',
@ -698,6 +704,8 @@ describe('runKtxDoctor', () => {
'llm:',
' provider:',
' backend: anthropic',
' models:',
' default: claude-sonnet-4-5',
'ingest:',
' adapters:',
' - live-database',

View file

@ -337,10 +337,13 @@ describe('runKtxIngest', () => {
expect(runIo.stdout()).toBe('');
expect(runIo.stderr()).toContain(
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.',
);
expect(runIo.stderr()).toContain('Configure a local Claude Code session or API-backed LLM, then rerun ingest:');
expect(runIo.stderr()).toContain('Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:');
expect(runIo.stderr()).toContain(`ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`);
expect(runIo.stderr()).toContain(
`ktx setup --project-dir ${projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`,
);
expect(runIo.stderr()).toContain(
`ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`,
);

View file

@ -312,4 +312,13 @@ describe('createKtxLlmProvider', () => {
}),
).toThrow('claude-code is not an AI SDK LanguageModel backend');
});
it('rejects codex as an AI SDK LanguageModel backend', () => {
expect(() =>
createKtxLlmProvider({
backend: 'codex',
modelSlots: { default: 'gpt-5.3-codex' },
}),
).toThrow('codex is not an AI SDK LanguageModel backend');
});
});

View file

@ -66,6 +66,7 @@ function makePromptAdapter(options: {
nextProviderChoice === 'anthropic' ||
nextProviderChoice === 'vertex' ||
nextProviderChoice === 'claude-code' ||
nextProviderChoice === 'codex' ||
nextProviderChoice === 'back'
) {
return selectValues.shift() ?? nextProviderChoice;
@ -183,6 +184,7 @@ describe('setup Anthropic model step', () => {
message: expect.stringContaining('Which LLM provider should KTX use?'),
options: [
{ value: 'claude-code', label: 'Claude subscription (Pro/Max)' },
{ value: 'codex', label: 'Codex subscription' },
{ value: 'anthropic', label: 'Anthropic API key' },
{ value: 'vertex', label: 'Google Vertex AI for Anthropic Claude' },
{ value: 'back', label: 'Back' },
@ -215,6 +217,85 @@ describe('setup Anthropic model step', () => {
expect(authProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'sonnet' }));
});
it('configures Codex backend and validates local auth', async () => {
const io = makeIo();
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
const result = await runKtxSetupAnthropicModelStep(
{
projectDir: tempDir,
inputMode: 'disabled',
llmBackend: 'codex',
llmModel: 'gpt-5.5',
skipLlm: false,
},
io.io,
{ codexAuthProbe },
);
expect(result.status).toBe('ready');
const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8'));
expect(config.llm).toMatchObject({
provider: { backend: 'codex' },
models: { default: 'gpt-5.5' },
});
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'gpt-5.5' }));
// The warning carries the clack gutter so it renders inside the setup frame.
expect(io.stderr()).toContain('│ Codex backend isolation is limited');
expect(io.stderr()).toContain('may still load user Codex config');
});
it('defaults the Codex model to gpt-5.5 when none is provided non-interactively', async () => {
const io = makeIo();
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
const result = await runKtxSetupAnthropicModelStep(
{
projectDir: tempDir,
inputMode: 'disabled',
llmBackend: 'codex',
skipLlm: false,
},
io.io,
{ codexAuthProbe },
);
expect(result.status).toBe('ready');
const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8'));
expect(config.llm).toMatchObject({
provider: { backend: 'codex' },
models: { default: 'gpt-5.5' },
});
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'gpt-5.5' }));
});
it('offers the curated Codex models during interactive setup', async () => {
const io = makeIo();
const prompts = makePromptAdapter({ selectValues: ['codex', 'gpt-5.5'] });
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
const result = await runKtxSetupAnthropicModelStep(
{ projectDir: tempDir, inputMode: 'auto', skipLlm: false },
io.io,
{ prompts, codexAuthProbe },
);
expect(result.status).toBe('ready');
expect(prompts.select).toHaveBeenCalledWith(
expect.objectContaining({
message: expect.stringContaining('Which Codex model should KTX use?'),
options: [
{ value: 'gpt-5.5', label: 'GPT-5.5', hint: 'recommended' },
{ value: 'gpt-5.4', label: 'GPT-5.4' },
{ value: 'gpt-5.4-mini', label: 'GPT-5.4 mini' },
{ value: 'manual', label: 'Enter a Codex model ID manually' },
{ value: 'back', label: 'Back' },
],
}),
);
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ model: 'gpt-5.5' }));
});
it('prompts for the Claude Code model during interactive setup', async () => {
const io = makeIo();
const prompts = makePromptAdapter({ selectValues: ['claude-code', 'opus'] });

View file

@ -44,6 +44,17 @@ function withClaudeCodeLlm(config: KtxProjectConfig): KtxProjectConfig {
};
}
function withCodexLlm(config: KtxProjectConfig): KtxProjectConfig {
return {
...config,
llm: {
...config.llm,
provider: { backend: 'codex' },
models: { ...config.llm.models, default: 'gpt-5.5' },
},
};
}
function baseProjectConfig(): KtxProjectConfig {
return withClaudeCodeLlm(buildDefaultKtxProjectConfig());
}
@ -391,6 +402,126 @@ describe('buildProjectStatus --fast', () => {
});
});
describe('buildProjectStatus codex', () => {
it('reports authenticated local Codex session', async () => {
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
const status = await buildProjectStatus(project, {
codexAuthProbe: async () => ({ ok: true as const }),
});
expect(status.llm).toMatchObject({
backend: 'codex',
model: 'gpt-5.5',
status: 'ok',
detail: 'local Codex session authenticated',
});
expect(status.warnings).toEqual(
expect.arrayContaining([
expect.objectContaining({
message: expect.stringContaining('Codex backend isolation is limited'),
fix: expect.stringContaining('claude-code'),
}),
]),
);
const rendered = renderProjectStatus(status, { verbose: false, useColor: false });
expect(rendered).toContain('Codex backend isolation is limited');
});
it('skips Codex auth probe with --fast', async () => {
let probeCalls = 0;
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
const status = await buildProjectStatus(project, {
fast: true,
codexAuthProbe: async () => {
probeCalls += 1;
return { ok: true };
},
});
expect(probeCalls).toBe(0);
expect(status.llm.status).toBe('skipped');
expect(status.llm.detail).toMatch(/--fast/);
});
it('surfaces the probe fix for a model-access failure instead of an auth fix', async () => {
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
const status = await buildProjectStatus(project, {
codexAuthProbe: async () => ({
ok: false,
message: 'Codex is authenticated, but the configured model "gpt-5.5" is not available...',
fix: 'Run `codex` to see the models your account supports, then set llm.models.default in ktx.yaml (or rerun `ktx setup`).',
}),
});
expect(status.llm.status).toBe('fail');
expect(status.llm.fix).toContain('llm.models.default');
expect(status.llm.fix).not.toContain('Authenticate Codex');
});
});
describe('buildProjectStatus llm models.default requirement', () => {
function withBackendNoModel(
backend: KtxProjectConfig['llm']['provider']['backend'],
): KtxProjectConfig {
const config = buildDefaultKtxProjectConfig();
return {
...config,
llm: { ...config.llm, provider: { backend }, models: {} },
};
}
it('fails codex without llm.models.default and never probes', async () => {
let probeCalls = 0;
const project = projectWithConfig(withBackendNoModel('codex'));
const status = await buildProjectStatus(project, {
codexAuthProbe: async () => {
probeCalls += 1;
return { ok: true };
},
});
expect(probeCalls).toBe(0);
expect(status.llm.status).toBe('fail');
expect(status.llm.detail).toContain('llm.models.default');
expect(status.verdict).toBe('blocked');
});
it('fails claude-code without llm.models.default and never probes', async () => {
let probeCalls = 0;
const project = projectWithConfig(withBackendNoModel('claude-code'));
const status = await buildProjectStatus(project, {
claudeCodeAuthProbe: async () => {
probeCalls += 1;
return { ok: true };
},
});
expect(probeCalls).toBe(0);
expect(status.llm.status).toBe('fail');
expect(status.llm.detail).toContain('llm.models.default');
expect(status.verdict).toBe('blocked');
});
it('fails anthropic without llm.models.default even when the key is set', async () => {
const config = withBackendNoModel('anthropic');
const project = projectWithConfig({
...config,
llm: {
...config.llm,
provider: { backend: 'anthropic', anthropic: { api_key: 'env:ANTHROPIC_API_KEY' } }, // pragma: allowlist secret
models: {},
},
});
const status = await buildProjectStatus(project, {
env: { ANTHROPIC_API_KEY: 'sk-test' }, // pragma: allowlist secret
});
expect(status.llm.status).toBe('fail');
expect(status.llm.detail).toContain('llm.models.default');
expect(status.verdict).toBe('blocked');
});
});
describe('buildLocalStatsStatus', () => {
let tempDir: string;