mirror of
https://github.com/Kaelio/ktx.git
synced 2026-07-01 08:59:39 +02:00
feat: add codex llm backend for ktx runtime work (#253)
* feat: add codex sdk runner foundation * feat: parse codex runtime events * feat: expose codex runtime mcp tools * feat: add codex llm runtime * feat: wire codex llm backend * test: avoid Array.fromAsync in codex runner test * docs: document codex llm backend * fix: tighten codex runtime config ownership * fix: use codex sdk env and thread options * fix: parse codex sdk event shapes * test: add codex backend live smoke * docs: clarify codex backend isolation * fix: drive codex loop metrics from mcp events * fix: enforce codex local step budget * docs: disclose codex isolation limits * fix: count all codex agent steps and stream step callbacks live The agent-loop step budget only counted completed mcp_tool_call items, so built-in command_execution steps (which the public Codex SDK/CLI surface can still expose) never decremented the budget, letting ingest/reconciliation run past stepBudget until Codex stopped on its own. onStepFinish was also replayed only after the whole stream drained, so live work_unit_step / reconciliation progress appeared stuck until the Codex process exited. collectEvents is now the single live step accumulator: it counts every completed agent-action item via a shared isCompletedAgentStep predicate (command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish as each step completes, and enforces the budget on that broader count. A no-tool turn still counts as one step. toolFailures stays MCP-specific, since a non-zero command exit is normal agent exploration, not a loop failure. * test: align ingest llm-guard assertions with codex backend The skip-llm ingest guard message now lists codex as a valid backend and mentions a Claude Code/Codex session plus a codex setup hint, but this slow suite test still asserted the pre-codex wording. Update it to match the production message (already covered by the local-bundle-runtime unit test) and add the codex setup-line assertion. * fix: treat codex error:null tool calls as success The Codex SDK serializes error: null on successful mcp_tool_call items, so the failure check (item.error !== undefined) flagged every successful tool call as failed with the empty-payload default "Codex turn failed". This killed every ingest work unit under the codex backend before it could produce a patch. Key on status === 'failed' (authoritative, always set) and only treat a populated error object as a failure. Add a regression test built from a verbatim real-SDK event capture. * fix: default codex backend to gpt-5.5 and report real probe errors The previous default gpt-5.3-codex is an API-key-only model that the OpenAI API rejects under ChatGPT-account (subscription) auth, so codex status/setup failed with a misleading "authentication is not usable" message even though auth was fine. - Default codex model is now gpt-5.5 (works on both subscription and API-key auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark). - runCodexAuthProbe now distinguishes "model not available" from an auth failure and surfaces the real API error: collectEvents retains stream events when the SDK throws on a non-zero exit, and the API error JSON envelope is unwrapped to its human-readable message. - The Codex isolation warning now renders inside the clack setup frame. - Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth. * fix: require llm.models.default in status and match codex probe remediation Status reported a project ready when a non-none LLM backend was configured without llm.models.default, but the runtime (resolveModelSlots) hard-requires it, so ingest/scan/memory threw after `ktx status` said the project was usable. buildLlmStatus now fails for any non-none backend missing models.default and no longer invents a fallback model for claude-code/codex. Codex probe failures now carry a category-matched fix: a model-access failure steers the user at llm.models.default instead of the auth/install remediation. runCodexAuthProbe returns the fix and status consumes it; the message stays self-sufficient so setup output is unchanged. Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx states --llm-model only accepts codex/default or gpt-*/codex-* ids. Repaired four doctor fixtures that configured a backend without models.default (the now-correctly-blocked config) and added coverage for the new behavior.
This commit is contained in:
parent
74c6076b72
commit
494618ab14
41 changed files with 2544 additions and 30 deletions
|
|
@ -77,9 +77,10 @@ describe('createLocalBundleIngestRuntime', () => {
|
|||
}),
|
||||
).toThrow(
|
||||
[
|
||||
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
|
||||
'Configure a local Claude Code session or API-backed LLM, then rerun ingest:',
|
||||
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.',
|
||||
'Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:',
|
||||
` ktx setup --project-dir ${project.projectDir} --llm-backend claude-code --no-input`,
|
||||
` ktx setup --project-dir ${project.projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`,
|
||||
` ktx setup --project-dir ${project.projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`,
|
||||
].join('\n'),
|
||||
);
|
||||
|
|
|
|||
188
packages/cli/test/context/llm/codex-exec-events.test.ts
Normal file
188
packages/cli/test/context/llm/codex-exec-events.test.ts
Normal file
|
|
@ -0,0 +1,188 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
parseCodexExecEventLine,
|
||||
summarizeCodexExecEvents,
|
||||
} from '../../../src/context/llm/codex-exec-events.js';
|
||||
|
||||
describe('Codex exec event parsing', () => {
|
||||
it('uses the completed turn as one step when no MCP tools run', () => {
|
||||
const summary = summarizeCodexExecEvents(
|
||||
[
|
||||
{ type: 'thread.started', thread_id: 'thr_1' },
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.completed', item: { id: 'item_1', type: 'agent_message', text: 'hello from codex' } },
|
||||
{
|
||||
type: 'turn.completed',
|
||||
usage: {
|
||||
input_tokens: 12,
|
||||
cached_input_tokens: 4,
|
||||
output_tokens: 5,
|
||||
reasoning_output_tokens: 2,
|
||||
},
|
||||
},
|
||||
],
|
||||
{ startedAt: 100, now: () => 125 },
|
||||
);
|
||||
|
||||
expect(summary).toEqual({
|
||||
finalText: 'hello from codex',
|
||||
stopReason: 'natural',
|
||||
usage: { inputTokens: 12, outputTokens: 5, totalTokens: 17 },
|
||||
stepCount: 1,
|
||||
stepBoundariesMs: [25],
|
||||
toolCallCount: 0,
|
||||
toolFailures: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('uses completed MCP tool calls as loop steps', () => {
|
||||
const offsets = [115, 140, 175];
|
||||
const summary = summarizeCodexExecEvents(
|
||||
[
|
||||
{ type: 'turn.started' },
|
||||
{
|
||||
type: 'item.started',
|
||||
item: { id: 'call_1', type: 'mcp_tool_call', server: 'ktx', tool: 'search', arguments: {}, status: 'in_progress' },
|
||||
},
|
||||
{
|
||||
type: 'item.completed',
|
||||
item: { id: 'call_1', type: 'mcp_tool_call', server: 'ktx', tool: 'search', arguments: {}, status: 'completed' },
|
||||
},
|
||||
{
|
||||
type: 'item.started',
|
||||
item: { id: 'call_2', type: 'mcp_tool_call', server: 'ktx', tool: 'lookup', arguments: {}, status: 'in_progress' },
|
||||
},
|
||||
{
|
||||
type: 'item.completed',
|
||||
item: {
|
||||
id: 'call_2',
|
||||
type: 'mcp_tool_call',
|
||||
server: 'ktx',
|
||||
tool: 'lookup',
|
||||
arguments: {},
|
||||
status: 'failed',
|
||||
error: { message: 'denied' },
|
||||
},
|
||||
},
|
||||
{ type: 'item.completed', item: { id: 'item_1', type: 'agent_message', text: 'done' } },
|
||||
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1, cached_input_tokens: 0, reasoning_output_tokens: 0 } },
|
||||
],
|
||||
{ startedAt: 100, now: () => offsets.shift() ?? 175 },
|
||||
);
|
||||
|
||||
expect(summary).toEqual({
|
||||
finalText: 'done',
|
||||
stopReason: 'natural',
|
||||
usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 },
|
||||
stepCount: 2,
|
||||
stepBoundariesMs: [15, 40],
|
||||
toolCallCount: 2,
|
||||
toolFailures: ['lookup: denied'],
|
||||
});
|
||||
});
|
||||
|
||||
it('does not treat a completed MCP tool call as failed when Codex sends error: null', () => {
|
||||
// Captured verbatim from a real @openai/codex-sdk run: successful tool calls
|
||||
// carry `error: null` and `result` alongside `status: "completed"`.
|
||||
const summary = summarizeCodexExecEvents([
|
||||
{ type: 'turn.started' },
|
||||
{
|
||||
type: 'item.started',
|
||||
item: {
|
||||
id: 'item_1',
|
||||
type: 'mcp_tool_call',
|
||||
server: 'ktx',
|
||||
tool: 'echo_value',
|
||||
arguments: { value: 'ktx_codex_tool_ok' },
|
||||
result: null,
|
||||
error: null,
|
||||
status: 'in_progress',
|
||||
},
|
||||
},
|
||||
{
|
||||
type: 'item.completed',
|
||||
item: {
|
||||
id: 'item_1',
|
||||
type: 'mcp_tool_call',
|
||||
server: 'ktx',
|
||||
tool: 'echo_value',
|
||||
arguments: { value: 'ktx_codex_tool_ok' },
|
||||
result: { content: [{ type: 'text', text: 'echo:ktx_codex_tool_ok' }], structured_content: null },
|
||||
error: null,
|
||||
status: 'completed',
|
||||
},
|
||||
},
|
||||
{ type: 'item.completed', item: { id: 'm1', type: 'agent_message', text: 'done' } },
|
||||
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
|
||||
]);
|
||||
|
||||
expect(summary.toolFailures).toEqual([]);
|
||||
expect(summary.toolCallCount).toBe(1);
|
||||
});
|
||||
|
||||
it('counts built-in command executions as loop steps without failing the loop', () => {
|
||||
const offsets = [110, 130];
|
||||
const summary = summarizeCodexExecEvents(
|
||||
[
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.completed', item: { id: 'c1', type: 'command_execution', command: 'ls', status: 'completed', exit_code: 0 } },
|
||||
{ type: 'item.completed', item: { id: 'c2', type: 'command_execution', command: 'cat missing', status: 'failed', exit_code: 1 } },
|
||||
{ type: 'item.completed', item: { id: 'm1', type: 'agent_message', text: 'done' } },
|
||||
{ type: 'turn.completed', usage: { input_tokens: 2, output_tokens: 1 } },
|
||||
],
|
||||
{ startedAt: 100, now: () => offsets.shift() ?? 130 },
|
||||
);
|
||||
|
||||
expect(summary.stepCount).toBe(2);
|
||||
expect(summary.stepBoundariesMs).toEqual([10, 30]);
|
||||
// A non-zero command exit is normal agent exploration, not a runtime tool failure.
|
||||
expect(summary.toolFailures).toEqual([]);
|
||||
expect(summary.toolCallCount).toBe(0);
|
||||
});
|
||||
|
||||
it('maps turn failures into error stop reason', () => {
|
||||
const summary = summarizeCodexExecEvents([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'turn.failed', error: { message: 'Codex could not connect to required MCP server' } },
|
||||
]);
|
||||
|
||||
expect(summary.stopReason).toBe('error');
|
||||
expect(summary.error?.message).toContain('Codex could not connect to required MCP server');
|
||||
});
|
||||
|
||||
it('unwraps the Codex API error envelope into its human-readable message', () => {
|
||||
// Codex serializes API errors as a JSON envelope inside the event message.
|
||||
const apiError = JSON.stringify({
|
||||
type: 'error',
|
||||
status: 400,
|
||||
error: {
|
||||
type: 'invalid_request_error',
|
||||
message: "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
|
||||
},
|
||||
});
|
||||
const summary = summarizeCodexExecEvents([
|
||||
{ type: 'thread.started', thread_id: 'thr_1' },
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'error', message: apiError },
|
||||
{ type: 'turn.failed', error: { message: apiError } },
|
||||
]);
|
||||
|
||||
expect(summary.stopReason).toBe('error');
|
||||
expect(summary.error?.message).toBe(
|
||||
"The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
|
||||
);
|
||||
});
|
||||
|
||||
it('maps max-turns terminal reasons into budget stop reason when Codex emits one', () => {
|
||||
const summary = summarizeCodexExecEvents([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'turn.completed', reason: 'max_turns', usage: { input_tokens: 1, output_tokens: 1 } },
|
||||
]);
|
||||
|
||||
expect(summary.stopReason).toBe('budget');
|
||||
});
|
||||
|
||||
it('throws a clear error for malformed JSONL lines', () => {
|
||||
expect(() => parseCodexExecEventLine('{not-json')).toThrow('Codex JSONL event stream was malformed');
|
||||
});
|
||||
});
|
||||
19
packages/cli/test/context/llm/codex-isolation.test.ts
Normal file
19
packages/cli/test/context/llm/codex-isolation.test.ts
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
CODEX_ISOLATION_WARNING,
|
||||
CODEX_ISOLATION_WARNING_FIX,
|
||||
formatCodexIsolationWarning,
|
||||
} from '../../../src/context/llm/codex-isolation.js';
|
||||
|
||||
describe('Codex isolation warning', () => {
|
||||
it('documents the enforced and unenforced Codex isolation boundaries', () => {
|
||||
expect(CODEX_ISOLATION_WARNING).toContain('runtime MCP server to the current ktx tool set');
|
||||
expect(CODEX_ISOLATION_WARNING).toContain('disables Codex web search');
|
||||
expect(CODEX_ISOLATION_WARNING).toContain('may still load user Codex config');
|
||||
expect(CODEX_ISOLATION_WARNING).toContain('built-in command execution');
|
||||
expect(CODEX_ISOLATION_WARNING_FIX).toContain('claude-code');
|
||||
expect(formatCodexIsolationWarning()).toBe(
|
||||
`${CODEX_ISOLATION_WARNING} ${CODEX_ISOLATION_WARNING_FIX}`,
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,73 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { z } from 'zod';
|
||||
import {
|
||||
createCodexRuntimeMcpServer,
|
||||
startCodexRuntimeMcpServer,
|
||||
} from '../../../src/context/llm/codex-mcp-runtime-server.js';
|
||||
|
||||
describe('Codex runtime MCP server', () => {
|
||||
it('registers runtime tools with markdown output', async () => {
|
||||
const registered = new Map<
|
||||
string,
|
||||
{
|
||||
config: { description?: string; inputSchema: unknown };
|
||||
handler: (input: Record<string, unknown>) => Promise<unknown>;
|
||||
}
|
||||
>();
|
||||
const server = createCodexRuntimeMcpServer({
|
||||
server: {
|
||||
registerTool(name, config, handler) {
|
||||
registered.set(name, { config, handler });
|
||||
},
|
||||
},
|
||||
toolSet: {
|
||||
wiki_search: {
|
||||
name: 'wiki_search',
|
||||
description: 'Search the wiki',
|
||||
inputSchema: z.object({ query: z.string() }),
|
||||
execute: vi.fn(async () => ({ markdown: 'result markdown', structured: { matches: 1 } })),
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(server).toBeDefined();
|
||||
expect([...registered.keys()]).toEqual(['wiki_search']);
|
||||
expect(registered.get('wiki_search')?.config).toMatchObject({
|
||||
description: 'Search the wiki',
|
||||
});
|
||||
await expect(registered.get('wiki_search')?.handler({ query: 'revenue' })).resolves.toEqual({
|
||||
content: [{ type: 'text', text: 'result markdown' }],
|
||||
structuredContent: { matches: 1 },
|
||||
});
|
||||
});
|
||||
|
||||
it('starts loopback HTTP MCP with a bearer token and reports the runtime URL', async () => {
|
||||
const close = vi.fn(async () => undefined);
|
||||
const runServer = vi.fn(async () => ({
|
||||
server: { address: () => ({ port: 4321 }) },
|
||||
close,
|
||||
}));
|
||||
|
||||
const handle = await startCodexRuntimeMcpServer({
|
||||
projectDir: '/tmp/ktx-project',
|
||||
toolSet: {},
|
||||
runServer: runServer as never,
|
||||
});
|
||||
|
||||
expect(handle.url).toBe('http://127.0.0.1:4321/mcp');
|
||||
expect(handle.bearerTokenEnvVar).toBe('KTX_CODEX_RUNTIME_MCP_TOKEN');
|
||||
expect(handle.bearerToken).toMatch(/^[a-f0-9]{64}$/);
|
||||
expect(runServer).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
projectDir: '/tmp/ktx-project',
|
||||
host: '127.0.0.1',
|
||||
port: 0,
|
||||
token: handle.bearerToken,
|
||||
allowedHosts: ['127.0.0.1', 'localhost'],
|
||||
allowedOrigins: [],
|
||||
}),
|
||||
);
|
||||
await handle.close();
|
||||
expect(close).toHaveBeenCalled();
|
||||
});
|
||||
});
|
||||
17
packages/cli/test/context/llm/codex-models.test.ts
Normal file
17
packages/cli/test/context/llm/codex-models.test.ts
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { resolveCodexModel } from '../../../src/context/llm/codex-models.js';
|
||||
|
||||
describe('resolveCodexModel', () => {
|
||||
it.each([
|
||||
['codex', 'gpt-5.5'],
|
||||
['default', 'gpt-5.5'],
|
||||
['gpt-5.3-codex-spark', 'gpt-5.3-codex-spark'],
|
||||
['gpt-5.4', 'gpt-5.4'],
|
||||
])('maps %s to %s', (input, expected) => {
|
||||
expect(resolveCodexModel(input)).toBe(expected);
|
||||
});
|
||||
|
||||
it.each(['', ' ', 'sonnet', 'claude-sonnet-4-6'])('rejects %s', (input) => {
|
||||
expect(() => resolveCodexModel(input)).toThrow('Unsupported Codex model');
|
||||
});
|
||||
});
|
||||
43
packages/cli/test/context/llm/codex-runtime-config.test.ts
Normal file
43
packages/cli/test/context/llm/codex-runtime-config.test.ts
Normal file
|
|
@ -0,0 +1,43 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { buildCodexRuntimeConfig } from '../../../src/context/llm/codex-runtime-config.js';
|
||||
|
||||
describe('buildCodexRuntimeConfig', () => {
|
||||
it('builds generic config without SDK thread-option fields', () => {
|
||||
expect(buildCodexRuntimeConfig({ model: 'gpt-5.3-codex' })).toEqual({
|
||||
configOverrides: {
|
||||
history: { persistence: 'none' },
|
||||
},
|
||||
env: {},
|
||||
});
|
||||
});
|
||||
|
||||
it('adds only the temporary ktx MCP server and exact enabled tools', () => {
|
||||
expect(
|
||||
buildCodexRuntimeConfig({
|
||||
model: 'gpt-5.3-codex',
|
||||
mcp: {
|
||||
url: 'http://127.0.0.1:4567/mcp',
|
||||
bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN',
|
||||
bearerToken: 'secret-token',
|
||||
toolNames: ['sl_read_source', 'wiki_search'],
|
||||
},
|
||||
}),
|
||||
).toEqual({
|
||||
configOverrides: {
|
||||
history: { persistence: 'none' },
|
||||
mcp_servers: {
|
||||
ktx: {
|
||||
url: 'http://127.0.0.1:4567/mcp',
|
||||
bearer_token_env_var: 'KTX_CODEX_RUNTIME_MCP_TOKEN',
|
||||
enabled_tools: ['sl_read_source', 'wiki_search'],
|
||||
default_tools_approval_mode: 'approve',
|
||||
required: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
env: {
|
||||
KTX_CODEX_RUNTIME_MCP_TOKEN: 'secret-token',
|
||||
},
|
||||
});
|
||||
});
|
||||
});
|
||||
460
packages/cli/test/context/llm/codex-runtime.test.ts
Normal file
460
packages/cli/test/context/llm/codex-runtime.test.ts
Normal file
|
|
@ -0,0 +1,460 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { z } from 'zod';
|
||||
import {
|
||||
CodexKtxLlmRuntime,
|
||||
runCodexAuthProbe,
|
||||
} from '../../../src/context/llm/codex-runtime.js';
|
||||
|
||||
async function* events(items: unknown[]) {
|
||||
for (const item of items) {
|
||||
yield item;
|
||||
}
|
||||
}
|
||||
|
||||
function runner(items: unknown[]) {
|
||||
return {
|
||||
runStreamed: vi.fn(async () => events(items)),
|
||||
};
|
||||
}
|
||||
|
||||
/** Yields the given events, then throws — mirroring the SDK throwing on a non-zero codex exec exit. */
|
||||
function throwingRunner(items: unknown[], error: Error) {
|
||||
return {
|
||||
runStreamed: vi.fn(async () =>
|
||||
(async function* () {
|
||||
for (const item of items) {
|
||||
yield item;
|
||||
}
|
||||
throw error;
|
||||
})(),
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
const MODEL_UNSUPPORTED_API_ERROR = JSON.stringify({
|
||||
type: 'error',
|
||||
status: 400,
|
||||
error: {
|
||||
type: 'invalid_request_error',
|
||||
message: "The 'gpt-5.3-codex' model is not supported when using Codex with a ChatGPT account.",
|
||||
},
|
||||
});
|
||||
|
||||
function budgetRunner() {
|
||||
let observedSignal: AbortSignal | undefined;
|
||||
return {
|
||||
observedSignal: () => observedSignal,
|
||||
runStreamed: vi.fn(async (input: { signal?: AbortSignal }) => {
|
||||
observedSignal = input.signal;
|
||||
return events([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'first', status: 'in_progress' } },
|
||||
{ type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'first', status: 'completed' } },
|
||||
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'second', status: 'in_progress' } },
|
||||
{ type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'second', status: 'completed' } },
|
||||
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
|
||||
]);
|
||||
}),
|
||||
};
|
||||
}
|
||||
|
||||
describe('CodexKtxLlmRuntime', () => {
|
||||
it('generates text with the role-selected model and metrics', async () => {
|
||||
const onMetrics = vi.fn();
|
||||
const fakeRunner = runner([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.completed', item: { type: 'agent_message', text: 'hello' } },
|
||||
{ type: 'turn.completed', usage: { input_tokens: 3, output_tokens: 4, total_tokens: 7 } },
|
||||
]);
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex', triage: 'gpt-5.4' },
|
||||
runner: fakeRunner,
|
||||
});
|
||||
|
||||
await expect(runtime.generateText({ role: 'triage', system: 'system', prompt: 'prompt', onMetrics })).resolves.toBe('hello');
|
||||
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
projectDir: '/tmp/project',
|
||||
model: 'gpt-5.4',
|
||||
prompt: 'system\n\nprompt',
|
||||
}),
|
||||
);
|
||||
expect(onMetrics).toHaveBeenCalledWith(expect.objectContaining({ usage: { inputTokens: 3, outputTokens: 4, totalTokens: 7 } }));
|
||||
});
|
||||
|
||||
it('generates and validates structured output', async () => {
|
||||
const fakeRunner = runner([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.completed', item: { type: 'agent_message', text: '{"answer":"yes"}' } },
|
||||
{ type: 'turn.completed' },
|
||||
]);
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: fakeRunner,
|
||||
});
|
||||
|
||||
await expect(
|
||||
runtime.generateObject({
|
||||
role: 'default',
|
||||
prompt: 'json',
|
||||
schema: z.object({ answer: z.string() }),
|
||||
}),
|
||||
).resolves.toEqual({ answer: 'yes' });
|
||||
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
outputSchema: expect.objectContaining({ type: 'object' }),
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it('returns a structured-output error when Codex final text is invalid JSON', async () => {
|
||||
const fakeRunner = runner([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.completed', item: { type: 'agent_message', text: 'not json' } },
|
||||
{ type: 'turn.completed' },
|
||||
]);
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: fakeRunner,
|
||||
});
|
||||
|
||||
await expect(
|
||||
runtime.generateObject({
|
||||
role: 'default',
|
||||
prompt: 'json',
|
||||
schema: z.object({ answer: z.string() }),
|
||||
}),
|
||||
).rejects.toThrow('Codex structured output failed validation');
|
||||
});
|
||||
|
||||
it('starts and closes a temporary MCP server for tool-backed agent loops', async () => {
|
||||
const close = vi.fn(async () => undefined);
|
||||
const startMcpServer = vi.fn(async () => ({
|
||||
url: 'http://127.0.0.1:4321/mcp',
|
||||
bearerTokenEnvVar: 'KTX_CODEX_RUNTIME_MCP_TOKEN' as const,
|
||||
bearerToken: 'token',
|
||||
close,
|
||||
}));
|
||||
const fakeRunner = runner([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.started', item: { type: 'mcp_tool_call', name: 'wiki_search' } },
|
||||
{ type: 'item.completed', item: { type: 'agent_message', text: 'done' } },
|
||||
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1, total_tokens: 2 } },
|
||||
]);
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: fakeRunner,
|
||||
startMcpServer,
|
||||
});
|
||||
const onStepFinish = vi.fn();
|
||||
|
||||
const result = await runtime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
stepBudget: 5,
|
||||
telemetryTags: {},
|
||||
onStepFinish,
|
||||
toolSet: {
|
||||
aliased_wiki_tool: {
|
||||
name: 'wiki_search',
|
||||
description: 'Search wiki',
|
||||
inputSchema: z.object({ query: z.string() }),
|
||||
execute: vi.fn(),
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(result.stopReason).toBe('natural');
|
||||
expect(result.metrics).toMatchObject({ stepCount: 1, usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 } });
|
||||
expect(onStepFinish).toHaveBeenCalledWith({ stepIndex: 1, stepBudget: 5 });
|
||||
expect(startMcpServer).toHaveBeenCalledWith({ projectDir: '/tmp/project', toolSet: expect.any(Object) });
|
||||
expect(fakeRunner.runStreamed).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
env: { KTX_CODEX_RUNTIME_MCP_TOKEN: 'token' },
|
||||
configOverrides: expect.objectContaining({
|
||||
mcp_servers: expect.objectContaining({
|
||||
ktx: expect.objectContaining({
|
||||
url: 'http://127.0.0.1:4321/mcp',
|
||||
enabled_tools: ['wiki_search'],
|
||||
required: true,
|
||||
}),
|
||||
}),
|
||||
}),
|
||||
}),
|
||||
);
|
||||
expect(close).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('returns error stop reason on turn failure', async () => {
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: runner([{ type: 'turn.failed', error: { message: 'boom' } }]),
|
||||
});
|
||||
|
||||
const result = await runtime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
stepBudget: 5,
|
||||
telemetryTags: {},
|
||||
toolSet: {},
|
||||
});
|
||||
|
||||
expect(result.stopReason).toBe('error');
|
||||
expect(result.error?.message).toBe('boom');
|
||||
});
|
||||
|
||||
it('surfaces failed MCP tool calls as agent-loop errors', async () => {
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: runner([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.started', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'search', status: 'in_progress' } },
|
||||
{
|
||||
type: 'item.completed',
|
||||
item: {
|
||||
type: 'mcp_tool_call',
|
||||
server: 'ktx',
|
||||
tool: 'search',
|
||||
status: 'failed',
|
||||
error: { message: 'denied' },
|
||||
},
|
||||
},
|
||||
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
|
||||
]),
|
||||
});
|
||||
|
||||
const result = await runtime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
stepBudget: 5,
|
||||
telemetryTags: {},
|
||||
toolSet: {},
|
||||
});
|
||||
|
||||
expect(result.stopReason).toBe('error');
|
||||
expect(result.error?.message).toBe('Codex runtime tool call failed: search: denied');
|
||||
expect(result.metrics).toMatchObject({
|
||||
stepCount: 1,
|
||||
usage: { inputTokens: 1, outputTokens: 1, totalTokens: 2 },
|
||||
});
|
||||
});
|
||||
|
||||
it('returns budget and aborts the Codex stream when local MCP step budget is reached', async () => {
|
||||
const fakeRunner = budgetRunner();
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: fakeRunner,
|
||||
});
|
||||
const onStepFinish = vi.fn();
|
||||
|
||||
const result = await runtime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
stepBudget: 1,
|
||||
telemetryTags: {},
|
||||
onStepFinish,
|
||||
toolSet: {
|
||||
first: {
|
||||
name: 'first',
|
||||
description: 'First tool',
|
||||
inputSchema: z.object({}),
|
||||
execute: vi.fn(),
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(result.stopReason).toBe('budget');
|
||||
expect(result.error).toBeUndefined();
|
||||
expect(result.metrics).toMatchObject({ stepCount: 1 });
|
||||
expect(onStepFinish).toHaveBeenCalledTimes(1);
|
||||
expect(onStepFinish).toHaveBeenCalledWith({ stepIndex: 1, stepBudget: 1 });
|
||||
expect(fakeRunner.observedSignal()?.aborted).toBe(true);
|
||||
});
|
||||
|
||||
it('counts built-in command_execution steps against the budget and aborts the stream', async () => {
|
||||
let observedSignal: AbortSignal | undefined;
|
||||
const fakeRunner = {
|
||||
observedSignal: () => observedSignal,
|
||||
runStreamed: vi.fn(async (input: { signal?: AbortSignal }) => {
|
||||
observedSignal = input.signal;
|
||||
return events([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.started', item: { type: 'command_execution', command: 'ls', status: 'in_progress' } },
|
||||
{ type: 'item.completed', item: { type: 'command_execution', command: 'ls', status: 'completed', exit_code: 0 } },
|
||||
{ type: 'item.started', item: { type: 'command_execution', command: 'cat a', status: 'in_progress' } },
|
||||
{ type: 'item.completed', item: { type: 'command_execution', command: 'cat a', status: 'completed', exit_code: 0 } },
|
||||
{ type: 'item.completed', item: { type: 'command_execution', command: 'cat b', status: 'completed', exit_code: 0 } },
|
||||
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } },
|
||||
]);
|
||||
}),
|
||||
};
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: fakeRunner,
|
||||
});
|
||||
const onStepFinish = vi.fn();
|
||||
|
||||
const result = await runtime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
stepBudget: 2,
|
||||
telemetryTags: {},
|
||||
onStepFinish,
|
||||
toolSet: {},
|
||||
});
|
||||
|
||||
expect(result.stopReason).toBe('budget');
|
||||
expect(result.error).toBeUndefined();
|
||||
expect(result.metrics).toMatchObject({ stepCount: 2 });
|
||||
expect(onStepFinish).toHaveBeenCalledTimes(2);
|
||||
expect(onStepFinish).toHaveBeenLastCalledWith({ stepIndex: 2, stepBudget: 2 });
|
||||
expect(fakeRunner.observedSignal()?.aborted).toBe(true);
|
||||
});
|
||||
|
||||
it('fires onStepFinish live as each step completes, before the stream drains', async () => {
|
||||
const order: string[] = [];
|
||||
async function* liveEvents() {
|
||||
yield { type: 'turn.started' };
|
||||
yield { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'a', status: 'completed' } };
|
||||
order.push('yielded-after-step-1');
|
||||
yield { type: 'item.completed', item: { type: 'mcp_tool_call', server: 'ktx', tool: 'b', status: 'completed' } };
|
||||
order.push('yielded-after-step-2');
|
||||
yield { type: 'item.completed', item: { type: 'agent_message', text: 'done' } };
|
||||
yield { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 1 } };
|
||||
}
|
||||
const fakeRunner = { runStreamed: vi.fn(async () => liveEvents()) };
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: fakeRunner,
|
||||
});
|
||||
|
||||
const result = await runtime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
stepBudget: 10,
|
||||
telemetryTags: {},
|
||||
onStepFinish: ({ stepIndex }) => {
|
||||
order.push(`step-${stepIndex}`);
|
||||
},
|
||||
toolSet: {},
|
||||
});
|
||||
|
||||
expect(result.stopReason).toBe('natural');
|
||||
expect(result.metrics).toMatchObject({ stepCount: 2 });
|
||||
expect(order).toEqual(['step-1', 'yielded-after-step-1', 'step-2', 'yielded-after-step-2']);
|
||||
});
|
||||
|
||||
it('surfaces the real Codex error event even when the SDK stream throws afterward', async () => {
|
||||
// The SDK yields the error/turn.failed events on stdout, then throws on the
|
||||
// non-zero exit. The masked exit message must not hide the real API error.
|
||||
const fakeRunner = throwingRunner(
|
||||
[
|
||||
{ type: 'thread.started', thread_id: 't' },
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'error', message: MODEL_UNSUPPORTED_API_ERROR },
|
||||
{ type: 'turn.failed', error: { message: MODEL_UNSUPPORTED_API_ERROR } },
|
||||
],
|
||||
new Error('Codex Exec exited with code 1: Reading prompt from stdin...'),
|
||||
);
|
||||
const runtime = new CodexKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'codex' },
|
||||
runner: fakeRunner,
|
||||
});
|
||||
|
||||
await expect(runtime.generateText({ role: 'default', prompt: 'hi' })).rejects.toThrow(
|
||||
'not supported when using Codex with a ChatGPT account',
|
||||
);
|
||||
});
|
||||
|
||||
it('probes Codex authentication through a minimal non-interactive turn', async () => {
|
||||
const fakeRunner = runner([
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'item.completed', item: { type: 'agent_message', text: 'ok' } },
|
||||
{ type: 'turn.completed' },
|
||||
]);
|
||||
|
||||
await expect(
|
||||
runCodexAuthProbe({
|
||||
projectDir: '/tmp/project',
|
||||
model: 'codex',
|
||||
runner: fakeRunner,
|
||||
}),
|
||||
).resolves.toEqual({ ok: true });
|
||||
});
|
||||
|
||||
it('reports an unavailable model without blaming auth when Codex rejects the model', async () => {
|
||||
const fakeRunner = throwingRunner(
|
||||
[
|
||||
{ type: 'turn.started' },
|
||||
{ type: 'turn.failed', error: { message: MODEL_UNSUPPORTED_API_ERROR } },
|
||||
],
|
||||
new Error('Codex Exec exited with code 1: Reading prompt from stdin...'),
|
||||
);
|
||||
|
||||
const result = await runCodexAuthProbe({
|
||||
projectDir: '/tmp/project',
|
||||
model: 'gpt-5.3-codex',
|
||||
runner: fakeRunner,
|
||||
});
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (!result.ok) {
|
||||
expect(result.message).not.toContain('authentication is not usable');
|
||||
expect(result.message).toContain('not available');
|
||||
expect(result.message).toContain('gpt-5.3-codex');
|
||||
expect(result.message).toContain('not supported when using Codex with a ChatGPT account');
|
||||
// A model-access failure must steer the user at the model config, not auth.
|
||||
expect(result.fix).toContain('llm.models.default');
|
||||
expect(result.fix).not.toContain('Authenticate Codex');
|
||||
}
|
||||
});
|
||||
|
||||
it('reports an auth failure when Codex exits without an error event', async () => {
|
||||
const fakeRunner = throwingRunner(
|
||||
[],
|
||||
new Error('Codex Exec exited with code 1: Not logged in. Run `codex login`.'),
|
||||
);
|
||||
|
||||
const result = await runCodexAuthProbe({
|
||||
projectDir: '/tmp/project',
|
||||
model: 'gpt-5.5',
|
||||
runner: fakeRunner,
|
||||
});
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (!result.ok) {
|
||||
expect(result.message).toContain('authentication is not usable');
|
||||
expect(result.message).toContain('Not logged in');
|
||||
expect(result.fix).toContain('Authenticate Codex');
|
||||
}
|
||||
});
|
||||
|
||||
it('rejects an unsupported model id before probing, steering at llm.models.default', async () => {
|
||||
const result = await runCodexAuthProbe({
|
||||
projectDir: '/tmp/project',
|
||||
model: 'not-a-real-model',
|
||||
});
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (!result.ok) {
|
||||
expect(result.message).toContain('Unsupported Codex model');
|
||||
expect(result.fix).toContain('llm.models.default');
|
||||
}
|
||||
});
|
||||
});
|
||||
97
packages/cli/test/context/llm/codex-sdk-runner.test.ts
Normal file
97
packages/cli/test/context/llm/codex-sdk-runner.test.ts
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
|
||||
const sdkMock = vi.hoisted(() => {
|
||||
const events = (async function* () {
|
||||
yield { type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 2 } };
|
||||
})();
|
||||
const runStreamed = vi.fn(async () => ({ events }));
|
||||
const startThread = vi.fn(() => ({ runStreamed }));
|
||||
const Codex = vi.fn(function Codex(this: { startThread: typeof startThread }, options?: unknown) {
|
||||
Object.assign(this, { options, startThread });
|
||||
});
|
||||
return { Codex, startThread, runStreamed };
|
||||
});
|
||||
|
||||
vi.mock('@openai/codex-sdk', () => ({ Codex: sdkMock.Codex }));
|
||||
|
||||
import { CodexSdkCliRunner } from '../../../src/context/llm/codex-sdk-runner.js';
|
||||
|
||||
async function collectAsync<T>(items: AsyncIterable<T>): Promise<T[]> {
|
||||
const collected: T[] = [];
|
||||
for await (const item of items) {
|
||||
collected.push(item);
|
||||
}
|
||||
return collected;
|
||||
}
|
||||
|
||||
describe('CodexSdkCliRunner', () => {
|
||||
it('passes isolated env through the SDK and runtime controls through thread options', async () => {
|
||||
const runner = new CodexSdkCliRunner({
|
||||
envBase: {
|
||||
HOME: '/home/ktx-user',
|
||||
PATH: '/usr/local/bin:/usr/bin',
|
||||
CODEX_HOME: '/home/ktx-user/.codex',
|
||||
HTTPS_PROXY: 'http://proxy.example',
|
||||
KTX_UNRELATED_SECRET: 'must-not-copy', // pragma: allowlist secret
|
||||
},
|
||||
});
|
||||
const previousToken = process.env.KTX_CODEX_RUNTIME_MCP_TOKEN;
|
||||
process.env.KTX_CODEX_RUNTIME_MCP_TOKEN = 'outer-token';
|
||||
const outputSchema = {
|
||||
type: 'object',
|
||||
properties: { answer: { type: 'string' } },
|
||||
required: ['answer'],
|
||||
additionalProperties: false,
|
||||
};
|
||||
const controller = new AbortController();
|
||||
|
||||
try {
|
||||
const events = await runner.runStreamed({
|
||||
projectDir: '/tmp/ktx-project',
|
||||
model: 'gpt-5.3-codex',
|
||||
prompt: 'Return JSON.',
|
||||
configOverrides: {
|
||||
history: { persistence: 'none' },
|
||||
},
|
||||
env: { KTX_CODEX_RUNTIME_MCP_TOKEN: 'run-token' },
|
||||
outputSchema,
|
||||
signal: controller.signal,
|
||||
});
|
||||
|
||||
expect(sdkMock.Codex).toHaveBeenCalledWith({
|
||||
config: {
|
||||
history: { persistence: 'none' },
|
||||
},
|
||||
env: {
|
||||
HOME: '/home/ktx-user',
|
||||
PATH: '/usr/local/bin:/usr/bin',
|
||||
CODEX_HOME: '/home/ktx-user/.codex',
|
||||
HTTPS_PROXY: 'http://proxy.example',
|
||||
KTX_CODEX_RUNTIME_MCP_TOKEN: 'run-token',
|
||||
},
|
||||
});
|
||||
expect(process.env.KTX_CODEX_RUNTIME_MCP_TOKEN).toBe('outer-token');
|
||||
expect(sdkMock.startThread).toHaveBeenCalledWith({
|
||||
workingDirectory: '/tmp/ktx-project',
|
||||
skipGitRepoCheck: true,
|
||||
model: 'gpt-5.3-codex',
|
||||
sandboxMode: 'read-only',
|
||||
webSearchMode: 'disabled',
|
||||
approvalPolicy: 'never',
|
||||
});
|
||||
expect(sdkMock.runStreamed).toHaveBeenCalledWith('Return JSON.', {
|
||||
outputSchema,
|
||||
signal: controller.signal,
|
||||
});
|
||||
await expect(collectAsync(events)).resolves.toEqual([
|
||||
{ type: 'turn.completed', usage: { input_tokens: 1, output_tokens: 2 } },
|
||||
]);
|
||||
} finally {
|
||||
if (previousToken === undefined) {
|
||||
delete process.env.KTX_CODEX_RUNTIME_MCP_TOKEN;
|
||||
} else {
|
||||
process.env.KTX_CODEX_RUNTIME_MCP_TOKEN = previousToken;
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
|
|
@ -22,4 +22,25 @@ describe('local KTX LLM runtime config', () => {
|
|||
}),
|
||||
).toBeNull();
|
||||
});
|
||||
|
||||
it('creates a Codex runtime for codex backend without creating an AI SDK provider', () => {
|
||||
const runtime = createLocalKtxLlmRuntimeFromConfig(
|
||||
{
|
||||
provider: { backend: 'codex' },
|
||||
models: { default: 'codex', triage: 'gpt-5.4' },
|
||||
},
|
||||
{ env: {}, projectDir: '/tmp/project', createCodexRuntime: vi.fn((deps) => ({ deps }) as never) },
|
||||
);
|
||||
|
||||
expect(runtime).toMatchObject({ deps: expect.objectContaining({ projectDir: '/tmp/project' }) });
|
||||
});
|
||||
|
||||
it('returns null from the AI SDK provider factory for codex backend', () => {
|
||||
expect(
|
||||
createLocalKtxLlmProviderFromConfig({
|
||||
provider: { backend: 'codex' },
|
||||
models: { default: 'codex' },
|
||||
}),
|
||||
).toBeNull();
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -231,6 +231,31 @@ llm:
|
|||
});
|
||||
});
|
||||
|
||||
it('parses Codex as a first-class LLM backend', () => {
|
||||
const config = parseKtxProjectConfig(`
|
||||
llm:
|
||||
provider:
|
||||
backend: codex
|
||||
models:
|
||||
default: gpt-5.3-codex
|
||||
triage: gpt-5.3-codex
|
||||
candidateExtraction: gpt-5.3-codex
|
||||
curator: gpt-5.3-codex
|
||||
reconcile: gpt-5.3-codex
|
||||
repair: gpt-5.3-codex
|
||||
`);
|
||||
|
||||
expect(config.llm.provider.backend).toBe('codex');
|
||||
expect(config.llm.models).toEqual({
|
||||
default: 'gpt-5.3-codex',
|
||||
triage: 'gpt-5.3-codex',
|
||||
candidateExtraction: 'gpt-5.3-codex',
|
||||
curator: 'gpt-5.3-codex',
|
||||
reconcile: 'gpt-5.3-codex',
|
||||
repair: 'gpt-5.3-codex',
|
||||
});
|
||||
});
|
||||
|
||||
it('parses gateway LLM, OpenAI scan embeddings, and sentence-transformers ingest embeddings', () => {
|
||||
const config = parseKtxProjectConfig(`
|
||||
llm:
|
||||
|
|
@ -530,7 +555,7 @@ describe('generateKtxProjectConfigJsonSchema', () => {
|
|||
const llm = (schema.properties as Record<string, { properties?: Record<string, unknown> }>).llm;
|
||||
const provider = llm?.properties?.provider as { properties?: Record<string, unknown> };
|
||||
const backend = provider?.properties?.backend as { enum?: readonly string[] };
|
||||
expect(backend?.enum).toEqual(['none', 'anthropic', 'vertex', 'gateway', 'claude-code']);
|
||||
expect(backend?.enum).toEqual(['none', 'anthropic', 'vertex', 'gateway', 'claude-code', 'codex']);
|
||||
|
||||
const storage = (schema.properties as Record<string, { properties?: Record<string, unknown> }>).storage;
|
||||
const state = storage?.properties?.state as { enum?: readonly string[] };
|
||||
|
|
|
|||
|
|
@ -422,6 +422,8 @@ describe('runKtxDoctor', () => {
|
|||
'llm:',
|
||||
' provider:',
|
||||
' backend: anthropic',
|
||||
' models:',
|
||||
' default: claude-sonnet-4-5',
|
||||
'',
|
||||
].join('\n'),
|
||||
'utf-8',
|
||||
|
|
@ -543,6 +545,8 @@ describe('runKtxDoctor', () => {
|
|||
'llm:',
|
||||
' provider:',
|
||||
' backend: anthropic',
|
||||
' models:',
|
||||
' default: claude-sonnet-4-5',
|
||||
'ingest:',
|
||||
' adapters:',
|
||||
' - live-database',
|
||||
|
|
@ -652,6 +656,8 @@ describe('runKtxDoctor', () => {
|
|||
'llm:',
|
||||
' provider:',
|
||||
' backend: anthropic',
|
||||
' models:',
|
||||
' default: claude-sonnet-4-5',
|
||||
'',
|
||||
].join('\n'),
|
||||
'utf-8',
|
||||
|
|
@ -698,6 +704,8 @@ describe('runKtxDoctor', () => {
|
|||
'llm:',
|
||||
' provider:',
|
||||
' backend: anthropic',
|
||||
' models:',
|
||||
' default: claude-sonnet-4-5',
|
||||
'ingest:',
|
||||
' adapters:',
|
||||
' - live-database',
|
||||
|
|
|
|||
|
|
@ -337,10 +337,13 @@ describe('runKtxIngest', () => {
|
|||
|
||||
expect(runIo.stdout()).toBe('');
|
||||
expect(runIo.stderr()).toContain(
|
||||
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
|
||||
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.',
|
||||
);
|
||||
expect(runIo.stderr()).toContain('Configure a local Claude Code session or API-backed LLM, then rerun ingest:');
|
||||
expect(runIo.stderr()).toContain('Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:');
|
||||
expect(runIo.stderr()).toContain(`ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`);
|
||||
expect(runIo.stderr()).toContain(
|
||||
`ktx setup --project-dir ${projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`,
|
||||
);
|
||||
expect(runIo.stderr()).toContain(
|
||||
`ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`,
|
||||
);
|
||||
|
|
|
|||
|
|
@ -312,4 +312,13 @@ describe('createKtxLlmProvider', () => {
|
|||
}),
|
||||
).toThrow('claude-code is not an AI SDK LanguageModel backend');
|
||||
});
|
||||
|
||||
it('rejects codex as an AI SDK LanguageModel backend', () => {
|
||||
expect(() =>
|
||||
createKtxLlmProvider({
|
||||
backend: 'codex',
|
||||
modelSlots: { default: 'gpt-5.3-codex' },
|
||||
}),
|
||||
).toThrow('codex is not an AI SDK LanguageModel backend');
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -66,6 +66,7 @@ function makePromptAdapter(options: {
|
|||
nextProviderChoice === 'anthropic' ||
|
||||
nextProviderChoice === 'vertex' ||
|
||||
nextProviderChoice === 'claude-code' ||
|
||||
nextProviderChoice === 'codex' ||
|
||||
nextProviderChoice === 'back'
|
||||
) {
|
||||
return selectValues.shift() ?? nextProviderChoice;
|
||||
|
|
@ -183,6 +184,7 @@ describe('setup Anthropic model step', () => {
|
|||
message: expect.stringContaining('Which LLM provider should KTX use?'),
|
||||
options: [
|
||||
{ value: 'claude-code', label: 'Claude subscription (Pro/Max)' },
|
||||
{ value: 'codex', label: 'Codex subscription' },
|
||||
{ value: 'anthropic', label: 'Anthropic API key' },
|
||||
{ value: 'vertex', label: 'Google Vertex AI for Anthropic Claude' },
|
||||
{ value: 'back', label: 'Back' },
|
||||
|
|
@ -215,6 +217,85 @@ describe('setup Anthropic model step', () => {
|
|||
expect(authProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'sonnet' }));
|
||||
});
|
||||
|
||||
it('configures Codex backend and validates local auth', async () => {
|
||||
const io = makeIo();
|
||||
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
|
||||
|
||||
const result = await runKtxSetupAnthropicModelStep(
|
||||
{
|
||||
projectDir: tempDir,
|
||||
inputMode: 'disabled',
|
||||
llmBackend: 'codex',
|
||||
llmModel: 'gpt-5.5',
|
||||
skipLlm: false,
|
||||
},
|
||||
io.io,
|
||||
{ codexAuthProbe },
|
||||
);
|
||||
|
||||
expect(result.status).toBe('ready');
|
||||
const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8'));
|
||||
expect(config.llm).toMatchObject({
|
||||
provider: { backend: 'codex' },
|
||||
models: { default: 'gpt-5.5' },
|
||||
});
|
||||
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'gpt-5.5' }));
|
||||
// The warning carries the clack gutter so it renders inside the setup frame.
|
||||
expect(io.stderr()).toContain('│ Codex backend isolation is limited');
|
||||
expect(io.stderr()).toContain('may still load user Codex config');
|
||||
});
|
||||
|
||||
it('defaults the Codex model to gpt-5.5 when none is provided non-interactively', async () => {
|
||||
const io = makeIo();
|
||||
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
|
||||
|
||||
const result = await runKtxSetupAnthropicModelStep(
|
||||
{
|
||||
projectDir: tempDir,
|
||||
inputMode: 'disabled',
|
||||
llmBackend: 'codex',
|
||||
skipLlm: false,
|
||||
},
|
||||
io.io,
|
||||
{ codexAuthProbe },
|
||||
);
|
||||
|
||||
expect(result.status).toBe('ready');
|
||||
const config = parseKtxProjectConfig(await readFile(join(tempDir, 'ktx.yaml'), 'utf-8'));
|
||||
expect(config.llm).toMatchObject({
|
||||
provider: { backend: 'codex' },
|
||||
models: { default: 'gpt-5.5' },
|
||||
});
|
||||
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ projectDir: tempDir, model: 'gpt-5.5' }));
|
||||
});
|
||||
|
||||
it('offers the curated Codex models during interactive setup', async () => {
|
||||
const io = makeIo();
|
||||
const prompts = makePromptAdapter({ selectValues: ['codex', 'gpt-5.5'] });
|
||||
const codexAuthProbe = vi.fn(async () => ({ ok: true as const }));
|
||||
|
||||
const result = await runKtxSetupAnthropicModelStep(
|
||||
{ projectDir: tempDir, inputMode: 'auto', skipLlm: false },
|
||||
io.io,
|
||||
{ prompts, codexAuthProbe },
|
||||
);
|
||||
|
||||
expect(result.status).toBe('ready');
|
||||
expect(prompts.select).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
message: expect.stringContaining('Which Codex model should KTX use?'),
|
||||
options: [
|
||||
{ value: 'gpt-5.5', label: 'GPT-5.5', hint: 'recommended' },
|
||||
{ value: 'gpt-5.4', label: 'GPT-5.4' },
|
||||
{ value: 'gpt-5.4-mini', label: 'GPT-5.4 mini' },
|
||||
{ value: 'manual', label: 'Enter a Codex model ID manually' },
|
||||
{ value: 'back', label: 'Back' },
|
||||
],
|
||||
}),
|
||||
);
|
||||
expect(codexAuthProbe).toHaveBeenCalledWith(expect.objectContaining({ model: 'gpt-5.5' }));
|
||||
});
|
||||
|
||||
it('prompts for the Claude Code model during interactive setup', async () => {
|
||||
const io = makeIo();
|
||||
const prompts = makePromptAdapter({ selectValues: ['claude-code', 'opus'] });
|
||||
|
|
|
|||
|
|
@ -44,6 +44,17 @@ function withClaudeCodeLlm(config: KtxProjectConfig): KtxProjectConfig {
|
|||
};
|
||||
}
|
||||
|
||||
function withCodexLlm(config: KtxProjectConfig): KtxProjectConfig {
|
||||
return {
|
||||
...config,
|
||||
llm: {
|
||||
...config.llm,
|
||||
provider: { backend: 'codex' },
|
||||
models: { ...config.llm.models, default: 'gpt-5.5' },
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function baseProjectConfig(): KtxProjectConfig {
|
||||
return withClaudeCodeLlm(buildDefaultKtxProjectConfig());
|
||||
}
|
||||
|
|
@ -391,6 +402,126 @@ describe('buildProjectStatus --fast', () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe('buildProjectStatus codex', () => {
|
||||
it('reports authenticated local Codex session', async () => {
|
||||
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
|
||||
const status = await buildProjectStatus(project, {
|
||||
codexAuthProbe: async () => ({ ok: true as const }),
|
||||
});
|
||||
|
||||
expect(status.llm).toMatchObject({
|
||||
backend: 'codex',
|
||||
model: 'gpt-5.5',
|
||||
status: 'ok',
|
||||
detail: 'local Codex session authenticated',
|
||||
});
|
||||
expect(status.warnings).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
message: expect.stringContaining('Codex backend isolation is limited'),
|
||||
fix: expect.stringContaining('claude-code'),
|
||||
}),
|
||||
]),
|
||||
);
|
||||
const rendered = renderProjectStatus(status, { verbose: false, useColor: false });
|
||||
expect(rendered).toContain('Codex backend isolation is limited');
|
||||
});
|
||||
|
||||
it('skips Codex auth probe with --fast', async () => {
|
||||
let probeCalls = 0;
|
||||
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
|
||||
const status = await buildProjectStatus(project, {
|
||||
fast: true,
|
||||
codexAuthProbe: async () => {
|
||||
probeCalls += 1;
|
||||
return { ok: true };
|
||||
},
|
||||
});
|
||||
|
||||
expect(probeCalls).toBe(0);
|
||||
expect(status.llm.status).toBe('skipped');
|
||||
expect(status.llm.detail).toMatch(/--fast/);
|
||||
});
|
||||
|
||||
it('surfaces the probe fix for a model-access failure instead of an auth fix', async () => {
|
||||
const project = projectWithConfig(withCodexLlm(buildDefaultKtxProjectConfig()));
|
||||
const status = await buildProjectStatus(project, {
|
||||
codexAuthProbe: async () => ({
|
||||
ok: false,
|
||||
message: 'Codex is authenticated, but the configured model "gpt-5.5" is not available...',
|
||||
fix: 'Run `codex` to see the models your account supports, then set llm.models.default in ktx.yaml (or rerun `ktx setup`).',
|
||||
}),
|
||||
});
|
||||
|
||||
expect(status.llm.status).toBe('fail');
|
||||
expect(status.llm.fix).toContain('llm.models.default');
|
||||
expect(status.llm.fix).not.toContain('Authenticate Codex');
|
||||
});
|
||||
});
|
||||
|
||||
describe('buildProjectStatus llm models.default requirement', () => {
|
||||
function withBackendNoModel(
|
||||
backend: KtxProjectConfig['llm']['provider']['backend'],
|
||||
): KtxProjectConfig {
|
||||
const config = buildDefaultKtxProjectConfig();
|
||||
return {
|
||||
...config,
|
||||
llm: { ...config.llm, provider: { backend }, models: {} },
|
||||
};
|
||||
}
|
||||
|
||||
it('fails codex without llm.models.default and never probes', async () => {
|
||||
let probeCalls = 0;
|
||||
const project = projectWithConfig(withBackendNoModel('codex'));
|
||||
const status = await buildProjectStatus(project, {
|
||||
codexAuthProbe: async () => {
|
||||
probeCalls += 1;
|
||||
return { ok: true };
|
||||
},
|
||||
});
|
||||
|
||||
expect(probeCalls).toBe(0);
|
||||
expect(status.llm.status).toBe('fail');
|
||||
expect(status.llm.detail).toContain('llm.models.default');
|
||||
expect(status.verdict).toBe('blocked');
|
||||
});
|
||||
|
||||
it('fails claude-code without llm.models.default and never probes', async () => {
|
||||
let probeCalls = 0;
|
||||
const project = projectWithConfig(withBackendNoModel('claude-code'));
|
||||
const status = await buildProjectStatus(project, {
|
||||
claudeCodeAuthProbe: async () => {
|
||||
probeCalls += 1;
|
||||
return { ok: true };
|
||||
},
|
||||
});
|
||||
|
||||
expect(probeCalls).toBe(0);
|
||||
expect(status.llm.status).toBe('fail');
|
||||
expect(status.llm.detail).toContain('llm.models.default');
|
||||
expect(status.verdict).toBe('blocked');
|
||||
});
|
||||
|
||||
it('fails anthropic without llm.models.default even when the key is set', async () => {
|
||||
const config = withBackendNoModel('anthropic');
|
||||
const project = projectWithConfig({
|
||||
...config,
|
||||
llm: {
|
||||
...config.llm,
|
||||
provider: { backend: 'anthropic', anthropic: { api_key: 'env:ANTHROPIC_API_KEY' } }, // pragma: allowlist secret
|
||||
models: {},
|
||||
},
|
||||
});
|
||||
const status = await buildProjectStatus(project, {
|
||||
env: { ANTHROPIC_API_KEY: 'sk-test' }, // pragma: allowlist secret
|
||||
});
|
||||
|
||||
expect(status.llm.status).toBe('fail');
|
||||
expect(status.llm.detail).toContain('llm.models.default');
|
||||
expect(status.verdict).toBe('blocked');
|
||||
});
|
||||
});
|
||||
|
||||
describe('buildLocalStatsStatus', () => {
|
||||
let tempDir: string;
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue