feat: add codex llm backend for ktx runtime work (#253)

* feat: add codex sdk runner foundation

* feat: parse codex runtime events

* feat: expose codex runtime mcp tools

* feat: add codex llm runtime

* feat: wire codex llm backend

* test: avoid Array.fromAsync in codex runner test

* docs: document codex llm backend

* fix: tighten codex runtime config ownership

* fix: use codex sdk env and thread options

* fix: parse codex sdk event shapes

* test: add codex backend live smoke

* docs: clarify codex backend isolation

* fix: drive codex loop metrics from mcp events

* fix: enforce codex local step budget

* docs: disclose codex isolation limits

* fix: count all codex agent steps and stream step callbacks live

The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.

collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.

* test: align ingest llm-guard assertions with codex backend

The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.

* fix: treat codex error:null tool calls as success

The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.

Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.

* fix: default codex backend to gpt-5.5 and report real probe errors

The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.

- Default codex model is now gpt-5.5 (works on both subscription and API-key
  auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
  keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
  failure and surfaces the real API error: collectEvents retains stream
  events when the SDK throws on a non-zero exit, and the API error JSON
  envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.

* fix: require llm.models.default in status and match codex probe remediation

Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.

Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.

Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.

Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
This commit is contained in:
Andrey Avtomonov 2026-06-02 13:57:11 +02:00 committed by GitHub
parent 74c6076b72
commit 494618ab14
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
41 changed files with 2544 additions and 30 deletions

View file

@ -3,6 +3,9 @@ import { writeFile } from 'node:fs/promises';
import { promisify } from 'node:util';
import { resolveLocalKtxLlmConfig } from './context/llm/local-config.js';
import { runClaudeCodeAuthProbe } from './context/llm/claude-code-runtime.js';
import { formatCodexIsolationWarning } from './context/llm/codex-isolation.js';
import { runCodexAuthProbe } from './context/llm/codex-runtime.js';
import { DEFAULT_CODEX_MODEL } from './context/llm/codex-models.js';
import { resolveKtxConfigReference } from './context/core/config-reference.js';
import { type KtxProjectConfig, type KtxProjectLlmConfig, serializeKtxProjectConfig } from './context/project/config.js';
import { loadKtxProject } from './context/project/project.js';
@ -56,7 +59,7 @@ export interface AnthropicModelChoice {
recommended: boolean;
}
export type KtxSetupLlmBackend = 'anthropic' | 'vertex' | 'claude-code';
export type KtxSetupLlmBackend = 'anthropic' | 'vertex' | 'claude-code' | 'codex';
/** @internal */
export interface KtxSetupModelPromptAdapter {
@ -82,6 +85,7 @@ export interface KtxSetupModelDeps {
model: string;
env?: NodeJS.ProcessEnv;
}) => Promise<{ ok: true } | { ok: false; message: string }>;
codexAuthProbe?: (input: { projectDir: string; model: string }) => Promise<{ ok: true } | { ok: false; message: string }>;
readGcloudProject?: () => Promise<string | undefined>;
listGcloudProjects?: () => Promise<GcloudProjectChoice[]>;
spinner?: () => KtxCliSpinner;
@ -110,6 +114,20 @@ const CLAUDE_CODE_MODELS: AnthropicModelChoice[] = [
{ id: 'haiku', label: 'Claude Haiku', recommended: false },
];
// Curated Codex models from OpenAI's current lineup that work under both
// ChatGPT-account (subscription) and API-key auth. Intentionally omitted:
// the `*-codex` ids (e.g. gpt-5.3-codex, gpt-5.2-codex) are API-key-only and
// fail on ChatGPT-account auth, and gpt-5.3-codex-spark is a ChatGPT-Pro-only
// research preview. Codex resolves real availability per account at runtime
// (its binary remote-fetches the model list), so this is a convenience
// shortlist only — the manual-entry option accepts any id your account's
// `codex` picker exposes, and the auth probe reports an unsupported choice.
const CODEX_MODELS: AnthropicModelChoice[] = [
{ id: 'gpt-5.5', label: 'GPT-5.5', recommended: true },
{ id: 'gpt-5.4', label: 'GPT-5.4', recommended: false },
{ id: 'gpt-5.4-mini', label: 'GPT-5.4 mini', recommended: false },
];
const HIDDEN_ANTHROPIC_MODEL_PATTERNS = [
/^claude-sonnet-4$/i,
/^claude-opus-4$/i,
@ -272,7 +290,12 @@ export function isKtxSetupLlmConfigReady(config: KtxProjectLlmConfig): boolean {
return typeof resolved.vertex?.location === 'string' && resolved.vertex.location.trim().length > 0;
}
return resolved.backend === 'anthropic' || resolved.backend === 'gateway' || resolved.backend === 'claude-code';
return (
resolved.backend === 'anthropic' ||
resolved.backend === 'gateway' ||
resolved.backend === 'claude-code' ||
resolved.backend === 'codex'
);
}
function hasUsableConfiguredLlm(config: KtxProjectConfig): boolean {
@ -284,7 +307,8 @@ function buildProjectLlmConfig(
provider:
| { backend: 'anthropic'; credentialRef: string }
| { backend: 'vertex'; vertex: { project?: string; location: string } }
| { backend: 'claude-code' },
| { backend: 'claude-code' }
| { backend: 'codex' },
model: string,
): KtxProjectLlmConfig {
if (provider.backend === 'claude-code') {
@ -295,6 +319,14 @@ function buildProjectLlmConfig(
};
}
if (provider.backend === 'codex') {
return {
provider: { backend: 'codex' },
models: { ...existing.models, default: model },
promptCaching: existing.promptCaching,
};
}
if (provider.backend === 'vertex') {
return {
provider: {
@ -515,6 +547,7 @@ async function chooseBackend(
message: 'Which LLM provider should KTX use?',
options: [
{ value: 'claude-code', label: 'Claude subscription (Pro/Max)' },
{ value: 'codex', label: 'Codex subscription' },
{ value: 'anthropic', label: 'Anthropic API key' },
{ value: 'vertex', label: 'Google Vertex AI for Anthropic Claude' },
{ value: 'back', label: 'Back' },
@ -525,7 +558,7 @@ async function chooseBackend(
}
return {
status: 'ready',
backend: choice === 'vertex' || choice === 'claude-code' ? choice : 'anthropic',
backend: choice === 'vertex' || choice === 'claude-code' || choice === 'codex' ? choice : 'anthropic',
prompted: true,
};
}
@ -884,12 +917,51 @@ async function chooseClaudeCodeModel(args: KtxSetupModelArgs, deps: KtxSetupMode
return { status: 'ready', model: choice };
}
async function chooseCodexModel(args: KtxSetupModelArgs, deps: KtxSetupModelDeps): Promise<ChooseModelResult> {
const providedModel = requestedModel(args);
if (providedModel) {
return { status: 'ready', model: providedModel };
}
if (args.inputMode === 'disabled') {
return { status: 'ready', model: DEFAULT_CODEX_MODEL };
}
const prompts = deps.prompts ?? createPromptAdapter();
const choice = await prompts.select({
message: `Which Codex model should KTX use?\n\n${ANTHROPIC_MODEL_PROMPT_CONTEXT}`,
options: [
...CODEX_MODELS.map((model) => ({
value: model.id,
label: model.label,
...(model.recommended ? { hint: 'recommended' } : {}),
})),
{ value: 'manual', label: 'Enter a Codex model ID manually' },
{ value: 'back', label: 'Back' },
],
});
if (choice === 'back') {
return { status: 'back' };
}
if (choice === 'manual') {
const manual = await prompts.text({
message: withTextInputNavigation('Codex model ID'),
placeholder: CODEX_MODELS.find((model) => model.recommended)?.id ?? CODEX_MODELS[0]?.id,
});
if (manual === undefined) {
return { status: 'back' };
}
return manual.trim() ? { status: 'ready', model: manual.trim() } : { status: 'missing-input' };
}
return { status: 'ready', model: choice };
}
async function persistLlmConfig(
projectDir: string,
provider:
| { backend: 'anthropic'; credentialRef: string }
| { backend: 'vertex'; vertex: { project?: string; location: string } }
| { backend: 'claude-code' },
| { backend: 'claude-code' }
| { backend: 'codex' },
model: string,
): Promise<void> {
const project = await loadKtxProject({ projectDir });
@ -1031,6 +1103,32 @@ export async function runKtxSetupAnthropicModelStep(
return { status: 'ready', projectDir: args.projectDir };
}
if (backendChoice.backend === 'codex') {
const model = await chooseCodexModel(backendArgs, deps);
if (model.status === 'back' && backendChoice.prompted) {
attemptArgs = buildInteractiveRetryArgs(args);
continue;
}
if (model.status === 'invalid-credential') {
return { status: 'failed', projectDir: args.projectDir };
}
if (model.status !== 'ready') {
return { status: model.status, projectDir: args.projectDir };
}
const probe = deps.codexAuthProbe ?? runCodexAuthProbe;
const health = await probe({ projectDir: args.projectDir, model: model.model });
if (!health.ok) {
io.stderr.write(`${health.message}\n`);
return { status: 'failed', projectDir: args.projectDir };
}
// Prefix the clack gutter so the warning sits inside the setup frame
// instead of breaking out of it; kept on stderr for scripted runs.
io.stderr.write(`${formatCodexIsolationWarning()}\n`);
await persistLlmConfig(args.projectDir, { backend: 'codex' }, model.model);
io.stdout.write(`│ LLM ready: yes (codex, ${model.model})\n`);
return { status: 'ready', projectDir: args.projectDir };
}
const credential = await chooseCredentialRef(backendArgs, io, deps);
if (credential.status === 'back' && backendChoice.prompted) {
attemptArgs = buildInteractiveRetryArgs(args);