feat: add codex llm backend for ktx runtime work (#253)

* feat: add codex sdk runner foundation * feat: parse codex runtime events * feat: expose codex runtime mcp tools * feat: add codex llm runtime * feat: wire codex llm backend * test: avoid Array.fromAsync in codex runner test * docs: document codex llm backend * fix: tighten codex runtime config ownership * fix: use codex sdk env and thread options * fix: parse codex sdk event shapes * test: add codex backend live smoke * docs: clarify codex backend isolation * fix: drive codex loop metrics from mcp events * fix: enforce codex local step budget * docs: disclose codex isolation limits * fix: count all codex agent steps and stream step callbacks live The agent-loop step budget only counted completed mcp_tool_call items, so built-in command_execution steps (which the public Codex SDK/CLI surface can still expose) never decremented the budget, letting ingest/reconciliation run past stepBudget until Codex stopped on its own. onStepFinish was also replayed only after the whole stream drained, so live work_unit_step / reconciliation progress appeared stuck until the Codex process exited. collectEvents is now the single live step accumulator: it counts every completed agent-action item via a shared isCompletedAgentStep predicate (command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish as each step completes, and enforces the budget on that broader count. A no-tool turn still counts as one step. toolFailures stays MCP-specific, since a non-zero command exit is normal agent exploration, not a loop failure. * test: align ingest llm-guard assertions with codex backend The skip-llm ingest guard message now lists codex as a valid backend and mentions a Claude Code/Codex session plus a codex setup hint, but this slow suite test still asserted the pre-codex wording. Update it to match the production message (already covered by the local-bundle-runtime unit test) and add the codex setup-line assertion. * fix: treat codex error:null tool calls as success The Codex SDK serializes error: null on successful mcp_tool_call items, so the failure check (item.error !== undefined) flagged every successful tool call as failed with the empty-payload default "Codex turn failed". This killed every ingest work unit under the codex backend before it could produce a patch. Key on status === 'failed' (authoritative, always set) and only treat a populated error object as a failure. Add a regression test built from a verbatim real-SDK event capture. * fix: default codex backend to gpt-5.5 and report real probe errors The previous default gpt-5.3-codex is an API-key-only model that the OpenAI API rejects under ChatGPT-account (subscription) auth, so codex status/setup failed with a misleading "authentication is not usable" message even though auth was fine. - Default codex model is now gpt-5.5 (works on both subscription and API-key auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark). - runCodexAuthProbe now distinguishes "model not available" from an auth failure and surfaces the real API error: collectEvents retains stream events when the SDK throws on a non-zero exit, and the API error JSON envelope is unwrapped to its human-readable message. - The Codex isolation warning now renders inside the clack setup frame. - Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth. * fix: require llm.models.default in status and match codex probe remediation Status reported a project ready when a non-none LLM backend was configured without llm.models.default, but the runtime (resolveModelSlots) hard-requires it, so ingest/scan/memory threw after `ktx status` said the project was usable. buildLlmStatus now fails for any non-none backend missing models.default and no longer invents a fallback model for claude-code/codex. Codex probe failures now carry a category-matched fix: a model-access failure steers the user at llm.models.default instead of the auth/install remediation. runCodexAuthProbe returns the fix and status consumes it; the message stays self-sufficient so setup output is unchanged. Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx states --llm-model only accepts codex/default or gpt-*/codex-* ids. Repaired four doctor fixtures that configured a backend without models.default (the now-correctly-blocked config) and added coverage for the new behavior.
2026-06-22 08:38:08 +02:00 · 2026-06-02 13:57:11 +02:00 · 2026-06-02 13:57:11 +02:00 · 494618ab14
commit 494618ab14
parent 74c6076b72
41 changed files with 2544 additions and 30 deletions
--- a/packages/cli/src/setup-models.ts
+++ b/packages/cli/src/setup-models.ts
@ -3,6 +3,9 @@ import { writeFile } from 'node:fs/promises';
 import { promisify } from 'node:util';
 import { resolveLocalKtxLlmConfig } from './context/llm/local-config.js';
 import { runClaudeCodeAuthProbe } from './context/llm/claude-code-runtime.js';
+import { formatCodexIsolationWarning } from './context/llm/codex-isolation.js';
+import { runCodexAuthProbe } from './context/llm/codex-runtime.js';
+import { DEFAULT_CODEX_MODEL } from './context/llm/codex-models.js';
 import { resolveKtxConfigReference } from './context/core/config-reference.js';
 import { type KtxProjectConfig, type KtxProjectLlmConfig, serializeKtxProjectConfig } from './context/project/config.js';
 import { loadKtxProject } from './context/project/project.js';
@ -56,7 +59,7 @@ export interface AnthropicModelChoice {
  recommended: boolean;
 }

-export type KtxSetupLlmBackend = 'anthropic' | 'vertex' | 'claude-code';
+export type KtxSetupLlmBackend = 'anthropic' | 'vertex' | 'claude-code' | 'codex';

 /** @internal */
 export interface KtxSetupModelPromptAdapter {
@ -82,6 +85,7 @@ export interface KtxSetupModelDeps {
    model: string;
    env?: NodeJS.ProcessEnv;
  }) => Promise<{ ok: true } | { ok: false; message: string }>;
+  codexAuthProbe?: (input: { projectDir: string; model: string }) => Promise<{ ok: true } | { ok: false; message: string }>;
  readGcloudProject?: () => Promise<string | undefined>;
  listGcloudProjects?: () => Promise<GcloudProjectChoice[]>;
  spinner?: () => KtxCliSpinner;
@ -110,6 +114,20 @@ const CLAUDE_CODE_MODELS: AnthropicModelChoice[] = [
  { id: 'haiku', label: 'Claude Haiku', recommended: false },
 ];

+// Curated Codex models from OpenAI's current lineup that work under both
+// ChatGPT-account (subscription) and API-key auth. Intentionally omitted:
+// the `*-codex` ids (e.g. gpt-5.3-codex, gpt-5.2-codex) are API-key-only and
+// fail on ChatGPT-account auth, and gpt-5.3-codex-spark is a ChatGPT-Pro-only
+// research preview. Codex resolves real availability per account at runtime
+// (its binary remote-fetches the model list), so this is a convenience
+// shortlist only — the manual-entry option accepts any id your account's
+// `codex` picker exposes, and the auth probe reports an unsupported choice.
+const CODEX_MODELS: AnthropicModelChoice[] = [
+  { id: 'gpt-5.5', label: 'GPT-5.5', recommended: true },
+  { id: 'gpt-5.4', label: 'GPT-5.4', recommended: false },
+  { id: 'gpt-5.4-mini', label: 'GPT-5.4 mini', recommended: false },
+];
+
 const HIDDEN_ANTHROPIC_MODEL_PATTERNS = [
  /^claude-sonnet-4$/i,
  /^claude-opus-4$/i,
@ -272,7 +290,12 @@ export function isKtxSetupLlmConfigReady(config: KtxProjectLlmConfig): boolean {
    return typeof resolved.vertex?.location === 'string' && resolved.vertex.location.trim().length > 0;
  }

-  return resolved.backend === 'anthropic' || resolved.backend === 'gateway' || resolved.backend === 'claude-code';
+  return (
+    resolved.backend === 'anthropic' ||
+    resolved.backend === 'gateway' ||
+    resolved.backend === 'claude-code' ||
+    resolved.backend === 'codex'
+  );
 }

 function hasUsableConfiguredLlm(config: KtxProjectConfig): boolean {
@ -284,7 +307,8 @@ function buildProjectLlmConfig(
  provider:
    | { backend: 'anthropic'; credentialRef: string }
    | { backend: 'vertex'; vertex: { project?: string; location: string } }
-    | { backend: 'claude-code' },
+    | { backend: 'claude-code' }
+    | { backend: 'codex' },
  model: string,
 ): KtxProjectLlmConfig {
  if (provider.backend === 'claude-code') {
@ -295,6 +319,14 @@ function buildProjectLlmConfig(
    };
  }

+  if (provider.backend === 'codex') {
+    return {
+      provider: { backend: 'codex' },
+      models: { ...existing.models, default: model },
+      promptCaching: existing.promptCaching,
+    };
+  }
+
  if (provider.backend === 'vertex') {
    return {
      provider: {
@ -515,6 +547,7 @@ async function chooseBackend(
    message: 'Which LLM provider should KTX use?',
    options: [
      { value: 'claude-code', label: 'Claude subscription (Pro/Max)' },
+      { value: 'codex', label: 'Codex subscription' },
      { value: 'anthropic', label: 'Anthropic API key' },
      { value: 'vertex', label: 'Google Vertex AI for Anthropic Claude' },
      { value: 'back', label: 'Back' },
@ -525,7 +558,7 @@ async function chooseBackend(
  }
  return {
    status: 'ready',
-    backend: choice === 'vertex' || choice === 'claude-code' ? choice : 'anthropic',
+    backend: choice === 'vertex' || choice === 'claude-code' || choice === 'codex' ? choice : 'anthropic',
    prompted: true,
  };
 }
@ -884,12 +917,51 @@ async function chooseClaudeCodeModel(args: KtxSetupModelArgs, deps: KtxSetupMode
  return { status: 'ready', model: choice };
 }

+async function chooseCodexModel(args: KtxSetupModelArgs, deps: KtxSetupModelDeps): Promise<ChooseModelResult> {
+  const providedModel = requestedModel(args);
+  if (providedModel) {
+    return { status: 'ready', model: providedModel };
+  }
+  if (args.inputMode === 'disabled') {
+    return { status: 'ready', model: DEFAULT_CODEX_MODEL };
+  }
+
+  const prompts = deps.prompts ?? createPromptAdapter();
+  const choice = await prompts.select({
+    message: `Which Codex model should KTX use?\n\n${ANTHROPIC_MODEL_PROMPT_CONTEXT}`,
+    options: [
+      ...CODEX_MODELS.map((model) => ({
+        value: model.id,
+        label: model.label,
+        ...(model.recommended ? { hint: 'recommended' } : {}),
+      })),
+      { value: 'manual', label: 'Enter a Codex model ID manually' },
+      { value: 'back', label: 'Back' },
+    ],
+  });
+  if (choice === 'back') {
+    return { status: 'back' };
+  }
+  if (choice === 'manual') {
+    const manual = await prompts.text({
+      message: withTextInputNavigation('Codex model ID'),
+      placeholder: CODEX_MODELS.find((model) => model.recommended)?.id ?? CODEX_MODELS[0]?.id,
+    });
+    if (manual === undefined) {
+      return { status: 'back' };
+    }
+    return manual.trim() ? { status: 'ready', model: manual.trim() } : { status: 'missing-input' };
+  }
+  return { status: 'ready', model: choice };
+}
+
 async function persistLlmConfig(
  projectDir: string,
  provider:
    | { backend: 'anthropic'; credentialRef: string }
    | { backend: 'vertex'; vertex: { project?: string; location: string } }
-    | { backend: 'claude-code' },
+    | { backend: 'claude-code' }
+    | { backend: 'codex' },
  model: string,
 ): Promise<void> {
  const project = await loadKtxProject({ projectDir });
@ -1031,6 +1103,32 @@ export async function runKtxSetupAnthropicModelStep(
      return { status: 'ready', projectDir: args.projectDir };
    }

+    if (backendChoice.backend === 'codex') {
+      const model = await chooseCodexModel(backendArgs, deps);
+      if (model.status === 'back' && backendChoice.prompted) {
+        attemptArgs = buildInteractiveRetryArgs(args);
+        continue;
+      }
+      if (model.status === 'invalid-credential') {
+        return { status: 'failed', projectDir: args.projectDir };
+      }
+      if (model.status !== 'ready') {
+        return { status: model.status, projectDir: args.projectDir };
+      }
+      const probe = deps.codexAuthProbe ?? runCodexAuthProbe;
+      const health = await probe({ projectDir: args.projectDir, model: model.model });
+      if (!health.ok) {
+        io.stderr.write(`${health.message}\n`);
+        return { status: 'failed', projectDir: args.projectDir };
+      }
+      // Prefix the clack gutter so the warning sits inside the setup frame
+      // instead of breaking out of it; kept on stderr for scripted runs.
+      io.stderr.write(`│  ${formatCodexIsolationWarning()}\n`);
+      await persistLlmConfig(args.projectDir, { backend: 'codex' }, model.model);
+      io.stdout.write(`│  LLM ready: yes (codex, ${model.model})\n`);
+      return { status: 'ready', projectDir: args.projectDir };
+    }
+
    const credential = await chooseCredentialRef(backendArgs, io, deps);
    if (credential.status === 'back' && backendChoice.prompted) {
      attemptArgs = buildInteractiveRetryArgs(args);