feat: add codex llm backend for ktx runtime work (#253)

* feat: add codex sdk runner foundation * feat: parse codex runtime events * feat: expose codex runtime mcp tools * feat: add codex llm runtime * feat: wire codex llm backend * test: avoid Array.fromAsync in codex runner test * docs: document codex llm backend * fix: tighten codex runtime config ownership * fix: use codex sdk env and thread options * fix: parse codex sdk event shapes * test: add codex backend live smoke * docs: clarify codex backend isolation * fix: drive codex loop metrics from mcp events * fix: enforce codex local step budget * docs: disclose codex isolation limits * fix: count all codex agent steps and stream step callbacks live The agent-loop step budget only counted completed mcp_tool_call items, so built-in command_execution steps (which the public Codex SDK/CLI surface can still expose) never decremented the budget, letting ingest/reconciliation run past stepBudget until Codex stopped on its own. onStepFinish was also replayed only after the whole stream drained, so live work_unit_step / reconciliation progress appeared stuck until the Codex process exited. collectEvents is now the single live step accumulator: it counts every completed agent-action item via a shared isCompletedAgentStep predicate (command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish as each step completes, and enforces the budget on that broader count. A no-tool turn still counts as one step. toolFailures stays MCP-specific, since a non-zero command exit is normal agent exploration, not a loop failure. * test: align ingest llm-guard assertions with codex backend The skip-llm ingest guard message now lists codex as a valid backend and mentions a Claude Code/Codex session plus a codex setup hint, but this slow suite test still asserted the pre-codex wording. Update it to match the production message (already covered by the local-bundle-runtime unit test) and add the codex setup-line assertion. * fix: treat codex error:null tool calls as success The Codex SDK serializes error: null on successful mcp_tool_call items, so the failure check (item.error !== undefined) flagged every successful tool call as failed with the empty-payload default "Codex turn failed". This killed every ingest work unit under the codex backend before it could produce a patch. Key on status === 'failed' (authoritative, always set) and only treat a populated error object as a failure. Add a regression test built from a verbatim real-SDK event capture. * fix: default codex backend to gpt-5.5 and report real probe errors The previous default gpt-5.3-codex is an API-key-only model that the OpenAI API rejects under ChatGPT-account (subscription) auth, so codex status/setup failed with a misleading "authentication is not usable" message even though auth was fine. - Default codex model is now gpt-5.5 (works on both subscription and API-key auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark). - runCodexAuthProbe now distinguishes "model not available" from an auth failure and surfaces the real API error: collectEvents retains stream events when the SDK throws on a non-zero exit, and the API error JSON envelope is unwrapped to its human-readable message. - The Codex isolation warning now renders inside the clack setup frame. - Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth. * fix: require llm.models.default in status and match codex probe remediation Status reported a project ready when a non-none LLM backend was configured without llm.models.default, but the runtime (resolveModelSlots) hard-requires it, so ingest/scan/memory threw after `ktx status` said the project was usable. buildLlmStatus now fails for any non-none backend missing models.default and no longer invents a fallback model for claude-code/codex. Codex probe failures now carry a category-matched fix: a model-access failure steers the user at llm.models.default instead of the auth/install remediation. runCodexAuthProbe returns the fix and status consumes it; the message stays self-sufficient so setup output is unchanged. Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx states --llm-model only accepts codex/default or gpt-*/codex-* ids. Repaired four doctor fixtures that configured a backend without models.default (the now-correctly-blocked config) and added coverage for the new behavior.
2026-06-22 08:38:08 +02:00 · 2026-06-02 13:57:11 +02:00 · 2026-06-02 13:57:11 +02:00 · 494618ab14
commit 494618ab14
parent 74c6076b72
41 changed files with 2544 additions and 30 deletions
--- a/packages/cli/src/context/llm/codex-runtime.ts
+++ b/packages/cli/src/context/llm/codex-runtime.ts
@ -0,0 +1,371 @@
+import { z } from 'zod';
+import { noopLogger, type KtxLogger } from '../core/config.js';
+import { isCompletedAgentStep, summarizeCodexExecEvents, type CodexExecEventSummary } from './codex-exec-events.js';
+import {
+  startCodexRuntimeMcpServer,
+  type CodexRuntimeMcpServerHandle,
+} from './codex-mcp-runtime-server.js';
+import { resolveCodexModel } from './codex-models.js';
+import { buildCodexRuntimeConfig } from './codex-runtime-config.js';
+import { CodexSdkCliRunner, type CodexSdkRunner } from './codex-sdk-runner.js';
+import type {
+  KtxGenerateObjectInput,
+  KtxGenerateTextInput,
+  KtxLlmRuntimePort,
+  KtxRuntimeToolSet,
+  LlmTokenUsage,
+  RunLoopParams,
+  RunLoopResult,
+} from './runtime-port.js';
+
+export interface CodexKtxLlmRuntimeDeps {
+  projectDir: string;
+  modelSlots: { default: string } & Partial<Record<string, string>>;
+  runner?: CodexSdkRunner;
+  startMcpServer?: (input: { projectDir: string; toolSet: KtxRuntimeToolSet }) => Promise<CodexRuntimeMcpServerHandle>;
+  logger?: KtxLogger;
+}
+
+function modelForRole(modelSlots: CodexKtxLlmRuntimeDeps['modelSlots'], role: string): string {
+  return resolveCodexModel(modelSlots[role] ?? modelSlots.default);
+}
+
+function promptWithSystem(system: string | undefined, prompt: string): string {
+  return [system, prompt].filter(Boolean).join('\n\n');
+}
+
+interface CollectCodexEventsOptions {
+  stepBudget?: number;
+  abortController?: AbortController;
+  onStep?: (stepIndex: number) => void | Promise<void>;
+}
+
+interface CollectCodexEventsResult {
+  events: unknown[];
+  budgetExceeded: boolean;
+  streamError?: Error;
+}
+
+function eventRecord(value: unknown): Record<string, unknown> | undefined {
+  return value && typeof value === 'object' ? (value as Record<string, unknown>) : undefined;
+}
+
+function isTurnCompleted(event: unknown): boolean {
+  return eventRecord(event)?.type === 'turn.completed';
+}
+
+/**
+ * Drains the Codex stream once, emitting a step as each agent action completes
+ * so callers see live progress and the step budget is enforced mid-run. Every
+ * completed agent-action item counts (see {@link isCompletedAgentStep}), so
+ * built-in `command_execution` steps decrement the budget the same as
+ * `mcp_tool_call`s. A turn that produced no actions still counts as one step,
+ * matching the metrics summary and the AI SDK backend.
+ */
+async function collectEvents(
+  events: AsyncIterable<unknown>,
+  options: CollectCodexEventsOptions = {},
+): Promise<CollectCodexEventsResult> {
+  const collected: unknown[] = [];
+  let completedSteps = 0;
+  let sawActionStep = false;
+  let budgetExceeded = false;
+  let streamError: Error | undefined;
+
+  // The SDK yields every stdout event, then throws on a non-zero codex exec
+  // exit. Catch that throw so the events already collected (which carry the
+  // real `turn.failed`/`error` reason) survive for the summary; the masked
+  // exit message is kept only as a fallback when no error event was emitted.
+  try {
+    for await (const event of events) {
+      collected.push(event);
+
+      const isActionStep = isCompletedAgentStep(event);
+      if (isActionStep) {
+        sawActionStep = true;
+      } else if (sawActionStep || !isTurnCompleted(event)) {
+        // Only fall back to counting a bare turn as a step when the turn produced
+        // no agent actions; a completed turn is terminal, so it never aborts.
+        continue;
+      }
+
+      completedSteps += 1;
+      await options.onStep?.(completedSteps);
+      if (isActionStep && options.stepBudget !== undefined && completedSteps >= options.stepBudget) {
+        budgetExceeded = true;
+        options.abortController?.abort();
+        break;
+      }
+    }
+  } catch (error) {
+    streamError = error instanceof Error ? error : new Error(String(error));
+  }
+
+  return { events: collected, budgetExceeded, ...(streamError ? { streamError } : {}) };
+}
+
+function metrics(summary: CodexExecEventSummary, startedAt: number): { totalMs: number; usage: LlmTokenUsage } {
+  return { totalMs: Date.now() - startedAt, usage: summary.usage };
+}
+
+function summaryError(summary: CodexExecEventSummary, streamError?: Error): Error | undefined {
+  // A `turn.failed`/`error` event carries the real reason; prefer it over the
+  // SDK's generic non-zero-exit throw. Fall back to the stream error only when
+  // no event explained the failure (e.g. spawn failure or auth before a turn).
+  if (summary.error) {
+    return summary.error;
+  }
+  if (summary.toolFailures.length > 0) {
+    return new Error(`Codex runtime tool call failed: ${summary.toolFailures.join('; ')}`);
+  }
+  return streamError;
+}
+
+function assertSuccessfulText(summary: CodexExecEventSummary, streamError?: Error): string {
+  const error = summaryError(summary, streamError);
+  if (error) {
+    throw error;
+  }
+  if (!summary.finalText.trim()) {
+    throw new Error('Codex completed without an agent message');
+  }
+  return summary.finalText;
+}
+
+function parseStructuredOutput<TOutput, TSchema extends z.ZodType<TOutput>>(schema: TSchema, text: string): TOutput {
+  try {
+    return schema.parse(JSON.parse(text));
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    throw new Error(`Codex structured output failed validation: ${message}`);
+  }
+}
+
+async function mcpForTools(input: {
+  projectDir: string;
+  toolSet?: KtxRuntimeToolSet;
+  startMcpServer: CodexKtxLlmRuntimeDeps['startMcpServer'];
+}): Promise<CodexRuntimeMcpServerHandle | undefined> {
+  if (!input.toolSet || Object.keys(input.toolSet).length === 0) {
+    return undefined;
+  }
+  return (input.startMcpServer ?? startCodexRuntimeMcpServer)({
+    projectDir: input.projectDir,
+    toolSet: input.toolSet,
+  });
+}
+
+function runtimeToolNames(toolSet: KtxRuntimeToolSet | undefined): string[] {
+  return Object.values(toolSet ?? {}).map((descriptor) => descriptor.name);
+}
+
+export class CodexKtxLlmRuntime implements KtxLlmRuntimePort {
+  private readonly runner: CodexSdkRunner;
+  private readonly logger: KtxLogger;
+
+  constructor(private readonly deps: CodexKtxLlmRuntimeDeps) {
+    this.runner = deps.runner ?? new CodexSdkCliRunner();
+    this.logger = deps.logger ?? noopLogger;
+  }
+
+  async generateText(input: KtxGenerateTextInput): Promise<string> {
+    const startedAt = Date.now();
+    const model = modelForRole(this.deps.modelSlots, input.role);
+    const mcp = await mcpForTools({
+      projectDir: this.deps.projectDir,
+      toolSet: input.tools,
+      startMcpServer: this.deps.startMcpServer,
+    });
+    try {
+      const config = buildCodexRuntimeConfig({
+        model,
+        ...(mcp
+          ? {
+              mcp: {
+                url: mcp.url,
+                bearerTokenEnvVar: mcp.bearerTokenEnvVar,
+                bearerToken: mcp.bearerToken,
+                toolNames: runtimeToolNames(input.tools),
+              },
+            }
+          : {}),
+      });
+      const collected = await collectEvents(
+        await this.runner.runStreamed({
+          projectDir: this.deps.projectDir,
+          model,
+          prompt: promptWithSystem(input.system, input.prompt),
+          configOverrides: config.configOverrides,
+          env: config.env,
+        }),
+      );
+      const summary = summarizeCodexExecEvents(collected.events, { startedAt });
+      input.onMetrics?.(metrics(summary, startedAt));
+      return assertSuccessfulText(summary, collected.streamError);
+    } finally {
+      await mcp?.close();
+    }
+  }
+
+  async generateObject<TOutput, TSchema extends z.ZodType<TOutput>>(
+    input: KtxGenerateObjectInput<TOutput, TSchema>,
+  ): Promise<TOutput> {
+    const startedAt = Date.now();
+    const model = modelForRole(this.deps.modelSlots, input.role);
+    const mcp = await mcpForTools({
+      projectDir: this.deps.projectDir,
+      toolSet: input.tools,
+      startMcpServer: this.deps.startMcpServer,
+    });
+    try {
+      const config = buildCodexRuntimeConfig({
+        model,
+        ...(mcp
+          ? {
+              mcp: {
+                url: mcp.url,
+                bearerTokenEnvVar: mcp.bearerTokenEnvVar,
+                bearerToken: mcp.bearerToken,
+                toolNames: runtimeToolNames(input.tools),
+              },
+            }
+          : {}),
+      });
+      const collected = await collectEvents(
+        await this.runner.runStreamed({
+          projectDir: this.deps.projectDir,
+          model,
+          prompt: promptWithSystem(input.system, input.prompt),
+          configOverrides: config.configOverrides,
+          env: config.env,
+          outputSchema: z.toJSONSchema(input.schema, { target: 'draft-7' }) as Record<string, unknown>,
+        }),
+      );
+      const summary = summarizeCodexExecEvents(collected.events, { startedAt });
+      input.onMetrics?.(metrics(summary, startedAt));
+      return parseStructuredOutput(input.schema, assertSuccessfulText(summary, collected.streamError));
+    } finally {
+      await mcp?.close();
+    }
+  }
+
+  async runAgentLoop(params: RunLoopParams): Promise<RunLoopResult> {
+    const startedAt = Date.now();
+    const model = modelForRole(this.deps.modelSlots, params.modelRole);
+    let mcp: CodexRuntimeMcpServerHandle | undefined;
+    try {
+      mcp = await mcpForTools({
+        projectDir: this.deps.projectDir,
+        toolSet: params.toolSet,
+        startMcpServer: this.deps.startMcpServer,
+      });
+      const config = buildCodexRuntimeConfig({
+        model,
+        ...(mcp
+          ? {
+              mcp: {
+                url: mcp.url,
+                bearerTokenEnvVar: mcp.bearerTokenEnvVar,
+                bearerToken: mcp.bearerToken,
+                toolNames: runtimeToolNames(params.toolSet),
+              },
+            }
+          : {}),
+      });
+      const abortController = new AbortController();
+      const onStep = async (stepIndex: number): Promise<void> => {
+        try {
+          await params.onStepFinish?.({ stepIndex, stepBudget: params.stepBudget });
+        } catch (error) {
+          this.logger.warn(
+            `[codex-runner] onStepFinish callback threw; ignoring: ${error instanceof Error ? error.message : String(error)}`,
+          );
+        }
+      };
+      const collected = await collectEvents(
+        await this.runner.runStreamed({
+          projectDir: this.deps.projectDir,
+          model,
+          prompt: promptWithSystem(params.systemPrompt, params.userPrompt),
+          configOverrides: config.configOverrides,
+          env: config.env,
+          signal: abortController.signal,
+        }),
+        { stepBudget: params.stepBudget, abortController, onStep },
+      );
+      const summary = summarizeCodexExecEvents(collected.events, { startedAt });
+      const error = summaryError(summary, collected.streamError);
+      const stopReason = collected.budgetExceeded ? 'budget' : error ? 'error' : summary.stopReason;
+      return {
+        stopReason,
+        ...(stopReason === 'error' && error ? { error } : {}),
+        metrics: {
+          totalMs: Date.now() - startedAt,
+          usage: summary.usage,
+          stepCount: summary.stepCount,
+          stepBoundariesMs: summary.stepBoundariesMs,
+        },
+      };
+    } catch (error) {
+      const err = error instanceof Error ? error : new Error(String(error));
+      return {
+        stopReason: 'error',
+        error: err,
+        metrics: { totalMs: Date.now() - startedAt, usage: {}, stepCount: 0, stepBoundariesMs: [] },
+      };
+    } finally {
+      await mcp?.close();
+    }
+  }
+}
+
+// A rejected model is not an auth failure: Codex authenticated, connected, and
+// the API refused the model id. These markers come from the API error envelope
+// (e.g. "model is not supported", "invalid_request_error").
+const MODEL_UNAVAILABLE_MARKERS =
+  /\bnot supported\b|\bnot available\b|\bdoes not exist\b|invalid_request_error|\bunknown model\b|\bunsupported model\b/i;
+
+function describeCodexProbeFailure(model: string, message: string): { message: string; fix: string } {
+  if (MODEL_UNAVAILABLE_MARKERS.test(message)) {
+    const fix = `Run \`codex\` to see the models your account supports, then set llm.models.default in ktx.yaml (or rerun \`ktx setup\`).`;
+    return {
+      message: `Codex is authenticated, but the configured model "${model}" is not available for this Codex account. ${fix} Details: ${message}`,
+      fix,
+    };
+  }
+  const fix = `Authenticate Codex locally with the Codex CLI, verify the Codex CLI is installed, then rerun setup or \`ktx status\`.`;
+  return {
+    message: `Codex authentication is not usable. ${fix} Details: ${message}`,
+    fix,
+  };
+}
+
+export async function runCodexAuthProbe(input: {
+  projectDir: string;
+  model: string;
+  runner?: CodexSdkRunner;
+}): Promise<{ ok: true } | { ok: false; message: string; fix: string }> {
+  let model: string;
+  try {
+    model = resolveCodexModel(input.model);
+  } catch (error) {
+    return {
+      ok: false,
+      message: error instanceof Error ? error.message : String(error),
+      fix: 'Set llm.models.default in ktx.yaml to a supported codex model (codex, default, or a gpt-* / codex-* id), or rerun `ktx setup`.',
+    };
+  }
+
+  const runtime = new CodexKtxLlmRuntime({
+    projectDir: input.projectDir,
+    modelSlots: { default: model },
+    ...(input.runner ? { runner: input.runner } : {}),
+  });
+  try {
+    await runtime.generateText({ role: 'default', prompt: 'Reply with exactly: ok' });
+    return { ok: true };
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    return { ok: false, ...describeCodexProbeFailure(model, message) };
+  }
+}