feat: add codex llm backend for ktx runtime work (#253)

* feat: add codex sdk runner foundation

* feat: parse codex runtime events

* feat: expose codex runtime mcp tools

* feat: add codex llm runtime

* feat: wire codex llm backend

* test: avoid Array.fromAsync in codex runner test

* docs: document codex llm backend

* fix: tighten codex runtime config ownership

* fix: use codex sdk env and thread options

* fix: parse codex sdk event shapes

* test: add codex backend live smoke

* docs: clarify codex backend isolation

* fix: drive codex loop metrics from mcp events

* fix: enforce codex local step budget

* docs: disclose codex isolation limits

* fix: count all codex agent steps and stream step callbacks live

The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.

collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.

* test: align ingest llm-guard assertions with codex backend

The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.

* fix: treat codex error:null tool calls as success

The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.

Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.

* fix: default codex backend to gpt-5.5 and report real probe errors

The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.

- Default codex model is now gpt-5.5 (works on both subscription and API-key
  auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
  keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
  failure and surfaces the real API error: collectEvents retains stream
  events when the SDK throws on a non-zero exit, and the API error JSON
  envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.

* fix: require llm.models.default in status and match codex probe remediation

Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.

Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.

Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.

Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
This commit is contained in:
Andrey Avtomonov 2026-06-02 13:57:11 +02:00 committed by GitHub
parent 74c6076b72
commit 494618ab14
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
41 changed files with 2544 additions and 30 deletions

View file

@ -51,8 +51,9 @@ prompts.
| Flag | Description |
|------|-------------|
| `--llm-backend <backend>` | LLM backend: `anthropic`, `vertex`, or `claude-code` |
| `--llm-backend <backend>` | LLM backend: `anthropic`, `vertex`, `claude-code`, or `codex` |
| `--llm-backend claude-code` | Use the local Claude Code session for **ktx** LLM calls |
| `--llm-backend codex` | Use local Codex authentication for **ktx** LLM calls |
| `--llm-model <model>` | LLM model ID or backend model alias to validate and save |
| `--anthropic-api-key-env <name>` | Environment variable containing the Anthropic API key |
| `--anthropic-api-key-file <path>` | File containing the Anthropic API key |
@ -62,9 +63,14 @@ prompts.
Choose only one Anthropic credential source. Anthropic credential flags are only
valid with the Anthropic backend; Vertex flags are only valid with the Vertex
backend. The `claude-code` backend uses local Claude Code authentication instead
backend. The `claude-code` and `codex` backends use local authentication instead
of Anthropic API key or Vertex flags. For Claude Code, `--llm-model` accepts
`sonnet`, `opus`, `haiku`, or a full Claude model ID.
`sonnet`, `opus`, `haiku`, or a full Claude model ID. For Codex, `--llm-model`
accepts `codex`, `default`, or a `gpt-*` / `codex-*` model ID such as
`gpt-5.5`; any other value is rejected before the auth probe. Run `codex` to
see the models available to your login, and pick a `gpt-*` / `codex-*` id from
that list. Note that `*-codex` API-billing model IDs (for example
`gpt-5.3-codex`) are not available to ChatGPT-subscription logins.
### Embeddings
@ -191,6 +197,17 @@ ktx setup \
--llm-backend claude-code \
--llm-model opus
# Configure **ktx** to use local Codex authentication for LLM work
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
When you choose `--llm-backend codex`, setup prints a warning if the public
Codex SDK and CLI surface cannot prove full Claude-Code-style isolation. The
backend restricts **ktx** runtime MCP tools to each run, but Codex may still
load user Codex config and built-in command execution or read-only file
capabilities.
```bash
# Script a Postgres connection that reads its URL from the environment
ktx setup \
--project-dir ./analytics \

View file

@ -21,7 +21,7 @@ ktx status [options]
| `--json` | Print JSON output | `false` |
| `-v`, `--verbose` | Show every check, including passing ones | `false` |
| `--validate` | Only validate the `ktx.yaml` schema; skip readiness checks | `false` |
| `--fast` | Skip checks that require external communication (query-history readiness probes and Claude Code auth probe) | `false` |
| `--fast` | Skip checks that require external communication (query-history readiness probes, Claude Code auth probe, and Codex auth probe) | `false` |
| `--no-input` | Disable interactive terminal input | - |
## Examples
@ -39,7 +39,7 @@ ktx status --verbose
# Validate ktx.yaml without running readiness checks
ktx status --validate
# Skip slow probes (query-history readiness, Claude Code auth)
# Skip slow probes (query-history readiness, Claude Code auth, Codex auth)
ktx status --fast
# Check a project from another directory
@ -57,6 +57,16 @@ flow, then rerun `ktx status`. Use `--fast` to skip this probe (useful in CI
or offline contexts); skipped checks render as `-` and carry
`"status": "skipped"` in JSON output.
For `llm.provider.backend: codex`, `ktx status` runs a minimal non-interactive
Codex request. If the probe fails, authenticate Codex locally with the Codex CLI
and verify the Codex CLI installation.
When `llm.provider.backend: codex` is configured, `ktx status` also prints a
warning when the installed public Codex SDK and CLI surface cannot prove full
Claude-Code-style isolation. The warning does not block authenticated Codex
usage, but it marks the project status as partial so you can make an explicit
runtime-isolation decision.
A `Local data` section summarises what the project has accumulated locally:
ingest run counts, last completed timestamp per connection, knowledge page
counts by scope, semantic-layer source and dictionary value counts, and the

View file

@ -376,13 +376,23 @@ llm:
| Field | Type | Default | Purpose |
|-------|------|---------|---------|
| `provider.backend` | `none` \| `anthropic` \| `vertex` \| `gateway` \| `claude-code` | `none` | Selected backend. `none` disables LLM features. `claude-code` uses the local Claude Code session and needs no API key. |
| `provider.backend` | `none` \| `anthropic` \| `vertex` \| `gateway` \| `claude-code` \| `codex` | `none` | Selected backend. `none` disables LLM features. `claude-code` uses the local Claude Code session and needs no API key. `codex` uses local Codex authentication and needs no API key. |
| `provider.anthropic.api_key` | `string` | - | Anthropic API key. Required when `backend: anthropic`. Accepts `env:` or `file:` references. |
| `provider.anthropic.base_url` | `string` | - | Override the Anthropic API base URL (proxy, self-hosted gateway). |
| `provider.gateway.api_key` / `base_url` | `string` | - | Credentials for an AI Gateway provider. Required when `backend: gateway`. |
| `provider.vertex.project` | `string` | - | Google Cloud project ID hosting the Vertex AI endpoint. |
| `provider.vertex.location` | `string` | - | Vertex AI region (for example `us-east5`). Required when the `vertex` block is present. |
Use `codex` when local Codex authentication should power **ktx** LLM work:
```yaml
llm:
provider:
backend: codex
models:
default: gpt-5.5
```
### Model roles
`models` overrides the per-role model. Keys are fixed; values are

View file

@ -39,8 +39,20 @@ ktx ingest --all
Enriched ingest needs a configured model and embeddings. Run `ktx setup` first;
connections without that configuration fail before any work starts.
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools for the
current run.
Local-auth backends keep provider credentials out of `ktx.yaml`:
```bash
ktx setup --llm-backend claude-code --no-input
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools
for the current run. With `codex`, **ktx** restricts the temporary runtime MCP
server to the current run's tool set, disables Codex web search, requests a
read-only sandbox, and sets `approval_policy=never`. The public Codex SDK and
CLI surface may still load user Codex config and built-in command execution or
read-only file capabilities, so use `claude-code` for stricter runtime tool
isolation.
## Query history

View file

@ -16,6 +16,7 @@ Set `llm.provider.backend` to one of these values:
- `gateway`: Use AI Gateway-compatible Anthropic model ids.
- `claude-code`: Use your local Claude Code session through the Claude Agent
SDK. **ktx** strips provider-routing environment variables from child processes.
- `codex`: Use your local Codex authentication through the Codex SDK.
## Claude Code
@ -47,6 +48,42 @@ model IDs are also accepted.
metadata may still list host slash commands, skills, and subagents; **ktx** does not
grant execution access to them.
## Codex backend
Use `codex` when you want **ktx** to run LLM-backed workflows through your
local Codex authentication instead of a direct provider API key.
```yaml
llm:
provider:
backend: codex
models:
default: gpt-5.5
```
Configure it non-interactively:
```bash
ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
```
This is separate from Codex agent-client setup. `ktx setup --agents --target
codex` installs instructions and MCP access for an end-user Codex session.
`ktx setup --llm-backend codex` makes **ktx** itself execute ingest, scan
enrichment, memory, and other LLM-backed work through Codex.
During runtime loops, **ktx** starts a temporary loopback MCP server for the
current run, exposes only the tools passed to that run, asks Codex to use a
read-only sandbox, sets `approval_policy=never`, auto-approves only those
run-scoped MCP tools, and disables Codex web search.
Codex backend isolation is currently limited by the public Codex SDK and CLI
surface. Codex may still load user Codex config and built-in command execution
or read-only file capabilities. Use `llm.provider.backend: claude-code` when
you need stricter Claude-Code-style runtime tool isolation, or remove host
Codex MCP and tool config before running untrusted prompts through the `codex`
backend.
## Prompt caching
`llm.promptCaching` has partial parity on `claude-code`. Status and doctor warn