feat: add claude-code llm backend with runtime port (#115)

* docs: revise claude-code ingest backend spec * docs: keep claude-code spec focused on ingest * docs: expand claude-code spec to full llm parity * Refine claude-code backend spec after adversarial review iteration 1 * Refine claude-code backend spec after adversarial review iteration 2 * Refine claude-code backend spec after adversarial review iteration 3 * feat: recognize claude-code llm backend * feat: add ktx llm runtime port * feat: add claude-code llm runtime * feat: route non-agent llm calls through runtime * feat: run ingest agents through llm runtime * feat: support claude-code setup and status * test: verify claude-code backend runtime * docs: add claude-code backend v1 runtime plan * fix: close claude-code runtime isolation checks * fix: warn on claude-code prompt caching during setup * chore: verify claude-code v1 closure * docs: add claude-code backend v1 isolation closure plan * fix: update claude-code ingest setup guidance * docs: add claude-code backend v1 ingest guidance closure plan * docs: align claude-code isolation spec with sdk metadata * test: cover claude-code host discovery metadata * fix: tolerate claude-code host discovery metadata * docs: clarify claude-code host discovery metadata * docs: add claude-code auth-probe isolation fix plan * chore: prepare kaelio ktx rc1 release * chore: add semantic release workflow * fix: unblock ci checks * chore(release): 0.1.0-rc.1 * feat: add Claude Code model selection to setup * fix: keep git maintenance attached in local repos
2026-06-22 08:38:08 +02:00 · 2026-05-16 12:06:34 +02:00 · 2026-05-16 12:06:34 +02:00 · b565e44a22
commit b565e44a22
parent e6d578c03f
109 changed files with 10218 additions and 1093 deletions
--- a/docs/release.md
+++ b/docs/release.md
@ -0,0 +1,99 @@
+# KTX release runbook
+
+This runbook covers the maintainer workflow for publishing `@kaelio/ktx` to
+npm through GitHub Actions. The workflow uses semantic-release to choose the
+next version, update release metadata, publish the package, create the GitHub
+release, and commit the release files back to the repository.
+
+## Release channels
+
+KTX has two npm release channels:
+
+- `rc` publishes prereleases such as `0.1.0-rc.2` to the npm `next` tag.
+- `stable` publishes normal releases such as `0.1.0` to the npm `latest` tag.
+
+Run stable releases only from `main`. The workflow rejects stable releases from
+other branches.
+
+## Prerequisites
+
+Before you publish, confirm these requirements:
+
+- The repository has an Actions secret named `NPM_TOKEN`.
+- `NPM_TOKEN` is a granular npm token that can publish `@kaelio/ktx`.
+- The token can publish non-interactively if the npm account or package uses
+  two-factor authentication for writes.
+- The repository has a baseline semantic-release tag for the latest published
+  package version, such as `v0.1.0-rc.1`.
+
+If no baseline tag exists, semantic-release treats the run as the first release
+and may choose a version that doesn't match the currently published package.
+
+## Dry-run a release
+
+Use a dry-run to verify the next version and generated release notes without
+publishing to npm.
+
+1. Open **Actions** in GitHub.
+2. Select **KTX Release**.
+3. Select the branch to release from.
+4. Set **release_kind** to `rc` or `stable`.
+5. Leave **publish_live** set to `false`.
+6. Optional: Set **force_release** to `true` when you need a patch release even
+   if semantic-release doesn't find a releasable commit.
+7. Run the workflow.
+
+The dry-run uses the same semantic-release configuration as a live release. It
+doesn't publish to npm and doesn't commit release files.
+
+## Publish an rc release
+
+Publish an rc release when you need a prerelease package for validation before
+promoting to `latest`.
+
+1. Open **Actions** in GitHub.
+2. Select **KTX Release**.
+3. Select the branch to release from.
+4. Set **release_kind** to `rc`.
+5. Set **publish_live** to `true`.
+6. Optional: Set **force_release** to `true`.
+7. Run the workflow.
+
+The workflow publishes `@kaelio/ktx` with `--access public --tag next`, runs the
+published package smoke test, creates a GitHub release, and commits
+`CHANGELOG.md`, `package.json`, and `release-policy.json`.
+
+## Publish a stable release
+
+Publish a stable release from `main` after you have validated an rc package.
+
+1. Open **Actions** in GitHub.
+2. Select **KTX Release**.
+3. Select `main`.
+4. Set **release_kind** to `stable`.
+5. Set **publish_live** to `true`.
+6. Optional: Set **force_release** to `true`.
+7. Run the workflow.
+
+The workflow publishes `@kaelio/ktx` with `--access public --tag latest`, runs
+the published package smoke test, creates a GitHub release, and commits the
+release metadata.
+
+## Release metadata
+
+semantic-release calls `scripts/update-public-release-version.mjs` during the
+prepare step. That script updates:
+
+- `package.json` with the semantic-release version.
+- `release-policy.json` with `publicNpmPackageVersion`, npm publish settings,
+  and the published package smoke-test version.
+
+The artifact packaging and readiness scripts read `publicNpmPackageVersion`
+from `release-policy.json`, so manual version edits in build scripts aren't
+needed for rc releases.
+
+## Trusted Publishing follow-up
+
+This workflow uses `NPM_TOKEN` today. Move to npm Trusted Publishing after the
+final publish command path is verified for the package manager and workflow
+filename configured in npm package settings.
--- a/docs/superpowers/plans/2026-05-15-claude-code-auth-probe-isolation-fix.md
+++ b/docs/superpowers/plans/2026-05-15-claude-code-auth-probe-isolation-fix.md
@ -0,0 +1,678 @@
+# Claude Code Auth Probe Isolation Fix Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Make the `claude-code` auth probe and runtime tolerate host-discovered
+Claude Code init metadata while preserving KTX-owned tool, MCP, and plugin
+restrictions.
+
+**Architecture:** Keep the existing Claude Code runtime and SDK option tuple.
+Change the init-message assertion from "no host discovery appears" to "only the
+KTX-controlled execution surface is active." Align the design spec and user docs
+with the pinned SDK behavior: `settingSources: []` disables filesystem settings,
+`skills: []` is a context filter, and deny-by-default `canUseTool` is the
+runtime enforcement boundary.
+
+**Tech Stack:** TypeScript, pnpm, Vitest, Markdown, Fumadocs MDX,
+`@anthropic-ai/claude-agent-sdk@0.3.142`.
+
+---
+
+## Audit result
+
+The current strict isolation assertion is a v1-blocking bug. A real authenticated
+Claude Code host can report non-empty `slash_commands`, `skills`, and `agents`
+in the SDK init message even when KTX passes `settingSources: []`, `skills: []`,
+`plugins: []`, `tools: []`, exact KTX MCP `allowedTools`, `disallowedTools`, and
+deny-by-default `canUseTool`.
+
+Spec findings:
+
+- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:45-47`
+  requires host-discovered capabilities not to expand the KTX agent-loop tool
+  surface. That requirement is about invocation, not necessarily about zero
+  diagnostic metadata in the init message.
+- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:254-265`
+  overreaches by asking the implementation to assert that unexpected
+  settings-derived commands, skills, agents, plugins, or MCP servers are
+  inactive from the SDK init message. In `@anthropic-ai/claude-agent-sdk@0.3.142`,
+  the available SDK controls cannot make `message.slash_commands`,
+  `message.skills`, or `message.agents` reliably empty on an authenticated host.
+- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:266-267`
+  says skills are disabled with `skills: []`. The pinned SDK type definitions
+  document `skills` as a context filter, not a sandbox.
+- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:543-545`
+  correctly requires the auth probe to pass the isolation option tuple and no
+  MCP servers. It does not require failing when host discovery metadata is
+  present.
+
+SDK evidence from
+`node_modules/.pnpm/@anthropic-ai+claude-agent-sdk@0.3.142_zod@4.4.3/node_modules/@anthropic-ai/claude-agent-sdk/sdk.d.ts`:
+
+- Lines `1686-1695`: `settingSources: []` disables filesystem settings only.
+- Lines `1697-1718`: `skills: []` is a context filter; unlisted skills are
+  hidden from listing and rejected by the Skill tool, but files remain on disk.
+- Lines `1202-1213`: `allowedTools` is auto-approval, while `canUseTool` is the
+  permission handler for controlling tool execution.
+- Lines `1224-1228`: `disallowedTools` removes listed tools from context and
+  prevents use.
+- Lines `1255-1264`: `tools: []` disables built-in tools.
+- Lines `1545-1558`: `plugins` loads plugins when supplied; KTX supplies `[]`.
+- Lines `3465-3489`: the init message reports `agents`, `tools`,
+  `mcp_servers`, `slash_commands`, `skills`, and `plugins`.
+
+Implemented plan audit:
+
+- `2026-05-15-claude-code-backend-v1-runtime.md` is implemented for config,
+  runtime port, SDK dependency, model aliases, environment scrubbing, Claude Code
+  text/object/agent execution, setup/status/doctor support, docs, and LLM
+  call-site migration.
+- `2026-05-15-claude-code-backend-v1-isolation-closure.md` is implemented, but
+  it converted the spec's ambiguous "assert inactive" line into an impossible
+  assertion against non-empty `slash_commands`, `skills`, and `agents`.
+- `2026-05-15-claude-code-backend-v1-ingest-guidance-closure.md` is implemented
+  for the ingest missing-LLM guidance and associated CLI/context tests.
+
+Remaining v1-blocking gaps:
+
+- `packages/context/src/llm/claude-code-runtime.ts:94-101` throws on
+  host-discovered slash commands, skills, and agents.
+- `packages/context/src/llm/claude-code-runtime.test.ts:158-178` encodes the
+  wrong behavior by requiring the runtime to reject any init message with
+  discovered agents.
+- The auth probe has no regression coverage for an authenticated host whose init
+  message reports non-empty `slash_commands`, `skills`, and `agents`.
+- User docs under `docs-site/content/docs/guides/` say KTX "disables" skills,
+  agents, hooks, and slash commands. That wording is stronger than the SDK
+  contract and must be changed to "not invokable by KTX agent loops."
+
+Non-blocking gaps:
+
+- Same-step AI SDK tool-call repair parity remains out of scope for v1.
+- OTEL telemetry parity remains out of scope for v1.
+- Embedding parity remains out of scope because embeddings are configured
+  separately.
+- Full prompt-caching parity remains out of scope. V1 keeps warning on ignored
+  prompt-cache fields and avoids AI SDK cache markers on the Claude Code path.
+
+Decision:
+
+- Choose option (a): relax the assertion in code and align the spec text. Do not
+  rely on an invented SDK mechanism. The pinned type definitions expose
+  `settingSources`, `skills`, `plugins`, `tools`, `allowedTools`,
+  `disallowedTools`, and `canUseTool`, but they do not expose a query option that
+  disables all host-discovered slash commands or user-level subagent names in the
+  init message.
+
+## File structure
+
+Modify these files:
+
+- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md` aligns the
+  design with the real SDK contract.
+- `packages/context/src/llm/claude-code-runtime.test.ts` adds the failing
+  regression tests for auth probe and runtime init metadata.
+- `packages/context/src/llm/claude-code-runtime.ts` relaxes init metadata checks
+  while tightening exact tool equality.
+- `docs-site/content/docs/guides/llm-configuration.mdx` changes user docs from
+  "disabled" to "not invokable."
+- `docs-site/content/docs/guides/building-context.mdx` applies the same
+  user-facing wording at the ingest guide boundary.
+
+### Task 1: Align the design spec with SDK reality
+
+**Files:**
+
+- Modify: `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md`
+
+- [ ] **Step 1: Update the tool-boundary goal**
+
+Replace the goal bullet at lines `45-47` with:
+
+```markdown
+- Preserve KTX's curated tool boundaries. Claude Code built-ins,
+  filesystem-discovered MCP servers, hooks, skills, plugins, agents, and slash
+  commands must not become invokable in KTX agent loops. The Agent SDK init
+  message may still report host-discovered slash commands, skills, and agents;
+  KTX treats that metadata as diagnostic only and restricts execution through
+  `tools: []`, exact KTX MCP `allowedTools`, `disallowedTools`, and
+  deny-by-default `canUseTool`.
+```
+
+- [ ] **Step 2: Replace the over-broad init assertion requirement**
+
+Replace the bullet at lines `254-265` with:
+
+```markdown
+- Filesystem settings are not loaded. The SDK's documented default for an
+  omitted `settingSources` is `["user", "project", "local"]`
+  (`@anthropic-ai/claude-agent-sdk@0.3.142` `sdk.d.ts:1686-1695`),
+  which would inherit the user's Claude Code filesystem settings. Every KTX
+  `query()` call site - agent loops, text generation, object generation, and
+  the auth probe - MUST pass `settingSources: []` explicitly, along with
+  `skills: []`, `plugins: []`, `tools: []`, `persistSession: false`, and no
+  `mcpServers` entries other than the KTX MCP server (omitted entirely when
+  the call site does not expose tools). The implementation MUST assert from
+  the SDK init message that the controlled execution surface matches KTX's
+  expectations:
+
+  - `message.tools` equals the exact generated KTX MCP tool ids for the current
+    call.
+  - `message.mcp_servers` equals the expected KTX MCP server set: `[]` when the
+    call exposes no tools, or `["ktx"]` when it does.
+  - `message.plugins` is empty.
+
+  The implementation MUST NOT reject a run solely because
+  `message.slash_commands`, `message.skills`, or `message.agents` contain
+  host-discovered names. In `@anthropic-ai/claude-agent-sdk@0.3.142`, those
+  fields can report host discovery even when KTX passes the isolation options.
+  They are not part of the KTX execution surface when `tools: []`,
+  `allowedTools`, `disallowedTools`, and deny-by-default `canUseTool` are set.
+```
+
+- [ ] **Step 3: Replace the skills/plugin wording**
+
+Replace the bullets at lines `266-289` with:
+
+```markdown
+- `skills: []` is a context filter in the pinned SDK
+  (`sdk.d.ts:1697-1718`): unlisted skills are hidden from the model's skill
+  listing and rejected by the Skill tool, but discovered skill names may still
+  appear in init metadata. KTX must still pass `skills: []`.
+- Plugins are disabled with `plugins: []`, and the runtime asserts that
+  `message.plugins` is empty in the init message.
+- Built-in tools are disabled by setting `tools: []`. The pinned SDK type
+  (`@anthropic-ai/claude-agent-sdk@0.3.142`, `sdk.d.ts:1255-1264`) documents
+  `tools` as the base set of built-in tools, with `[]` meaning "disable all
+  built-ins"; `tools` does not accept MCP tool ids and cannot be used to
+  restrict MCP availability.
+- MCP tool availability is granted by registering the KTX MCP server through
+  `mcpServers`. The SDK does not document a wildcard like `mcp__ktx__*` for
+  any tool field; KTX must enumerate exact generated MCP tool ids of the form
+  `mcp__ktx__<toolName>` (derived from the tool map handed to
+  `createSdkMcpServer`) wherever a list of tool ids is required.
+- Pre-approval under `permissionMode: "dontAsk"` is configured by listing those
+  same exact `mcp__ktx__<toolName>` ids in `allowedTools` (documented as
+  auto-allow without prompting). Treat `allowedTools` as auto-approval, not
+  restriction.
+- Defense-in-depth restriction uses `canUseTool`. The KTX runtime supplies a
+  `canUseTool` handler that allows only tool names in the current KTX MCP tool
+  map and denies everything else, so host-discovered slash commands, skills,
+  agents, future SDK defaults, or a misconfigured MCP server cannot expand the
+  execution surface.
+- `disallowedTools` MUST additionally list the current built-in tool names
+  (`Agent`, `Task`, `AskUserQuestion`, `Bash`, `Read`, `Edit`, `Write`, `Glob`,
+  `Grep`, `WebFetch`, `WebSearch`, `TodoWrite`) as redundant insurance.
+```
+
+- [ ] **Step 4: Update auth probe acceptance text**
+
+After the auth probe option list at lines `543-545`, add:
+
+```markdown
+  The auth probe MUST tolerate init messages with non-empty
+  `slash_commands`, `skills`, and `agents` when `message.tools` is empty,
+  `message.mcp_servers` is empty, `message.plugins` is empty, and the query
+  options contain the KTX isolation tuple. Host discovery metadata is not an
+  auth failure.
+```
+
+- [ ] **Step 5: Update verified evidence and open items**
+
+Replace lines `621-623` with:
+
+```markdown
+- The Agent SDK skills docs say the `skills` option is a context filter rather
+  than a sandbox. KTX must pass `skills: []`, but must not assert that
+  `message.skills` is empty in the SDK init message.
+```
+
+Replace open item `8` at lines `648-649` with:
+
+```markdown
+8. Write tests proving a raw built-in Claude Code tool request is denied,
+   host-discovered Skill/Agent/SlashCommand requests are denied by `canUseTool`,
+   and only exact `mcp__ktx__*` tools are allowed during KTX agent loops.
+```
+
+Replace open item `9` at lines `650-654` with:
+
+```markdown
+9. Write a test that asserts every KTX-originated `query()` invocation
+   (agent loop, text generation, object generation, auth probe) is called
+   with `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`, and
+   `persistSession: false`, by spying on the SDK entry point. The test must
+   fail if any path falls back to SDK defaults for those fields. The test must
+   also prove that non-empty host-discovered `slash_commands`, `skills`, and
+   `agents` in the init message do not fail the auth probe or runtime when the
+   controlled tool, MCP server, and plugin surfaces match KTX expectations.
+```
+
+- [ ] **Step 6: Commit the spec alignment**
+
+Run:
+
+```bash
+git add docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
+git commit -m "docs: align claude-code isolation spec with sdk metadata"
+```
+
+Expected: the design spec no longer requires zero host-discovery metadata in
+the SDK init message.
+
+### Task 2: Add regression tests for host-discovered init metadata
+
+**Files:**
+
+- Modify: `packages/context/src/llm/claude-code-runtime.test.ts`
+
+- [ ] **Step 1: Replace the invalid agent rejection test**
+
+In `packages/context/src/llm/claude-code-runtime.test.ts`, replace the test named
+`rejects settings-derived agents and non-KTX MCP servers from init messages`
+with these tests:
+
+```ts
+  it('treats host-discovered commands skills and agents as non-fatal init metadata for text and auth probe', async () => {
+    const hostDiscoveredInit = initMessage({
+      slash_commands: ['/help', '/compact', '/clear', '/user-command'],
+      skills: ['pdf', 'docx'],
+      agents: ['claude', 'Explore', 'general-purpose'],
+    });
+    const textQuery = vi.fn((_input: any) =>
+      stream([hostDiscoveredInit, resultMessage({ result: 'hello' })]),
+    );
+    const runtime = new ClaudeCodeKtxLlmRuntime({
+      projectDir: '/tmp/project',
+      modelSlots: { default: 'sonnet' },
+      query: textQuery,
+      env: { ANTHROPIC_API_KEY: 'sk-ant-test', PATH: '/usr/bin' }, // pragma: allowlist secret
+    });
+
+    await expect(runtime.generateText({ role: 'default', prompt: 'say hello' })).resolves.toBe('hello');
+    const textOptions = textQuery.mock.calls[0][0].options;
+    expect(textOptions).toMatchObject({
+      settingSources: [],
+      skills: [],
+      plugins: [],
+      tools: [],
+      allowedTools: [],
+      permissionMode: 'dontAsk',
+      persistSession: false,
+      env: expect.not.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test' }),
+    });
+    expect(textOptions.disallowedTools).toEqual(expect.arrayContaining(['Agent', 'Task', 'Bash']));
+    expect(await textOptions.canUseTool('Agent', {}, { signal: new AbortController().signal, toolUseID: 'agent' })).toMatchObject({
+      behavior: 'deny',
+      toolUseID: 'agent',
+    });
+    expect(await textOptions.canUseTool('Skill', {}, { signal: new AbortController().signal, toolUseID: 'skill' })).toMatchObject({
+      behavior: 'deny',
+      toolUseID: 'skill',
+    });
+    expect(
+      await textOptions.canUseTool('SlashCommand', {}, { signal: new AbortController().signal, toolUseID: 'slash' }),
+    ).toMatchObject({
+      behavior: 'deny',
+      toolUseID: 'slash',
+    });
+
+    const probeQuery = vi.fn((_input: any) =>
+      stream([hostDiscoveredInit, resultMessage({ result: 'ok' })]),
+    );
+    await expect(
+      runClaudeCodeAuthProbe({
+        projectDir: '/tmp/project',
+        model: 'sonnet',
+        query: probeQuery,
+        env: { ANTHROPIC_AUTH_TOKEN: 'token', HOME: '/Users/test' },
+      }),
+    ).resolves.toEqual({ ok: true });
+    expect(probeQuery.mock.calls[0][0].options).toMatchObject({
+      settingSources: [],
+      skills: [],
+      plugins: [],
+      tools: [],
+      allowedTools: [],
+      permissionMode: 'dontAsk',
+      persistSession: false,
+      env: expect.objectContaining({ HOME: '/Users/test' }),
+    });
+    expect(probeQuery.mock.calls[0][0].options.env).not.toEqual(
+      expect.objectContaining({ ANTHROPIC_AUTH_TOKEN: 'token' }),
+    );
+  });
+
+  it('allows host-discovered context during agent loops while requiring exact KTX MCP tools and servers', async () => {
+    const query = vi.fn((_input: any) =>
+      stream([
+        initMessage({
+          tools: ['mcp__ktx__load_skill'],
+          mcp_servers: [{ name: 'ktx', status: 'connected' }],
+          slash_commands: ['/help', '/compact', '/clear'],
+          skills: ['memory-agent', 'doc-reader'],
+          agents: ['claude', 'Plan', 'Explore'],
+        }),
+        {
+          type: 'assistant',
+          message: { role: 'assistant', content: [] },
+          parent_tool_use_id: null,
+          uuid: '00000000-0000-4000-8000-000000000006',
+          session_id: 'session-id',
+        } as unknown as SDKMessage,
+        resultMessage({ subtype: 'error_max_turns', is_error: true }),
+      ]),
+    );
+    const runtime = new ClaudeCodeKtxLlmRuntime({
+      projectDir: '/tmp/project',
+      modelSlots: { default: 'sonnet' },
+      query,
+      env: {},
+    });
+
+    await expect(
+      runtime.runAgentLoop({
+        modelRole: 'default',
+        systemPrompt: 'system',
+        userPrompt: 'user',
+        toolSet: {
+          load_skill: {
+            name: 'load_skill',
+            description: 'Load skill.',
+            inputSchema: z.object({ name: z.string() }),
+            execute: async () => ({ markdown: 'loaded' }),
+          },
+        },
+        stepBudget: 1,
+        telemetryTags: { operationName: 'test' },
+      }),
+    ).resolves.toEqual({ stopReason: 'budget' });
+
+    const options = query.mock.calls[0][0].options;
+    expect(options.allowedTools).toEqual(['mcp__ktx__load_skill']);
+    expect(await options.canUseTool('mcp__ktx__load_skill', {}, { signal: new AbortController().signal, toolUseID: '1' })).toEqual({
+      behavior: 'allow',
+      toolUseID: '1',
+    });
+    expect(await options.canUseTool('Task', {}, { signal: new AbortController().signal, toolUseID: '2' })).toMatchObject({
+      behavior: 'deny',
+      toolUseID: '2',
+    });
+    expect(await options.canUseTool('Skill', {}, { signal: new AbortController().signal, toolUseID: '3' })).toMatchObject({
+      behavior: 'deny',
+      toolUseID: '3',
+    });
+  });
+
+  it('still rejects unexpected tools, missing KTX tools, plugins, and non-KTX MCP servers from init messages', async () => {
+    const query = vi.fn((_input: any) =>
+      stream([
+        initMessage({
+          tools: ['Bash'],
+          mcp_servers: [{ name: 'filesystem', status: 'connected' }],
+          plugins: [{ name: 'host-plugin', path: '/tmp/plugin' }],
+        }),
+        resultMessage({ result: 'hello' }),
+      ]),
+    );
+    const runtime = new ClaudeCodeKtxLlmRuntime({
+      projectDir: '/tmp/project',
+      modelSlots: { default: 'sonnet' },
+      query,
+      env: {},
+    });
+
+    await expect(
+      runtime.generateText({
+        role: 'default',
+        prompt: 'say hello',
+        tools: {
+          load_skill: {
+            name: 'load_skill',
+            description: 'Load skill.',
+            inputSchema: z.object({ name: z.string() }),
+            execute: async () => ({ markdown: 'loaded' }),
+          },
+        },
+      }),
+    ).rejects.toThrow(
+      /Claude Code runtime isolation failed: .*tools=Bash.*missing_tools=mcp__ktx__load_skill.*mcp_servers=filesystem.*plugins=host-plugin/,
+    );
+  });
+```
+
+- [ ] **Step 2: Run the runtime test to verify it fails**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
+```
+
+Expected: FAIL. The first new test fails because `runClaudeCodeAuthProbe(...)`
+returns `{ ok: false, ... }` and `generateText(...)` rejects when init metadata
+contains non-empty `slash_commands`, `skills`, or `agents`. The second new test
+fails because `runAgentLoop(...)` returns `{ stopReason: 'error', ... }` for the
+same reason.
+
+- [ ] **Step 3: Commit the failing regression test**
+
+Run:
+
+```bash
+git add packages/context/src/llm/claude-code-runtime.test.ts
+git commit -m "test: cover claude-code host discovery metadata"
+```
+
+Expected: the commit contains tests that fail before the runtime assertion is
+fixed.
+
+### Task 3: Relax init metadata assertions to the controlled execution surface
+
+**Files:**
+
+- Modify: `packages/context/src/llm/claude-code-runtime.ts`
+
+- [ ] **Step 1: Replace `assertInitIsolation`**
+
+In `packages/context/src/llm/claude-code-runtime.ts`, replace the full
+`assertInitIsolation(...)` function with:
+
+```ts
+function assertInitIsolation(
+  message: SDKMessage,
+  allowedToolIds: Set<string>,
+  expectedMcpServerNames: Set<string>,
+): void {
+  if (message.type !== 'system' || message.subtype !== 'init') {
+    return;
+  }
+  const activeToolIds = new Set(message.tools);
+  const unexpectedTools = message.tools.filter((toolName) => !allowedToolIds.has(toolName));
+  const missingTools = [...allowedToolIds].filter((toolName) => !activeToolIds.has(toolName));
+  const activeMcpServerNames = message.mcp_servers.map((server) => server.name);
+  const unexpectedMcpServers = activeMcpServerNames.filter((name) => !expectedMcpServerNames.has(name));
+  const missingMcpServers = [...expectedMcpServerNames].filter((name) => !activeMcpServerNames.includes(name));
+  const unexpectedPlugins = message.plugins.map((plugin) => plugin.name);
+  if (
+    unexpectedTools.length > 0 ||
+    missingTools.length > 0 ||
+    unexpectedMcpServers.length > 0 ||
+    missingMcpServers.length > 0 ||
+    unexpectedPlugins.length > 0
+  ) {
+    throw new Error(
+      `Claude Code runtime isolation failed: tools=${unexpectedTools.join(',') || '(none)'} missing_tools=${
+        missingTools.join(',') || '(none)'
+      } mcp_servers=${unexpectedMcpServers.join(',') || '(none)'} missing_mcp_servers=${
+        missingMcpServers.join(',') || '(none)'
+      } plugins=${unexpectedPlugins.join(',') || '(none)'} host_slash_commands=${
+        message.slash_commands.length
+      } host_skills=${message.skills.length} host_agents=${message.agents?.join(',') || '(none)'}`,
+    );
+  }
+}
+```
+
+This preserves strict checks for the KTX-controlled execution surface:
+
+- `message.tools` must exactly equal the generated KTX MCP tool ids for the
+  current call.
+- `message.mcp_servers` must exactly equal the expected KTX MCP server names.
+- `message.plugins` must be empty.
+
+It deliberately stops treating `message.slash_commands`, `message.skills`, and
+`message.agents` as fatal because those fields can contain host-discovered
+metadata that KTX cannot disable through the pinned SDK options.
+
+- [ ] **Step 2: Run the runtime test to verify it passes**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Commit the runtime fix**
+
+Run:
+
+```bash
+git add packages/context/src/llm/claude-code-runtime.ts packages/context/src/llm/claude-code-runtime.test.ts
+git commit -m "fix: tolerate claude-code host discovery metadata"
+```
+
+Expected: the auth probe and runtime no longer fail solely because the SDK init
+message reports host-discovered slash commands, skills, or agents.
+
+### Task 4: Correct user-facing docs wording
+
+**Files:**
+
+- Modify: `docs-site/content/docs/guides/llm-configuration.mdx`
+- Modify: `docs-site/content/docs/guides/building-context.mdx`
+
+- [ ] **Step 1: Update the LLM configuration guide wording**
+
+In `docs-site/content/docs/guides/llm-configuration.mdx`, replace lines `39-41`
+with:
+
+```mdx
+`claude-code` keeps KTX tool boundaries intact. KTX exposes only the MCP tools
+needed for the current KTX agent loop, disables Claude Code built-in tools,
+keeps plugins empty, and denies every non-KTX tool request through
+`canUseTool`. The Claude Agent SDK may still report host-discovered slash
+commands, skills, and subagent names in init metadata; that metadata is not an
+execution grant for KTX agent loops.
+```
+
+- [ ] **Step 2: Update the building context guide wording**
+
+In `docs-site/content/docs/guides/building-context.mdx`, replace lines `61-63`
+with:
+
+```mdx
+When you use `claude-code`, KTX still controls the tool surface for ingest and
+memory capture. Claude Code built-in tools, discovered MCP servers, plugins,
+skills, agents, and slash commands are not invokable by KTX agent loops unless
+they are exact KTX MCP tools for the current run.
+```
+
+- [ ] **Step 3: Run docs tests**
+
+Run:
+
+```bash
+pnpm --filter ktx-docs run test
+```
+
+Expected: PASS.
+
+- [ ] **Step 4: Commit docs wording**
+
+Run:
+
+```bash
+git add docs-site/content/docs/guides/llm-configuration.mdx docs-site/content/docs/guides/building-context.mdx
+git commit -m "docs: clarify claude-code host discovery metadata"
+```
+
+Expected: user docs describe invocation control rather than promising zero
+host-discovery metadata.
+
+### Task 5: Final verification
+
+**Files:**
+
+- Verify: `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md`
+- Verify: `packages/context/src/llm/claude-code-runtime.ts`
+- Verify: `packages/context/src/llm/claude-code-runtime.test.ts`
+- Verify: `docs-site/content/docs/guides/llm-configuration.mdx`
+- Verify: `docs-site/content/docs/guides/building-context.mdx`
+
+- [ ] **Step 1: Run targeted runtime tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts src/llm/runtime-tools.test.ts src/llm/claude-code-env.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 2: Run package type-check**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Run docs verification**
+
+Run:
+
+```bash
+pnpm --filter ktx-docs run test
+```
+
+Expected: PASS.
+
+- [ ] **Step 4: Run dead-code checks**
+
+Run:
+
+```bash
+pnpm run dead-code
+```
+
+Expected: PASS or only pre-existing unrelated findings. Investigate and fix any
+finding caused by the runtime assertion or test changes.
+
+- [ ] **Step 5: Inspect git status**
+
+Run:
+
+```bash
+git status --short
+```
+
+Expected: only files from this plan are modified, or the working tree is clean
+if each task was committed.
+
+## Self-review
+
+- Spec coverage: This plan addresses the v1-blocking auth probe failure,
+  aligns the spec with the SDK contract, preserves the real KTX execution
+  boundary, and adds regression coverage for non-empty host-discovered
+  `slash_commands`, `skills`, and `agents` in both auth probe and runtime paths.
+- Placeholder scan: No placeholder markers remain. Every code-changing step
+  includes exact file paths, code blocks, commands, and expected results.
+- Type consistency: The plan uses existing names from the codebase:
+  `ClaudeCodeKtxLlmRuntime`, `runClaudeCodeAuthProbe`, `initMessage`,
+  `resultMessage`, `assertInitIsolation`, `mcpToolIds`, `KtxRuntimeToolSet`, and
+  `canUseTool`.
--- a/docs/superpowers/plans/2026-05-15-claude-code-backend-v1-ingest-guidance-closure.md
+++ b/docs/superpowers/plans/2026-05-15-claude-code-backend-v1-ingest-guidance-closure.md
@ -0,0 +1,160 @@
+# Claude Code Backend V1 Ingest Guidance Closure Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Make the `ktx ingest` missing-LLM guidance treat `claude-code` as a first-class setup path and restore the CLI ingest test suite.
+
+**Architecture:** Keep the existing Claude Code runtime implementation unchanged. Update the single local-ingest guard message so users see both the local Claude Code setup path and the Anthropic API setup path, then align the context and CLI tests with that user-facing copy.
+
+**Tech Stack:** TypeScript, pnpm, Vitest.
+
+---
+
+## Audit summary
+
+The May 15 Claude Code backend runtime and isolation plans are implemented for
+the core runtime path: config accepts `claude-code`, runtime calls use
+`KtxLlmRuntimePort`, Claude SDK calls pass isolation options and scrubbed env,
+setup/status/doctor validate Claude Code auth, and docs describe the backend.
+
+One v1-blocking issue remains: `packages/context/src/ingest/local-bundle-runtime.ts`
+lists `claude-code` in the missing-LLM guard line but still tells users only to
+"Configure an Anthropic provider." The full CLI ingest test suite currently
+fails because `packages/cli/src/ingest.test.ts` still expects the old provider
+list without `claude-code`. This is v1-blocking because CI is red and the
+fallback guidance is not first-class for the new backend.
+
+Non-blocking gaps from the original spec remain unchanged:
+
+- Same-step AI SDK tool-call repair parity is out of scope for the Claude Code
+  runtime.
+- OTEL telemetry parity is out of scope for the Claude Code runtime.
+- Embedding parity is out of scope because embeddings stay independently
+  configured.
+- Full prompt-caching parity for tools, history, and per-section TTLs is out of
+  scope; v1 only needs no AI SDK cache markers on `claude-code` and explicit
+  warnings for ignored fields.
+
+## File structure
+
+Modify these files:
+
+- `packages/context/src/ingest/local-bundle-runtime.ts` owns the missing-LLM
+  guard message used by local ingest and MCP-triggered ingest.
+- `packages/context/src/ingest/local-bundle-runtime.test.ts` verifies the guard
+  message at the context boundary.
+- `packages/cli/src/ingest.test.ts` verifies the user-facing CLI output.
+
+No `docs-site/` update is required because the existing public docs already
+document `claude-code` setup and ingest behavior; this plan only fixes an
+inline runtime error message.
+
+### Task 1: Update ingest LLM setup guidance
+
+**Files:**
+
+- Modify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
+- Modify: `packages/cli/src/ingest.test.ts`
+- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
+
+- [ ] **Step 1: Update the context guard-message test**
+
+In `packages/context/src/ingest/local-bundle-runtime.test.ts`, replace the
+expected message in `requires an agent runner or configured local ingest LLM`
+with this exact array:
+
+```ts
+[
+  'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
+  'Configure a local Claude Code session or API-backed LLM, then rerun ingest:',
+  `  ktx setup --project-dir ${project.projectDir} --llm-backend claude-code --no-input`,
+  `  ktx setup --project-dir ${project.projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --anthropic-model claude-sonnet-4-6 --no-input`,
+].join('\n')
+```
+
+- [ ] **Step 2: Update the CLI ingest test**
+
+In `packages/cli/src/ingest.test.ts`, replace the stale provider-list
+assertion in `prints provider setup guidance when a skip-llm setup project runs
+ingest` with:
+
+```ts
+expect(runIo.stderr()).toContain(
+  'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
+);
+expect(runIo.stderr()).toContain('Configure a local Claude Code session or API-backed LLM, then rerun ingest:');
+expect(runIo.stderr()).toContain(`ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`);
+expect(runIo.stderr()).toContain(
+  `ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --anthropic-model claude-sonnet-4-6 --no-input`,
+);
+```
+
+- [ ] **Step 3: Run tests to verify the new expectations fail**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts
+pnpm --filter @ktx/cli exec vitest run src/ingest.test.ts
+```
+
+Expected: both suites fail because the source message still says
+`Configure an Anthropic provider, then rerun ingest:` and does not include the
+Claude Code setup command.
+
+- [ ] **Step 4: Update the ingest guard message**
+
+In `packages/context/src/ingest/local-bundle-runtime.ts`, replace
+`localIngestLlmProviderGuardMessage` with:
+
+```ts
+function localIngestLlmProviderGuardMessage(projectDir: string): string {
+  return [
+    'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
+    'Configure a local Claude Code session or API-backed LLM, then rerun ingest:',
+    `  ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`,
+    `  ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --anthropic-model claude-sonnet-4-6 --no-input`,
+  ].join('\n');
+}
+```
+
+- [ ] **Step 5: Run the targeted tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts
+pnpm --filter @ktx/cli exec vitest run src/ingest.test.ts
+```
+
+Expected: both suites pass.
+
+- [ ] **Step 6: Run package type-checks**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+pnpm --filter @ktx/cli run type-check
+```
+
+Expected: both commands pass.
+
+- [ ] **Step 7: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/ingest/local-bundle-runtime.ts packages/context/src/ingest/local-bundle-runtime.test.ts packages/cli/src/ingest.test.ts
+git commit -m "fix: update claude-code ingest setup guidance"
+```
+
+## Self-review
+
+- Spec coverage: This plan closes the only remaining v1-blocking audit finding:
+  ingest setup guidance and CLI test expectations now include `claude-code` as
+  a first-class backend.
+- Placeholder scan: No placeholders remain; every step includes exact paths,
+  code, commands, and expected output.
+- Type consistency: The exact guard string is identical across the source and
+  both test updates.
--- a/docs/superpowers/plans/2026-05-15-claude-code-backend-v1-isolation-closure.md
+++ b/docs/superpowers/plans/2026-05-15-claude-code-backend-v1-isolation-closure.md
@ -0,0 +1,575 @@
+# Claude Code Backend V1 Isolation Closure Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Close the remaining v1-blocking Claude Code backend gaps around SDK
+init isolation assertions and setup-time prompt-caching warnings.
+
+**Architecture:** Keep the existing runtime port and Claude Code runtime. Add
+the missing init-message checks inside the Claude runtime, then share the
+prompt-caching warning formatter between status/doctor and setup so all
+user-facing readiness flows report ignored Claude Code cache knobs consistently.
+
+**Tech Stack:** TypeScript, pnpm, Vitest, Zod, `@anthropic-ai/claude-agent-sdk@0.3.142`.
+
+---
+
+## Audit Summary
+
+The May 15 Claude Code backend v1 plan is mostly implemented. Remaining
+v1-blocking gaps from the original spec are:
+
+- `packages/context/src/llm/claude-code-runtime.ts` asserts init-message tools,
+  slash commands, skills, and plugins, but does not assert `agents` or
+  unexpected `mcp_servers`. The spec requires asserting that settings-derived
+  commands, skills, agents, plugins, and MCP servers are inactive.
+- `packages/cli/src/setup-models.ts` validates Claude Code auth but does not
+  surface ignored `llm.promptCaching` fields during setup. The spec requires
+  setup, status, and doctor to surface ignored prompt-caching fields for the
+  `claude-code` backend. Status and doctor already warn.
+
+Non-blocking gaps:
+
+- Same-step tool-call repair parity remains out of scope for v1.
+- OTEL telemetry parity remains out of scope for v1.
+- Embedding parity remains out of scope because embeddings are configured
+  independently.
+- Full prompt-caching parity for tools, history, and per-section TTLs remains
+  out of scope; v1 only needs explicit warnings and no AI SDK cache markers on
+  the Claude Code path.
+
+## File Structure
+
+Modify these files:
+
+- `packages/context/src/llm/claude-code-runtime.ts` adds complete init-message
+  isolation checks for agents and MCP servers.
+- `packages/context/src/llm/claude-code-runtime.test.ts` adds regression tests
+  for rejected agents/MCP servers, object/agent env scrubbing, and callback
+  error handling.
+- `packages/cli/src/claude-code-prompt-caching.ts` is created as the shared
+  formatter for ignored prompt-caching fields.
+- `packages/cli/src/status-project.ts` imports the shared formatter instead of
+  keeping a local helper.
+- `packages/cli/src/setup-models.ts` emits the shared warning when setup saves
+  `llm.provider.backend: claude-code` and existing prompt-caching fields are
+  present.
+- `packages/cli/src/setup-models.test.ts` covers setup warning output.
+- `packages/cli/src/doctor.test.ts` keeps coverage for doctor output using the
+  shared formatter.
+
+### Task 1: Complete Claude Code init isolation checks
+
+**Files:**
+
+- Modify: `packages/context/src/llm/claude-code-runtime.test.ts`
+- Modify: `packages/context/src/llm/claude-code-runtime.ts`
+
+- [ ] **Step 1: Add failing isolation and runtime behavior tests**
+
+Add these tests inside `describe('ClaudeCodeKtxLlmRuntime', ...)` in
+`packages/context/src/llm/claude-code-runtime.test.ts`:
+
+```ts
+  it('rejects settings-derived agents and non-KTX MCP servers from init messages', async () => {
+    const query = vi.fn((_input: any) =>
+      stream([
+        initMessage({
+          agents: ['project-agent'],
+          mcp_servers: [{ name: 'filesystem', status: 'connected' }],
+        }),
+        resultMessage({ result: 'hello' }),
+      ]),
+    );
+    const runtime = new ClaudeCodeKtxLlmRuntime({
+      projectDir: '/tmp/project',
+      modelSlots: { default: 'sonnet' },
+      query,
+      env: {},
+    });
+
+    await expect(runtime.generateText({ role: 'default', prompt: 'say hello' })).rejects.toThrow(
+      /Claude Code runtime isolation failed: .*mcp_servers=filesystem.*agents=project-agent/,
+    );
+  });
+
+  it('passes scrubbed env to object generation and agent loops', async () => {
+    const schema = z.object({ answer: z.string() });
+    const objectQuery = vi.fn((_input: any) =>
+      stream([initMessage(), resultMessage({ structured_output: { answer: 'yes' } })]),
+    );
+    const objectRuntime = new ClaudeCodeKtxLlmRuntime({
+      projectDir: '/tmp/project',
+      modelSlots: { default: 'sonnet' },
+      query: objectQuery,
+      env: { ANTHROPIC_API_KEY: 'sk-ant-test', AWS_PROFILE: 'prod', PATH: '/usr/bin' }, // pragma: allowlist secret
+    });
+
+    await expect(objectRuntime.generateObject({ role: 'default', prompt: 'json', schema })).resolves.toEqual({
+      answer: 'yes',
+    });
+    expect(objectQuery.mock.calls[0][0].options.env).toEqual(
+      expect.objectContaining({ PATH: '/usr/bin' }),
+    );
+    expect(objectQuery.mock.calls[0][0].options.env).not.toEqual(
+      expect.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test', AWS_PROFILE: 'prod' }), // pragma: allowlist secret
+    );
+
+    const agentQuery = vi.fn((_input: any) =>
+      stream([
+        initMessage({ tools: ['mcp__ktx__load_skill'], mcp_servers: [{ name: 'ktx', status: 'connected' }] }),
+        {
+          type: 'assistant',
+          message: { role: 'assistant', content: [] },
+          parent_tool_use_id: null,
+          uuid: '00000000-0000-4000-8000-000000000004',
+          session_id: 'session-id',
+        } as unknown as SDKMessage,
+        resultMessage({ subtype: 'error_max_turns', is_error: true }),
+      ]),
+    );
+    const agentRuntime = new ClaudeCodeKtxLlmRuntime({
+      projectDir: '/tmp/project',
+      modelSlots: { default: 'sonnet' },
+      query: agentQuery,
+      env: { ANTHROPIC_AUTH_TOKEN: 'token', CLAUDE_CODE_USE_VERTEX: '1', HOME: '/Users/test' },
+    });
+
+    await agentRuntime.runAgentLoop({
+      modelRole: 'default',
+      systemPrompt: 'system',
+      userPrompt: 'user',
+      toolSet: {
+        load_skill: {
+          name: 'load_skill',
+          description: 'Load skill.',
+          inputSchema: z.object({ name: z.string() }),
+          execute: async () => ({ markdown: 'loaded' }),
+        },
+      },
+      stepBudget: 1,
+      telemetryTags: { operationName: 'test' },
+    });
+    expect(agentQuery.mock.calls[0][0].options.env).toEqual(expect.objectContaining({ HOME: '/Users/test' }));
+    expect(agentQuery.mock.calls[0][0].options.env).not.toEqual(
+      expect.objectContaining({ ANTHROPIC_AUTH_TOKEN: 'token', CLAUDE_CODE_USE_VERTEX: '1' }),
+    );
+  });
+
+  it('logs and ignores onStepFinish callback errors', async () => {
+    const query = vi.fn((_input: any) =>
+      stream([
+        initMessage(),
+        {
+          type: 'assistant',
+          message: { role: 'assistant', content: [] },
+          parent_tool_use_id: null,
+          uuid: '00000000-0000-4000-8000-000000000005',
+          session_id: 'session-id',
+        } as unknown as SDKMessage,
+        resultMessage({ subtype: 'success', terminal_reason: 'completed' }),
+      ]),
+    );
+    const logger = {
+      debug: vi.fn(),
+      log: vi.fn(),
+      warn: vi.fn(),
+      error: vi.fn(),
+    };
+    const runtime = new ClaudeCodeKtxLlmRuntime({
+      projectDir: '/tmp/project',
+      modelSlots: { default: 'sonnet' },
+      query,
+      env: {},
+      logger,
+    });
+
+    await expect(
+      runtime.runAgentLoop({
+        modelRole: 'default',
+        systemPrompt: 'system',
+        userPrompt: 'user',
+        toolSet: {},
+        stepBudget: 1,
+        telemetryTags: { operationName: 'test' },
+        onStepFinish: async () => {
+          throw new Error('callback exploded');
+        },
+      }),
+    ).resolves.toEqual({ stopReason: 'natural' });
+    expect(logger.warn).toHaveBeenCalledWith(expect.stringContaining('callback exploded'));
+  });
+```
+
+- [ ] **Step 2: Run the Claude runtime test to verify it fails**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
+```
+
+Expected: FAIL because the new agents/MCP-server isolation test resolves
+successfully instead of throwing.
+
+- [ ] **Step 3: Add expected MCP server metadata and complete init assertions**
+
+In `packages/context/src/llm/claude-code-runtime.ts`, replace
+`assertInitIsolation` and add the helper below it:
+
+```ts
+function assertInitIsolation(
+  message: SDKMessage,
+  allowedToolIds: Set<string>,
+  expectedMcpServerNames: Set<string>,
+): void {
+  if (message.type !== 'system' || message.subtype !== 'init') {
+    return;
+  }
+  const unexpectedTools = message.tools.filter((toolName) => !allowedToolIds.has(toolName));
+  const activeMcpServerNames = message.mcp_servers.map((server) => server.name);
+  const unexpectedMcpServers = activeMcpServerNames.filter((name) => !expectedMcpServerNames.has(name));
+  const missingMcpServers = [...expectedMcpServerNames].filter((name) => !activeMcpServerNames.includes(name));
+  const unexpectedAgents = message.agents ?? [];
+  if (
+    unexpectedTools.length > 0 ||
+    unexpectedMcpServers.length > 0 ||
+    missingMcpServers.length > 0 ||
+    message.slash_commands.length > 0 ||
+    message.skills.length > 0 ||
+    message.plugins.length > 0 ||
+    unexpectedAgents.length > 0
+  ) {
+    throw new Error(
+      `Claude Code runtime isolation failed: tools=${unexpectedTools.join(',') || '(none)'} mcp_servers=${
+        unexpectedMcpServers.join(',') || '(none)'
+      } missing_mcp_servers=${missingMcpServers.join(',') || '(none)'} slash_commands=${
+        message.slash_commands.length
+      } skills=${message.skills.length} plugins=${message.plugins.length} agents=${
+        unexpectedAgents.join(',') || '(none)'
+      }`,
+    );
+  }
+}
+
+function expectedMcpServerNames(tools: KtxRuntimeToolSet | undefined): Set<string> {
+  return tools && Object.keys(tools).length > 0 ? new Set(['ktx']) : new Set();
+}
+```
+
+Update `collectResult` parameters:
+
+```ts
+async function collectResult(params: {
+  query: QueryFn;
+  prompt: string;
+  options: Options;
+  allowedToolIds: Set<string>;
+  expectedMcpServerNames: Set<string>;
+  onAssistantTurn?: () => Promise<void>;
+}): Promise<SDKResultMessage> {
+  let result: SDKResultMessage | undefined;
+  for await (const message of params.query({ prompt: params.prompt, options: params.options })) {
+    assertInitIsolation(message, params.allowedToolIds, params.expectedMcpServerNames);
+```
+
+Update the four `collectResult(...)` calls:
+
+```ts
+    const tools = input.tools ?? {};
+    const result = await collectResult({
+      query: this.runQuery,
+      prompt: [input.system, input.prompt].filter(Boolean).join('\n\n'),
+      options,
+      allowedToolIds: new Set(mcpToolIds(tools)),
+      expectedMcpServerNames: expectedMcpServerNames(input.tools),
+    });
+```
+
+For `runAgentLoop(...)`, use:
+
+```ts
+      const result = await collectResult({
+        query: this.runQuery,
+        prompt: params.userPrompt,
+        options: { ...options, systemPrompt: params.systemPrompt },
+        allowedToolIds: new Set(mcpToolIds(params.toolSet)),
+        expectedMcpServerNames: expectedMcpServerNames(params.toolSet),
+        onAssistantTurn: async () => {
+```
+
+For `runClaudeCodeAuthProbe(...)`, use:
+
+```ts
+    const result = await collectResult({
+      query: input.query ?? defaultQuery,
+      prompt: 'Reply with exactly: ok',
+      options,
+      allowedToolIds: new Set(),
+      expectedMcpServerNames: new Set(),
+    });
+```
+
+- [ ] **Step 4: Run the Claude runtime test to verify it passes**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/llm/claude-code-runtime.ts packages/context/src/llm/claude-code-runtime.test.ts
+git commit -m "fix: close claude-code runtime isolation checks"
+```
+
+### Task 2: Surface Claude Code prompt-caching warnings during setup
+
+**Files:**
+
+- Create: `packages/cli/src/claude-code-prompt-caching.ts`
+- Modify: `packages/cli/src/status-project.ts`
+- Modify: `packages/cli/src/setup-models.ts`
+- Modify: `packages/cli/src/setup-models.test.ts`
+- Modify: `packages/cli/src/doctor.test.ts`
+
+- [ ] **Step 1: Add failing setup warning test**
+
+Add this test to `packages/cli/src/setup-models.test.ts`:
+
+```ts
+  it('warns during Claude Code setup when existing prompt-caching fields will be ignored', async () => {
+    await writeFile(
+      join(tempDir, 'ktx.yaml'),
+      [
+        'llm:',
+        '  provider:',
+        '    backend: anthropic',
+        '  models:',
+        '    default: claude-sonnet-4-6',
+        '  promptCaching:',
+        '    enabled: true',
+        '    systemTtl: 1h',
+        '    toolsTtl: 1h',
+        '    historyTtl: 5m',
+        '',
+      ].join('\n'),
+      'utf-8',
+    );
+    const io = makeIo();
+
+    const result = await runKtxSetupAnthropicModelStep(
+      {
+        projectDir: tempDir,
+        inputMode: 'disabled',
+        llmBackend: 'claude-code',
+        skipLlm: false,
+      },
+      io.io,
+      {
+        claudeCodeAuthProbe: async () => ({ ok: true as const }),
+      },
+    );
+
+    expect(result.status).toBe('ready');
+    expect(io.stderr()).toContain('claude-code ignores llm.promptCaching.systemTtl');
+    expect(io.stderr()).toContain('Claude Agent SDK does not expose KTX prompt-cache TTL, tool, or history markers');
+  });
+```
+
+- [ ] **Step 2: Run setup tests to verify the new test fails**
+
+Run:
+
+```bash
+pnpm --filter @ktx/cli exec vitest run src/setup-models.test.ts
+```
+
+Expected: FAIL because setup does not emit the ignored prompt-caching warning.
+
+- [ ] **Step 3: Create the shared prompt-caching warning helper**
+
+Create `packages/cli/src/claude-code-prompt-caching.ts`:
+
+```ts
+import type { KtxProjectLlmConfig } from '@ktx/context/project';
+
+const CLAUDE_CODE_IGNORED_PROMPT_CACHING_FIELDS = [
+  'systemTtl',
+  'toolsTtl',
+  'historyTtl',
+  'vertexFallbackTo5m',
+] as const;
+
+export function ignoredClaudeCodePromptCachingFields(config: KtxProjectLlmConfig): string[] {
+  if (config.provider.backend !== 'claude-code' || !config.promptCaching) {
+    return [];
+  }
+  return CLAUDE_CODE_IGNORED_PROMPT_CACHING_FIELDS.filter((key) => key in config.promptCaching).map(
+    (key) => `llm.promptCaching.${key}`,
+  );
+}
+
+export function formatClaudeCodePromptCachingWarning(fields: string[]): string | null {
+  if (fields.length === 0) {
+    return null;
+  }
+  return `claude-code ignores ${fields.join(', ')} because the Claude Agent SDK does not expose KTX prompt-cache TTL, tool, or history markers.`;
+}
+
+export function formatClaudeCodePromptCachingFix(): string {
+  return 'Remove those promptCaching fields or use anthropic, vertex, or gateway when those cache knobs are required.';
+}
+```
+
+- [ ] **Step 4: Update status/doctor to use the shared helper**
+
+In `packages/cli/src/status-project.ts`, add:
+
+```ts
+import {
+  formatClaudeCodePromptCachingFix,
+  formatClaudeCodePromptCachingWarning,
+  ignoredClaudeCodePromptCachingFields,
+} from './claude-code-prompt-caching.js';
+```
+
+Delete the local `ignoredClaudeCodePromptCachingFields(...)` function.
+
+Replace the warning block in `buildWarnings(...)` with:
+
+```ts
+  const warning = formatClaudeCodePromptCachingWarning(ignoredClaudeCodePromptCachingFields(config.llm));
+  if (warning) {
+    warnings.push({
+      message: warning,
+      fix: formatClaudeCodePromptCachingFix(),
+    });
+  }
+```
+
+- [ ] **Step 5: Emit the setup warning before persisting Claude Code config**
+
+In `packages/cli/src/setup-models.ts`, add:
+
+```ts
+import {
+  formatClaudeCodePromptCachingWarning,
+  ignoredClaudeCodePromptCachingFields,
+} from './claude-code-prompt-caching.js';
+```
+
+Inside the `backendChoice.backend === 'claude-code'` branch, immediately before
+`await persistLlmConfig(...)`, add:
+
+```ts
+      const warning = formatClaudeCodePromptCachingWarning(
+        ignoredClaudeCodePromptCachingFields(buildProjectLlmConfig(project.config.llm, { backend: 'claude-code' }, model)),
+      );
+      if (warning) {
+        io.stderr.write(`${warning}\n`);
+      }
+```
+
+- [ ] **Step 6: Run CLI tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/cli exec vitest run src/setup-models.test.ts src/doctor.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 7: Commit**
+
+Run:
+
+```bash
+git add packages/cli/src/claude-code-prompt-caching.ts packages/cli/src/status-project.ts packages/cli/src/setup-models.ts packages/cli/src/setup-models.test.ts packages/cli/src/doctor.test.ts
+git commit -m "fix: warn on claude-code prompt caching during setup"
+```
+
+### Task 3: Final verification
+
+**Files:**
+
+- Verify: `packages/context/src/llm/claude-code-runtime.ts`
+- Verify: `packages/cli/src/setup-models.ts`
+- Verify: `packages/cli/src/status-project.ts`
+
+- [ ] **Step 1: Run targeted tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts src/llm/runtime-tools.test.ts src/llm/claude-code-env.test.ts src/llm/claude-code-models.test.ts src/llm/runtime-local-config.test.ts
+pnpm --filter @ktx/cli exec vitest run src/setup-models.test.ts src/doctor.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 2: Run package type-checks**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+pnpm --filter @ktx/cli run type-check
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Run the LLM boundary audit**
+
+Run:
+
+```bash
+rg -n "generateKtxText\\(|generateKtxObject\\(|new AgentRunnerService\\(|AgentRunnerService\\b|llmProvider\\b|getModel\\(|getModelByName\\(" packages/context/src packages/cli/src packages/llm/src --glob '!**/*.test.ts'
+```
+
+Expected: remaining matches are limited to:
+
+- `packages/llm/src/**`
+- `packages/context/src/llm/ai-sdk-runtime.ts`
+- `packages/context/src/llm/local-config.ts`
+- `packages/context/src/agent/agent-runner.service.ts`
+- type/export declarations that intentionally preserve the AI SDK adapter
+  boundary.
+
+- [ ] **Step 4: Run dead-code check**
+
+Run:
+
+```bash
+pnpm run dead-code
+```
+
+Expected: PASS or only pre-existing unrelated findings. Investigate and fix
+any finding caused by the new helper file.
+
+- [ ] **Step 5: Commit verification cleanup if needed**
+
+If verification required small cleanup, run:
+
+```bash
+git add packages/context/src/llm/claude-code-runtime.ts packages/context/src/llm/claude-code-runtime.test.ts packages/cli/src/claude-code-prompt-caching.ts packages/cli/src/status-project.ts packages/cli/src/setup-models.ts packages/cli/src/setup-models.test.ts packages/cli/src/doctor.test.ts
+git commit -m "chore: verify claude-code v1 closure"
+```
+
+If no files changed after verification, skip this commit.
+
+## Self-Review
+
+- Spec coverage: The plan closes the remaining v1-blocking isolation assertion
+  and setup-warning requirements from the original spec.
+- Placeholder scan: No placeholders remain; every task includes file paths,
+  code, commands, and expected output.
+- Type consistency: The helper names and runtime function signatures are used
+  consistently across tasks.
--- a/docs/superpowers/plans/2026-05-15-claude-code-backend-v1-runtime.md
+++ b/docs/superpowers/plans/2026-05-15-claude-code-backend-v1-runtime.md
--- a/docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
+++ b/docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
@ -0,0 +1,698 @@
+# Brainstorm: `claude-code` backend with full KTX LLM parity
+
+Adds a `claude-code` backend that gives KTX full parity with the existing
+`ANTHROPIC_API_KEY`-based `anthropic` backend for **all KTX LLM calls**. The
+backend uses `@anthropic-ai/claude-agent-sdk` and reuses the user's existing
+local Claude Code authentication. Users select it in `ktx.yaml`.
+
+This is not an implementation plan. It is the revised design after expanding
+the requirement from "`ktx ingest` works with Claude Code" to "every KTX LLM
+call works with Claude Code." The follow-up implementation plan should be
+written separately.
+
+## Core decision
+
+`claude-code` is a first-class global LLM backend. Any code path that currently
+works with `llm.provider.backend: anthropic` must work with
+`llm.provider.backend: claude-code`, unless it is not an LLM call at all.
+
+This includes:
+
+- Agent loops implemented through `AgentRunnerService.runLoop(...)`.
+- Text generation through `generateKtxText(...)`.
+- Structured object generation through `generateKtxObject(...)`.
+- Local ingest and MCP-triggered local ingest flows.
+- Page triage and light extraction.
+- Context-candidate curation and reconciliation.
+- Memory capture.
+- Scan/enrichment internals and relationship LLM proposals.
+- Future KTX LLM call sites that use the shared runtime boundary.
+
+Commands that do not use LLMs do not need special Claude Code behavior. There
+must be no silent fallback from `claude-code` to gateway, Anthropic API-key
+execution, or deterministic output.
+
+## Goals
+
+- Let a KTX user run all KTX LLM-backed behavior through their existing local
+  Claude Code session without provisioning `ANTHROPIC_API_KEY`, Vertex
+  credentials, or an AI Gateway key.
+- Preserve the existing user-facing CLI and MCP behavior. `claude-code` changes
+  how LLM calls execute, not which KTX workflows exist.
+- Preserve role-based model selection. `llm.models.default`, `triage`,
+  `candidateExtraction`, `curator`, `reconcile`, and `repair` remain the source
+  of model selection for every LLM call.
+- Preserve KTX's curated tool boundaries. Claude Code built-ins,
+  filesystem-discovered MCP servers, hooks, skills, plugins, agents, and slash
+  commands must not become invokable in KTX agent loops. The Agent SDK init
+  message may still report host-discovered slash commands, skills, and agents;
+  KTX treats that metadata as diagnostic only and restricts execution through
+  `tools: []`, exact KTX MCP `allowedTools`, `disallowedTools`, and
+  deny-by-default `canUseTool`.
+- Keep embeddings independent. Claude does not provide embeddings; users keep
+  configuring `ingest.embeddings` and scan/enrichment embeddings as they do
+  today.
+- Fail fast with a clear message if local Claude Code authentication is not
+  usable.
+
+## Non-goals
+
+- **Embedding parity.** Embeddings remain separate from LLM execution.
+- **Tool-call repair parity in the first pass.** The AI SDK runner uses
+  `experimental_repairToolCall` (`packages/llm/src/repair.ts:35-88`). The Claude
+  Agent SDK has no transparent same-step repair hook. MVP behavior is next-turn
+  self-correction from schema errors or a normal tool-failure count.
+- **OTEL telemetry parity in the first pass.** The AI SDK runner uses
+  `experimental_telemetry`. The Agent SDK exposes hooks such as
+  `PostToolUseFailure` and `SessionEnd`, but no drop-in OTEL switch. MVP ships
+  without telemetry parity on this backend.
+- **Productizing Claude subscription limits.** Documentation must frame this as
+  "use your own local Claude Code session," not as a third-party Claude Max or
+  Claude.ai product feature.
+
+## Approaches considered
+
+### Recommended: global LLM runtime port
+
+Introduce a backend-neutral KTX LLM runtime port for operations, not just model
+construction:
+
+```ts
+interface KtxLlmRuntimePort {
+  generateText(input: KtxGenerateTextInput): Promise<string>;
+  generateObject<T>(input: KtxGenerateObjectInput<T>): Promise<T>;
+  runAgentLoop(params: RunLoopParams): Promise<RunLoopResult>;
+}
+```
+
+The existing `anthropic`, `vertex`, and `gateway` backends implement the runtime
+through the AI SDK and existing `KtxLlmProvider`. The new `claude-code` backend
+implements the same runtime through `@anthropic-ai/claude-agent-sdk`.
+
+This is the recommended approach because KTX call sites need operations:
+"generate text," "generate a structured object," and "run an agent loop." They
+do not inherently need direct access to an AI SDK `LanguageModel`. The Agent SDK
+is a session/agent API, not an AI SDK model factory, so the runtime port avoids
+pretending those APIs are the same.
+
+### Rejected: fake AI SDK `LanguageModel` for Claude Code
+
+Trying to make Claude Code look like an AI SDK `LanguageModel` would be brittle.
+The Agent SDK owns session execution, permissions, MCP tools, structured output,
+and result messages. Those semantics do not map cleanly onto a normal
+`getModel(...)` return value.
+
+### Rejected: branch at every call site
+
+Adding `if backend === "claude-code"` around each LLM call would work briefly
+but would duplicate prompt wrapping, structured output handling, debug logging,
+tool conversion, auth checks, and error mapping. It would also make future LLM
+call sites easy to miss.
+
+## Architecture
+
+```text
+ktx.yaml
+  llm.provider.backend: anthropic | vertex | gateway | claude-code
+  llm.models.<role>: model alias or model ID
+
+createLocalKtxLlmRuntimeFromConfig(project.config.llm)
+  -> AiSdkKtxLlmRuntime
+     - wraps existing KtxLlmProvider
+     - generateText / Output.object / AgentRunnerService
+  -> ClaudeCodeKtxLlmRuntime
+     - uses @anthropic-ai/claude-agent-sdk query()
+     - implements text, object, and agent-loop operations
+
+All KTX LLM call sites
+  -> KtxLlmRuntimePort
+```
+
+The runtime is selected at the same boundaries that currently construct an
+`llmProvider` or `AgentRunnerService`:
+
+- `packages/context/src/llm/local-config.ts`
+- `packages/context/src/ingest/local-bundle-runtime.ts`
+- `packages/context/src/memory/local-memory.ts`
+- `packages/context/src/scan/local-scan.ts`
+- `packages/context/src/mcp/local-project-ports.ts`
+- Any CLI setup/status/doctor code that validates LLM readiness
+
+After the change, services should not need to know whether the configured
+backend is AI SDK based or Claude Code based. They call the runtime operation
+they need.
+
+## LLM call-site migration
+
+The implementation plan must migrate every current KTX LLM call site to the
+runtime port:
+
+- `packages/context/src/llm/generation.ts`: `generateKtxText` and
+  `generateKtxObject` become runtime-backed helpers or are folded into the
+  runtime.
+- `packages/context/src/agent/agent-runner.service.ts`: the AI SDK agent loop
+  becomes the AI SDK implementation of `runAgentLoop`.
+- `packages/context/src/ingest/page-triage/page-triage.service.ts`: page triage
+  and light extraction depend on `KtxLlmRuntimePort`, not raw `KtxLlmProvider`.
+- `packages/context/src/scan/description-generation.ts`: AI descriptions use
+  the runtime text-generation operation.
+- `packages/context/src/scan/relationship-llm-proposal.ts`: relationship
+  proposals use the runtime object-generation operation.
+- `packages/context/src/ingest/stages/stage-3-work-units.ts`,
+  `packages/context/src/ingest/stages/stage-4-reconciliation.ts`,
+  `packages/context/src/ingest/context-candidates/curator-pagination.service.ts`,
+  and `packages/context/src/memory/memory-agent.service.ts`: agent loops use the
+  runtime agent-loop operation or a thin `AgentRunnerPort` backed by it.
+- Test helpers and MCP local project ports that inject `llmProvider` or
+  `agentRunner` must either inject the runtime port or use compatibility test
+  adapters during the migration.
+
+The plan must include a grep-based audit so new or overlooked `getModel(...)`,
+`generateKtxText(...)`, `generateKtxObject(...)`, `AgentRunnerService`, and
+`llmProvider` usages are either migrated or explicitly proven non-runtime.
+
+## Config design
+
+The config should make `claude-code` a first-class backend:
+
+```yaml
+llm:
+  provider:
+    backend: claude-code
+  models:
+    default: sonnet
+    triage: haiku
+    candidateExtraction: sonnet
+    curator: sonnet
+    reconcile: sonnet
+    repair: sonnet
+```
+
+Implementation implications:
+
+- Extend `KTX_LLM_BACKENDS` in `packages/context/src/project/config.ts` and
+  `KtxLlmBackend` in `packages/llm/src/types.ts`.
+- Update setup, status, doctor, schema generation, examples, and docs so
+  `claude-code` is understood everywhere `anthropic` is understood.
+- Update `createKtxLlmProvider` / `createModelFactory` so unsupported backend
+  values throw instead of falling through to gateway.
+- Keep `llm.models` as the per-role binding source. The Claude Code runtime maps
+  each KTX role to the configured model string for the current call.
+- Define accepted model aliases, such as `sonnet`, `opus`, and `haiku`, and full
+  model IDs supported by the pinned SDK version.
+
+## Claude Agent SDK runtime behavior
+
+Every Agent SDK call must be isolated enough for KTX execution. Use explicit
+options even when SDK defaults currently match the desired value.
+
+For agent loops with tools:
+
+```ts
+query({
+  prompt,
+  options: {
+    cwd: project.projectDir,
+    systemPrompt,
+    model: resolveModel(modelRole),
+    maxTurns: stepBudget,
+    settingSources: [],
+    skills: [],
+    plugins: [],
+    mcpServers: { ktx: createSdkMcpServer({ name: "ktx", tools }) },
+    tools: [],
+    allowedTools: [/* exact mcp__ktx__<toolName> ids generated from the tool map */],
+    canUseTool: ktxCanUseTool,
+    permissionMode: "dontAsk",
+    persistSession: false,
+    env: ktxClaudeCodeEnv
+  }
+});
+```
+
+`ktxClaudeCodeEnv` is the controlled environment described in
+"Agent SDK environment and auth boundary" below; it must be passed on every
+KTX `query()` call.
+
+For plain text generation:
+
+- Use the same `query()` runtime with `maxTurns: 1`.
+- Pass `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`,
+  `permissionMode: "dontAsk"`, `persistSession: false`, and
+  `env: ktxClaudeCodeEnv`.
+- Do not expose MCP tools unless the KTX call explicitly passed tools.
+- Return the final result message text.
+
+For structured object generation:
+
+- Use the same `query()` runtime with the Agent SDK structured output option
+  for JSON schema output, plus the same isolation tuple including
+  `env: ktxClaudeCodeEnv`.
+- Convert KTX Zod schemas at the runtime boundary.
+- Parse and validate the returned object with the original KTX schema before
+  returning it to the caller.
+
+The plan must confirm the exact option names against the pinned SDK version, but
+the required outcome is fixed:
+
+- Filesystem settings are not loaded. The SDK's documented default for an
+  omitted `settingSources` is `["user", "project", "local"]`
+  (`@anthropic-ai/claude-agent-sdk@0.3.142` `sdk.d.ts:1686-1695`),
+  which would inherit the user's Claude Code filesystem settings. Every KTX
+  `query()` call site - agent loops, text generation, object generation, and
+  the auth probe - MUST pass `settingSources: []` explicitly, along with
+  `skills: []`, `plugins: []`, `tools: []`, `persistSession: false`, and no
+  `mcpServers` entries other than the KTX MCP server (omitted entirely when
+  the call site does not expose tools). The implementation MUST assert from
+  the SDK init message that the controlled execution surface matches KTX's
+  expectations:
+
+  - `message.tools` equals the exact generated KTX MCP tool ids for the current
+    call.
+  - `message.mcp_servers` equals the expected KTX MCP server set: `[]` when the
+    call exposes no tools, or `["ktx"]` when it does.
+  - `message.plugins` is empty.
+
+  The implementation MUST NOT reject a run solely because
+  `message.slash_commands`, `message.skills`, or `message.agents` contain
+  host-discovered names. In `@anthropic-ai/claude-agent-sdk@0.3.142`, those
+  fields can report host discovery even when KTX passes the isolation options.
+  They are not part of the KTX execution surface when `tools: []`,
+  `allowedTools`, `disallowedTools`, and deny-by-default `canUseTool` are set.
+- `skills: []` is a context filter in the pinned SDK
+  (`sdk.d.ts:1697-1718`): unlisted skills are hidden from the model's skill
+  listing and rejected by the Skill tool, but discovered skill names may still
+  appear in init metadata. KTX must still pass `skills: []`.
+- Plugins are disabled with `plugins: []`, and the runtime asserts that
+  `message.plugins` is empty in the init message.
+- Built-in tools are disabled by setting `tools: []`. The pinned SDK type
+  (`@anthropic-ai/claude-agent-sdk@0.3.142`, `sdk.d.ts`) documents `tools` as
+  the base set of built-in tools, with `[]` meaning "disable all built-ins";
+  `tools` does not accept MCP tool ids and cannot be used to restrict MCP
+  availability.
+- MCP tool availability is granted by registering the KTX MCP server through
+  `mcpServers`. The SDK does not document a wildcard like `mcp__ktx__*` for
+  any tool field; KTX must enumerate exact generated MCP tool ids of the form
+  `mcp__ktx__<toolName>` (derived from the tool map handed to
+  `createSdkMcpServer`) wherever a list of tool ids is required.
+- Pre-approval under `permissionMode: "dontAsk"` is configured by listing those
+  same exact `mcp__ktx__<toolName>` ids in `allowedTools` (documented as
+  auto-allow without prompting). Treat `allowedTools` as auto-approval, not
+  restriction.
+- Defense-in-depth restriction uses `canUseTool`. The KTX runtime supplies a
+  `canUseTool` handler that allows only tool names in the current KTX MCP tool
+  map and denies everything else, so host-discovered slash commands, skills,
+  agents, future SDK defaults, or a misconfigured MCP server cannot expand the
+  execution surface.
+- `disallowedTools` MUST additionally list the current built-in tool names
+  (`Agent`, `Task`, `AskUserQuestion`, `Bash`, `Read`, `Edit`, `Write`, `Glob`,
+  `Grep`, `WebFetch`, `WebSearch`, `TodoWrite`) as redundant insurance.
+- `cwd` is `project.projectDir`, resolved at startup via `resolveKtxProjectDir`,
+  not `process.cwd()`.
+- Sessions are not persisted unless the plan identifies a concrete debugging
+  feature that needs persistence.
+
+## Agent SDK environment and auth boundary
+
+The Agent SDK's `query()` option `env` (`@anthropic-ai/claude-agent-sdk@0.3.142`
+`sdk.d.ts:1265-1279`) is the environment passed to the Claude Code child
+process and defaults to `process.env`. Without an explicit `env`, the SDK
+inherits the parent's environment, including any `ANTHROPIC_API_KEY`,
+`ANTHROPIC_AUTH_TOKEN`, `ANTHROPIC_BASE_URL`, gateway/AI-Gateway tokens,
+`GOOGLE_APPLICATION_CREDENTIALS` / `CLOUD_ML_REGION` (Vertex), and
+`AWS_*` (Bedrock) credentials — any of which can switch the Claude Code CLI's
+authentication source to API-key or another provider, bypassing the user's
+local Claude Code session. That would silently violate the core requirement
+that `claude-code` runs through the user's existing local Claude Code session
+and that there is no silent fallback to gateway, Anthropic API-key, or other
+provider execution.
+
+Every `claude-code` `query()` call site - agent loops, text generation,
+object generation, and the auth probe - MUST pass an explicit `env`
+(`ktxClaudeCodeEnv`) constructed from `process.env` with the following
+denylist removed:
+
+- `ANTHROPIC_API_KEY`
+- `ANTHROPIC_AUTH_TOKEN`
+- `ANTHROPIC_BASE_URL`
+- `ANTHROPIC_MODEL` (provider-routing override)
+- `ANTHROPIC_VERTEX_PROJECT_ID`, `CLOUD_ML_REGION`,
+  `GOOGLE_APPLICATION_CREDENTIALS`, `GOOGLE_CLOUD_PROJECT`
+- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`,
+  `AWS_REGION`, `AWS_PROFILE`
+- `CLAUDE_CODE_USE_BEDROCK`, `CLAUDE_CODE_USE_VERTEX`
+- Any future provider-routing variables the pinned SDK version documents
+
+The denylist is the source of truth and lives next to the runtime constructor
+so adding a variable is a single-file change.
+
+Acceptance criteria:
+
+- The constructed `ktxClaudeCodeEnv` does not contain any denylisted key, and
+  this is verified by a unit test that seeds each denylisted key in a fake
+  `process.env`.
+- The auth probe fails with the same "authenticate Claude Code locally"
+  message even when `ANTHROPIC_API_KEY` (or any other denylisted credential)
+  is present in `process.env` and no valid local Claude Code session exists.
+- Every KTX-originated `query()` invocation is spied to assert that `env`
+  was passed and that it does not contain any denylisted key; the test fails
+  if any code path falls back to the SDK default `process.env`.
+- The "no silent fallback" rule is preserved end-to-end: a machine with
+  `ANTHROPIC_API_KEY` set but no local Claude Code authentication still fails
+  setup/status/doctor on `claude-code`.
+
+## Tool boundary
+
+Agent-loop tools cannot remain only raw AI SDK `Record<string, Tool>` values if
+two backends must consume them. The plan must define a backend-neutral tool
+descriptor for the final tool map handed to an agent loop:
+
+```ts
+interface KtxRuntimeToolDescriptor<TInput, TOutput> {
+  name: string;
+  description: string;
+  inputSchema: z.ZodObject<z.ZodRawShape>;
+  execute(input: TInput): Promise<KtxRuntimeToolOutput<TOutput>>;
+}
+
+interface KtxRuntimeToolOutput<TOutput> {
+  // What the model sees as the tool_result content. Always a markdown string;
+  // never a raw JS object. This matches BaseTool's existing
+  // `toModelOutput` contract (`packages/context/src/tools/base-tool.ts:154-162`)
+  // which sends only markdown to the LLM.
+  markdown: string;
+  // Out-of-band payload preserved for tool callers (transcripts, debug,
+  // verification ledger, downstream KTX consumers). Not sent to the model.
+  structured?: TOutput;
+}
+```
+
+Every composed tool entry must produce this descriptor shape, including:
+
+- `BaseTool` outputs from factory toolsets, which already return
+  `{ markdown, structured }`.
+- Source-specific raw tools such as `emit_historic_sql_evidence` in
+  `packages/context/src/ingest/local-bundle-runtime.ts`.
+- Stage-local tools in `buildWuToolSet` and `buildReconcileToolSet`.
+- Inline `load_skill`, read/raw/span, stage/diff, eviction, and emit tools in
+  `packages/context/src/ingest/ingest-bundle.runner.ts`.
+- Memory-agent `load_skill` in
+  `packages/context/src/memory/memory-agent.service.ts`.
+- The `withVerificationLedger` wrapping layer, whose markdown/structured
+  guard outputs (`packages/context/src/ingest/tools/verification-ledger.tool.ts:40-97`)
+  already match the contract.
+
+### Tool output contract
+
+The runtime defines a single output contract for both backends so the model
+sees the same content regardless of provider:
+
+- **Model-visible content**: the `markdown` field, mapped to the Agent SDK
+  tool handler return as `{ content: [{ type: "text", text: markdown }] }` for
+  `claude-code`, and surfaced through the existing `toModelOutput` markdown
+  path for AI SDK backends. The model never sees raw JS objects.
+- **Structured payload**: the optional `structured` field, preserved on the
+  in-process tool-result envelope for transcript/debug capture, the
+  verification ledger, and any KTX caller that introspects results. The
+  Claude adapter does not put structured JSON into model-visible content
+  unless an individual call site explicitly opts in.
+- **Normalization of existing raw tools**: tools that today return a bare
+  string (e.g. `load_skill` "Skill not available" responses in
+  `packages/context/src/ingest/ingest-bundle.runner.ts:697-721` and
+  `:924-936`, and `packages/context/src/memory/memory-agent.service.ts:128-152`)
+  must be wrapped at the descriptor boundary so `markdown` is the string and
+  `structured` is omitted. Tools that today return a plain object (e.g.
+  skill payload `{ name, content, skillDirectory }`) must be wrapped so
+  `markdown` is a deterministic human-readable rendering (e.g. the skill
+  body with a header) and the original object is preserved on `structured`.
+  No KTX tool may return a raw object as the model-visible payload on the
+  Claude Code backend, because the Agent SDK MCP handler will otherwise
+  stringify it and drop the structured fields.
+- **AI SDK parity**: the AI SDK adapter MUST preserve BaseTool's existing
+  `toModelOutput` markdown-only behavior. Migrating BaseTool-derived tools
+  to the descriptor must not start sending structured JSON to the model.
+
+The AI SDK adapter converts descriptors to `tool(...)` with a `toModelOutput`
+that emits `markdown` only. The Claude Code adapter converts descriptors to
+Agent SDK `tool(name, description, schema.shape, handler)` entries inside
+`createSdkMcpServer(...)` and returns `{ content: [{ type: "text", text:
+markdown }] }`.
+
+Non-object schemas are unsupported for `claude-code` and must be rejected at
+startup with a clear error. In practice KTX tool inputs are already `z.object`.
+
+## Stop reasons and failures
+
+The Claude runner maps the SDK's typed `SDKResultMessage` (union of
+`SDKResultSuccess` and `SDKResultError` in
+`@anthropic-ai/claude-agent-sdk@0.3.142`, `sdk.d.ts`) to
+`RunLoopStopReason = "budget" | "natural" | "error"`. The mapping must consider
+three typed signals in this precedence order, because each successive signal
+may be present where the previous one is absent:
+
+1. `subtype`: `"error_max_turns"` -> `"budget"`; `"success"` -> `"natural"`;
+   other error subtypes (`"error_during_execution"`,
+   `"error_max_budget_usd"`, `"error_max_structured_output_retries"`) ->
+   `"error"`.
+2. `terminal_reason` (optional `TerminalReason` field on both success and
+   error results): `"max_turns"` -> `"budget"`; `"completed"` -> `"natural"`;
+   any other terminal reason such as `"blocking_limit"`,
+   `"rapid_refill_breaker"`, `"prompt_too_long"`, `"image_error"`,
+   `"model_error"`, `"aborted_streaming"`, `"aborted_tools"`,
+   `"stop_hook_prevented"`, `"hook_stopped"`, or `"tool_deferred"` ->
+   `"error"`.
+3. The assistant message `stop_reason`: `"max_turns"` -> `"budget"`; any
+   other non-null unsuccessful stop reason -> `"error"`.
+
+A `max_turns` signal arriving through any of the three sources must map to
+`"budget"`; the runner MUST NOT classify a max-turn termination as
+`"natural"` or as a generic `"error"` because it was reported via
+`terminal_reason` instead of `subtype`.
+
+`Stop` hooks are not the authoritative stop-reason source because they do not
+carry the terminal reason. They remain useful for lifecycle logging. Tool failure
+counting should use `PostToolUseFailure` and feed the same mechanism that
+`stage-3-work-units.ts` checks through `toolFailureCount?(wu.unitKey)`.
+
+For text and object generation, SDK authentication, billing, rate-limit,
+permission, max-turn, structured-output, and execution errors must map to the
+same error surfaces that KTX uses for the Anthropic API-key backend.
+
+## Agent-loop progress callbacks
+
+`RunLoopParams.onStepFinish`
+(`packages/context/src/agent/agent-runner.service.ts:20`) is part of the
+current agent-loop contract. The AI SDK runner increments `stepIndex` on each
+`generateText` step and invokes the callback
+(`agent-runner.service.ts:83-97`). KTX consumers depend on this:
+`packages/context/src/ingest/ingest-bundle.runner.ts:782` emits
+`work_unit_step` events from it, and `:1036` / `:1089` update reconciliation
+progress for the user-visible "Reconciling results · step N" status.
+
+The `claude-code` runner MUST preserve `onStepFinish` semantics:
+
+- It MUST invoke `onStepFinish` exactly once per assistant turn (i.e. once per
+  step the SDK reports), incrementing `stepIndex` starting at 1.
+- The plan MUST name the concrete SDK stream event used as the step boundary
+  (the implementation plan picks one of the documented assistant/result
+  message events from the pinned SDK version and justifies it). The chosen
+  event must produce the same `stepIndex` count as the AI SDK runner for an
+  equivalent run: N tool-using turns yield N callbacks.
+- Callback errors MUST be caught and logged at `warn` level without aborting
+  the loop, matching `agent-runner.service.ts:90-96`.
+- `stepBudget` passed to the callback MUST equal the `maxTurns` configured on
+  the SDK `query()` call.
+
+Acceptance criteria:
+
+- A `claude-code` agent loop run with `stepBudget: N` produces N
+  `work_unit_step` events when the loop runs to budget.
+- A reconciliation run under `claude-code` produces the same
+  `updateProgress` calls (count and `stepIndex / stepBudget` ratio) as the
+  Anthropic API-key backend for an equivalent fixture.
+- An `onStepFinish` callback that throws does not surface the error as the
+  loop result.
+
+## Prompt caching parity
+
+`packages/llm/src/types.ts:44, :61` exposes `llm.promptCaching` as a config
+field, and the AI SDK message builder
+(`packages/llm/src/message-builder.ts:62-114, :141-218`) applies
+`anthropic.cacheControl: { type: "ephemeral", ttl }` markers to the system
+message, the last history message, and sorted tools, with TTLs split into
+`systemTtl`, `toolsTtl`, and `historyTtl`. `model-provider.test.ts:276`
+verifies caching is enabled by default with those three TTLs.
+
+The Agent SDK does not expose KTX's marker-based contract. The closest
+mechanism is `systemPrompt: string[]` with
+`SYSTEM_PROMPT_DYNAMIC_BOUNDARY` (`sdk.d.ts:1746-1799`), which marks a static
+prefix as cacheable but provides no per-tool, per-history, or per-TTL knobs.
+
+For the `claude-code` backend, the spec treats `llm.promptCaching` as
+**partial parity**:
+
+- The Claude runtime MAY map a non-empty static system prefix to a cacheable
+  `systemPrompt` array using `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` when
+  `cacheSystem` is enabled in the resolved `KtxPromptCachingConfig`. The
+  implementation plan decides whether to ship this mapping in the first pass
+  or defer it.
+- `cacheTools`, `cacheHistory`, and the `systemTtl` / `toolsTtl` /
+  `historyTtl` fields have no Agent SDK equivalent. The runtime MUST NOT
+  silently drop them: when a user sets non-default values under
+  `llm.promptCaching` and the backend is `claude-code`, status/doctor and the
+  setup wizard MUST surface that these fields are ignored on this backend.
+- Docs under `docs-site/content/docs/` MUST document this divergence in the
+  same pages that describe `claude-code` setup, so users do not assume the
+  TTL/tool/history knobs apply.
+
+Acceptance criteria:
+
+- A `claude-code` runtime constructed from a config with default
+  `promptCaching` does not throw and does not pass KTX `cacheControl`
+  markers to the Agent SDK (the AI-SDK-only markers stay on the AI SDK
+  path).
+- A `claude-code` runtime constructed from a config with non-default
+  `promptCaching` values yields a warning surfaced through doctor/status
+  output identifying the ignored fields.
+
+## Auth and setup
+
+`ktx setup`, status, and doctor flows must validate that Claude Code SDK auth is
+usable, not just that `~/.claude/` exists. Acceptable validation strategies:
+
+- A minimal SDK probe call with `settingSources: []`, `skills: []`,
+  `plugins: []`, `tools: []`, `persistSession: false`, no `mcpServers`,
+  `env: ktxClaudeCodeEnv`, and `maxTurns: 1`. The probe MUST NOT rely on
+  the SDK's documented default for any of these fields, because the default
+  for `settingSources` is `["user", "project", "local"]` (loads filesystem
+  settings) and the default for `env` is `process.env` (can route auth
+  through `ANTHROPIC_API_KEY` or other provider credentials and hide a
+  missing local Claude Code session). See "Agent SDK environment and auth
+  boundary" above for the `env` denylist.
+  The auth probe MUST tolerate init messages with non-empty `slash_commands`,
+  `skills`, and `agents` when `message.tools` is empty, `message.mcp_servers`
+  is empty, `message.plugins` is empty, and the query options contain the KTX
+  isolation tuple. Host discovery metadata is not an auth failure.
+- An SDK-provided account/auth status method if the pinned version exposes one.
+- A docs-endorsed file-presence check only if the official SDK docs explicitly
+  state that it proves auth usability.
+
+Failure copy should tell the user to authenticate Claude Code locally with the
+Claude Code CLI, then rerun setup or the command they attempted.
+
+## Documentation impact
+
+Docs updates are required because this changes user-visible setup and LLM
+provider behavior:
+
+- `docs-site/content/docs/getting-started/quickstart.mdx`
+- `docs-site/content/docs/cli-reference/ktx-setup.mdx`
+- `docs-site/content/docs/guides/building-context.mdx`
+- Any config reference page that documents `llm.provider.backend`
+- Any status or doctor docs that describe LLM readiness
+
+The docs must say that `claude-code` uses the user's own local Claude Code
+session. Do not describe it as a way for KTX to resell, pool, or productize
+Claude subscription limits.
+
+## Verified evidence
+
+- Current `KtxLlmProvider` returns AI SDK `LanguageModel` instances and only
+  supports `anthropic`, `vertex`, and `gateway`
+  (`packages/llm/src/types.ts`, `packages/llm/src/model-provider.ts`).
+- Project config currently accepts `llm.provider.backend: none | anthropic |
+  vertex | gateway` (`packages/context/src/project/config.ts`).
+- `generateKtxText` and `generateKtxObject` are shared non-agent generation
+  helpers (`packages/context/src/llm/generation.ts`).
+- `AgentRunnerService` is the shared AI SDK agent-loop implementation
+  (`packages/context/src/agent/agent-runner.service.ts`).
+- Page triage and light extraction currently use raw `KtxLlmProvider`
+  (`packages/context/src/ingest/page-triage/page-triage.service.ts`).
+- Scan/enrichment internals currently use `createLocalKtxLlmProviderFromConfig`,
+  `generateKtxText`, and `generateKtxObject`
+  (`packages/context/src/scan/local-scan.ts`,
+  `packages/context/src/scan/description-generation.ts`,
+  `packages/context/src/scan/relationship-llm-proposal.ts`).
+- Local ingest and MCP local project ports inject `llmProvider` and
+  `agentRunner` today (`packages/context/src/ingest/local-bundle-runtime.ts`,
+  `packages/context/src/mcp/local-project-ports.ts`).
+- The Agent SDK TypeScript reference (`@anthropic-ai/claude-agent-sdk@0.3.142`,
+  `sdk.d.ts:1690-1697` and the `sdk.mjs` runtime default
+  `["user","project","local"]`) documents `settingSources` **defaulting to
+  loading user, project, and local filesystem settings** when omitted; passing
+  `[]` is the explicit opt-out ("SDK isolation mode"). The same reference
+  documents `allowedTools` as auto-approval rather than restriction,
+  `canUseTool` as the programmatic permission handler,
+  `permissionMode: "dontAsk"`, `tools` as the base built-in set with `[]`
+  meaning "disable all built-ins" and no MCP-id support, `disallowedTools`,
+  `maxTurns`, `mcpServers`, `cwd`, `persistSession`, and SDK result/hook
+  message shapes.
+- `SDKResultMessage = SDKResultSuccess | SDKResultError` in
+  `@anthropic-ai/claude-agent-sdk@0.3.142` (`sdk.d.ts`); both variants expose
+  an optional `terminal_reason: TerminalReason`, where `TerminalReason`
+  includes `'max_turns' | 'completed'` alongside other terminal reasons.
+- The Agent SDK MCP docs and SDK examples (e.g. Context7
+  `/nothflare/claude-agent-sdk-docs` custom-tools guide) show registering MCP
+  servers in `query()` options and listing exact `mcp__<server>__<tool>` ids
+  in `allowedTools`; no SDK doc or type currently documents a wildcard form.
+- BaseTool's `toModelOutput` already sends only `markdown` to the model while
+  preserving structured output for callers
+  (`packages/context/src/tools/base-tool.ts:154-162`); some raw AI SDK tools
+  in `packages/context/src/ingest/ingest-bundle.runner.ts:697-721, :924-936`
+  and `packages/context/src/memory/memory-agent.service.ts:128-152` currently
+  return bare strings or plain objects and must be normalized at the
+  descriptor boundary so both backends preserve the contract.
+- The Agent SDK skills docs say the `skills` option is a context filter rather
+  than a sandbox. KTX must pass `skills: []`, but must not assert that
+  `message.skills` is empty in the SDK init message.
+- `Options.env` in `@anthropic-ai/claude-agent-sdk@0.3.142`
+  (`sdk.d.ts:1265-1279`) is the environment passed to the Claude Code
+  process and defaults to `process.env`. Without an explicit `env`, the SDK
+  inherits the parent environment, including any provider-routing variables
+  (`ANTHROPIC_API_KEY`, Vertex/Bedrock credentials, gateway tokens) that
+  could change the active authentication source of the Claude Code CLI and
+  hide a missing local Claude Code session.
+
+## Open items for the implementation plan
+
+1. Confirm exact TypeScript option names and result-message discriminants
+   against the pinned `@anthropic-ai/claude-agent-sdk` version.
+2. Define the final `KtxLlmRuntimePort` file location and package exports.
+3. Define model alias validation for `sonnet`, `opus`, `haiku`, and full model
+   IDs.
+4. Define the auth probe and make setup/status/doctor report actionable
+   messages.
+5. Run a repo-wide audit for all LLM call sites and migrate each one to the
+   runtime boundary.
+6. Write tests proving `claude-code` works for text generation, structured
+   object generation, and agent-loop execution.
+7. Write tests proving page triage, scan/enrichment internals, memory capture,
+   MCP-triggered local ingest, and normal local ingest all use the
+   `claude-code` runtime when configured.
+8. Write tests proving a raw built-in Claude Code tool request is denied,
+   host-discovered Skill/Agent/SlashCommand requests are denied by `canUseTool`,
+   and only exact `mcp__ktx__*` tools are allowed during KTX agent loops.
+9. Write a test that asserts every KTX-originated `query()` invocation
+   (agent loop, text generation, object generation, auth probe) is called
+   with `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`, and
+   `persistSession: false`, by spying on the SDK entry point. The test must
+   fail if any path falls back to SDK defaults for those fields. The test must
+   also prove that non-empty host-discovered `slash_commands`, `skills`, and
+   `agents` in the init message do not fail the auth probe or runtime when the
+   controlled tool, MCP server, and plugin surfaces match KTX expectations.
+10. Write a test that asserts `onStepFinish` is invoked the expected number
+    of times for a fixed-budget `claude-code` agent loop, including the
+    work-unit and reconciliation progress paths.
+11. Write a test that asserts every KTX-originated `query()` invocation
+    (agent loop, text generation, object generation, auth probe) is called
+    with an explicit `env` and that none of the denylisted provider-routing
+    variables (`ANTHROPIC_API_KEY`, `ANTHROPIC_AUTH_TOKEN`,
+    `ANTHROPIC_BASE_URL`, `ANTHROPIC_MODEL`, `ANTHROPIC_VERTEX_PROJECT_ID`,
+    `CLOUD_ML_REGION`, `GOOGLE_APPLICATION_CREDENTIALS`,
+    `GOOGLE_CLOUD_PROJECT`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`,
+    `AWS_SESSION_TOKEN`, `AWS_REGION`, `AWS_PROFILE`,
+    `CLAUDE_CODE_USE_BEDROCK`, `CLAUDE_CODE_USE_VERTEX`) are present in
+    that env, by seeding each variable in a fake `process.env`. The test
+    must also assert that the auth probe still fails when
+    `ANTHROPIC_API_KEY` is set in `process.env` but no local Claude Code
+    session exists.