mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-22 08:38:08 +02:00
feat: add claude-code llm backend with runtime port (#115)
* docs: revise claude-code ingest backend spec * docs: keep claude-code spec focused on ingest * docs: expand claude-code spec to full llm parity * Refine claude-code backend spec after adversarial review iteration 1 * Refine claude-code backend spec after adversarial review iteration 2 * Refine claude-code backend spec after adversarial review iteration 3 * feat: recognize claude-code llm backend * feat: add ktx llm runtime port * feat: add claude-code llm runtime * feat: route non-agent llm calls through runtime * feat: run ingest agents through llm runtime * feat: support claude-code setup and status * test: verify claude-code backend runtime * docs: add claude-code backend v1 runtime plan * fix: close claude-code runtime isolation checks * fix: warn on claude-code prompt caching during setup * chore: verify claude-code v1 closure * docs: add claude-code backend v1 isolation closure plan * fix: update claude-code ingest setup guidance * docs: add claude-code backend v1 ingest guidance closure plan * docs: align claude-code isolation spec with sdk metadata * test: cover claude-code host discovery metadata * fix: tolerate claude-code host discovery metadata * docs: clarify claude-code host discovery metadata * docs: add claude-code auth-probe isolation fix plan * chore: prepare kaelio ktx rc1 release * chore: add semantic release workflow * fix: unblock ci checks * chore(release): 0.1.0-rc.1 * feat: add Claude Code model selection to setup * fix: keep git maintenance attached in local repos
This commit is contained in:
parent
e6d578c03f
commit
b565e44a22
109 changed files with 10218 additions and 1093 deletions
99
docs/release.md
Normal file
99
docs/release.md
Normal file
|
|
@ -0,0 +1,99 @@
|
|||
# KTX release runbook
|
||||
|
||||
This runbook covers the maintainer workflow for publishing `@kaelio/ktx` to
|
||||
npm through GitHub Actions. The workflow uses semantic-release to choose the
|
||||
next version, update release metadata, publish the package, create the GitHub
|
||||
release, and commit the release files back to the repository.
|
||||
|
||||
## Release channels
|
||||
|
||||
KTX has two npm release channels:
|
||||
|
||||
- `rc` publishes prereleases such as `0.1.0-rc.2` to the npm `next` tag.
|
||||
- `stable` publishes normal releases such as `0.1.0` to the npm `latest` tag.
|
||||
|
||||
Run stable releases only from `main`. The workflow rejects stable releases from
|
||||
other branches.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before you publish, confirm these requirements:
|
||||
|
||||
- The repository has an Actions secret named `NPM_TOKEN`.
|
||||
- `NPM_TOKEN` is a granular npm token that can publish `@kaelio/ktx`.
|
||||
- The token can publish non-interactively if the npm account or package uses
|
||||
two-factor authentication for writes.
|
||||
- The repository has a baseline semantic-release tag for the latest published
|
||||
package version, such as `v0.1.0-rc.1`.
|
||||
|
||||
If no baseline tag exists, semantic-release treats the run as the first release
|
||||
and may choose a version that doesn't match the currently published package.
|
||||
|
||||
## Dry-run a release
|
||||
|
||||
Use a dry-run to verify the next version and generated release notes without
|
||||
publishing to npm.
|
||||
|
||||
1. Open **Actions** in GitHub.
|
||||
2. Select **KTX Release**.
|
||||
3. Select the branch to release from.
|
||||
4. Set **release_kind** to `rc` or `stable`.
|
||||
5. Leave **publish_live** set to `false`.
|
||||
6. Optional: Set **force_release** to `true` when you need a patch release even
|
||||
if semantic-release doesn't find a releasable commit.
|
||||
7. Run the workflow.
|
||||
|
||||
The dry-run uses the same semantic-release configuration as a live release. It
|
||||
doesn't publish to npm and doesn't commit release files.
|
||||
|
||||
## Publish an rc release
|
||||
|
||||
Publish an rc release when you need a prerelease package for validation before
|
||||
promoting to `latest`.
|
||||
|
||||
1. Open **Actions** in GitHub.
|
||||
2. Select **KTX Release**.
|
||||
3. Select the branch to release from.
|
||||
4. Set **release_kind** to `rc`.
|
||||
5. Set **publish_live** to `true`.
|
||||
6. Optional: Set **force_release** to `true`.
|
||||
7. Run the workflow.
|
||||
|
||||
The workflow publishes `@kaelio/ktx` with `--access public --tag next`, runs the
|
||||
published package smoke test, creates a GitHub release, and commits
|
||||
`CHANGELOG.md`, `package.json`, and `release-policy.json`.
|
||||
|
||||
## Publish a stable release
|
||||
|
||||
Publish a stable release from `main` after you have validated an rc package.
|
||||
|
||||
1. Open **Actions** in GitHub.
|
||||
2. Select **KTX Release**.
|
||||
3. Select `main`.
|
||||
4. Set **release_kind** to `stable`.
|
||||
5. Set **publish_live** to `true`.
|
||||
6. Optional: Set **force_release** to `true`.
|
||||
7. Run the workflow.
|
||||
|
||||
The workflow publishes `@kaelio/ktx` with `--access public --tag latest`, runs
|
||||
the published package smoke test, creates a GitHub release, and commits the
|
||||
release metadata.
|
||||
|
||||
## Release metadata
|
||||
|
||||
semantic-release calls `scripts/update-public-release-version.mjs` during the
|
||||
prepare step. That script updates:
|
||||
|
||||
- `package.json` with the semantic-release version.
|
||||
- `release-policy.json` with `publicNpmPackageVersion`, npm publish settings,
|
||||
and the published package smoke-test version.
|
||||
|
||||
The artifact packaging and readiness scripts read `publicNpmPackageVersion`
|
||||
from `release-policy.json`, so manual version edits in build scripts aren't
|
||||
needed for rc releases.
|
||||
|
||||
## Trusted Publishing follow-up
|
||||
|
||||
This workflow uses `NPM_TOKEN` today. Move to npm Trusted Publishing after the
|
||||
final publish command path is verified for the package manager and workflow
|
||||
filename configured in npm package settings.
|
||||
|
|
@ -0,0 +1,678 @@
|
|||
# Claude Code Auth Probe Isolation Fix Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Make the `claude-code` auth probe and runtime tolerate host-discovered
|
||||
Claude Code init metadata while preserving KTX-owned tool, MCP, and plugin
|
||||
restrictions.
|
||||
|
||||
**Architecture:** Keep the existing Claude Code runtime and SDK option tuple.
|
||||
Change the init-message assertion from "no host discovery appears" to "only the
|
||||
KTX-controlled execution surface is active." Align the design spec and user docs
|
||||
with the pinned SDK behavior: `settingSources: []` disables filesystem settings,
|
||||
`skills: []` is a context filter, and deny-by-default `canUseTool` is the
|
||||
runtime enforcement boundary.
|
||||
|
||||
**Tech Stack:** TypeScript, pnpm, Vitest, Markdown, Fumadocs MDX,
|
||||
`@anthropic-ai/claude-agent-sdk@0.3.142`.
|
||||
|
||||
---
|
||||
|
||||
## Audit result
|
||||
|
||||
The current strict isolation assertion is a v1-blocking bug. A real authenticated
|
||||
Claude Code host can report non-empty `slash_commands`, `skills`, and `agents`
|
||||
in the SDK init message even when KTX passes `settingSources: []`, `skills: []`,
|
||||
`plugins: []`, `tools: []`, exact KTX MCP `allowedTools`, `disallowedTools`, and
|
||||
deny-by-default `canUseTool`.
|
||||
|
||||
Spec findings:
|
||||
|
||||
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:45-47`
|
||||
requires host-discovered capabilities not to expand the KTX agent-loop tool
|
||||
surface. That requirement is about invocation, not necessarily about zero
|
||||
diagnostic metadata in the init message.
|
||||
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:254-265`
|
||||
overreaches by asking the implementation to assert that unexpected
|
||||
settings-derived commands, skills, agents, plugins, or MCP servers are
|
||||
inactive from the SDK init message. In `@anthropic-ai/claude-agent-sdk@0.3.142`,
|
||||
the available SDK controls cannot make `message.slash_commands`,
|
||||
`message.skills`, or `message.agents` reliably empty on an authenticated host.
|
||||
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:266-267`
|
||||
says skills are disabled with `skills: []`. The pinned SDK type definitions
|
||||
document `skills` as a context filter, not a sandbox.
|
||||
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:543-545`
|
||||
correctly requires the auth probe to pass the isolation option tuple and no
|
||||
MCP servers. It does not require failing when host discovery metadata is
|
||||
present.
|
||||
|
||||
SDK evidence from
|
||||
`node_modules/.pnpm/@anthropic-ai+claude-agent-sdk@0.3.142_zod@4.4.3/node_modules/@anthropic-ai/claude-agent-sdk/sdk.d.ts`:
|
||||
|
||||
- Lines `1686-1695`: `settingSources: []` disables filesystem settings only.
|
||||
- Lines `1697-1718`: `skills: []` is a context filter; unlisted skills are
|
||||
hidden from listing and rejected by the Skill tool, but files remain on disk.
|
||||
- Lines `1202-1213`: `allowedTools` is auto-approval, while `canUseTool` is the
|
||||
permission handler for controlling tool execution.
|
||||
- Lines `1224-1228`: `disallowedTools` removes listed tools from context and
|
||||
prevents use.
|
||||
- Lines `1255-1264`: `tools: []` disables built-in tools.
|
||||
- Lines `1545-1558`: `plugins` loads plugins when supplied; KTX supplies `[]`.
|
||||
- Lines `3465-3489`: the init message reports `agents`, `tools`,
|
||||
`mcp_servers`, `slash_commands`, `skills`, and `plugins`.
|
||||
|
||||
Implemented plan audit:
|
||||
|
||||
- `2026-05-15-claude-code-backend-v1-runtime.md` is implemented for config,
|
||||
runtime port, SDK dependency, model aliases, environment scrubbing, Claude Code
|
||||
text/object/agent execution, setup/status/doctor support, docs, and LLM
|
||||
call-site migration.
|
||||
- `2026-05-15-claude-code-backend-v1-isolation-closure.md` is implemented, but
|
||||
it converted the spec's ambiguous "assert inactive" line into an impossible
|
||||
assertion against non-empty `slash_commands`, `skills`, and `agents`.
|
||||
- `2026-05-15-claude-code-backend-v1-ingest-guidance-closure.md` is implemented
|
||||
for the ingest missing-LLM guidance and associated CLI/context tests.
|
||||
|
||||
Remaining v1-blocking gaps:
|
||||
|
||||
- `packages/context/src/llm/claude-code-runtime.ts:94-101` throws on
|
||||
host-discovered slash commands, skills, and agents.
|
||||
- `packages/context/src/llm/claude-code-runtime.test.ts:158-178` encodes the
|
||||
wrong behavior by requiring the runtime to reject any init message with
|
||||
discovered agents.
|
||||
- The auth probe has no regression coverage for an authenticated host whose init
|
||||
message reports non-empty `slash_commands`, `skills`, and `agents`.
|
||||
- User docs under `docs-site/content/docs/guides/` say KTX "disables" skills,
|
||||
agents, hooks, and slash commands. That wording is stronger than the SDK
|
||||
contract and must be changed to "not invokable by KTX agent loops."
|
||||
|
||||
Non-blocking gaps:
|
||||
|
||||
- Same-step AI SDK tool-call repair parity remains out of scope for v1.
|
||||
- OTEL telemetry parity remains out of scope for v1.
|
||||
- Embedding parity remains out of scope because embeddings are configured
|
||||
separately.
|
||||
- Full prompt-caching parity remains out of scope. V1 keeps warning on ignored
|
||||
prompt-cache fields and avoids AI SDK cache markers on the Claude Code path.
|
||||
|
||||
Decision:
|
||||
|
||||
- Choose option (a): relax the assertion in code and align the spec text. Do not
|
||||
rely on an invented SDK mechanism. The pinned type definitions expose
|
||||
`settingSources`, `skills`, `plugins`, `tools`, `allowedTools`,
|
||||
`disallowedTools`, and `canUseTool`, but they do not expose a query option that
|
||||
disables all host-discovered slash commands or user-level subagent names in the
|
||||
init message.
|
||||
|
||||
## File structure
|
||||
|
||||
Modify these files:
|
||||
|
||||
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md` aligns the
|
||||
design with the real SDK contract.
|
||||
- `packages/context/src/llm/claude-code-runtime.test.ts` adds the failing
|
||||
regression tests for auth probe and runtime init metadata.
|
||||
- `packages/context/src/llm/claude-code-runtime.ts` relaxes init metadata checks
|
||||
while tightening exact tool equality.
|
||||
- `docs-site/content/docs/guides/llm-configuration.mdx` changes user docs from
|
||||
"disabled" to "not invokable."
|
||||
- `docs-site/content/docs/guides/building-context.mdx` applies the same
|
||||
user-facing wording at the ingest guide boundary.
|
||||
|
||||
### Task 1: Align the design spec with SDK reality
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md`
|
||||
|
||||
- [ ] **Step 1: Update the tool-boundary goal**
|
||||
|
||||
Replace the goal bullet at lines `45-47` with:
|
||||
|
||||
```markdown
|
||||
- Preserve KTX's curated tool boundaries. Claude Code built-ins,
|
||||
filesystem-discovered MCP servers, hooks, skills, plugins, agents, and slash
|
||||
commands must not become invokable in KTX agent loops. The Agent SDK init
|
||||
message may still report host-discovered slash commands, skills, and agents;
|
||||
KTX treats that metadata as diagnostic only and restricts execution through
|
||||
`tools: []`, exact KTX MCP `allowedTools`, `disallowedTools`, and
|
||||
deny-by-default `canUseTool`.
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Replace the over-broad init assertion requirement**
|
||||
|
||||
Replace the bullet at lines `254-265` with:
|
||||
|
||||
```markdown
|
||||
- Filesystem settings are not loaded. The SDK's documented default for an
|
||||
omitted `settingSources` is `["user", "project", "local"]`
|
||||
(`@anthropic-ai/claude-agent-sdk@0.3.142` `sdk.d.ts:1686-1695`),
|
||||
which would inherit the user's Claude Code filesystem settings. Every KTX
|
||||
`query()` call site - agent loops, text generation, object generation, and
|
||||
the auth probe - MUST pass `settingSources: []` explicitly, along with
|
||||
`skills: []`, `plugins: []`, `tools: []`, `persistSession: false`, and no
|
||||
`mcpServers` entries other than the KTX MCP server (omitted entirely when
|
||||
the call site does not expose tools). The implementation MUST assert from
|
||||
the SDK init message that the controlled execution surface matches KTX's
|
||||
expectations:
|
||||
|
||||
- `message.tools` equals the exact generated KTX MCP tool ids for the current
|
||||
call.
|
||||
- `message.mcp_servers` equals the expected KTX MCP server set: `[]` when the
|
||||
call exposes no tools, or `["ktx"]` when it does.
|
||||
- `message.plugins` is empty.
|
||||
|
||||
The implementation MUST NOT reject a run solely because
|
||||
`message.slash_commands`, `message.skills`, or `message.agents` contain
|
||||
host-discovered names. In `@anthropic-ai/claude-agent-sdk@0.3.142`, those
|
||||
fields can report host discovery even when KTX passes the isolation options.
|
||||
They are not part of the KTX execution surface when `tools: []`,
|
||||
`allowedTools`, `disallowedTools`, and deny-by-default `canUseTool` are set.
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Replace the skills/plugin wording**
|
||||
|
||||
Replace the bullets at lines `266-289` with:
|
||||
|
||||
```markdown
|
||||
- `skills: []` is a context filter in the pinned SDK
|
||||
(`sdk.d.ts:1697-1718`): unlisted skills are hidden from the model's skill
|
||||
listing and rejected by the Skill tool, but discovered skill names may still
|
||||
appear in init metadata. KTX must still pass `skills: []`.
|
||||
- Plugins are disabled with `plugins: []`, and the runtime asserts that
|
||||
`message.plugins` is empty in the init message.
|
||||
- Built-in tools are disabled by setting `tools: []`. The pinned SDK type
|
||||
(`@anthropic-ai/claude-agent-sdk@0.3.142`, `sdk.d.ts:1255-1264`) documents
|
||||
`tools` as the base set of built-in tools, with `[]` meaning "disable all
|
||||
built-ins"; `tools` does not accept MCP tool ids and cannot be used to
|
||||
restrict MCP availability.
|
||||
- MCP tool availability is granted by registering the KTX MCP server through
|
||||
`mcpServers`. The SDK does not document a wildcard like `mcp__ktx__*` for
|
||||
any tool field; KTX must enumerate exact generated MCP tool ids of the form
|
||||
`mcp__ktx__<toolName>` (derived from the tool map handed to
|
||||
`createSdkMcpServer`) wherever a list of tool ids is required.
|
||||
- Pre-approval under `permissionMode: "dontAsk"` is configured by listing those
|
||||
same exact `mcp__ktx__<toolName>` ids in `allowedTools` (documented as
|
||||
auto-allow without prompting). Treat `allowedTools` as auto-approval, not
|
||||
restriction.
|
||||
- Defense-in-depth restriction uses `canUseTool`. The KTX runtime supplies a
|
||||
`canUseTool` handler that allows only tool names in the current KTX MCP tool
|
||||
map and denies everything else, so host-discovered slash commands, skills,
|
||||
agents, future SDK defaults, or a misconfigured MCP server cannot expand the
|
||||
execution surface.
|
||||
- `disallowedTools` MUST additionally list the current built-in tool names
|
||||
(`Agent`, `Task`, `AskUserQuestion`, `Bash`, `Read`, `Edit`, `Write`, `Glob`,
|
||||
`Grep`, `WebFetch`, `WebSearch`, `TodoWrite`) as redundant insurance.
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Update auth probe acceptance text**
|
||||
|
||||
After the auth probe option list at lines `543-545`, add:
|
||||
|
||||
```markdown
|
||||
The auth probe MUST tolerate init messages with non-empty
|
||||
`slash_commands`, `skills`, and `agents` when `message.tools` is empty,
|
||||
`message.mcp_servers` is empty, `message.plugins` is empty, and the query
|
||||
options contain the KTX isolation tuple. Host discovery metadata is not an
|
||||
auth failure.
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Update verified evidence and open items**
|
||||
|
||||
Replace lines `621-623` with:
|
||||
|
||||
```markdown
|
||||
- The Agent SDK skills docs say the `skills` option is a context filter rather
|
||||
than a sandbox. KTX must pass `skills: []`, but must not assert that
|
||||
`message.skills` is empty in the SDK init message.
|
||||
```
|
||||
|
||||
Replace open item `8` at lines `648-649` with:
|
||||
|
||||
```markdown
|
||||
8. Write tests proving a raw built-in Claude Code tool request is denied,
|
||||
host-discovered Skill/Agent/SlashCommand requests are denied by `canUseTool`,
|
||||
and only exact `mcp__ktx__*` tools are allowed during KTX agent loops.
|
||||
```
|
||||
|
||||
Replace open item `9` at lines `650-654` with:
|
||||
|
||||
```markdown
|
||||
9. Write a test that asserts every KTX-originated `query()` invocation
|
||||
(agent loop, text generation, object generation, auth probe) is called
|
||||
with `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`, and
|
||||
`persistSession: false`, by spying on the SDK entry point. The test must
|
||||
fail if any path falls back to SDK defaults for those fields. The test must
|
||||
also prove that non-empty host-discovered `slash_commands`, `skills`, and
|
||||
`agents` in the init message do not fail the auth probe or runtime when the
|
||||
controlled tool, MCP server, and plugin surfaces match KTX expectations.
|
||||
```
|
||||
|
||||
- [ ] **Step 6: Commit the spec alignment**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
|
||||
git commit -m "docs: align claude-code isolation spec with sdk metadata"
|
||||
```
|
||||
|
||||
Expected: the design spec no longer requires zero host-discovery metadata in
|
||||
the SDK init message.
|
||||
|
||||
### Task 2: Add regression tests for host-discovered init metadata
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/context/src/llm/claude-code-runtime.test.ts`
|
||||
|
||||
- [ ] **Step 1: Replace the invalid agent rejection test**
|
||||
|
||||
In `packages/context/src/llm/claude-code-runtime.test.ts`, replace the test named
|
||||
`rejects settings-derived agents and non-KTX MCP servers from init messages`
|
||||
with these tests:
|
||||
|
||||
```ts
|
||||
it('treats host-discovered commands skills and agents as non-fatal init metadata for text and auth probe', async () => {
|
||||
const hostDiscoveredInit = initMessage({
|
||||
slash_commands: ['/help', '/compact', '/clear', '/user-command'],
|
||||
skills: ['pdf', 'docx'],
|
||||
agents: ['claude', 'Explore', 'general-purpose'],
|
||||
});
|
||||
const textQuery = vi.fn((_input: any) =>
|
||||
stream([hostDiscoveredInit, resultMessage({ result: 'hello' })]),
|
||||
);
|
||||
const runtime = new ClaudeCodeKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'sonnet' },
|
||||
query: textQuery,
|
||||
env: { ANTHROPIC_API_KEY: 'sk-ant-test', PATH: '/usr/bin' }, // pragma: allowlist secret
|
||||
});
|
||||
|
||||
await expect(runtime.generateText({ role: 'default', prompt: 'say hello' })).resolves.toBe('hello');
|
||||
const textOptions = textQuery.mock.calls[0][0].options;
|
||||
expect(textOptions).toMatchObject({
|
||||
settingSources: [],
|
||||
skills: [],
|
||||
plugins: [],
|
||||
tools: [],
|
||||
allowedTools: [],
|
||||
permissionMode: 'dontAsk',
|
||||
persistSession: false,
|
||||
env: expect.not.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test' }),
|
||||
});
|
||||
expect(textOptions.disallowedTools).toEqual(expect.arrayContaining(['Agent', 'Task', 'Bash']));
|
||||
expect(await textOptions.canUseTool('Agent', {}, { signal: new AbortController().signal, toolUseID: 'agent' })).toMatchObject({
|
||||
behavior: 'deny',
|
||||
toolUseID: 'agent',
|
||||
});
|
||||
expect(await textOptions.canUseTool('Skill', {}, { signal: new AbortController().signal, toolUseID: 'skill' })).toMatchObject({
|
||||
behavior: 'deny',
|
||||
toolUseID: 'skill',
|
||||
});
|
||||
expect(
|
||||
await textOptions.canUseTool('SlashCommand', {}, { signal: new AbortController().signal, toolUseID: 'slash' }),
|
||||
).toMatchObject({
|
||||
behavior: 'deny',
|
||||
toolUseID: 'slash',
|
||||
});
|
||||
|
||||
const probeQuery = vi.fn((_input: any) =>
|
||||
stream([hostDiscoveredInit, resultMessage({ result: 'ok' })]),
|
||||
);
|
||||
await expect(
|
||||
runClaudeCodeAuthProbe({
|
||||
projectDir: '/tmp/project',
|
||||
model: 'sonnet',
|
||||
query: probeQuery,
|
||||
env: { ANTHROPIC_AUTH_TOKEN: 'token', HOME: '/Users/test' },
|
||||
}),
|
||||
).resolves.toEqual({ ok: true });
|
||||
expect(probeQuery.mock.calls[0][0].options).toMatchObject({
|
||||
settingSources: [],
|
||||
skills: [],
|
||||
plugins: [],
|
||||
tools: [],
|
||||
allowedTools: [],
|
||||
permissionMode: 'dontAsk',
|
||||
persistSession: false,
|
||||
env: expect.objectContaining({ HOME: '/Users/test' }),
|
||||
});
|
||||
expect(probeQuery.mock.calls[0][0].options.env).not.toEqual(
|
||||
expect.objectContaining({ ANTHROPIC_AUTH_TOKEN: 'token' }),
|
||||
);
|
||||
});
|
||||
|
||||
it('allows host-discovered context during agent loops while requiring exact KTX MCP tools and servers', async () => {
|
||||
const query = vi.fn((_input: any) =>
|
||||
stream([
|
||||
initMessage({
|
||||
tools: ['mcp__ktx__load_skill'],
|
||||
mcp_servers: [{ name: 'ktx', status: 'connected' }],
|
||||
slash_commands: ['/help', '/compact', '/clear'],
|
||||
skills: ['memory-agent', 'doc-reader'],
|
||||
agents: ['claude', 'Plan', 'Explore'],
|
||||
}),
|
||||
{
|
||||
type: 'assistant',
|
||||
message: { role: 'assistant', content: [] },
|
||||
parent_tool_use_id: null,
|
||||
uuid: '00000000-0000-4000-8000-000000000006',
|
||||
session_id: 'session-id',
|
||||
} as unknown as SDKMessage,
|
||||
resultMessage({ subtype: 'error_max_turns', is_error: true }),
|
||||
]),
|
||||
);
|
||||
const runtime = new ClaudeCodeKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'sonnet' },
|
||||
query,
|
||||
env: {},
|
||||
});
|
||||
|
||||
await expect(
|
||||
runtime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
toolSet: {
|
||||
load_skill: {
|
||||
name: 'load_skill',
|
||||
description: 'Load skill.',
|
||||
inputSchema: z.object({ name: z.string() }),
|
||||
execute: async () => ({ markdown: 'loaded' }),
|
||||
},
|
||||
},
|
||||
stepBudget: 1,
|
||||
telemetryTags: { operationName: 'test' },
|
||||
}),
|
||||
).resolves.toEqual({ stopReason: 'budget' });
|
||||
|
||||
const options = query.mock.calls[0][0].options;
|
||||
expect(options.allowedTools).toEqual(['mcp__ktx__load_skill']);
|
||||
expect(await options.canUseTool('mcp__ktx__load_skill', {}, { signal: new AbortController().signal, toolUseID: '1' })).toEqual({
|
||||
behavior: 'allow',
|
||||
toolUseID: '1',
|
||||
});
|
||||
expect(await options.canUseTool('Task', {}, { signal: new AbortController().signal, toolUseID: '2' })).toMatchObject({
|
||||
behavior: 'deny',
|
||||
toolUseID: '2',
|
||||
});
|
||||
expect(await options.canUseTool('Skill', {}, { signal: new AbortController().signal, toolUseID: '3' })).toMatchObject({
|
||||
behavior: 'deny',
|
||||
toolUseID: '3',
|
||||
});
|
||||
});
|
||||
|
||||
it('still rejects unexpected tools, missing KTX tools, plugins, and non-KTX MCP servers from init messages', async () => {
|
||||
const query = vi.fn((_input: any) =>
|
||||
stream([
|
||||
initMessage({
|
||||
tools: ['Bash'],
|
||||
mcp_servers: [{ name: 'filesystem', status: 'connected' }],
|
||||
plugins: [{ name: 'host-plugin', path: '/tmp/plugin' }],
|
||||
}),
|
||||
resultMessage({ result: 'hello' }),
|
||||
]),
|
||||
);
|
||||
const runtime = new ClaudeCodeKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'sonnet' },
|
||||
query,
|
||||
env: {},
|
||||
});
|
||||
|
||||
await expect(
|
||||
runtime.generateText({
|
||||
role: 'default',
|
||||
prompt: 'say hello',
|
||||
tools: {
|
||||
load_skill: {
|
||||
name: 'load_skill',
|
||||
description: 'Load skill.',
|
||||
inputSchema: z.object({ name: z.string() }),
|
||||
execute: async () => ({ markdown: 'loaded' }),
|
||||
},
|
||||
},
|
||||
}),
|
||||
).rejects.toThrow(
|
||||
/Claude Code runtime isolation failed: .*tools=Bash.*missing_tools=mcp__ktx__load_skill.*mcp_servers=filesystem.*plugins=host-plugin/,
|
||||
);
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the runtime test to verify it fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
|
||||
```
|
||||
|
||||
Expected: FAIL. The first new test fails because `runClaudeCodeAuthProbe(...)`
|
||||
returns `{ ok: false, ... }` and `generateText(...)` rejects when init metadata
|
||||
contains non-empty `slash_commands`, `skills`, or `agents`. The second new test
|
||||
fails because `runAgentLoop(...)` returns `{ stopReason: 'error', ... }` for the
|
||||
same reason.
|
||||
|
||||
- [ ] **Step 3: Commit the failing regression test**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/llm/claude-code-runtime.test.ts
|
||||
git commit -m "test: cover claude-code host discovery metadata"
|
||||
```
|
||||
|
||||
Expected: the commit contains tests that fail before the runtime assertion is
|
||||
fixed.
|
||||
|
||||
### Task 3: Relax init metadata assertions to the controlled execution surface
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/context/src/llm/claude-code-runtime.ts`
|
||||
|
||||
- [ ] **Step 1: Replace `assertInitIsolation`**
|
||||
|
||||
In `packages/context/src/llm/claude-code-runtime.ts`, replace the full
|
||||
`assertInitIsolation(...)` function with:
|
||||
|
||||
```ts
|
||||
function assertInitIsolation(
|
||||
message: SDKMessage,
|
||||
allowedToolIds: Set<string>,
|
||||
expectedMcpServerNames: Set<string>,
|
||||
): void {
|
||||
if (message.type !== 'system' || message.subtype !== 'init') {
|
||||
return;
|
||||
}
|
||||
const activeToolIds = new Set(message.tools);
|
||||
const unexpectedTools = message.tools.filter((toolName) => !allowedToolIds.has(toolName));
|
||||
const missingTools = [...allowedToolIds].filter((toolName) => !activeToolIds.has(toolName));
|
||||
const activeMcpServerNames = message.mcp_servers.map((server) => server.name);
|
||||
const unexpectedMcpServers = activeMcpServerNames.filter((name) => !expectedMcpServerNames.has(name));
|
||||
const missingMcpServers = [...expectedMcpServerNames].filter((name) => !activeMcpServerNames.includes(name));
|
||||
const unexpectedPlugins = message.plugins.map((plugin) => plugin.name);
|
||||
if (
|
||||
unexpectedTools.length > 0 ||
|
||||
missingTools.length > 0 ||
|
||||
unexpectedMcpServers.length > 0 ||
|
||||
missingMcpServers.length > 0 ||
|
||||
unexpectedPlugins.length > 0
|
||||
) {
|
||||
throw new Error(
|
||||
`Claude Code runtime isolation failed: tools=${unexpectedTools.join(',') || '(none)'} missing_tools=${
|
||||
missingTools.join(',') || '(none)'
|
||||
} mcp_servers=${unexpectedMcpServers.join(',') || '(none)'} missing_mcp_servers=${
|
||||
missingMcpServers.join(',') || '(none)'
|
||||
} plugins=${unexpectedPlugins.join(',') || '(none)'} host_slash_commands=${
|
||||
message.slash_commands.length
|
||||
} host_skills=${message.skills.length} host_agents=${message.agents?.join(',') || '(none)'}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This preserves strict checks for the KTX-controlled execution surface:
|
||||
|
||||
- `message.tools` must exactly equal the generated KTX MCP tool ids for the
|
||||
current call.
|
||||
- `message.mcp_servers` must exactly equal the expected KTX MCP server names.
|
||||
- `message.plugins` must be empty.
|
||||
|
||||
It deliberately stops treating `message.slash_commands`, `message.skills`, and
|
||||
`message.agents` as fatal because those fields can contain host-discovered
|
||||
metadata that KTX cannot disable through the pinned SDK options.
|
||||
|
||||
- [ ] **Step 2: Run the runtime test to verify it passes**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 3: Commit the runtime fix**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/llm/claude-code-runtime.ts packages/context/src/llm/claude-code-runtime.test.ts
|
||||
git commit -m "fix: tolerate claude-code host discovery metadata"
|
||||
```
|
||||
|
||||
Expected: the auth probe and runtime no longer fail solely because the SDK init
|
||||
message reports host-discovered slash commands, skills, or agents.
|
||||
|
||||
### Task 4: Correct user-facing docs wording
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `docs-site/content/docs/guides/llm-configuration.mdx`
|
||||
- Modify: `docs-site/content/docs/guides/building-context.mdx`
|
||||
|
||||
- [ ] **Step 1: Update the LLM configuration guide wording**
|
||||
|
||||
In `docs-site/content/docs/guides/llm-configuration.mdx`, replace lines `39-41`
|
||||
with:
|
||||
|
||||
```mdx
|
||||
`claude-code` keeps KTX tool boundaries intact. KTX exposes only the MCP tools
|
||||
needed for the current KTX agent loop, disables Claude Code built-in tools,
|
||||
keeps plugins empty, and denies every non-KTX tool request through
|
||||
`canUseTool`. The Claude Agent SDK may still report host-discovered slash
|
||||
commands, skills, and subagent names in init metadata; that metadata is not an
|
||||
execution grant for KTX agent loops.
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Update the building context guide wording**
|
||||
|
||||
In `docs-site/content/docs/guides/building-context.mdx`, replace lines `61-63`
|
||||
with:
|
||||
|
||||
```mdx
|
||||
When you use `claude-code`, KTX still controls the tool surface for ingest and
|
||||
memory capture. Claude Code built-in tools, discovered MCP servers, plugins,
|
||||
skills, agents, and slash commands are not invokable by KTX agent loops unless
|
||||
they are exact KTX MCP tools for the current run.
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run docs tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter ktx-docs run test
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 4: Commit docs wording**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add docs-site/content/docs/guides/llm-configuration.mdx docs-site/content/docs/guides/building-context.mdx
|
||||
git commit -m "docs: clarify claude-code host discovery metadata"
|
||||
```
|
||||
|
||||
Expected: user docs describe invocation control rather than promising zero
|
||||
host-discovery metadata.
|
||||
|
||||
### Task 5: Final verification
|
||||
|
||||
**Files:**
|
||||
|
||||
- Verify: `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md`
|
||||
- Verify: `packages/context/src/llm/claude-code-runtime.ts`
|
||||
- Verify: `packages/context/src/llm/claude-code-runtime.test.ts`
|
||||
- Verify: `docs-site/content/docs/guides/llm-configuration.mdx`
|
||||
- Verify: `docs-site/content/docs/guides/building-context.mdx`
|
||||
|
||||
- [ ] **Step 1: Run targeted runtime tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts src/llm/runtime-tools.test.ts src/llm/claude-code-env.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 2: Run package type-check**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run type-check
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 3: Run docs verification**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter ktx-docs run test
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 4: Run dead-code checks**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm run dead-code
|
||||
```
|
||||
|
||||
Expected: PASS or only pre-existing unrelated findings. Investigate and fix any
|
||||
finding caused by the runtime assertion or test changes.
|
||||
|
||||
- [ ] **Step 5: Inspect git status**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git status --short
|
||||
```
|
||||
|
||||
Expected: only files from this plan are modified, or the working tree is clean
|
||||
if each task was committed.
|
||||
|
||||
## Self-review
|
||||
|
||||
- Spec coverage: This plan addresses the v1-blocking auth probe failure,
|
||||
aligns the spec with the SDK contract, preserves the real KTX execution
|
||||
boundary, and adds regression coverage for non-empty host-discovered
|
||||
`slash_commands`, `skills`, and `agents` in both auth probe and runtime paths.
|
||||
- Placeholder scan: No placeholder markers remain. Every code-changing step
|
||||
includes exact file paths, code blocks, commands, and expected results.
|
||||
- Type consistency: The plan uses existing names from the codebase:
|
||||
`ClaudeCodeKtxLlmRuntime`, `runClaudeCodeAuthProbe`, `initMessage`,
|
||||
`resultMessage`, `assertInitIsolation`, `mcpToolIds`, `KtxRuntimeToolSet`, and
|
||||
`canUseTool`.
|
||||
|
|
@ -0,0 +1,160 @@
|
|||
# Claude Code Backend V1 Ingest Guidance Closure Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Make the `ktx ingest` missing-LLM guidance treat `claude-code` as a first-class setup path and restore the CLI ingest test suite.
|
||||
|
||||
**Architecture:** Keep the existing Claude Code runtime implementation unchanged. Update the single local-ingest guard message so users see both the local Claude Code setup path and the Anthropic API setup path, then align the context and CLI tests with that user-facing copy.
|
||||
|
||||
**Tech Stack:** TypeScript, pnpm, Vitest.
|
||||
|
||||
---
|
||||
|
||||
## Audit summary
|
||||
|
||||
The May 15 Claude Code backend runtime and isolation plans are implemented for
|
||||
the core runtime path: config accepts `claude-code`, runtime calls use
|
||||
`KtxLlmRuntimePort`, Claude SDK calls pass isolation options and scrubbed env,
|
||||
setup/status/doctor validate Claude Code auth, and docs describe the backend.
|
||||
|
||||
One v1-blocking issue remains: `packages/context/src/ingest/local-bundle-runtime.ts`
|
||||
lists `claude-code` in the missing-LLM guard line but still tells users only to
|
||||
"Configure an Anthropic provider." The full CLI ingest test suite currently
|
||||
fails because `packages/cli/src/ingest.test.ts` still expects the old provider
|
||||
list without `claude-code`. This is v1-blocking because CI is red and the
|
||||
fallback guidance is not first-class for the new backend.
|
||||
|
||||
Non-blocking gaps from the original spec remain unchanged:
|
||||
|
||||
- Same-step AI SDK tool-call repair parity is out of scope for the Claude Code
|
||||
runtime.
|
||||
- OTEL telemetry parity is out of scope for the Claude Code runtime.
|
||||
- Embedding parity is out of scope because embeddings stay independently
|
||||
configured.
|
||||
- Full prompt-caching parity for tools, history, and per-section TTLs is out of
|
||||
scope; v1 only needs no AI SDK cache markers on `claude-code` and explicit
|
||||
warnings for ignored fields.
|
||||
|
||||
## File structure
|
||||
|
||||
Modify these files:
|
||||
|
||||
- `packages/context/src/ingest/local-bundle-runtime.ts` owns the missing-LLM
|
||||
guard message used by local ingest and MCP-triggered ingest.
|
||||
- `packages/context/src/ingest/local-bundle-runtime.test.ts` verifies the guard
|
||||
message at the context boundary.
|
||||
- `packages/cli/src/ingest.test.ts` verifies the user-facing CLI output.
|
||||
|
||||
No `docs-site/` update is required because the existing public docs already
|
||||
document `claude-code` setup and ingest behavior; this plan only fixes an
|
||||
inline runtime error message.
|
||||
|
||||
### Task 1: Update ingest LLM setup guidance
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
|
||||
- Modify: `packages/cli/src/ingest.test.ts`
|
||||
- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
|
||||
|
||||
- [ ] **Step 1: Update the context guard-message test**
|
||||
|
||||
In `packages/context/src/ingest/local-bundle-runtime.test.ts`, replace the
|
||||
expected message in `requires an agent runner or configured local ingest LLM`
|
||||
with this exact array:
|
||||
|
||||
```ts
|
||||
[
|
||||
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
|
||||
'Configure a local Claude Code session or API-backed LLM, then rerun ingest:',
|
||||
` ktx setup --project-dir ${project.projectDir} --llm-backend claude-code --no-input`,
|
||||
` ktx setup --project-dir ${project.projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --anthropic-model claude-sonnet-4-6 --no-input`,
|
||||
].join('\n')
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Update the CLI ingest test**
|
||||
|
||||
In `packages/cli/src/ingest.test.ts`, replace the stale provider-list
|
||||
assertion in `prints provider setup guidance when a skip-llm setup project runs
|
||||
ingest` with:
|
||||
|
||||
```ts
|
||||
expect(runIo.stderr()).toContain(
|
||||
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
|
||||
);
|
||||
expect(runIo.stderr()).toContain('Configure a local Claude Code session or API-backed LLM, then rerun ingest:');
|
||||
expect(runIo.stderr()).toContain(`ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`);
|
||||
expect(runIo.stderr()).toContain(
|
||||
`ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --anthropic-model claude-sonnet-4-6 --no-input`,
|
||||
);
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run tests to verify the new expectations fail**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts
|
||||
pnpm --filter @ktx/cli exec vitest run src/ingest.test.ts
|
||||
```
|
||||
|
||||
Expected: both suites fail because the source message still says
|
||||
`Configure an Anthropic provider, then rerun ingest:` and does not include the
|
||||
Claude Code setup command.
|
||||
|
||||
- [ ] **Step 4: Update the ingest guard message**
|
||||
|
||||
In `packages/context/src/ingest/local-bundle-runtime.ts`, replace
|
||||
`localIngestLlmProviderGuardMessage` with:
|
||||
|
||||
```ts
|
||||
function localIngestLlmProviderGuardMessage(projectDir: string): string {
|
||||
return [
|
||||
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, or claude-code, or an injected agentRunner.',
|
||||
'Configure a local Claude Code session or API-backed LLM, then rerun ingest:',
|
||||
` ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`,
|
||||
` ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --anthropic-model claude-sonnet-4-6 --no-input`,
|
||||
].join('\n');
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run the targeted tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts
|
||||
pnpm --filter @ktx/cli exec vitest run src/ingest.test.ts
|
||||
```
|
||||
|
||||
Expected: both suites pass.
|
||||
|
||||
- [ ] **Step 6: Run package type-checks**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run type-check
|
||||
pnpm --filter @ktx/cli run type-check
|
||||
```
|
||||
|
||||
Expected: both commands pass.
|
||||
|
||||
- [ ] **Step 7: Commit**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/ingest/local-bundle-runtime.ts packages/context/src/ingest/local-bundle-runtime.test.ts packages/cli/src/ingest.test.ts
|
||||
git commit -m "fix: update claude-code ingest setup guidance"
|
||||
```
|
||||
|
||||
## Self-review
|
||||
|
||||
- Spec coverage: This plan closes the only remaining v1-blocking audit finding:
|
||||
ingest setup guidance and CLI test expectations now include `claude-code` as
|
||||
a first-class backend.
|
||||
- Placeholder scan: No placeholders remain; every step includes exact paths,
|
||||
code, commands, and expected output.
|
||||
- Type consistency: The exact guard string is identical across the source and
|
||||
both test updates.
|
||||
|
|
@ -0,0 +1,575 @@
|
|||
# Claude Code Backend V1 Isolation Closure Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Close the remaining v1-blocking Claude Code backend gaps around SDK
|
||||
init isolation assertions and setup-time prompt-caching warnings.
|
||||
|
||||
**Architecture:** Keep the existing runtime port and Claude Code runtime. Add
|
||||
the missing init-message checks inside the Claude runtime, then share the
|
||||
prompt-caching warning formatter between status/doctor and setup so all
|
||||
user-facing readiness flows report ignored Claude Code cache knobs consistently.
|
||||
|
||||
**Tech Stack:** TypeScript, pnpm, Vitest, Zod, `@anthropic-ai/claude-agent-sdk@0.3.142`.
|
||||
|
||||
---
|
||||
|
||||
## Audit Summary
|
||||
|
||||
The May 15 Claude Code backend v1 plan is mostly implemented. Remaining
|
||||
v1-blocking gaps from the original spec are:
|
||||
|
||||
- `packages/context/src/llm/claude-code-runtime.ts` asserts init-message tools,
|
||||
slash commands, skills, and plugins, but does not assert `agents` or
|
||||
unexpected `mcp_servers`. The spec requires asserting that settings-derived
|
||||
commands, skills, agents, plugins, and MCP servers are inactive.
|
||||
- `packages/cli/src/setup-models.ts` validates Claude Code auth but does not
|
||||
surface ignored `llm.promptCaching` fields during setup. The spec requires
|
||||
setup, status, and doctor to surface ignored prompt-caching fields for the
|
||||
`claude-code` backend. Status and doctor already warn.
|
||||
|
||||
Non-blocking gaps:
|
||||
|
||||
- Same-step tool-call repair parity remains out of scope for v1.
|
||||
- OTEL telemetry parity remains out of scope for v1.
|
||||
- Embedding parity remains out of scope because embeddings are configured
|
||||
independently.
|
||||
- Full prompt-caching parity for tools, history, and per-section TTLs remains
|
||||
out of scope; v1 only needs explicit warnings and no AI SDK cache markers on
|
||||
the Claude Code path.
|
||||
|
||||
## File Structure
|
||||
|
||||
Modify these files:
|
||||
|
||||
- `packages/context/src/llm/claude-code-runtime.ts` adds complete init-message
|
||||
isolation checks for agents and MCP servers.
|
||||
- `packages/context/src/llm/claude-code-runtime.test.ts` adds regression tests
|
||||
for rejected agents/MCP servers, object/agent env scrubbing, and callback
|
||||
error handling.
|
||||
- `packages/cli/src/claude-code-prompt-caching.ts` is created as the shared
|
||||
formatter for ignored prompt-caching fields.
|
||||
- `packages/cli/src/status-project.ts` imports the shared formatter instead of
|
||||
keeping a local helper.
|
||||
- `packages/cli/src/setup-models.ts` emits the shared warning when setup saves
|
||||
`llm.provider.backend: claude-code` and existing prompt-caching fields are
|
||||
present.
|
||||
- `packages/cli/src/setup-models.test.ts` covers setup warning output.
|
||||
- `packages/cli/src/doctor.test.ts` keeps coverage for doctor output using the
|
||||
shared formatter.
|
||||
|
||||
### Task 1: Complete Claude Code init isolation checks
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `packages/context/src/llm/claude-code-runtime.test.ts`
|
||||
- Modify: `packages/context/src/llm/claude-code-runtime.ts`
|
||||
|
||||
- [ ] **Step 1: Add failing isolation and runtime behavior tests**
|
||||
|
||||
Add these tests inside `describe('ClaudeCodeKtxLlmRuntime', ...)` in
|
||||
`packages/context/src/llm/claude-code-runtime.test.ts`:
|
||||
|
||||
```ts
|
||||
it('rejects settings-derived agents and non-KTX MCP servers from init messages', async () => {
|
||||
const query = vi.fn((_input: any) =>
|
||||
stream([
|
||||
initMessage({
|
||||
agents: ['project-agent'],
|
||||
mcp_servers: [{ name: 'filesystem', status: 'connected' }],
|
||||
}),
|
||||
resultMessage({ result: 'hello' }),
|
||||
]),
|
||||
);
|
||||
const runtime = new ClaudeCodeKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'sonnet' },
|
||||
query,
|
||||
env: {},
|
||||
});
|
||||
|
||||
await expect(runtime.generateText({ role: 'default', prompt: 'say hello' })).rejects.toThrow(
|
||||
/Claude Code runtime isolation failed: .*mcp_servers=filesystem.*agents=project-agent/,
|
||||
);
|
||||
});
|
||||
|
||||
it('passes scrubbed env to object generation and agent loops', async () => {
|
||||
const schema = z.object({ answer: z.string() });
|
||||
const objectQuery = vi.fn((_input: any) =>
|
||||
stream([initMessage(), resultMessage({ structured_output: { answer: 'yes' } })]),
|
||||
);
|
||||
const objectRuntime = new ClaudeCodeKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'sonnet' },
|
||||
query: objectQuery,
|
||||
env: { ANTHROPIC_API_KEY: 'sk-ant-test', AWS_PROFILE: 'prod', PATH: '/usr/bin' }, // pragma: allowlist secret
|
||||
});
|
||||
|
||||
await expect(objectRuntime.generateObject({ role: 'default', prompt: 'json', schema })).resolves.toEqual({
|
||||
answer: 'yes',
|
||||
});
|
||||
expect(objectQuery.mock.calls[0][0].options.env).toEqual(
|
||||
expect.objectContaining({ PATH: '/usr/bin' }),
|
||||
);
|
||||
expect(objectQuery.mock.calls[0][0].options.env).not.toEqual(
|
||||
expect.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test', AWS_PROFILE: 'prod' }), // pragma: allowlist secret
|
||||
);
|
||||
|
||||
const agentQuery = vi.fn((_input: any) =>
|
||||
stream([
|
||||
initMessage({ tools: ['mcp__ktx__load_skill'], mcp_servers: [{ name: 'ktx', status: 'connected' }] }),
|
||||
{
|
||||
type: 'assistant',
|
||||
message: { role: 'assistant', content: [] },
|
||||
parent_tool_use_id: null,
|
||||
uuid: '00000000-0000-4000-8000-000000000004',
|
||||
session_id: 'session-id',
|
||||
} as unknown as SDKMessage,
|
||||
resultMessage({ subtype: 'error_max_turns', is_error: true }),
|
||||
]),
|
||||
);
|
||||
const agentRuntime = new ClaudeCodeKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'sonnet' },
|
||||
query: agentQuery,
|
||||
env: { ANTHROPIC_AUTH_TOKEN: 'token', CLAUDE_CODE_USE_VERTEX: '1', HOME: '/Users/test' },
|
||||
});
|
||||
|
||||
await agentRuntime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
toolSet: {
|
||||
load_skill: {
|
||||
name: 'load_skill',
|
||||
description: 'Load skill.',
|
||||
inputSchema: z.object({ name: z.string() }),
|
||||
execute: async () => ({ markdown: 'loaded' }),
|
||||
},
|
||||
},
|
||||
stepBudget: 1,
|
||||
telemetryTags: { operationName: 'test' },
|
||||
});
|
||||
expect(agentQuery.mock.calls[0][0].options.env).toEqual(expect.objectContaining({ HOME: '/Users/test' }));
|
||||
expect(agentQuery.mock.calls[0][0].options.env).not.toEqual(
|
||||
expect.objectContaining({ ANTHROPIC_AUTH_TOKEN: 'token', CLAUDE_CODE_USE_VERTEX: '1' }),
|
||||
);
|
||||
});
|
||||
|
||||
it('logs and ignores onStepFinish callback errors', async () => {
|
||||
const query = vi.fn((_input: any) =>
|
||||
stream([
|
||||
initMessage(),
|
||||
{
|
||||
type: 'assistant',
|
||||
message: { role: 'assistant', content: [] },
|
||||
parent_tool_use_id: null,
|
||||
uuid: '00000000-0000-4000-8000-000000000005',
|
||||
session_id: 'session-id',
|
||||
} as unknown as SDKMessage,
|
||||
resultMessage({ subtype: 'success', terminal_reason: 'completed' }),
|
||||
]),
|
||||
);
|
||||
const logger = {
|
||||
debug: vi.fn(),
|
||||
log: vi.fn(),
|
||||
warn: vi.fn(),
|
||||
error: vi.fn(),
|
||||
};
|
||||
const runtime = new ClaudeCodeKtxLlmRuntime({
|
||||
projectDir: '/tmp/project',
|
||||
modelSlots: { default: 'sonnet' },
|
||||
query,
|
||||
env: {},
|
||||
logger,
|
||||
});
|
||||
|
||||
await expect(
|
||||
runtime.runAgentLoop({
|
||||
modelRole: 'default',
|
||||
systemPrompt: 'system',
|
||||
userPrompt: 'user',
|
||||
toolSet: {},
|
||||
stepBudget: 1,
|
||||
telemetryTags: { operationName: 'test' },
|
||||
onStepFinish: async () => {
|
||||
throw new Error('callback exploded');
|
||||
},
|
||||
}),
|
||||
).resolves.toEqual({ stopReason: 'natural' });
|
||||
expect(logger.warn).toHaveBeenCalledWith(expect.stringContaining('callback exploded'));
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the Claude runtime test to verify it fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
|
||||
```
|
||||
|
||||
Expected: FAIL because the new agents/MCP-server isolation test resolves
|
||||
successfully instead of throwing.
|
||||
|
||||
- [ ] **Step 3: Add expected MCP server metadata and complete init assertions**
|
||||
|
||||
In `packages/context/src/llm/claude-code-runtime.ts`, replace
|
||||
`assertInitIsolation` and add the helper below it:
|
||||
|
||||
```ts
|
||||
function assertInitIsolation(
|
||||
message: SDKMessage,
|
||||
allowedToolIds: Set<string>,
|
||||
expectedMcpServerNames: Set<string>,
|
||||
): void {
|
||||
if (message.type !== 'system' || message.subtype !== 'init') {
|
||||
return;
|
||||
}
|
||||
const unexpectedTools = message.tools.filter((toolName) => !allowedToolIds.has(toolName));
|
||||
const activeMcpServerNames = message.mcp_servers.map((server) => server.name);
|
||||
const unexpectedMcpServers = activeMcpServerNames.filter((name) => !expectedMcpServerNames.has(name));
|
||||
const missingMcpServers = [...expectedMcpServerNames].filter((name) => !activeMcpServerNames.includes(name));
|
||||
const unexpectedAgents = message.agents ?? [];
|
||||
if (
|
||||
unexpectedTools.length > 0 ||
|
||||
unexpectedMcpServers.length > 0 ||
|
||||
missingMcpServers.length > 0 ||
|
||||
message.slash_commands.length > 0 ||
|
||||
message.skills.length > 0 ||
|
||||
message.plugins.length > 0 ||
|
||||
unexpectedAgents.length > 0
|
||||
) {
|
||||
throw new Error(
|
||||
`Claude Code runtime isolation failed: tools=${unexpectedTools.join(',') || '(none)'} mcp_servers=${
|
||||
unexpectedMcpServers.join(',') || '(none)'
|
||||
} missing_mcp_servers=${missingMcpServers.join(',') || '(none)'} slash_commands=${
|
||||
message.slash_commands.length
|
||||
} skills=${message.skills.length} plugins=${message.plugins.length} agents=${
|
||||
unexpectedAgents.join(',') || '(none)'
|
||||
}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
function expectedMcpServerNames(tools: KtxRuntimeToolSet | undefined): Set<string> {
|
||||
return tools && Object.keys(tools).length > 0 ? new Set(['ktx']) : new Set();
|
||||
}
|
||||
```
|
||||
|
||||
Update `collectResult` parameters:
|
||||
|
||||
```ts
|
||||
async function collectResult(params: {
|
||||
query: QueryFn;
|
||||
prompt: string;
|
||||
options: Options;
|
||||
allowedToolIds: Set<string>;
|
||||
expectedMcpServerNames: Set<string>;
|
||||
onAssistantTurn?: () => Promise<void>;
|
||||
}): Promise<SDKResultMessage> {
|
||||
let result: SDKResultMessage | undefined;
|
||||
for await (const message of params.query({ prompt: params.prompt, options: params.options })) {
|
||||
assertInitIsolation(message, params.allowedToolIds, params.expectedMcpServerNames);
|
||||
```
|
||||
|
||||
Update the four `collectResult(...)` calls:
|
||||
|
||||
```ts
|
||||
const tools = input.tools ?? {};
|
||||
const result = await collectResult({
|
||||
query: this.runQuery,
|
||||
prompt: [input.system, input.prompt].filter(Boolean).join('\n\n'),
|
||||
options,
|
||||
allowedToolIds: new Set(mcpToolIds(tools)),
|
||||
expectedMcpServerNames: expectedMcpServerNames(input.tools),
|
||||
});
|
||||
```
|
||||
|
||||
For `runAgentLoop(...)`, use:
|
||||
|
||||
```ts
|
||||
const result = await collectResult({
|
||||
query: this.runQuery,
|
||||
prompt: params.userPrompt,
|
||||
options: { ...options, systemPrompt: params.systemPrompt },
|
||||
allowedToolIds: new Set(mcpToolIds(params.toolSet)),
|
||||
expectedMcpServerNames: expectedMcpServerNames(params.toolSet),
|
||||
onAssistantTurn: async () => {
|
||||
```
|
||||
|
||||
For `runClaudeCodeAuthProbe(...)`, use:
|
||||
|
||||
```ts
|
||||
const result = await collectResult({
|
||||
query: input.query ?? defaultQuery,
|
||||
prompt: 'Reply with exactly: ok',
|
||||
options,
|
||||
allowedToolIds: new Set(),
|
||||
expectedMcpServerNames: new Set(),
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the Claude runtime test to verify it passes**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/llm/claude-code-runtime.ts packages/context/src/llm/claude-code-runtime.test.ts
|
||||
git commit -m "fix: close claude-code runtime isolation checks"
|
||||
```
|
||||
|
||||
### Task 2: Surface Claude Code prompt-caching warnings during setup
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `packages/cli/src/claude-code-prompt-caching.ts`
|
||||
- Modify: `packages/cli/src/status-project.ts`
|
||||
- Modify: `packages/cli/src/setup-models.ts`
|
||||
- Modify: `packages/cli/src/setup-models.test.ts`
|
||||
- Modify: `packages/cli/src/doctor.test.ts`
|
||||
|
||||
- [ ] **Step 1: Add failing setup warning test**
|
||||
|
||||
Add this test to `packages/cli/src/setup-models.test.ts`:
|
||||
|
||||
```ts
|
||||
it('warns during Claude Code setup when existing prompt-caching fields will be ignored', async () => {
|
||||
await writeFile(
|
||||
join(tempDir, 'ktx.yaml'),
|
||||
[
|
||||
'llm:',
|
||||
' provider:',
|
||||
' backend: anthropic',
|
||||
' models:',
|
||||
' default: claude-sonnet-4-6',
|
||||
' promptCaching:',
|
||||
' enabled: true',
|
||||
' systemTtl: 1h',
|
||||
' toolsTtl: 1h',
|
||||
' historyTtl: 5m',
|
||||
'',
|
||||
].join('\n'),
|
||||
'utf-8',
|
||||
);
|
||||
const io = makeIo();
|
||||
|
||||
const result = await runKtxSetupAnthropicModelStep(
|
||||
{
|
||||
projectDir: tempDir,
|
||||
inputMode: 'disabled',
|
||||
llmBackend: 'claude-code',
|
||||
skipLlm: false,
|
||||
},
|
||||
io.io,
|
||||
{
|
||||
claudeCodeAuthProbe: async () => ({ ok: true as const }),
|
||||
},
|
||||
);
|
||||
|
||||
expect(result.status).toBe('ready');
|
||||
expect(io.stderr()).toContain('claude-code ignores llm.promptCaching.systemTtl');
|
||||
expect(io.stderr()).toContain('Claude Agent SDK does not expose KTX prompt-cache TTL, tool, or history markers');
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run setup tests to verify the new test fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/cli exec vitest run src/setup-models.test.ts
|
||||
```
|
||||
|
||||
Expected: FAIL because setup does not emit the ignored prompt-caching warning.
|
||||
|
||||
- [ ] **Step 3: Create the shared prompt-caching warning helper**
|
||||
|
||||
Create `packages/cli/src/claude-code-prompt-caching.ts`:
|
||||
|
||||
```ts
|
||||
import type { KtxProjectLlmConfig } from '@ktx/context/project';
|
||||
|
||||
const CLAUDE_CODE_IGNORED_PROMPT_CACHING_FIELDS = [
|
||||
'systemTtl',
|
||||
'toolsTtl',
|
||||
'historyTtl',
|
||||
'vertexFallbackTo5m',
|
||||
] as const;
|
||||
|
||||
export function ignoredClaudeCodePromptCachingFields(config: KtxProjectLlmConfig): string[] {
|
||||
if (config.provider.backend !== 'claude-code' || !config.promptCaching) {
|
||||
return [];
|
||||
}
|
||||
return CLAUDE_CODE_IGNORED_PROMPT_CACHING_FIELDS.filter((key) => key in config.promptCaching).map(
|
||||
(key) => `llm.promptCaching.${key}`,
|
||||
);
|
||||
}
|
||||
|
||||
export function formatClaudeCodePromptCachingWarning(fields: string[]): string | null {
|
||||
if (fields.length === 0) {
|
||||
return null;
|
||||
}
|
||||
return `claude-code ignores ${fields.join(', ')} because the Claude Agent SDK does not expose KTX prompt-cache TTL, tool, or history markers.`;
|
||||
}
|
||||
|
||||
export function formatClaudeCodePromptCachingFix(): string {
|
||||
return 'Remove those promptCaching fields or use anthropic, vertex, or gateway when those cache knobs are required.';
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Update status/doctor to use the shared helper**
|
||||
|
||||
In `packages/cli/src/status-project.ts`, add:
|
||||
|
||||
```ts
|
||||
import {
|
||||
formatClaudeCodePromptCachingFix,
|
||||
formatClaudeCodePromptCachingWarning,
|
||||
ignoredClaudeCodePromptCachingFields,
|
||||
} from './claude-code-prompt-caching.js';
|
||||
```
|
||||
|
||||
Delete the local `ignoredClaudeCodePromptCachingFields(...)` function.
|
||||
|
||||
Replace the warning block in `buildWarnings(...)` with:
|
||||
|
||||
```ts
|
||||
const warning = formatClaudeCodePromptCachingWarning(ignoredClaudeCodePromptCachingFields(config.llm));
|
||||
if (warning) {
|
||||
warnings.push({
|
||||
message: warning,
|
||||
fix: formatClaudeCodePromptCachingFix(),
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Emit the setup warning before persisting Claude Code config**
|
||||
|
||||
In `packages/cli/src/setup-models.ts`, add:
|
||||
|
||||
```ts
|
||||
import {
|
||||
formatClaudeCodePromptCachingWarning,
|
||||
ignoredClaudeCodePromptCachingFields,
|
||||
} from './claude-code-prompt-caching.js';
|
||||
```
|
||||
|
||||
Inside the `backendChoice.backend === 'claude-code'` branch, immediately before
|
||||
`await persistLlmConfig(...)`, add:
|
||||
|
||||
```ts
|
||||
const warning = formatClaudeCodePromptCachingWarning(
|
||||
ignoredClaudeCodePromptCachingFields(buildProjectLlmConfig(project.config.llm, { backend: 'claude-code' }, model)),
|
||||
);
|
||||
if (warning) {
|
||||
io.stderr.write(`${warning}\n`);
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 6: Run CLI tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/cli exec vitest run src/setup-models.test.ts src/doctor.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 7: Commit**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/cli/src/claude-code-prompt-caching.ts packages/cli/src/status-project.ts packages/cli/src/setup-models.ts packages/cli/src/setup-models.test.ts packages/cli/src/doctor.test.ts
|
||||
git commit -m "fix: warn on claude-code prompt caching during setup"
|
||||
```
|
||||
|
||||
### Task 3: Final verification
|
||||
|
||||
**Files:**
|
||||
|
||||
- Verify: `packages/context/src/llm/claude-code-runtime.ts`
|
||||
- Verify: `packages/cli/src/setup-models.ts`
|
||||
- Verify: `packages/cli/src/status-project.ts`
|
||||
|
||||
- [ ] **Step 1: Run targeted tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts src/llm/runtime-tools.test.ts src/llm/claude-code-env.test.ts src/llm/claude-code-models.test.ts src/llm/runtime-local-config.test.ts
|
||||
pnpm --filter @ktx/cli exec vitest run src/setup-models.test.ts src/doctor.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 2: Run package type-checks**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run type-check
|
||||
pnpm --filter @ktx/cli run type-check
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 3: Run the LLM boundary audit**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
rg -n "generateKtxText\\(|generateKtxObject\\(|new AgentRunnerService\\(|AgentRunnerService\\b|llmProvider\\b|getModel\\(|getModelByName\\(" packages/context/src packages/cli/src packages/llm/src --glob '!**/*.test.ts'
|
||||
```
|
||||
|
||||
Expected: remaining matches are limited to:
|
||||
|
||||
- `packages/llm/src/**`
|
||||
- `packages/context/src/llm/ai-sdk-runtime.ts`
|
||||
- `packages/context/src/llm/local-config.ts`
|
||||
- `packages/context/src/agent/agent-runner.service.ts`
|
||||
- type/export declarations that intentionally preserve the AI SDK adapter
|
||||
boundary.
|
||||
|
||||
- [ ] **Step 4: Run dead-code check**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm run dead-code
|
||||
```
|
||||
|
||||
Expected: PASS or only pre-existing unrelated findings. Investigate and fix
|
||||
any finding caused by the new helper file.
|
||||
|
||||
- [ ] **Step 5: Commit verification cleanup if needed**
|
||||
|
||||
If verification required small cleanup, run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/llm/claude-code-runtime.ts packages/context/src/llm/claude-code-runtime.test.ts packages/cli/src/claude-code-prompt-caching.ts packages/cli/src/status-project.ts packages/cli/src/setup-models.ts packages/cli/src/setup-models.test.ts packages/cli/src/doctor.test.ts
|
||||
git commit -m "chore: verify claude-code v1 closure"
|
||||
```
|
||||
|
||||
If no files changed after verification, skip this commit.
|
||||
|
||||
## Self-Review
|
||||
|
||||
- Spec coverage: The plan closes the remaining v1-blocking isolation assertion
|
||||
and setup-warning requirements from the original spec.
|
||||
- Placeholder scan: No placeholders remain; every task includes file paths,
|
||||
code, commands, and expected output.
|
||||
- Type consistency: The helper names and runtime function signatures are used
|
||||
consistently across tasks.
|
||||
2483
docs/superpowers/plans/2026-05-15-claude-code-backend-v1-runtime.md
Normal file
2483
docs/superpowers/plans/2026-05-15-claude-code-backend-v1-runtime.md
Normal file
File diff suppressed because it is too large
Load diff
698
docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
Normal file
698
docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
Normal file
|
|
@ -0,0 +1,698 @@
|
|||
# Brainstorm: `claude-code` backend with full KTX LLM parity
|
||||
|
||||
Adds a `claude-code` backend that gives KTX full parity with the existing
|
||||
`ANTHROPIC_API_KEY`-based `anthropic` backend for **all KTX LLM calls**. The
|
||||
backend uses `@anthropic-ai/claude-agent-sdk` and reuses the user's existing
|
||||
local Claude Code authentication. Users select it in `ktx.yaml`.
|
||||
|
||||
This is not an implementation plan. It is the revised design after expanding
|
||||
the requirement from "`ktx ingest` works with Claude Code" to "every KTX LLM
|
||||
call works with Claude Code." The follow-up implementation plan should be
|
||||
written separately.
|
||||
|
||||
## Core decision
|
||||
|
||||
`claude-code` is a first-class global LLM backend. Any code path that currently
|
||||
works with `llm.provider.backend: anthropic` must work with
|
||||
`llm.provider.backend: claude-code`, unless it is not an LLM call at all.
|
||||
|
||||
This includes:
|
||||
|
||||
- Agent loops implemented through `AgentRunnerService.runLoop(...)`.
|
||||
- Text generation through `generateKtxText(...)`.
|
||||
- Structured object generation through `generateKtxObject(...)`.
|
||||
- Local ingest and MCP-triggered local ingest flows.
|
||||
- Page triage and light extraction.
|
||||
- Context-candidate curation and reconciliation.
|
||||
- Memory capture.
|
||||
- Scan/enrichment internals and relationship LLM proposals.
|
||||
- Future KTX LLM call sites that use the shared runtime boundary.
|
||||
|
||||
Commands that do not use LLMs do not need special Claude Code behavior. There
|
||||
must be no silent fallback from `claude-code` to gateway, Anthropic API-key
|
||||
execution, or deterministic output.
|
||||
|
||||
## Goals
|
||||
|
||||
- Let a KTX user run all KTX LLM-backed behavior through their existing local
|
||||
Claude Code session without provisioning `ANTHROPIC_API_KEY`, Vertex
|
||||
credentials, or an AI Gateway key.
|
||||
- Preserve the existing user-facing CLI and MCP behavior. `claude-code` changes
|
||||
how LLM calls execute, not which KTX workflows exist.
|
||||
- Preserve role-based model selection. `llm.models.default`, `triage`,
|
||||
`candidateExtraction`, `curator`, `reconcile`, and `repair` remain the source
|
||||
of model selection for every LLM call.
|
||||
- Preserve KTX's curated tool boundaries. Claude Code built-ins,
|
||||
filesystem-discovered MCP servers, hooks, skills, plugins, agents, and slash
|
||||
commands must not become invokable in KTX agent loops. The Agent SDK init
|
||||
message may still report host-discovered slash commands, skills, and agents;
|
||||
KTX treats that metadata as diagnostic only and restricts execution through
|
||||
`tools: []`, exact KTX MCP `allowedTools`, `disallowedTools`, and
|
||||
deny-by-default `canUseTool`.
|
||||
- Keep embeddings independent. Claude does not provide embeddings; users keep
|
||||
configuring `ingest.embeddings` and scan/enrichment embeddings as they do
|
||||
today.
|
||||
- Fail fast with a clear message if local Claude Code authentication is not
|
||||
usable.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Embedding parity.** Embeddings remain separate from LLM execution.
|
||||
- **Tool-call repair parity in the first pass.** The AI SDK runner uses
|
||||
`experimental_repairToolCall` (`packages/llm/src/repair.ts:35-88`). The Claude
|
||||
Agent SDK has no transparent same-step repair hook. MVP behavior is next-turn
|
||||
self-correction from schema errors or a normal tool-failure count.
|
||||
- **OTEL telemetry parity in the first pass.** The AI SDK runner uses
|
||||
`experimental_telemetry`. The Agent SDK exposes hooks such as
|
||||
`PostToolUseFailure` and `SessionEnd`, but no drop-in OTEL switch. MVP ships
|
||||
without telemetry parity on this backend.
|
||||
- **Productizing Claude subscription limits.** Documentation must frame this as
|
||||
"use your own local Claude Code session," not as a third-party Claude Max or
|
||||
Claude.ai product feature.
|
||||
|
||||
## Approaches considered
|
||||
|
||||
### Recommended: global LLM runtime port
|
||||
|
||||
Introduce a backend-neutral KTX LLM runtime port for operations, not just model
|
||||
construction:
|
||||
|
||||
```ts
|
||||
interface KtxLlmRuntimePort {
|
||||
generateText(input: KtxGenerateTextInput): Promise<string>;
|
||||
generateObject<T>(input: KtxGenerateObjectInput<T>): Promise<T>;
|
||||
runAgentLoop(params: RunLoopParams): Promise<RunLoopResult>;
|
||||
}
|
||||
```
|
||||
|
||||
The existing `anthropic`, `vertex`, and `gateway` backends implement the runtime
|
||||
through the AI SDK and existing `KtxLlmProvider`. The new `claude-code` backend
|
||||
implements the same runtime through `@anthropic-ai/claude-agent-sdk`.
|
||||
|
||||
This is the recommended approach because KTX call sites need operations:
|
||||
"generate text," "generate a structured object," and "run an agent loop." They
|
||||
do not inherently need direct access to an AI SDK `LanguageModel`. The Agent SDK
|
||||
is a session/agent API, not an AI SDK model factory, so the runtime port avoids
|
||||
pretending those APIs are the same.
|
||||
|
||||
### Rejected: fake AI SDK `LanguageModel` for Claude Code
|
||||
|
||||
Trying to make Claude Code look like an AI SDK `LanguageModel` would be brittle.
|
||||
The Agent SDK owns session execution, permissions, MCP tools, structured output,
|
||||
and result messages. Those semantics do not map cleanly onto a normal
|
||||
`getModel(...)` return value.
|
||||
|
||||
### Rejected: branch at every call site
|
||||
|
||||
Adding `if backend === "claude-code"` around each LLM call would work briefly
|
||||
but would duplicate prompt wrapping, structured output handling, debug logging,
|
||||
tool conversion, auth checks, and error mapping. It would also make future LLM
|
||||
call sites easy to miss.
|
||||
|
||||
## Architecture
|
||||
|
||||
```text
|
||||
ktx.yaml
|
||||
llm.provider.backend: anthropic | vertex | gateway | claude-code
|
||||
llm.models.<role>: model alias or model ID
|
||||
|
||||
createLocalKtxLlmRuntimeFromConfig(project.config.llm)
|
||||
-> AiSdkKtxLlmRuntime
|
||||
- wraps existing KtxLlmProvider
|
||||
- generateText / Output.object / AgentRunnerService
|
||||
-> ClaudeCodeKtxLlmRuntime
|
||||
- uses @anthropic-ai/claude-agent-sdk query()
|
||||
- implements text, object, and agent-loop operations
|
||||
|
||||
All KTX LLM call sites
|
||||
-> KtxLlmRuntimePort
|
||||
```
|
||||
|
||||
The runtime is selected at the same boundaries that currently construct an
|
||||
`llmProvider` or `AgentRunnerService`:
|
||||
|
||||
- `packages/context/src/llm/local-config.ts`
|
||||
- `packages/context/src/ingest/local-bundle-runtime.ts`
|
||||
- `packages/context/src/memory/local-memory.ts`
|
||||
- `packages/context/src/scan/local-scan.ts`
|
||||
- `packages/context/src/mcp/local-project-ports.ts`
|
||||
- Any CLI setup/status/doctor code that validates LLM readiness
|
||||
|
||||
After the change, services should not need to know whether the configured
|
||||
backend is AI SDK based or Claude Code based. They call the runtime operation
|
||||
they need.
|
||||
|
||||
## LLM call-site migration
|
||||
|
||||
The implementation plan must migrate every current KTX LLM call site to the
|
||||
runtime port:
|
||||
|
||||
- `packages/context/src/llm/generation.ts`: `generateKtxText` and
|
||||
`generateKtxObject` become runtime-backed helpers or are folded into the
|
||||
runtime.
|
||||
- `packages/context/src/agent/agent-runner.service.ts`: the AI SDK agent loop
|
||||
becomes the AI SDK implementation of `runAgentLoop`.
|
||||
- `packages/context/src/ingest/page-triage/page-triage.service.ts`: page triage
|
||||
and light extraction depend on `KtxLlmRuntimePort`, not raw `KtxLlmProvider`.
|
||||
- `packages/context/src/scan/description-generation.ts`: AI descriptions use
|
||||
the runtime text-generation operation.
|
||||
- `packages/context/src/scan/relationship-llm-proposal.ts`: relationship
|
||||
proposals use the runtime object-generation operation.
|
||||
- `packages/context/src/ingest/stages/stage-3-work-units.ts`,
|
||||
`packages/context/src/ingest/stages/stage-4-reconciliation.ts`,
|
||||
`packages/context/src/ingest/context-candidates/curator-pagination.service.ts`,
|
||||
and `packages/context/src/memory/memory-agent.service.ts`: agent loops use the
|
||||
runtime agent-loop operation or a thin `AgentRunnerPort` backed by it.
|
||||
- Test helpers and MCP local project ports that inject `llmProvider` or
|
||||
`agentRunner` must either inject the runtime port or use compatibility test
|
||||
adapters during the migration.
|
||||
|
||||
The plan must include a grep-based audit so new or overlooked `getModel(...)`,
|
||||
`generateKtxText(...)`, `generateKtxObject(...)`, `AgentRunnerService`, and
|
||||
`llmProvider` usages are either migrated or explicitly proven non-runtime.
|
||||
|
||||
## Config design
|
||||
|
||||
The config should make `claude-code` a first-class backend:
|
||||
|
||||
```yaml
|
||||
llm:
|
||||
provider:
|
||||
backend: claude-code
|
||||
models:
|
||||
default: sonnet
|
||||
triage: haiku
|
||||
candidateExtraction: sonnet
|
||||
curator: sonnet
|
||||
reconcile: sonnet
|
||||
repair: sonnet
|
||||
```
|
||||
|
||||
Implementation implications:
|
||||
|
||||
- Extend `KTX_LLM_BACKENDS` in `packages/context/src/project/config.ts` and
|
||||
`KtxLlmBackend` in `packages/llm/src/types.ts`.
|
||||
- Update setup, status, doctor, schema generation, examples, and docs so
|
||||
`claude-code` is understood everywhere `anthropic` is understood.
|
||||
- Update `createKtxLlmProvider` / `createModelFactory` so unsupported backend
|
||||
values throw instead of falling through to gateway.
|
||||
- Keep `llm.models` as the per-role binding source. The Claude Code runtime maps
|
||||
each KTX role to the configured model string for the current call.
|
||||
- Define accepted model aliases, such as `sonnet`, `opus`, and `haiku`, and full
|
||||
model IDs supported by the pinned SDK version.
|
||||
|
||||
## Claude Agent SDK runtime behavior
|
||||
|
||||
Every Agent SDK call must be isolated enough for KTX execution. Use explicit
|
||||
options even when SDK defaults currently match the desired value.
|
||||
|
||||
For agent loops with tools:
|
||||
|
||||
```ts
|
||||
query({
|
||||
prompt,
|
||||
options: {
|
||||
cwd: project.projectDir,
|
||||
systemPrompt,
|
||||
model: resolveModel(modelRole),
|
||||
maxTurns: stepBudget,
|
||||
settingSources: [],
|
||||
skills: [],
|
||||
plugins: [],
|
||||
mcpServers: { ktx: createSdkMcpServer({ name: "ktx", tools }) },
|
||||
tools: [],
|
||||
allowedTools: [/* exact mcp__ktx__<toolName> ids generated from the tool map */],
|
||||
canUseTool: ktxCanUseTool,
|
||||
permissionMode: "dontAsk",
|
||||
persistSession: false,
|
||||
env: ktxClaudeCodeEnv
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
`ktxClaudeCodeEnv` is the controlled environment described in
|
||||
"Agent SDK environment and auth boundary" below; it must be passed on every
|
||||
KTX `query()` call.
|
||||
|
||||
For plain text generation:
|
||||
|
||||
- Use the same `query()` runtime with `maxTurns: 1`.
|
||||
- Pass `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`,
|
||||
`permissionMode: "dontAsk"`, `persistSession: false`, and
|
||||
`env: ktxClaudeCodeEnv`.
|
||||
- Do not expose MCP tools unless the KTX call explicitly passed tools.
|
||||
- Return the final result message text.
|
||||
|
||||
For structured object generation:
|
||||
|
||||
- Use the same `query()` runtime with the Agent SDK structured output option
|
||||
for JSON schema output, plus the same isolation tuple including
|
||||
`env: ktxClaudeCodeEnv`.
|
||||
- Convert KTX Zod schemas at the runtime boundary.
|
||||
- Parse and validate the returned object with the original KTX schema before
|
||||
returning it to the caller.
|
||||
|
||||
The plan must confirm the exact option names against the pinned SDK version, but
|
||||
the required outcome is fixed:
|
||||
|
||||
- Filesystem settings are not loaded. The SDK's documented default for an
|
||||
omitted `settingSources` is `["user", "project", "local"]`
|
||||
(`@anthropic-ai/claude-agent-sdk@0.3.142` `sdk.d.ts:1686-1695`),
|
||||
which would inherit the user's Claude Code filesystem settings. Every KTX
|
||||
`query()` call site - agent loops, text generation, object generation, and
|
||||
the auth probe - MUST pass `settingSources: []` explicitly, along with
|
||||
`skills: []`, `plugins: []`, `tools: []`, `persistSession: false`, and no
|
||||
`mcpServers` entries other than the KTX MCP server (omitted entirely when
|
||||
the call site does not expose tools). The implementation MUST assert from
|
||||
the SDK init message that the controlled execution surface matches KTX's
|
||||
expectations:
|
||||
|
||||
- `message.tools` equals the exact generated KTX MCP tool ids for the current
|
||||
call.
|
||||
- `message.mcp_servers` equals the expected KTX MCP server set: `[]` when the
|
||||
call exposes no tools, or `["ktx"]` when it does.
|
||||
- `message.plugins` is empty.
|
||||
|
||||
The implementation MUST NOT reject a run solely because
|
||||
`message.slash_commands`, `message.skills`, or `message.agents` contain
|
||||
host-discovered names. In `@anthropic-ai/claude-agent-sdk@0.3.142`, those
|
||||
fields can report host discovery even when KTX passes the isolation options.
|
||||
They are not part of the KTX execution surface when `tools: []`,
|
||||
`allowedTools`, `disallowedTools`, and deny-by-default `canUseTool` are set.
|
||||
- `skills: []` is a context filter in the pinned SDK
|
||||
(`sdk.d.ts:1697-1718`): unlisted skills are hidden from the model's skill
|
||||
listing and rejected by the Skill tool, but discovered skill names may still
|
||||
appear in init metadata. KTX must still pass `skills: []`.
|
||||
- Plugins are disabled with `plugins: []`, and the runtime asserts that
|
||||
`message.plugins` is empty in the init message.
|
||||
- Built-in tools are disabled by setting `tools: []`. The pinned SDK type
|
||||
(`@anthropic-ai/claude-agent-sdk@0.3.142`, `sdk.d.ts`) documents `tools` as
|
||||
the base set of built-in tools, with `[]` meaning "disable all built-ins";
|
||||
`tools` does not accept MCP tool ids and cannot be used to restrict MCP
|
||||
availability.
|
||||
- MCP tool availability is granted by registering the KTX MCP server through
|
||||
`mcpServers`. The SDK does not document a wildcard like `mcp__ktx__*` for
|
||||
any tool field; KTX must enumerate exact generated MCP tool ids of the form
|
||||
`mcp__ktx__<toolName>` (derived from the tool map handed to
|
||||
`createSdkMcpServer`) wherever a list of tool ids is required.
|
||||
- Pre-approval under `permissionMode: "dontAsk"` is configured by listing those
|
||||
same exact `mcp__ktx__<toolName>` ids in `allowedTools` (documented as
|
||||
auto-allow without prompting). Treat `allowedTools` as auto-approval, not
|
||||
restriction.
|
||||
- Defense-in-depth restriction uses `canUseTool`. The KTX runtime supplies a
|
||||
`canUseTool` handler that allows only tool names in the current KTX MCP tool
|
||||
map and denies everything else, so host-discovered slash commands, skills,
|
||||
agents, future SDK defaults, or a misconfigured MCP server cannot expand the
|
||||
execution surface.
|
||||
- `disallowedTools` MUST additionally list the current built-in tool names
|
||||
(`Agent`, `Task`, `AskUserQuestion`, `Bash`, `Read`, `Edit`, `Write`, `Glob`,
|
||||
`Grep`, `WebFetch`, `WebSearch`, `TodoWrite`) as redundant insurance.
|
||||
- `cwd` is `project.projectDir`, resolved at startup via `resolveKtxProjectDir`,
|
||||
not `process.cwd()`.
|
||||
- Sessions are not persisted unless the plan identifies a concrete debugging
|
||||
feature that needs persistence.
|
||||
|
||||
## Agent SDK environment and auth boundary
|
||||
|
||||
The Agent SDK's `query()` option `env` (`@anthropic-ai/claude-agent-sdk@0.3.142`
|
||||
`sdk.d.ts:1265-1279`) is the environment passed to the Claude Code child
|
||||
process and defaults to `process.env`. Without an explicit `env`, the SDK
|
||||
inherits the parent's environment, including any `ANTHROPIC_API_KEY`,
|
||||
`ANTHROPIC_AUTH_TOKEN`, `ANTHROPIC_BASE_URL`, gateway/AI-Gateway tokens,
|
||||
`GOOGLE_APPLICATION_CREDENTIALS` / `CLOUD_ML_REGION` (Vertex), and
|
||||
`AWS_*` (Bedrock) credentials — any of which can switch the Claude Code CLI's
|
||||
authentication source to API-key or another provider, bypassing the user's
|
||||
local Claude Code session. That would silently violate the core requirement
|
||||
that `claude-code` runs through the user's existing local Claude Code session
|
||||
and that there is no silent fallback to gateway, Anthropic API-key, or other
|
||||
provider execution.
|
||||
|
||||
Every `claude-code` `query()` call site - agent loops, text generation,
|
||||
object generation, and the auth probe - MUST pass an explicit `env`
|
||||
(`ktxClaudeCodeEnv`) constructed from `process.env` with the following
|
||||
denylist removed:
|
||||
|
||||
- `ANTHROPIC_API_KEY`
|
||||
- `ANTHROPIC_AUTH_TOKEN`
|
||||
- `ANTHROPIC_BASE_URL`
|
||||
- `ANTHROPIC_MODEL` (provider-routing override)
|
||||
- `ANTHROPIC_VERTEX_PROJECT_ID`, `CLOUD_ML_REGION`,
|
||||
`GOOGLE_APPLICATION_CREDENTIALS`, `GOOGLE_CLOUD_PROJECT`
|
||||
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`,
|
||||
`AWS_REGION`, `AWS_PROFILE`
|
||||
- `CLAUDE_CODE_USE_BEDROCK`, `CLAUDE_CODE_USE_VERTEX`
|
||||
- Any future provider-routing variables the pinned SDK version documents
|
||||
|
||||
The denylist is the source of truth and lives next to the runtime constructor
|
||||
so adding a variable is a single-file change.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- The constructed `ktxClaudeCodeEnv` does not contain any denylisted key, and
|
||||
this is verified by a unit test that seeds each denylisted key in a fake
|
||||
`process.env`.
|
||||
- The auth probe fails with the same "authenticate Claude Code locally"
|
||||
message even when `ANTHROPIC_API_KEY` (or any other denylisted credential)
|
||||
is present in `process.env` and no valid local Claude Code session exists.
|
||||
- Every KTX-originated `query()` invocation is spied to assert that `env`
|
||||
was passed and that it does not contain any denylisted key; the test fails
|
||||
if any code path falls back to the SDK default `process.env`.
|
||||
- The "no silent fallback" rule is preserved end-to-end: a machine with
|
||||
`ANTHROPIC_API_KEY` set but no local Claude Code authentication still fails
|
||||
setup/status/doctor on `claude-code`.
|
||||
|
||||
## Tool boundary
|
||||
|
||||
Agent-loop tools cannot remain only raw AI SDK `Record<string, Tool>` values if
|
||||
two backends must consume them. The plan must define a backend-neutral tool
|
||||
descriptor for the final tool map handed to an agent loop:
|
||||
|
||||
```ts
|
||||
interface KtxRuntimeToolDescriptor<TInput, TOutput> {
|
||||
name: string;
|
||||
description: string;
|
||||
inputSchema: z.ZodObject<z.ZodRawShape>;
|
||||
execute(input: TInput): Promise<KtxRuntimeToolOutput<TOutput>>;
|
||||
}
|
||||
|
||||
interface KtxRuntimeToolOutput<TOutput> {
|
||||
// What the model sees as the tool_result content. Always a markdown string;
|
||||
// never a raw JS object. This matches BaseTool's existing
|
||||
// `toModelOutput` contract (`packages/context/src/tools/base-tool.ts:154-162`)
|
||||
// which sends only markdown to the LLM.
|
||||
markdown: string;
|
||||
// Out-of-band payload preserved for tool callers (transcripts, debug,
|
||||
// verification ledger, downstream KTX consumers). Not sent to the model.
|
||||
structured?: TOutput;
|
||||
}
|
||||
```
|
||||
|
||||
Every composed tool entry must produce this descriptor shape, including:
|
||||
|
||||
- `BaseTool` outputs from factory toolsets, which already return
|
||||
`{ markdown, structured }`.
|
||||
- Source-specific raw tools such as `emit_historic_sql_evidence` in
|
||||
`packages/context/src/ingest/local-bundle-runtime.ts`.
|
||||
- Stage-local tools in `buildWuToolSet` and `buildReconcileToolSet`.
|
||||
- Inline `load_skill`, read/raw/span, stage/diff, eviction, and emit tools in
|
||||
`packages/context/src/ingest/ingest-bundle.runner.ts`.
|
||||
- Memory-agent `load_skill` in
|
||||
`packages/context/src/memory/memory-agent.service.ts`.
|
||||
- The `withVerificationLedger` wrapping layer, whose markdown/structured
|
||||
guard outputs (`packages/context/src/ingest/tools/verification-ledger.tool.ts:40-97`)
|
||||
already match the contract.
|
||||
|
||||
### Tool output contract
|
||||
|
||||
The runtime defines a single output contract for both backends so the model
|
||||
sees the same content regardless of provider:
|
||||
|
||||
- **Model-visible content**: the `markdown` field, mapped to the Agent SDK
|
||||
tool handler return as `{ content: [{ type: "text", text: markdown }] }` for
|
||||
`claude-code`, and surfaced through the existing `toModelOutput` markdown
|
||||
path for AI SDK backends. The model never sees raw JS objects.
|
||||
- **Structured payload**: the optional `structured` field, preserved on the
|
||||
in-process tool-result envelope for transcript/debug capture, the
|
||||
verification ledger, and any KTX caller that introspects results. The
|
||||
Claude adapter does not put structured JSON into model-visible content
|
||||
unless an individual call site explicitly opts in.
|
||||
- **Normalization of existing raw tools**: tools that today return a bare
|
||||
string (e.g. `load_skill` "Skill not available" responses in
|
||||
`packages/context/src/ingest/ingest-bundle.runner.ts:697-721` and
|
||||
`:924-936`, and `packages/context/src/memory/memory-agent.service.ts:128-152`)
|
||||
must be wrapped at the descriptor boundary so `markdown` is the string and
|
||||
`structured` is omitted. Tools that today return a plain object (e.g.
|
||||
skill payload `{ name, content, skillDirectory }`) must be wrapped so
|
||||
`markdown` is a deterministic human-readable rendering (e.g. the skill
|
||||
body with a header) and the original object is preserved on `structured`.
|
||||
No KTX tool may return a raw object as the model-visible payload on the
|
||||
Claude Code backend, because the Agent SDK MCP handler will otherwise
|
||||
stringify it and drop the structured fields.
|
||||
- **AI SDK parity**: the AI SDK adapter MUST preserve BaseTool's existing
|
||||
`toModelOutput` markdown-only behavior. Migrating BaseTool-derived tools
|
||||
to the descriptor must not start sending structured JSON to the model.
|
||||
|
||||
The AI SDK adapter converts descriptors to `tool(...)` with a `toModelOutput`
|
||||
that emits `markdown` only. The Claude Code adapter converts descriptors to
|
||||
Agent SDK `tool(name, description, schema.shape, handler)` entries inside
|
||||
`createSdkMcpServer(...)` and returns `{ content: [{ type: "text", text:
|
||||
markdown }] }`.
|
||||
|
||||
Non-object schemas are unsupported for `claude-code` and must be rejected at
|
||||
startup with a clear error. In practice KTX tool inputs are already `z.object`.
|
||||
|
||||
## Stop reasons and failures
|
||||
|
||||
The Claude runner maps the SDK's typed `SDKResultMessage` (union of
|
||||
`SDKResultSuccess` and `SDKResultError` in
|
||||
`@anthropic-ai/claude-agent-sdk@0.3.142`, `sdk.d.ts`) to
|
||||
`RunLoopStopReason = "budget" | "natural" | "error"`. The mapping must consider
|
||||
three typed signals in this precedence order, because each successive signal
|
||||
may be present where the previous one is absent:
|
||||
|
||||
1. `subtype`: `"error_max_turns"` -> `"budget"`; `"success"` -> `"natural"`;
|
||||
other error subtypes (`"error_during_execution"`,
|
||||
`"error_max_budget_usd"`, `"error_max_structured_output_retries"`) ->
|
||||
`"error"`.
|
||||
2. `terminal_reason` (optional `TerminalReason` field on both success and
|
||||
error results): `"max_turns"` -> `"budget"`; `"completed"` -> `"natural"`;
|
||||
any other terminal reason such as `"blocking_limit"`,
|
||||
`"rapid_refill_breaker"`, `"prompt_too_long"`, `"image_error"`,
|
||||
`"model_error"`, `"aborted_streaming"`, `"aborted_tools"`,
|
||||
`"stop_hook_prevented"`, `"hook_stopped"`, or `"tool_deferred"` ->
|
||||
`"error"`.
|
||||
3. The assistant message `stop_reason`: `"max_turns"` -> `"budget"`; any
|
||||
other non-null unsuccessful stop reason -> `"error"`.
|
||||
|
||||
A `max_turns` signal arriving through any of the three sources must map to
|
||||
`"budget"`; the runner MUST NOT classify a max-turn termination as
|
||||
`"natural"` or as a generic `"error"` because it was reported via
|
||||
`terminal_reason` instead of `subtype`.
|
||||
|
||||
`Stop` hooks are not the authoritative stop-reason source because they do not
|
||||
carry the terminal reason. They remain useful for lifecycle logging. Tool failure
|
||||
counting should use `PostToolUseFailure` and feed the same mechanism that
|
||||
`stage-3-work-units.ts` checks through `toolFailureCount?(wu.unitKey)`.
|
||||
|
||||
For text and object generation, SDK authentication, billing, rate-limit,
|
||||
permission, max-turn, structured-output, and execution errors must map to the
|
||||
same error surfaces that KTX uses for the Anthropic API-key backend.
|
||||
|
||||
## Agent-loop progress callbacks
|
||||
|
||||
`RunLoopParams.onStepFinish`
|
||||
(`packages/context/src/agent/agent-runner.service.ts:20`) is part of the
|
||||
current agent-loop contract. The AI SDK runner increments `stepIndex` on each
|
||||
`generateText` step and invokes the callback
|
||||
(`agent-runner.service.ts:83-97`). KTX consumers depend on this:
|
||||
`packages/context/src/ingest/ingest-bundle.runner.ts:782` emits
|
||||
`work_unit_step` events from it, and `:1036` / `:1089` update reconciliation
|
||||
progress for the user-visible "Reconciling results · step N" status.
|
||||
|
||||
The `claude-code` runner MUST preserve `onStepFinish` semantics:
|
||||
|
||||
- It MUST invoke `onStepFinish` exactly once per assistant turn (i.e. once per
|
||||
step the SDK reports), incrementing `stepIndex` starting at 1.
|
||||
- The plan MUST name the concrete SDK stream event used as the step boundary
|
||||
(the implementation plan picks one of the documented assistant/result
|
||||
message events from the pinned SDK version and justifies it). The chosen
|
||||
event must produce the same `stepIndex` count as the AI SDK runner for an
|
||||
equivalent run: N tool-using turns yield N callbacks.
|
||||
- Callback errors MUST be caught and logged at `warn` level without aborting
|
||||
the loop, matching `agent-runner.service.ts:90-96`.
|
||||
- `stepBudget` passed to the callback MUST equal the `maxTurns` configured on
|
||||
the SDK `query()` call.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- A `claude-code` agent loop run with `stepBudget: N` produces N
|
||||
`work_unit_step` events when the loop runs to budget.
|
||||
- A reconciliation run under `claude-code` produces the same
|
||||
`updateProgress` calls (count and `stepIndex / stepBudget` ratio) as the
|
||||
Anthropic API-key backend for an equivalent fixture.
|
||||
- An `onStepFinish` callback that throws does not surface the error as the
|
||||
loop result.
|
||||
|
||||
## Prompt caching parity
|
||||
|
||||
`packages/llm/src/types.ts:44, :61` exposes `llm.promptCaching` as a config
|
||||
field, and the AI SDK message builder
|
||||
(`packages/llm/src/message-builder.ts:62-114, :141-218`) applies
|
||||
`anthropic.cacheControl: { type: "ephemeral", ttl }` markers to the system
|
||||
message, the last history message, and sorted tools, with TTLs split into
|
||||
`systemTtl`, `toolsTtl`, and `historyTtl`. `model-provider.test.ts:276`
|
||||
verifies caching is enabled by default with those three TTLs.
|
||||
|
||||
The Agent SDK does not expose KTX's marker-based contract. The closest
|
||||
mechanism is `systemPrompt: string[]` with
|
||||
`SYSTEM_PROMPT_DYNAMIC_BOUNDARY` (`sdk.d.ts:1746-1799`), which marks a static
|
||||
prefix as cacheable but provides no per-tool, per-history, or per-TTL knobs.
|
||||
|
||||
For the `claude-code` backend, the spec treats `llm.promptCaching` as
|
||||
**partial parity**:
|
||||
|
||||
- The Claude runtime MAY map a non-empty static system prefix to a cacheable
|
||||
`systemPrompt` array using `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` when
|
||||
`cacheSystem` is enabled in the resolved `KtxPromptCachingConfig`. The
|
||||
implementation plan decides whether to ship this mapping in the first pass
|
||||
or defer it.
|
||||
- `cacheTools`, `cacheHistory`, and the `systemTtl` / `toolsTtl` /
|
||||
`historyTtl` fields have no Agent SDK equivalent. The runtime MUST NOT
|
||||
silently drop them: when a user sets non-default values under
|
||||
`llm.promptCaching` and the backend is `claude-code`, status/doctor and the
|
||||
setup wizard MUST surface that these fields are ignored on this backend.
|
||||
- Docs under `docs-site/content/docs/` MUST document this divergence in the
|
||||
same pages that describe `claude-code` setup, so users do not assume the
|
||||
TTL/tool/history knobs apply.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- A `claude-code` runtime constructed from a config with default
|
||||
`promptCaching` does not throw and does not pass KTX `cacheControl`
|
||||
markers to the Agent SDK (the AI-SDK-only markers stay on the AI SDK
|
||||
path).
|
||||
- A `claude-code` runtime constructed from a config with non-default
|
||||
`promptCaching` values yields a warning surfaced through doctor/status
|
||||
output identifying the ignored fields.
|
||||
|
||||
## Auth and setup
|
||||
|
||||
`ktx setup`, status, and doctor flows must validate that Claude Code SDK auth is
|
||||
usable, not just that `~/.claude/` exists. Acceptable validation strategies:
|
||||
|
||||
- A minimal SDK probe call with `settingSources: []`, `skills: []`,
|
||||
`plugins: []`, `tools: []`, `persistSession: false`, no `mcpServers`,
|
||||
`env: ktxClaudeCodeEnv`, and `maxTurns: 1`. The probe MUST NOT rely on
|
||||
the SDK's documented default for any of these fields, because the default
|
||||
for `settingSources` is `["user", "project", "local"]` (loads filesystem
|
||||
settings) and the default for `env` is `process.env` (can route auth
|
||||
through `ANTHROPIC_API_KEY` or other provider credentials and hide a
|
||||
missing local Claude Code session). See "Agent SDK environment and auth
|
||||
boundary" above for the `env` denylist.
|
||||
The auth probe MUST tolerate init messages with non-empty `slash_commands`,
|
||||
`skills`, and `agents` when `message.tools` is empty, `message.mcp_servers`
|
||||
is empty, `message.plugins` is empty, and the query options contain the KTX
|
||||
isolation tuple. Host discovery metadata is not an auth failure.
|
||||
- An SDK-provided account/auth status method if the pinned version exposes one.
|
||||
- A docs-endorsed file-presence check only if the official SDK docs explicitly
|
||||
state that it proves auth usability.
|
||||
|
||||
Failure copy should tell the user to authenticate Claude Code locally with the
|
||||
Claude Code CLI, then rerun setup or the command they attempted.
|
||||
|
||||
## Documentation impact
|
||||
|
||||
Docs updates are required because this changes user-visible setup and LLM
|
||||
provider behavior:
|
||||
|
||||
- `docs-site/content/docs/getting-started/quickstart.mdx`
|
||||
- `docs-site/content/docs/cli-reference/ktx-setup.mdx`
|
||||
- `docs-site/content/docs/guides/building-context.mdx`
|
||||
- Any config reference page that documents `llm.provider.backend`
|
||||
- Any status or doctor docs that describe LLM readiness
|
||||
|
||||
The docs must say that `claude-code` uses the user's own local Claude Code
|
||||
session. Do not describe it as a way for KTX to resell, pool, or productize
|
||||
Claude subscription limits.
|
||||
|
||||
## Verified evidence
|
||||
|
||||
- Current `KtxLlmProvider` returns AI SDK `LanguageModel` instances and only
|
||||
supports `anthropic`, `vertex`, and `gateway`
|
||||
(`packages/llm/src/types.ts`, `packages/llm/src/model-provider.ts`).
|
||||
- Project config currently accepts `llm.provider.backend: none | anthropic |
|
||||
vertex | gateway` (`packages/context/src/project/config.ts`).
|
||||
- `generateKtxText` and `generateKtxObject` are shared non-agent generation
|
||||
helpers (`packages/context/src/llm/generation.ts`).
|
||||
- `AgentRunnerService` is the shared AI SDK agent-loop implementation
|
||||
(`packages/context/src/agent/agent-runner.service.ts`).
|
||||
- Page triage and light extraction currently use raw `KtxLlmProvider`
|
||||
(`packages/context/src/ingest/page-triage/page-triage.service.ts`).
|
||||
- Scan/enrichment internals currently use `createLocalKtxLlmProviderFromConfig`,
|
||||
`generateKtxText`, and `generateKtxObject`
|
||||
(`packages/context/src/scan/local-scan.ts`,
|
||||
`packages/context/src/scan/description-generation.ts`,
|
||||
`packages/context/src/scan/relationship-llm-proposal.ts`).
|
||||
- Local ingest and MCP local project ports inject `llmProvider` and
|
||||
`agentRunner` today (`packages/context/src/ingest/local-bundle-runtime.ts`,
|
||||
`packages/context/src/mcp/local-project-ports.ts`).
|
||||
- The Agent SDK TypeScript reference (`@anthropic-ai/claude-agent-sdk@0.3.142`,
|
||||
`sdk.d.ts:1690-1697` and the `sdk.mjs` runtime default
|
||||
`["user","project","local"]`) documents `settingSources` **defaulting to
|
||||
loading user, project, and local filesystem settings** when omitted; passing
|
||||
`[]` is the explicit opt-out ("SDK isolation mode"). The same reference
|
||||
documents `allowedTools` as auto-approval rather than restriction,
|
||||
`canUseTool` as the programmatic permission handler,
|
||||
`permissionMode: "dontAsk"`, `tools` as the base built-in set with `[]`
|
||||
meaning "disable all built-ins" and no MCP-id support, `disallowedTools`,
|
||||
`maxTurns`, `mcpServers`, `cwd`, `persistSession`, and SDK result/hook
|
||||
message shapes.
|
||||
- `SDKResultMessage = SDKResultSuccess | SDKResultError` in
|
||||
`@anthropic-ai/claude-agent-sdk@0.3.142` (`sdk.d.ts`); both variants expose
|
||||
an optional `terminal_reason: TerminalReason`, where `TerminalReason`
|
||||
includes `'max_turns' | 'completed'` alongside other terminal reasons.
|
||||
- The Agent SDK MCP docs and SDK examples (e.g. Context7
|
||||
`/nothflare/claude-agent-sdk-docs` custom-tools guide) show registering MCP
|
||||
servers in `query()` options and listing exact `mcp__<server>__<tool>` ids
|
||||
in `allowedTools`; no SDK doc or type currently documents a wildcard form.
|
||||
- BaseTool's `toModelOutput` already sends only `markdown` to the model while
|
||||
preserving structured output for callers
|
||||
(`packages/context/src/tools/base-tool.ts:154-162`); some raw AI SDK tools
|
||||
in `packages/context/src/ingest/ingest-bundle.runner.ts:697-721, :924-936`
|
||||
and `packages/context/src/memory/memory-agent.service.ts:128-152` currently
|
||||
return bare strings or plain objects and must be normalized at the
|
||||
descriptor boundary so both backends preserve the contract.
|
||||
- The Agent SDK skills docs say the `skills` option is a context filter rather
|
||||
than a sandbox. KTX must pass `skills: []`, but must not assert that
|
||||
`message.skills` is empty in the SDK init message.
|
||||
- `Options.env` in `@anthropic-ai/claude-agent-sdk@0.3.142`
|
||||
(`sdk.d.ts:1265-1279`) is the environment passed to the Claude Code
|
||||
process and defaults to `process.env`. Without an explicit `env`, the SDK
|
||||
inherits the parent environment, including any provider-routing variables
|
||||
(`ANTHROPIC_API_KEY`, Vertex/Bedrock credentials, gateway tokens) that
|
||||
could change the active authentication source of the Claude Code CLI and
|
||||
hide a missing local Claude Code session.
|
||||
|
||||
## Open items for the implementation plan
|
||||
|
||||
1. Confirm exact TypeScript option names and result-message discriminants
|
||||
against the pinned `@anthropic-ai/claude-agent-sdk` version.
|
||||
2. Define the final `KtxLlmRuntimePort` file location and package exports.
|
||||
3. Define model alias validation for `sonnet`, `opus`, `haiku`, and full model
|
||||
IDs.
|
||||
4. Define the auth probe and make setup/status/doctor report actionable
|
||||
messages.
|
||||
5. Run a repo-wide audit for all LLM call sites and migrate each one to the
|
||||
runtime boundary.
|
||||
6. Write tests proving `claude-code` works for text generation, structured
|
||||
object generation, and agent-loop execution.
|
||||
7. Write tests proving page triage, scan/enrichment internals, memory capture,
|
||||
MCP-triggered local ingest, and normal local ingest all use the
|
||||
`claude-code` runtime when configured.
|
||||
8. Write tests proving a raw built-in Claude Code tool request is denied,
|
||||
host-discovered Skill/Agent/SlashCommand requests are denied by `canUseTool`,
|
||||
and only exact `mcp__ktx__*` tools are allowed during KTX agent loops.
|
||||
9. Write a test that asserts every KTX-originated `query()` invocation
|
||||
(agent loop, text generation, object generation, auth probe) is called
|
||||
with `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`, and
|
||||
`persistSession: false`, by spying on the SDK entry point. The test must
|
||||
fail if any path falls back to SDK defaults for those fields. The test must
|
||||
also prove that non-empty host-discovered `slash_commands`, `skills`, and
|
||||
`agents` in the init message do not fail the auth probe or runtime when the
|
||||
controlled tool, MCP server, and plugin surfaces match KTX expectations.
|
||||
10. Write a test that asserts `onStepFinish` is invoked the expected number
|
||||
of times for a fixed-budget `claude-code` agent loop, including the
|
||||
work-unit and reconciliation progress paths.
|
||||
11. Write a test that asserts every KTX-originated `query()` invocation
|
||||
(agent loop, text generation, object generation, auth probe) is called
|
||||
with an explicit `env` and that none of the denylisted provider-routing
|
||||
variables (`ANTHROPIC_API_KEY`, `ANTHROPIC_AUTH_TOKEN`,
|
||||
`ANTHROPIC_BASE_URL`, `ANTHROPIC_MODEL`, `ANTHROPIC_VERTEX_PROJECT_ID`,
|
||||
`CLOUD_ML_REGION`, `GOOGLE_APPLICATION_CREDENTIALS`,
|
||||
`GOOGLE_CLOUD_PROJECT`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`,
|
||||
`AWS_SESSION_TOKEN`, `AWS_REGION`, `AWS_PROFILE`,
|
||||
`CLAUDE_CODE_USE_BEDROCK`, `CLAUDE_CODE_USE_VERTEX`) are present in
|
||||
that env, by seeding each variable in a fake `process.env`. The test
|
||||
must also assert that the auth probe still fails when
|
||||
`ANTHROPIC_API_KEY` is set in `process.env` but no local Claude Code
|
||||
session exists.
|
||||
Loading…
Add table
Add a link
Reference in a new issue