mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-16 08:25:14 +02:00
* docs: revise claude-code ingest backend spec * docs: keep claude-code spec focused on ingest * docs: expand claude-code spec to full llm parity * Refine claude-code backend spec after adversarial review iteration 1 * Refine claude-code backend spec after adversarial review iteration 2 * Refine claude-code backend spec after adversarial review iteration 3 * feat: recognize claude-code llm backend * feat: add ktx llm runtime port * feat: add claude-code llm runtime * feat: route non-agent llm calls through runtime * feat: run ingest agents through llm runtime * feat: support claude-code setup and status * test: verify claude-code backend runtime * docs: add claude-code backend v1 runtime plan * fix: close claude-code runtime isolation checks * fix: warn on claude-code prompt caching during setup * chore: verify claude-code v1 closure * docs: add claude-code backend v1 isolation closure plan * fix: update claude-code ingest setup guidance * docs: add claude-code backend v1 ingest guidance closure plan * docs: align claude-code isolation spec with sdk metadata * test: cover claude-code host discovery metadata * fix: tolerate claude-code host discovery metadata * docs: clarify claude-code host discovery metadata * docs: add claude-code auth-probe isolation fix plan * chore: prepare kaelio ktx rc1 release * chore: add semantic release workflow * fix: unblock ci checks * chore(release): 0.1.0-rc.1 * feat: add Claude Code model selection to setup * fix: keep git maintenance attached in local repos
678 lines
26 KiB
Markdown
678 lines
26 KiB
Markdown
# Claude Code Auth Probe Isolation Fix Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Make the `claude-code` auth probe and runtime tolerate host-discovered
|
|
Claude Code init metadata while preserving KTX-owned tool, MCP, and plugin
|
|
restrictions.
|
|
|
|
**Architecture:** Keep the existing Claude Code runtime and SDK option tuple.
|
|
Change the init-message assertion from "no host discovery appears" to "only the
|
|
KTX-controlled execution surface is active." Align the design spec and user docs
|
|
with the pinned SDK behavior: `settingSources: []` disables filesystem settings,
|
|
`skills: []` is a context filter, and deny-by-default `canUseTool` is the
|
|
runtime enforcement boundary.
|
|
|
|
**Tech Stack:** TypeScript, pnpm, Vitest, Markdown, Fumadocs MDX,
|
|
`@anthropic-ai/claude-agent-sdk@0.3.142`.
|
|
|
|
---
|
|
|
|
## Audit result
|
|
|
|
The current strict isolation assertion is a v1-blocking bug. A real authenticated
|
|
Claude Code host can report non-empty `slash_commands`, `skills`, and `agents`
|
|
in the SDK init message even when KTX passes `settingSources: []`, `skills: []`,
|
|
`plugins: []`, `tools: []`, exact KTX MCP `allowedTools`, `disallowedTools`, and
|
|
deny-by-default `canUseTool`.
|
|
|
|
Spec findings:
|
|
|
|
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:45-47`
|
|
requires host-discovered capabilities not to expand the KTX agent-loop tool
|
|
surface. That requirement is about invocation, not necessarily about zero
|
|
diagnostic metadata in the init message.
|
|
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:254-265`
|
|
overreaches by asking the implementation to assert that unexpected
|
|
settings-derived commands, skills, agents, plugins, or MCP servers are
|
|
inactive from the SDK init message. In `@anthropic-ai/claude-agent-sdk@0.3.142`,
|
|
the available SDK controls cannot make `message.slash_commands`,
|
|
`message.skills`, or `message.agents` reliably empty on an authenticated host.
|
|
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:266-267`
|
|
says skills are disabled with `skills: []`. The pinned SDK type definitions
|
|
document `skills` as a context filter, not a sandbox.
|
|
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md:543-545`
|
|
correctly requires the auth probe to pass the isolation option tuple and no
|
|
MCP servers. It does not require failing when host discovery metadata is
|
|
present.
|
|
|
|
SDK evidence from
|
|
`node_modules/.pnpm/@anthropic-ai+claude-agent-sdk@0.3.142_zod@4.4.3/node_modules/@anthropic-ai/claude-agent-sdk/sdk.d.ts`:
|
|
|
|
- Lines `1686-1695`: `settingSources: []` disables filesystem settings only.
|
|
- Lines `1697-1718`: `skills: []` is a context filter; unlisted skills are
|
|
hidden from listing and rejected by the Skill tool, but files remain on disk.
|
|
- Lines `1202-1213`: `allowedTools` is auto-approval, while `canUseTool` is the
|
|
permission handler for controlling tool execution.
|
|
- Lines `1224-1228`: `disallowedTools` removes listed tools from context and
|
|
prevents use.
|
|
- Lines `1255-1264`: `tools: []` disables built-in tools.
|
|
- Lines `1545-1558`: `plugins` loads plugins when supplied; KTX supplies `[]`.
|
|
- Lines `3465-3489`: the init message reports `agents`, `tools`,
|
|
`mcp_servers`, `slash_commands`, `skills`, and `plugins`.
|
|
|
|
Implemented plan audit:
|
|
|
|
- `2026-05-15-claude-code-backend-v1-runtime.md` is implemented for config,
|
|
runtime port, SDK dependency, model aliases, environment scrubbing, Claude Code
|
|
text/object/agent execution, setup/status/doctor support, docs, and LLM
|
|
call-site migration.
|
|
- `2026-05-15-claude-code-backend-v1-isolation-closure.md` is implemented, but
|
|
it converted the spec's ambiguous "assert inactive" line into an impossible
|
|
assertion against non-empty `slash_commands`, `skills`, and `agents`.
|
|
- `2026-05-15-claude-code-backend-v1-ingest-guidance-closure.md` is implemented
|
|
for the ingest missing-LLM guidance and associated CLI/context tests.
|
|
|
|
Remaining v1-blocking gaps:
|
|
|
|
- `packages/context/src/llm/claude-code-runtime.ts:94-101` throws on
|
|
host-discovered slash commands, skills, and agents.
|
|
- `packages/context/src/llm/claude-code-runtime.test.ts:158-178` encodes the
|
|
wrong behavior by requiring the runtime to reject any init message with
|
|
discovered agents.
|
|
- The auth probe has no regression coverage for an authenticated host whose init
|
|
message reports non-empty `slash_commands`, `skills`, and `agents`.
|
|
- User docs under `docs-site/content/docs/guides/` say KTX "disables" skills,
|
|
agents, hooks, and slash commands. That wording is stronger than the SDK
|
|
contract and must be changed to "not invokable by KTX agent loops."
|
|
|
|
Non-blocking gaps:
|
|
|
|
- Same-step AI SDK tool-call repair parity remains out of scope for v1.
|
|
- OTEL telemetry parity remains out of scope for v1.
|
|
- Embedding parity remains out of scope because embeddings are configured
|
|
separately.
|
|
- Full prompt-caching parity remains out of scope. V1 keeps warning on ignored
|
|
prompt-cache fields and avoids AI SDK cache markers on the Claude Code path.
|
|
|
|
Decision:
|
|
|
|
- Choose option (a): relax the assertion in code and align the spec text. Do not
|
|
rely on an invented SDK mechanism. The pinned type definitions expose
|
|
`settingSources`, `skills`, `plugins`, `tools`, `allowedTools`,
|
|
`disallowedTools`, and `canUseTool`, but they do not expose a query option that
|
|
disables all host-discovered slash commands or user-level subagent names in the
|
|
init message.
|
|
|
|
## File structure
|
|
|
|
Modify these files:
|
|
|
|
- `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md` aligns the
|
|
design with the real SDK contract.
|
|
- `packages/context/src/llm/claude-code-runtime.test.ts` adds the failing
|
|
regression tests for auth probe and runtime init metadata.
|
|
- `packages/context/src/llm/claude-code-runtime.ts` relaxes init metadata checks
|
|
while tightening exact tool equality.
|
|
- `docs-site/content/docs/guides/llm-configuration.mdx` changes user docs from
|
|
"disabled" to "not invokable."
|
|
- `docs-site/content/docs/guides/building-context.mdx` applies the same
|
|
user-facing wording at the ingest guide boundary.
|
|
|
|
### Task 1: Align the design spec with SDK reality
|
|
|
|
**Files:**
|
|
|
|
- Modify: `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md`
|
|
|
|
- [ ] **Step 1: Update the tool-boundary goal**
|
|
|
|
Replace the goal bullet at lines `45-47` with:
|
|
|
|
```markdown
|
|
- Preserve KTX's curated tool boundaries. Claude Code built-ins,
|
|
filesystem-discovered MCP servers, hooks, skills, plugins, agents, and slash
|
|
commands must not become invokable in KTX agent loops. The Agent SDK init
|
|
message may still report host-discovered slash commands, skills, and agents;
|
|
KTX treats that metadata as diagnostic only and restricts execution through
|
|
`tools: []`, exact KTX MCP `allowedTools`, `disallowedTools`, and
|
|
deny-by-default `canUseTool`.
|
|
```
|
|
|
|
- [ ] **Step 2: Replace the over-broad init assertion requirement**
|
|
|
|
Replace the bullet at lines `254-265` with:
|
|
|
|
```markdown
|
|
- Filesystem settings are not loaded. The SDK's documented default for an
|
|
omitted `settingSources` is `["user", "project", "local"]`
|
|
(`@anthropic-ai/claude-agent-sdk@0.3.142` `sdk.d.ts:1686-1695`),
|
|
which would inherit the user's Claude Code filesystem settings. Every KTX
|
|
`query()` call site - agent loops, text generation, object generation, and
|
|
the auth probe - MUST pass `settingSources: []` explicitly, along with
|
|
`skills: []`, `plugins: []`, `tools: []`, `persistSession: false`, and no
|
|
`mcpServers` entries other than the KTX MCP server (omitted entirely when
|
|
the call site does not expose tools). The implementation MUST assert from
|
|
the SDK init message that the controlled execution surface matches KTX's
|
|
expectations:
|
|
|
|
- `message.tools` equals the exact generated KTX MCP tool ids for the current
|
|
call.
|
|
- `message.mcp_servers` equals the expected KTX MCP server set: `[]` when the
|
|
call exposes no tools, or `["ktx"]` when it does.
|
|
- `message.plugins` is empty.
|
|
|
|
The implementation MUST NOT reject a run solely because
|
|
`message.slash_commands`, `message.skills`, or `message.agents` contain
|
|
host-discovered names. In `@anthropic-ai/claude-agent-sdk@0.3.142`, those
|
|
fields can report host discovery even when KTX passes the isolation options.
|
|
They are not part of the KTX execution surface when `tools: []`,
|
|
`allowedTools`, `disallowedTools`, and deny-by-default `canUseTool` are set.
|
|
```
|
|
|
|
- [ ] **Step 3: Replace the skills/plugin wording**
|
|
|
|
Replace the bullets at lines `266-289` with:
|
|
|
|
```markdown
|
|
- `skills: []` is a context filter in the pinned SDK
|
|
(`sdk.d.ts:1697-1718`): unlisted skills are hidden from the model's skill
|
|
listing and rejected by the Skill tool, but discovered skill names may still
|
|
appear in init metadata. KTX must still pass `skills: []`.
|
|
- Plugins are disabled with `plugins: []`, and the runtime asserts that
|
|
`message.plugins` is empty in the init message.
|
|
- Built-in tools are disabled by setting `tools: []`. The pinned SDK type
|
|
(`@anthropic-ai/claude-agent-sdk@0.3.142`, `sdk.d.ts:1255-1264`) documents
|
|
`tools` as the base set of built-in tools, with `[]` meaning "disable all
|
|
built-ins"; `tools` does not accept MCP tool ids and cannot be used to
|
|
restrict MCP availability.
|
|
- MCP tool availability is granted by registering the KTX MCP server through
|
|
`mcpServers`. The SDK does not document a wildcard like `mcp__ktx__*` for
|
|
any tool field; KTX must enumerate exact generated MCP tool ids of the form
|
|
`mcp__ktx__<toolName>` (derived from the tool map handed to
|
|
`createSdkMcpServer`) wherever a list of tool ids is required.
|
|
- Pre-approval under `permissionMode: "dontAsk"` is configured by listing those
|
|
same exact `mcp__ktx__<toolName>` ids in `allowedTools` (documented as
|
|
auto-allow without prompting). Treat `allowedTools` as auto-approval, not
|
|
restriction.
|
|
- Defense-in-depth restriction uses `canUseTool`. The KTX runtime supplies a
|
|
`canUseTool` handler that allows only tool names in the current KTX MCP tool
|
|
map and denies everything else, so host-discovered slash commands, skills,
|
|
agents, future SDK defaults, or a misconfigured MCP server cannot expand the
|
|
execution surface.
|
|
- `disallowedTools` MUST additionally list the current built-in tool names
|
|
(`Agent`, `Task`, `AskUserQuestion`, `Bash`, `Read`, `Edit`, `Write`, `Glob`,
|
|
`Grep`, `WebFetch`, `WebSearch`, `TodoWrite`) as redundant insurance.
|
|
```
|
|
|
|
- [ ] **Step 4: Update auth probe acceptance text**
|
|
|
|
After the auth probe option list at lines `543-545`, add:
|
|
|
|
```markdown
|
|
The auth probe MUST tolerate init messages with non-empty
|
|
`slash_commands`, `skills`, and `agents` when `message.tools` is empty,
|
|
`message.mcp_servers` is empty, `message.plugins` is empty, and the query
|
|
options contain the KTX isolation tuple. Host discovery metadata is not an
|
|
auth failure.
|
|
```
|
|
|
|
- [ ] **Step 5: Update verified evidence and open items**
|
|
|
|
Replace lines `621-623` with:
|
|
|
|
```markdown
|
|
- The Agent SDK skills docs say the `skills` option is a context filter rather
|
|
than a sandbox. KTX must pass `skills: []`, but must not assert that
|
|
`message.skills` is empty in the SDK init message.
|
|
```
|
|
|
|
Replace open item `8` at lines `648-649` with:
|
|
|
|
```markdown
|
|
8. Write tests proving a raw built-in Claude Code tool request is denied,
|
|
host-discovered Skill/Agent/SlashCommand requests are denied by `canUseTool`,
|
|
and only exact `mcp__ktx__*` tools are allowed during KTX agent loops.
|
|
```
|
|
|
|
Replace open item `9` at lines `650-654` with:
|
|
|
|
```markdown
|
|
9. Write a test that asserts every KTX-originated `query()` invocation
|
|
(agent loop, text generation, object generation, auth probe) is called
|
|
with `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`, and
|
|
`persistSession: false`, by spying on the SDK entry point. The test must
|
|
fail if any path falls back to SDK defaults for those fields. The test must
|
|
also prove that non-empty host-discovered `slash_commands`, `skills`, and
|
|
`agents` in the init message do not fail the auth probe or runtime when the
|
|
controlled tool, MCP server, and plugin surfaces match KTX expectations.
|
|
```
|
|
|
|
- [ ] **Step 6: Commit the spec alignment**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git add docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
|
|
git commit -m "docs: align claude-code isolation spec with sdk metadata"
|
|
```
|
|
|
|
Expected: the design spec no longer requires zero host-discovery metadata in
|
|
the SDK init message.
|
|
|
|
### Task 2: Add regression tests for host-discovered init metadata
|
|
|
|
**Files:**
|
|
|
|
- Modify: `packages/context/src/llm/claude-code-runtime.test.ts`
|
|
|
|
- [ ] **Step 1: Replace the invalid agent rejection test**
|
|
|
|
In `packages/context/src/llm/claude-code-runtime.test.ts`, replace the test named
|
|
`rejects settings-derived agents and non-KTX MCP servers from init messages`
|
|
with these tests:
|
|
|
|
```ts
|
|
it('treats host-discovered commands skills and agents as non-fatal init metadata for text and auth probe', async () => {
|
|
const hostDiscoveredInit = initMessage({
|
|
slash_commands: ['/help', '/compact', '/clear', '/user-command'],
|
|
skills: ['pdf', 'docx'],
|
|
agents: ['claude', 'Explore', 'general-purpose'],
|
|
});
|
|
const textQuery = vi.fn((_input: any) =>
|
|
stream([hostDiscoveredInit, resultMessage({ result: 'hello' })]),
|
|
);
|
|
const runtime = new ClaudeCodeKtxLlmRuntime({
|
|
projectDir: '/tmp/project',
|
|
modelSlots: { default: 'sonnet' },
|
|
query: textQuery,
|
|
env: { ANTHROPIC_API_KEY: 'sk-ant-test', PATH: '/usr/bin' }, // pragma: allowlist secret
|
|
});
|
|
|
|
await expect(runtime.generateText({ role: 'default', prompt: 'say hello' })).resolves.toBe('hello');
|
|
const textOptions = textQuery.mock.calls[0][0].options;
|
|
expect(textOptions).toMatchObject({
|
|
settingSources: [],
|
|
skills: [],
|
|
plugins: [],
|
|
tools: [],
|
|
allowedTools: [],
|
|
permissionMode: 'dontAsk',
|
|
persistSession: false,
|
|
env: expect.not.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test' }),
|
|
});
|
|
expect(textOptions.disallowedTools).toEqual(expect.arrayContaining(['Agent', 'Task', 'Bash']));
|
|
expect(await textOptions.canUseTool('Agent', {}, { signal: new AbortController().signal, toolUseID: 'agent' })).toMatchObject({
|
|
behavior: 'deny',
|
|
toolUseID: 'agent',
|
|
});
|
|
expect(await textOptions.canUseTool('Skill', {}, { signal: new AbortController().signal, toolUseID: 'skill' })).toMatchObject({
|
|
behavior: 'deny',
|
|
toolUseID: 'skill',
|
|
});
|
|
expect(
|
|
await textOptions.canUseTool('SlashCommand', {}, { signal: new AbortController().signal, toolUseID: 'slash' }),
|
|
).toMatchObject({
|
|
behavior: 'deny',
|
|
toolUseID: 'slash',
|
|
});
|
|
|
|
const probeQuery = vi.fn((_input: any) =>
|
|
stream([hostDiscoveredInit, resultMessage({ result: 'ok' })]),
|
|
);
|
|
await expect(
|
|
runClaudeCodeAuthProbe({
|
|
projectDir: '/tmp/project',
|
|
model: 'sonnet',
|
|
query: probeQuery,
|
|
env: { ANTHROPIC_AUTH_TOKEN: 'token', HOME: '/Users/test' },
|
|
}),
|
|
).resolves.toEqual({ ok: true });
|
|
expect(probeQuery.mock.calls[0][0].options).toMatchObject({
|
|
settingSources: [],
|
|
skills: [],
|
|
plugins: [],
|
|
tools: [],
|
|
allowedTools: [],
|
|
permissionMode: 'dontAsk',
|
|
persistSession: false,
|
|
env: expect.objectContaining({ HOME: '/Users/test' }),
|
|
});
|
|
expect(probeQuery.mock.calls[0][0].options.env).not.toEqual(
|
|
expect.objectContaining({ ANTHROPIC_AUTH_TOKEN: 'token' }),
|
|
);
|
|
});
|
|
|
|
it('allows host-discovered context during agent loops while requiring exact KTX MCP tools and servers', async () => {
|
|
const query = vi.fn((_input: any) =>
|
|
stream([
|
|
initMessage({
|
|
tools: ['mcp__ktx__load_skill'],
|
|
mcp_servers: [{ name: 'ktx', status: 'connected' }],
|
|
slash_commands: ['/help', '/compact', '/clear'],
|
|
skills: ['memory-agent', 'doc-reader'],
|
|
agents: ['claude', 'Plan', 'Explore'],
|
|
}),
|
|
{
|
|
type: 'assistant',
|
|
message: { role: 'assistant', content: [] },
|
|
parent_tool_use_id: null,
|
|
uuid: '00000000-0000-4000-8000-000000000006',
|
|
session_id: 'session-id',
|
|
} as unknown as SDKMessage,
|
|
resultMessage({ subtype: 'error_max_turns', is_error: true }),
|
|
]),
|
|
);
|
|
const runtime = new ClaudeCodeKtxLlmRuntime({
|
|
projectDir: '/tmp/project',
|
|
modelSlots: { default: 'sonnet' },
|
|
query,
|
|
env: {},
|
|
});
|
|
|
|
await expect(
|
|
runtime.runAgentLoop({
|
|
modelRole: 'default',
|
|
systemPrompt: 'system',
|
|
userPrompt: 'user',
|
|
toolSet: {
|
|
load_skill: {
|
|
name: 'load_skill',
|
|
description: 'Load skill.',
|
|
inputSchema: z.object({ name: z.string() }),
|
|
execute: async () => ({ markdown: 'loaded' }),
|
|
},
|
|
},
|
|
stepBudget: 1,
|
|
telemetryTags: { operationName: 'test' },
|
|
}),
|
|
).resolves.toEqual({ stopReason: 'budget' });
|
|
|
|
const options = query.mock.calls[0][0].options;
|
|
expect(options.allowedTools).toEqual(['mcp__ktx__load_skill']);
|
|
expect(await options.canUseTool('mcp__ktx__load_skill', {}, { signal: new AbortController().signal, toolUseID: '1' })).toEqual({
|
|
behavior: 'allow',
|
|
toolUseID: '1',
|
|
});
|
|
expect(await options.canUseTool('Task', {}, { signal: new AbortController().signal, toolUseID: '2' })).toMatchObject({
|
|
behavior: 'deny',
|
|
toolUseID: '2',
|
|
});
|
|
expect(await options.canUseTool('Skill', {}, { signal: new AbortController().signal, toolUseID: '3' })).toMatchObject({
|
|
behavior: 'deny',
|
|
toolUseID: '3',
|
|
});
|
|
});
|
|
|
|
it('still rejects unexpected tools, missing KTX tools, plugins, and non-KTX MCP servers from init messages', async () => {
|
|
const query = vi.fn((_input: any) =>
|
|
stream([
|
|
initMessage({
|
|
tools: ['Bash'],
|
|
mcp_servers: [{ name: 'filesystem', status: 'connected' }],
|
|
plugins: [{ name: 'host-plugin', path: '/tmp/plugin' }],
|
|
}),
|
|
resultMessage({ result: 'hello' }),
|
|
]),
|
|
);
|
|
const runtime = new ClaudeCodeKtxLlmRuntime({
|
|
projectDir: '/tmp/project',
|
|
modelSlots: { default: 'sonnet' },
|
|
query,
|
|
env: {},
|
|
});
|
|
|
|
await expect(
|
|
runtime.generateText({
|
|
role: 'default',
|
|
prompt: 'say hello',
|
|
tools: {
|
|
load_skill: {
|
|
name: 'load_skill',
|
|
description: 'Load skill.',
|
|
inputSchema: z.object({ name: z.string() }),
|
|
execute: async () => ({ markdown: 'loaded' }),
|
|
},
|
|
},
|
|
}),
|
|
).rejects.toThrow(
|
|
/Claude Code runtime isolation failed: .*tools=Bash.*missing_tools=mcp__ktx__load_skill.*mcp_servers=filesystem.*plugins=host-plugin/,
|
|
);
|
|
});
|
|
```
|
|
|
|
- [ ] **Step 2: Run the runtime test to verify it fails**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
|
|
```
|
|
|
|
Expected: FAIL. The first new test fails because `runClaudeCodeAuthProbe(...)`
|
|
returns `{ ok: false, ... }` and `generateText(...)` rejects when init metadata
|
|
contains non-empty `slash_commands`, `skills`, or `agents`. The second new test
|
|
fails because `runAgentLoop(...)` returns `{ stopReason: 'error', ... }` for the
|
|
same reason.
|
|
|
|
- [ ] **Step 3: Commit the failing regression test**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git add packages/context/src/llm/claude-code-runtime.test.ts
|
|
git commit -m "test: cover claude-code host discovery metadata"
|
|
```
|
|
|
|
Expected: the commit contains tests that fail before the runtime assertion is
|
|
fixed.
|
|
|
|
### Task 3: Relax init metadata assertions to the controlled execution surface
|
|
|
|
**Files:**
|
|
|
|
- Modify: `packages/context/src/llm/claude-code-runtime.ts`
|
|
|
|
- [ ] **Step 1: Replace `assertInitIsolation`**
|
|
|
|
In `packages/context/src/llm/claude-code-runtime.ts`, replace the full
|
|
`assertInitIsolation(...)` function with:
|
|
|
|
```ts
|
|
function assertInitIsolation(
|
|
message: SDKMessage,
|
|
allowedToolIds: Set<string>,
|
|
expectedMcpServerNames: Set<string>,
|
|
): void {
|
|
if (message.type !== 'system' || message.subtype !== 'init') {
|
|
return;
|
|
}
|
|
const activeToolIds = new Set(message.tools);
|
|
const unexpectedTools = message.tools.filter((toolName) => !allowedToolIds.has(toolName));
|
|
const missingTools = [...allowedToolIds].filter((toolName) => !activeToolIds.has(toolName));
|
|
const activeMcpServerNames = message.mcp_servers.map((server) => server.name);
|
|
const unexpectedMcpServers = activeMcpServerNames.filter((name) => !expectedMcpServerNames.has(name));
|
|
const missingMcpServers = [...expectedMcpServerNames].filter((name) => !activeMcpServerNames.includes(name));
|
|
const unexpectedPlugins = message.plugins.map((plugin) => plugin.name);
|
|
if (
|
|
unexpectedTools.length > 0 ||
|
|
missingTools.length > 0 ||
|
|
unexpectedMcpServers.length > 0 ||
|
|
missingMcpServers.length > 0 ||
|
|
unexpectedPlugins.length > 0
|
|
) {
|
|
throw new Error(
|
|
`Claude Code runtime isolation failed: tools=${unexpectedTools.join(',') || '(none)'} missing_tools=${
|
|
missingTools.join(',') || '(none)'
|
|
} mcp_servers=${unexpectedMcpServers.join(',') || '(none)'} missing_mcp_servers=${
|
|
missingMcpServers.join(',') || '(none)'
|
|
} plugins=${unexpectedPlugins.join(',') || '(none)'} host_slash_commands=${
|
|
message.slash_commands.length
|
|
} host_skills=${message.skills.length} host_agents=${message.agents?.join(',') || '(none)'}`,
|
|
);
|
|
}
|
|
}
|
|
```
|
|
|
|
This preserves strict checks for the KTX-controlled execution surface:
|
|
|
|
- `message.tools` must exactly equal the generated KTX MCP tool ids for the
|
|
current call.
|
|
- `message.mcp_servers` must exactly equal the expected KTX MCP server names.
|
|
- `message.plugins` must be empty.
|
|
|
|
It deliberately stops treating `message.slash_commands`, `message.skills`, and
|
|
`message.agents` as fatal because those fields can contain host-discovered
|
|
metadata that KTX cannot disable through the pinned SDK options.
|
|
|
|
- [ ] **Step 2: Run the runtime test to verify it passes**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts
|
|
```
|
|
|
|
Expected: PASS.
|
|
|
|
- [ ] **Step 3: Commit the runtime fix**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git add packages/context/src/llm/claude-code-runtime.ts packages/context/src/llm/claude-code-runtime.test.ts
|
|
git commit -m "fix: tolerate claude-code host discovery metadata"
|
|
```
|
|
|
|
Expected: the auth probe and runtime no longer fail solely because the SDK init
|
|
message reports host-discovered slash commands, skills, or agents.
|
|
|
|
### Task 4: Correct user-facing docs wording
|
|
|
|
**Files:**
|
|
|
|
- Modify: `docs-site/content/docs/guides/llm-configuration.mdx`
|
|
- Modify: `docs-site/content/docs/guides/building-context.mdx`
|
|
|
|
- [ ] **Step 1: Update the LLM configuration guide wording**
|
|
|
|
In `docs-site/content/docs/guides/llm-configuration.mdx`, replace lines `39-41`
|
|
with:
|
|
|
|
```mdx
|
|
`claude-code` keeps KTX tool boundaries intact. KTX exposes only the MCP tools
|
|
needed for the current KTX agent loop, disables Claude Code built-in tools,
|
|
keeps plugins empty, and denies every non-KTX tool request through
|
|
`canUseTool`. The Claude Agent SDK may still report host-discovered slash
|
|
commands, skills, and subagent names in init metadata; that metadata is not an
|
|
execution grant for KTX agent loops.
|
|
```
|
|
|
|
- [ ] **Step 2: Update the building context guide wording**
|
|
|
|
In `docs-site/content/docs/guides/building-context.mdx`, replace lines `61-63`
|
|
with:
|
|
|
|
```mdx
|
|
When you use `claude-code`, KTX still controls the tool surface for ingest and
|
|
memory capture. Claude Code built-in tools, discovered MCP servers, plugins,
|
|
skills, agents, and slash commands are not invokable by KTX agent loops unless
|
|
they are exact KTX MCP tools for the current run.
|
|
```
|
|
|
|
- [ ] **Step 3: Run docs tests**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
pnpm --filter ktx-docs run test
|
|
```
|
|
|
|
Expected: PASS.
|
|
|
|
- [ ] **Step 4: Commit docs wording**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git add docs-site/content/docs/guides/llm-configuration.mdx docs-site/content/docs/guides/building-context.mdx
|
|
git commit -m "docs: clarify claude-code host discovery metadata"
|
|
```
|
|
|
|
Expected: user docs describe invocation control rather than promising zero
|
|
host-discovery metadata.
|
|
|
|
### Task 5: Final verification
|
|
|
|
**Files:**
|
|
|
|
- Verify: `docs/superpowers/specs/2026-05-15-claude-code-backend-design.md`
|
|
- Verify: `packages/context/src/llm/claude-code-runtime.ts`
|
|
- Verify: `packages/context/src/llm/claude-code-runtime.test.ts`
|
|
- Verify: `docs-site/content/docs/guides/llm-configuration.mdx`
|
|
- Verify: `docs-site/content/docs/guides/building-context.mdx`
|
|
|
|
- [ ] **Step 1: Run targeted runtime tests**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
pnpm --filter @ktx/context exec vitest run src/llm/claude-code-runtime.test.ts src/llm/runtime-tools.test.ts src/llm/claude-code-env.test.ts
|
|
```
|
|
|
|
Expected: PASS.
|
|
|
|
- [ ] **Step 2: Run package type-check**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
pnpm --filter @ktx/context run type-check
|
|
```
|
|
|
|
Expected: PASS.
|
|
|
|
- [ ] **Step 3: Run docs verification**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
pnpm --filter ktx-docs run test
|
|
```
|
|
|
|
Expected: PASS.
|
|
|
|
- [ ] **Step 4: Run dead-code checks**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
pnpm run dead-code
|
|
```
|
|
|
|
Expected: PASS or only pre-existing unrelated findings. Investigate and fix any
|
|
finding caused by the runtime assertion or test changes.
|
|
|
|
- [ ] **Step 5: Inspect git status**
|
|
|
|
Run:
|
|
|
|
```bash
|
|
git status --short
|
|
```
|
|
|
|
Expected: only files from this plan are modified, or the working tree is clean
|
|
if each task was committed.
|
|
|
|
## Self-review
|
|
|
|
- Spec coverage: This plan addresses the v1-blocking auth probe failure,
|
|
aligns the spec with the SDK contract, preserves the real KTX execution
|
|
boundary, and adds regression coverage for non-empty host-discovered
|
|
`slash_commands`, `skills`, and `agents` in both auth probe and runtime paths.
|
|
- Placeholder scan: No placeholder markers remain. Every code-changing step
|
|
includes exact file paths, code blocks, commands, and expected results.
|
|
- Type consistency: The plan uses existing names from the codebase:
|
|
`ClaudeCodeKtxLlmRuntime`, `runClaudeCodeAuthProbe`, `initMessage`,
|
|
`resultMessage`, `assertInitIsolation`, `mcpToolIds`, `KtxRuntimeToolSet`, and
|
|
`canUseTool`.
|