19 KiB
Brainstorm: claude-code agent-runner backend for KTX
Adds a claude-code selection that routes KTX agent-runner LLM calls through @anthropic-ai/claude-agent-sdk, reusing the user's existing Claude Code authentication. Same KTX UX (ktx ingest, etc.); the backend is selected in ktx.yaml.
Scope of the new backend. The claude-code selection is an agent-runner backend, not a drop-in replacement for the global KtxLlmProvider. Non-agent LLM call sites (page triage, scan enrichment, scan description generation, relationship LLM proposals) all consume KtxLlmProvider directly via generateKtxText / generateKtxObject (packages/context/src/ingest/page-triage/page-triage.service.ts:342-356, packages/context/src/scan/local-scan.ts:377-382, packages/context/src/scan/description-generation.ts:780-788, packages/context/src/scan/relationship-discovery.ts:244-255) and createKtxLlmProvider currently routes any unknown backend to gateway (packages/llm/src/model-provider.ts:155-186). The plan must decide one of: (a) introduce llm.provider.backend: 'claude-code' as a global LLM backend and define behavior for every non-agent consumer (likely by mapping it to an Anthropic provider construction that also reuses local Claude Code credentials, or by failing fast with a clear error), or (b) keep llm.provider.backend to its existing values and add a separate llm.agentRunner.backend (or equivalent) field whose 'claude-code' value only swaps the agent runner. Option (b) preserves existing non-agent behavior with no risk of accidental gateway fall-through; option (a) requires explicit handling at every non-agent site. This decision is open for the plan-writing session — the brainstorm does not lock it.
This is not a plan. It is the decided design after a /brainstorming session. The follow-up plan should be written separately.
Goals
- Let a KTX user run
ktx ingest(and the other agentic CLI paths) against their existing Claude Code session — without provisioning a newANTHROPIC_API_KEYor Vertex credentials. - Preserve KTX's per-stage tool curation: each
ktx ingeststage continues to pass its own curated tool set into the runner; the new backend exposes exactly that set and nothing more, with no Claude Code built-ins (Bash/Read/Edit/Write/Grep/Glob/WebFetch/Task) reachable by the model. - Preserve correctness of existing ingest output. The new backend must produce work-unit results equivalent to today's AI-SDK backend on the same inputs.
Non-goals (MVP)
- Tool-call repair parity. Today the AI-SDK runner uses
experimental_repairToolCallto recover from malformed tool args (extract JSON from string input, or re-prompt the repair model against the schema — seepackages/llm/src/repair.ts:35-88). The Claude Agent SDK has no transparent same-step equivalent. MVP accepts degraded behavior: the model sees a schema error and self-corrects on the next turn. - OTEL telemetry. The AI-SDK runner uses
experimental_telemetry(packages/context/src/agent/agent-runner.service.ts:76). The Claude Agent SDK exposes hook events (PreToolUse,PostToolUse,SessionStart,SessionEnd, etc.) but no native OTEL plug. MVP ships without telemetry on this backend; observability follow-up wires OTEL through hooks. - Promoting this backend as a productized "use your Max subscription" feature. Per Claude Agent SDK docs ("Anthropic generally does not permit third-party developers to offer
claude.ailogin or rate limits for their products"), this is framed as "use your own local Claude Code session." Documentation and setup messaging should reflect that. - Embeddings. Embeddings (
createKtxEmbeddingProviderinpackages/llm/src/embedding-provider.ts) are independent of the LLM backend and unaffected by this change. Claude does not provide embeddings; users keep their existing embedding config.
Architecture
ktx ingest
└─ stage-3-work-units.ts (and other stages)
└─ deps.agentRunner.runLoop({ systemPrompt, userPrompt, toolSet, stepBudget, modelRole, … })
│
├─ AgentRunnerService (when llm.backend !== 'claude-code')
│ uses generateText() + llmProvider.getModel(role) + experimental_repairToolCall
│
└─ ClaudeAgentSdkRunnerService (when agent-runner backend === 'claude-code')
uses @anthropic-ai/claude-agent-sdk: query({
prompt,
options: {
cwd: project.projectDir,
systemPrompt,
mcpServers: { ktx: createSdkMcpServer({ tools: <curated KTX tools> }) },
// Make ONLY KTX MCP tools reachable. The exact SDK option that
// enforces "agent may not call anything outside this list" is open
// for the plan; `allowedTools` alone is documented as an
// auto-approval list, not a restriction. Candidate mechanisms
// (to be confirmed by plan-writing against current SDK docs):
// - `disallowedTools` covering each built-in tool name, OR
// - the `tools` option configured to expose only the MCP set, OR
// - a `canUseTool` callback / permission mode that denies
// anything not prefixed `mcp__ktx__`.
// Whichever mechanism the SDK actually supports for restriction
// is what the plan must use; the security boundary is "no Claude
// Code built-in reachable", not the literal word `allowedTools`.
maxTurns: stepBudget,
}
})
The two runner classes share the public runLoop(params: RunLoopParams): Promise<RunLoopResult> shape (see packages/context/src/agent/agent-runner.service.ts:13-37 for the interface). Stage code does not change. The runner is selected at the context-runtime DI factories that today construct AgentRunnerService — resolveAgentRunner in packages/context/src/ingest/local-bundle-runtime.ts:580-604 for ingest, and the corresponding agent-runner construction in packages/context/src/memory/local-memory.ts:92-110 for the memory agent. (Note: packages/cli/src/runtime.ts is the Python-runtime command handler, not agent-runner DI; it is not the integration point.)
The Claude Agent SDK is documented to reuse local Claude Code authentication automatically when the user has run claude to authenticate (see Verified evidence #1). No KTX-side login flow. No ANTHROPIC_API_KEY.
Decisions
| # | Decision | Rationale |
|---|---|---|
| Q1 | Same ktx ingest/ktx scan/etc. surface; backend selected via ktx.yaml |
KTX UX stays unchanged; the new backend is invisible to the user except in config |
| Q2 | @anthropic-ai/claude-agent-sdk directly (not the OpenAI proxy, not claude -p subprocess) |
Native Anthropic protocol; reuses ~/.claude/ auth; fewest hops |
| Q3 | KTX MCP tools only; Claude Code built-ins (Bash/Read/Edit/Write/Grep/Glob/WebFetch/Task) must be unreachable to the model. The exact SDK mechanism (e.g. disallowedTools, tools configuration, canUseTool / permission mode) is an open item for the plan — allowedTools alone is auto-approval, not a restriction |
Preserves current ingest determinism and blast-radius limits; tool set continues to come from each stage's curated set |
| Q4 | New ClaudeAgentSdkRunnerService class alongside existing AgentRunnerService; both implement the same runLoop shape |
Avoids polluting the AI-SDK runner with conditional dead deps; clean per-runner deps shape; both call sites in stage-3-work-units.ts:91 etc. are untouched |
cwd |
Explicit cwd: project.projectDir (resolved at startup via resolveKtxProjectDir, not process.cwd()) |
SDK's cwd is semantic (skills, CLAUDE.md, file checkpointing); KTX's existing convention is to anchor on projectDir regardless of invocation directory |
| Tool adapter | A backend-neutral tool boundary that preserves enough KTX tool definition data to build an SDK tool(name, description, zodSchema, handler) for each entry. Today RunLoopParams.toolSet is Record<string, Tool> (AI SDK type) and source-specific tools (e.g. emit_historic_sql_evidence) are already raw AI SDK Tool objects, not BaseTool instances (packages/context/src/ingest/local-bundle-runtime.ts:543-556); the runner alone cannot recover the original Zod input schema and KTX handler from those. The plan must either (a) extend the toolset port (createIngestWuToolset, equivalents in memory/scan toolsets) so it returns a per-tool descriptor with name, description, inputSchema (Zod), and the KTX handler — convertible to either AI SDK or Claude Agent SDK tools at the boundary — and adapt source-specific raw tools to that descriptor, or (b) require both shapes to be produced upstream. toClaudeAgentSdkTool() on BaseTool is fine for the BaseTool subset but is not sufficient on its own |
KTX tools return { markdown, structured }; Claude Agent SDK's tool() expects { content: [{ type: 'text', text }] } — flattening markdown is straightforward once the underlying schema + handler are reachable |
| Q5 | MVP: degraded repair + no telemetry; both documented as known gaps | Fastest path to a working backend; correctness preserved (model self-corrects); follow-up wires both through SDK hooks if/when needed |
| Naming | Config value: 'claude-code' |
Names the user-facing thing (the Claude Code session they already authenticated); fits enum semantics (each value names an auth/API surface); avoids productizing the Max subscription |
Implementation surface
The plan should touch (at minimum) these areas. This is a sketch, not the plan.
- Config schema — depending on the open scope decision (see top of doc), either (a) extend
KtxLlmBackendinpackages/llm/src/types.tsand explicitly handle the new value increateKtxLlmProvider/createModelFactory(packages/llm/src/model-provider.ts:155-186) so it does not silently fall through to gateway, and at every non-agent LLM consumer (page triage, scan enrichment, scan description generation, relationship LLM proposals); or (b) leaveKtxLlmBackendalone and add a separate agent-runner backend field toKtxProjectLlmConfigwhose'claude-code'value is consumed only at the agent-runner DI boundary. - Tool boundary — make the per-stage toolset port return descriptors that preserve
name,description, Zod input schema, and the KTX handler so either an AI SDK tool or a Claude Agent SDKtool()can be built at the consumer. TouchLocalIngestToolsetFactory.createIngestWuToolsetand the memory/scan toolset equivalents (packages/context/src/ingest/local-bundle-runtime.ts:543-556,packages/context/src/memory/types.ts:120-126). Source-specific raw AI SDK tools must be wrapped to the same descriptor shape.BaseTool.toAiSdkTool()(:117-165) stays; a paralleltoClaudeAgentSdkTool()may live alongside it but is not the whole solution. packages/context/src/agent/— addclaude-agent-sdk-runner.service.tsexposingClaudeAgentSdkRunnerServicewith the samerunLoop(params: RunLoopParams)shape asAgentRunnerService. Internals: register the curated KTX tools viacreateSdkMcpServerandmcpServers, setcwd,systemPrompt,maxTurns: stepBudget, configure the SDK so onlymcp__ktx__*tools are reachable (mechanism per the open Q3 item — notallowedToolsalone), and consume the async iterator to detect stop conditions and map ontoRunLoopResult.- DI wiring — modify
resolveAgentRunnerinpackages/context/src/ingest/local-bundle-runtime.ts:580-604and the agent-runner construction path inpackages/context/src/memory/local-memory.ts:92-110to branch on the resolved agent-runner backend and constructClaudeAgentSdkRunnerServiceinstead ofAgentRunnerServicewhen applicable. All call sites (stage-3-work-units.ts:91, memory agent, etc.) receive the chosen runner via DI and do not change. - Setup / config validation — when the user selects
claude-codeinktx setup, verify that the local Claude Code SDK auth is usable, not just that~/.claude/exists. SDK docs establish that the SDK reuses authentication automatically when the user has runclaudeto authenticate; they do not establish directory probing as a sufficient liveness test. The plan must define a usability check (e.g. a minimal SDK probe call that exercises auth, an SDK-provided auth-status helper if one exists, or a documented file-presence check that the SDK docs explicitly endorse). Pure existence-of-~/.claude/is not sufficient on its own. - Docs —
docs-site/content/docs/concepts/anddocs-site/content/docs/getting-started/need a section on theclaude-codebackend, framed as "use your own local Claude Code session." Avoid productizing-Max-sub language.
Verified evidence
Findings cited during the brainstorm (each one already verified in this session):
- Auth reuse. Claude Agent SDK docs (
/nothflare/claude-agent-sdk-docsvia context7): "if you have already authenticated Claude Code by runningclaudein your terminal, the SDK will use that authentication automatically." - Tool config. SDK uses
createSdkMcpServer({ name, version, tools: [tool(name, description, zodSchema, handler), ...] })registered viamcpServersinquery()options. - Disabling Claude Code built-ins. Required outcome: only the registered
mcp__<server>__<tool>names are reachable;Bash/Read/Edit/Write/Grep/Glob/WebFetch/Taskand any other built-ins must not be invocable by the model. The exact SDK option that enforces this is open and must be confirmed against current SDK docs in the plan-writing session —allowedToolsis documented as an auto-approval list and is not sufficient as a restriction; candidate enforcement mechanisms aredisallowedTools, thetoolsoption, or acanUseTool/ permission-mode callback. - Step budget.
maxTurnsoption inquery()maps to KTX'sstepBudget. cwdsemantics. Skill loading (.claude/skills/),CLAUDE.mddiscovery (whensettingSources: ['project']), and file checkpointing all resolve relative tocwd. Defaults toprocess.cwd().- No transparent repair hook. SDK hook event list:
PreToolUse | PostToolUse | PostToolUseFailure | Notification | UserPromptSubmit | SessionStart | SessionEnd | Stop | SubagentStart | SubagentStop | PreCompact | PermissionRequest.PostToolUseFailurefires on execution failure, not on pre-execution malformed args. - No native OTEL plug. Telemetry must be wired manually through the above hooks.
- KTX tool shape today.
packages/context/src/tools/base-tool.ts:1importstoolfromai. Handlers returnToolOutput<T> = { markdown: string; structured: T }.toAiSdkTool()at:117-165flattens to{ type: 'content', value: [{ type: 'text', text: markdown }] }. Tools close over per-WU state viaToolSession. - KTX repair logic is portable.
createKtxToolCallRepairHandler(packages/llm/src/repair.ts:35-88) has no AI-SDK internals dependency; it could be plumbed onto any runner that exposes a comparable hook. For MVP we accept no repair on theclaude-codebackend. - Projection of
projectDir.resolveKtxProjectDir(packages/cli/src/project-resolver.ts:34-56) resolves once at startup from--project-dir,KTX_PROJECT_DIR, or nearestktx.yaml. Tools never readprocess.cwd()at execution time (packages/context/src/tools/*.tshas zeroprocess.cwd()calls; the only repo-wide runtime use is a fallback inpackages/context/src/connections/sqlite-query-executor.ts:67that's reached only ifinput.projectDiris undefined). - Daemon precedent.
packages/cli/src/managed-python-daemon.ts:182-188spawns the Python daemon without an explicitcwd. State paths are explicitly project-rooted (packages/cli/src/managed-python-runtime.ts:163-174). The Agent SDK case is different becausecwdis semantically meaningful to the SDK, unlike the Python daemon. - Runtime dir convention.
~/.ktx/runtime/<version>/holds shared versioned infrastructure (venv,ktx-daemonbinary).<projectDir>/.ktx/runtime/holds per-project execution state. The Claude Agent SDK runner is per-project execution; it runs fromprojectDir. If the plan later introduces shared infrastructure for this backend (e.g. a vendored Claude Code binary), that infrastructure goes under~/.ktx/runtime/<version>/.
Open items for the plan-writing session
Real questions the plan will need to answer that we did not lock during brainstorm:
- Model selection per role. Today KTX has
KtxModelRole = 'default' | 'triage' | 'candidateExtraction' | 'curator' | 'reconcile' | 'repair'with per-role model IDs. Claude Agent SDK'squery()accepts a singlemodelstring per call. The plan needs to decide whether theclaude-codebackend (a) maps each role to a specific Claude model ID per call, (b) uses a single configured model for all roles, or (c) reads role-to-model mapping from the samektx.yamlshape used by other backends. The'repair'role specifically is degraded under Q5=A, but the rest still need a binding strategy. - Auth presence check. Before the first
query()call, KTX should fail fast with a clear message if the local Claude Code SDK auth is not usable. The check must be a usability test, not just~/.claude/directory probing — directory presence does not prove the SDK can actually authenticate (see implementation surface). The detection mechanism (SDK probe call, SDK-provided helper, or a docs-endorsed file-presence test) is open. ktx.yamlschema migration & non-agent LLM consumers. Decide between the two scope options at the top of this document (globalKtxLlmBackendextension vs. separate agent-runner backend field). If extendingKtxLlmBackend: update zod schemas underpackages/context/src/project/, the setup wizard (packages/cli/src/setup-models.ts,packages/cli/src/commands/setup-commands.ts),createKtxLlmProvider/createModelFactory(packages/llm/src/model-provider.ts:155-186), and define behavior at every non-agent LLM consumer (packages/context/src/ingest/page-triage/page-triage.service.ts,packages/context/src/scan/local-scan.ts,packages/context/src/scan/description-generation.ts,packages/context/src/scan/relationship-discovery.ts). If adding a separate field: only the setup wizard, the project zod schema, and the agent-runner DI factories need to change.- Stop-reason mapping. The Agent SDK exposes session lifecycle via the async iterator and the
Stophook event. The plan needs to define how a Claude Agent SDK session maps to KTX'sRunLoopStopReason = 'budget' | 'natural' | 'error'(agent-runner.service.ts:6). In particular: how to detect thatmaxTurnswas hit vs natural completion vs error. - Tool failure counting.
stage-3-work-units.ts:132readstoolFailureCount?(wu.unitKey)to fail a WU when any tool call failed. The new runner needs to surface tool failures via the same counting mechanism. ThePostToolUseFailurehook is the natural integration point.