From f7fbacf229986fa0746ca8cc0770f2d714f3e1c0 Mon Sep 17 00:00:00 2001 From: Andrey Avtomonov Date: Fri, 15 May 2026 14:30:02 +0200 Subject: [PATCH] docs: expand claude-code spec to full llm parity --- .../2026-05-15-claude-code-backend-design.md | 345 ++++++++++-------- 1 file changed, 197 insertions(+), 148 deletions(-) diff --git a/docs/superpowers/specs/2026-05-15-claude-code-backend-design.md b/docs/superpowers/specs/2026-05-15-claude-code-backend-design.md index 510cd977..7930f7ed 100644 --- a/docs/superpowers/specs/2026-05-15-claude-code-backend-design.md +++ b/docs/superpowers/specs/2026-05-15-claude-code-backend-design.md @@ -1,150 +1,175 @@ -# Brainstorm: `claude-code` backend for end-to-end KTX ingest +# Brainstorm: `claude-code` backend with full KTX LLM parity -Adds a `claude-code` selection that makes **all `ktx ingest` LLM work** run -through `@anthropic-ai/claude-agent-sdk`, reusing the user's existing local -Claude Code authentication. The user experience stays the same: users run -`ktx ingest`; the backend is selected in `ktx.yaml`. +Adds a `claude-code` backend that gives KTX full parity with the existing +`ANTHROPIC_API_KEY`-based `anthropic` backend for **all KTX LLM calls**. The +backend uses `@anthropic-ai/claude-agent-sdk` and reuses the user's existing +local Claude Code authentication. Users select it in `ktx.yaml`. -This is not an implementation plan. It is the revised design after iterating on -the brainstorm with the requirement that **all KTX ingest capabilities must work -with `claude-code`**. The follow-up implementation plan should be written -separately. +This is not an implementation plan. It is the revised design after expanding +the requirement from "`ktx ingest` works with Claude Code" to "every KTX LLM +call works with Claude Code." The follow-up implementation plan should be +written separately. ## Core decision -`claude-code` is no longer only an agent-runner backend. It is an ingest-capable -LLM runtime that covers both kinds of LLM work used by ingest: +`claude-code` is a first-class global LLM backend. Any code path that currently +works with `llm.provider.backend: anthropic` must work with +`llm.provider.backend: claude-code`, unless it is not an LLM call at all. -- **Agent loops**: work-unit execution, reconciliation, context-candidate - curation pagination, and memory-agent ingestion paths that call - `agentRunner.runLoop(...)`. -- **Non-agent generation**: page triage and light extraction, which currently - call `KtxLlmProvider` directly through `generateText`. +This includes: -The implementation must not make page triage silently disappear when the user -chooses `claude-code`. Today `PageTriageService` is only constructed when -`resolveAgentRunner(...)` returns an AI SDK `llmProvider` -(`packages/context/src/ingest/local-bundle-runtime.ts:684-693`). Under the new -design, ingest gets a generation runtime for `claude-code`, so page triage and -light extraction still run. +- Agent loops implemented through `AgentRunnerService.runLoop(...)`. +- Text generation through `generateKtxText(...)`. +- Structured object generation through `generateKtxObject(...)`. +- Local ingest and MCP-triggered local ingest flows. +- Page triage and light extraction. +- Context-candidate curation and reconciliation. +- Memory capture. +- Scan/enrichment internals and relationship LLM proposals. +- Future KTX LLM call sites that use the shared runtime boundary. + +Commands that do not use LLMs do not need special Claude Code behavior. There +must be no silent fallback from `claude-code` to gateway, Anthropic API-key +execution, or deterministic output. ## Goals -- Let a KTX user run every `ktx ingest` mode against their existing local Claude - Code session without provisioning `ANTHROPIC_API_KEY`, Vertex credentials, or - an AI Gateway key. -- Cover scheduled pulls, upload ingest, Metabase fan-out, page triage, light - extraction, context-candidate curation, work-unit execution, reconciliation, - memory capture invoked from ingest, and source-specific tools such as - historic-SQL evidence emission. -- Preserve KTX's per-stage tool curation. Each stage exposes exactly the KTX - tools it already selected; Claude Code built-ins, filesystem-discovered MCP - servers, hooks, skills, plugins, agents, and slash commands must not expand - the tool surface. +- Let a KTX user run all KTX LLM-backed behavior through their existing local + Claude Code session without provisioning `ANTHROPIC_API_KEY`, Vertex + credentials, or an AI Gateway key. +- Preserve the existing user-facing CLI and MCP behavior. `claude-code` changes + how LLM calls execute, not which KTX workflows exist. +- Preserve role-based model selection. `llm.models.default`, `triage`, + `candidateExtraction`, `curator`, `reconcile`, and `repair` remain the source + of model selection for every LLM call. +- Preserve KTX's curated tool boundaries. Claude Code built-ins, + filesystem-discovered MCP servers, hooks, skills, plugins, agents, and slash + commands must not expand the tool surface for KTX agent loops. - Keep embeddings independent. Claude does not provide embeddings; users keep - configuring `ingest.embeddings` as they do today. + configuring `ingest.embeddings` and scan/enrichment embeddings as they do + today. - Fail fast with a clear message if local Claude Code authentication is not usable. ## Non-goals -- **Non-ingest LLM surfaces.** The required target is end-to-end `ktx ingest`. - Other internal LLM consumers are out of scope for this spec. The config and - runtime design must still avoid accidental gateway fall-through when - `llm.provider.backend` is `claude-code`. -- **Tool-call repair parity.** The AI SDK runner uses +- **Embedding parity.** Embeddings remain separate from LLM execution. +- **Tool-call repair parity in the first pass.** The AI SDK runner uses `experimental_repairToolCall` (`packages/llm/src/repair.ts:35-88`). The Claude Agent SDK has no transparent same-step repair hook. MVP behavior is next-turn self-correction from schema errors or a normal tool-failure count. -- **OTEL telemetry parity.** The AI SDK runner uses `experimental_telemetry`. - The Agent SDK exposes hooks such as `PostToolUseFailure` and `SessionEnd`, but - no drop-in OTEL switch. MVP ships without telemetry parity on this backend. +- **OTEL telemetry parity in the first pass.** The AI SDK runner uses + `experimental_telemetry`. The Agent SDK exposes hooks such as + `PostToolUseFailure` and `SessionEnd`, but no drop-in OTEL switch. MVP ships + without telemetry parity on this backend. - **Productizing Claude subscription limits.** Documentation must frame this as "use your own local Claude Code session," not as a third-party Claude Max or Claude.ai product feature. ## Approaches considered -### Recommended: Ingest LLM runtime port +### Recommended: global LLM runtime port Introduce a backend-neutral KTX LLM runtime port for operations, not just model construction: ```ts -interface KtxGenerationPort { +interface KtxLlmRuntimePort { generateText(input: KtxGenerateTextInput): Promise; generateObject(input: KtxGenerateObjectInput): Promise; -} - -interface AgentRunnerPort { - runLoop(params: RunLoopParams): Promise; + runAgentLoop(params: RunLoopParams): Promise; } ``` -The existing AI SDK implementation adapts `KtxLlmProvider` to these ports. The -new Claude Code implementation uses `query()` from -`@anthropic-ai/claude-agent-sdk`. Ingest services depend on the ports: +The existing `anthropic`, `vertex`, and `gateway` backends implement the runtime +through the AI SDK and existing `KtxLlmProvider`. The new `claude-code` backend +implements the same runtime through `@anthropic-ai/claude-agent-sdk`. -- `PageTriageService` depends on `KtxGenerationPort`, not raw - `KtxLlmProvider`. -- `generateKtxText` / `generateKtxObject` become thin helpers over the - generation port or move behind it. -- `AgentRunnerService` and `ClaudeAgentSdkRunnerService` both implement - `AgentRunnerPort`. +This is the recommended approach because KTX call sites need operations: +"generate text," "generate a structured object," and "run an agent loop." They +do not inherently need direct access to an AI SDK `LanguageModel`. The Agent SDK +is a session/agent API, not an AI SDK model factory, so the runtime port avoids +pretending those APIs are the same. -This is the recommended approach because it matches the Agent SDK's actual -shape. The Agent SDK is an agent/session API, not an AI SDK `LanguageModel` -factory, so forcing it into `KtxLlmProvider.getModel(...)` would create a false -abstraction and leave page triage broken. +### Rejected: fake AI SDK `LanguageModel` for Claude Code -### Rejected: agent-runner-only backend +Trying to make Claude Code look like an AI SDK `LanguageModel` would be brittle. +The Agent SDK owns session execution, permissions, MCP tools, structured output, +and result messages. Those semantics do not map cleanly onto a normal +`getModel(...)` return value. -This was the previous version of the spec. It made work-unit and reconciliation -agent loops possible, but it did not cover page triage or light extraction. -Because `ktx ingest` uses those non-agent LLM calls for document-like sources, -this does not satisfy the updated requirement. +### Rejected: branch at every call site -### Rejected for MVP: Claude Code OpenAI proxy - -Using a proxy or `claude -p` subprocess would avoid some TypeScript adapter work, -but it would add another protocol boundary, make tool control harder to prove, -and move away from the official Agent SDK API. +Adding `if backend === "claude-code"` around each LLM call would work briefly +but would duplicate prompt wrapping, structured output handling, debug logging, +tool conversion, auth checks, and error mapping. It would also make future LLM +call sites easy to miss. ## Architecture ```text -ktx ingest - -> createLocalBundleIngestRuntime(...) - -> resolveIngestLlmRuntime(...) - -> AI SDK runtime - - KtxGenerationPort via generateText / Output.object - - AgentRunnerPort via current AgentRunnerService - -> Claude Code runtime - - KtxGenerationPort via Agent SDK query() - - AgentRunnerPort via ClaudeAgentSdkRunnerService +ktx.yaml + llm.provider.backend: anthropic | vertex | gateway | claude-code + llm.models.: model alias or model ID - -> PageTriageService - -> generation.generateText({ role: "triage", ... }) +createLocalKtxLlmRuntimeFromConfig(project.config.llm) + -> AiSdkKtxLlmRuntime + - wraps existing KtxLlmProvider + - generateText / Output.object / AgentRunnerService + -> ClaudeCodeKtxLlmRuntime + - uses @anthropic-ai/claude-agent-sdk query() + - implements text, object, and agent-loop operations - -> IngestBundleRunner stages - -> agentRunner.runLoop({ modelRole, toolSet, stepBudget, ... }) +All KTX LLM call sites + -> KtxLlmRuntimePort ``` -The runtime is selected once at the context-runtime DI boundary. The main ingest -integration point remains `resolveAgentRunner` / -`createLocalBundleIngestRuntime` in -`packages/context/src/ingest/local-bundle-runtime.ts`, but the function should -evolve from "resolve an agent runner plus optional AI SDK provider" into -"resolve the ingest LLM runtime ports." The memory-agent construction path in -`packages/context/src/memory/local-memory.ts` needs the same port treatment. +The runtime is selected at the same boundaries that currently construct an +`llmProvider` or `AgentRunnerService`: -`packages/cli/src/runtime.ts` is the Python-runtime command handler; it is not -the agent-runner or generation-runtime integration point. +- `packages/context/src/llm/local-config.ts` +- `packages/context/src/ingest/local-bundle-runtime.ts` +- `packages/context/src/memory/local-memory.ts` +- `packages/context/src/scan/local-scan.ts` +- `packages/context/src/mcp/local-project-ports.ts` +- Any CLI setup/status/doctor code that validates LLM readiness + +After the change, services should not need to know whether the configured +backend is AI SDK based or Claude Code based. They call the runtime operation +they need. + +## LLM call-site migration + +The implementation plan must migrate every current KTX LLM call site to the +runtime port: + +- `packages/context/src/llm/generation.ts`: `generateKtxText` and + `generateKtxObject` become runtime-backed helpers or are folded into the + runtime. +- `packages/context/src/agent/agent-runner.service.ts`: the AI SDK agent loop + becomes the AI SDK implementation of `runAgentLoop`. +- `packages/context/src/ingest/page-triage/page-triage.service.ts`: page triage + and light extraction depend on `KtxLlmRuntimePort`, not raw `KtxLlmProvider`. +- `packages/context/src/scan/description-generation.ts`: AI descriptions use + the runtime text-generation operation. +- `packages/context/src/scan/relationship-llm-proposal.ts`: relationship + proposals use the runtime object-generation operation. +- `packages/context/src/ingest/stages/stage-3-work-units.ts`, + `packages/context/src/ingest/stages/stage-4-reconciliation.ts`, + `packages/context/src/ingest/context-candidates/curator-pagination.service.ts`, + and `packages/context/src/memory/memory-agent.service.ts`: agent loops use the + runtime agent-loop operation or a thin `AgentRunnerPort` backed by it. +- Test helpers and MCP local project ports that inject `llmProvider` or + `agentRunner` must either inject the runtime port or use compatibility test + adapters during the migration. + +The plan must include a grep-based audit so new or overlooked `getModel(...)`, +`generateKtxText(...)`, `generateKtxObject(...)`, `AgentRunnerService`, and +`llmProvider` usages are either migrated or explicitly proven non-runtime. ## Config design -The plan should make `claude-code` a first-class config value, not a hidden -side-channel. Recommended shape: +The config should make `claude-code` a first-class backend: ```yaml llm: @@ -156,28 +181,28 @@ llm: candidateExtraction: sonnet curator: sonnet reconcile: sonnet + repair: sonnet ``` Implementation implications: - Extend `KTX_LLM_BACKENDS` in `packages/context/src/project/config.ts` and `KtxLlmBackend` in `packages/llm/src/types.ts`. -- Update setup, status, doctor, and local provider resolution so - `claude-code` does not fall through to `gateway`. -- For `claude-code`, do not construct a fake AI SDK `LanguageModel`. Construct - the Claude Code generation/runtime ports. +- Update setup, status, doctor, schema generation, examples, and docs so + `claude-code` is understood everywhere `anthropic` is understood. +- Update `createKtxLlmProvider` / `createModelFactory` so unsupported backend + values throw instead of falling through to gateway. - Keep `llm.models` as the per-role binding source. The Claude Code runtime maps - each KTX role to the configured model string for the current call. The plan - must decide and test the accepted model aliases, for example `sonnet`, - `opus`, `haiku`, or full Claude model IDs supported by the SDK. -- If non-ingest code sees `backend: claude-code` before it has been ported, it - must fail fast with a clear unsupported-backend message. It must not silently - route to gateway. + each KTX role to the configured model string for the current call. +- Define accepted model aliases, such as `sonnet`, `opus`, and `haiku`, and full + model IDs supported by the pinned SDK version. ## Claude Agent SDK runtime behavior -Every Agent SDK call must be isolated and deterministic enough for KTX ingest. -Use explicit options even when SDK defaults currently match the desired value. +Every Agent SDK call must be isolated enough for KTX execution. Use explicit +options even when SDK defaults currently match the desired value. + +For agent loops with tools: ```ts query({ @@ -213,33 +238,43 @@ query({ }); ``` +For plain text generation: + +- Use the same `query()` runtime with `maxTurns: 1`. +- Pass `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`, and + `permissionMode: "dontAsk"`. +- Do not expose MCP tools unless the KTX call explicitly passed tools. +- Return the final result message text. + +For structured object generation: + +- Use the Agent SDK structured output option for JSON schema output. +- Convert KTX Zod schemas at the runtime boundary. +- Parse and validate the returned object with the original KTX schema before + returning it to the caller. + The plan must confirm the exact option names against the pinned SDK version, but the required outcome is fixed: - Filesystem settings are not loaded. `settingSources: []` is explicit, and the implementation should assert from the SDK init message that no unexpected - settings-derived commands, skills, agents, or MCP servers are active. + settings-derived commands, skills, agents, plugins, or MCP servers are active. - Skills are disabled with `skills: []`, and plugins are disabled with `plugins: []`. -- Only KTX MCP tools are available and auto-approved. `allowedTools` alone is - not sufficient because the current SDK docs describe it as auto-approval, not - restriction. Use `tools`, `permissionMode: "dontAsk"`, and explicit - `disallowedTools` for built-ins. +- `allowedTools` alone is not sufficient because the current SDK docs describe + it as auto-approval, not restriction. Use `tools`, `permissionMode: + "dontAsk"`, and explicit `disallowedTools` for built-ins. - Built-ins are denied even if a future SDK default changes. - `cwd` is `project.projectDir`, resolved at startup via `resolveKtxProjectDir`, not `process.cwd()`. -- Sessions are not persisted for ingest unless the plan identifies a concrete - debugging feature that needs persistence. - -For non-agent text generation, use the same isolated runtime with no MCP tools, -`maxTurns: 1`, and no filesystem settings. For structured outputs, use the Agent -SDK's JSON-schema output format and convert KTX's Zod schemas at the boundary. +- Sessions are not persisted unless the plan identifies a concrete debugging + feature that needs persistence. ## Tool boundary -The final `RunLoopParams.toolSet` cannot remain a raw AI SDK `Record` if two backends must consume it. The plan must define a backend-neutral -tool descriptor for the **final** tool map handed to the runner: +Agent-loop tools cannot remain only raw AI SDK `Record` values if +two backends must consume them. The plan must define a backend-neutral tool +descriptor for the final tool map handed to an agent loop: ```ts interface KtxRuntimeToolDescriptor { @@ -254,10 +289,8 @@ Every composed tool entry must preserve the descriptor, including: - `BaseTool` outputs from factory toolsets. - Source-specific raw tools such as `emit_historic_sql_evidence` in - `packages/context/src/ingest/local-bundle-runtime.ts:543-556`. -- Stage-local tools in `buildWuToolSet` and `buildReconcileToolSet` - (`packages/context/src/ingest/stages/build-wu-context.ts`, - `packages/context/src/ingest/stages/build-reconcile-context.ts`). + `packages/context/src/ingest/local-bundle-runtime.ts`. +- Stage-local tools in `buildWuToolSet` and `buildReconcileToolSet`. - Inline `load_skill`, read/raw/span, stage/diff, eviction, and emit tools in `packages/context/src/ingest/ingest-bundle.runner.ts`. - Memory-agent `load_skill` in @@ -267,8 +300,8 @@ Every composed tool entry must preserve the descriptor, including: The AI SDK adapter converts descriptors to `tool(...)`. The Claude Code adapter converts descriptors to Agent SDK `tool(name, description, schema.shape, handler)` entries inside `createSdkMcpServer(...)`. KTX tool handlers return -`{ markdown, structured }`; the Claude adapter returns the markdown as text -content and may include structured JSON in the text only if a caller needs it. +`{ markdown, structured }`; the Claude adapter returns markdown as text content +and may include structured JSON only if a caller needs it. Non-object schemas are unsupported for `claude-code` and must be rejected at startup with a clear error. In practice KTX tool inputs are already `z.object`. @@ -287,10 +320,14 @@ carry the terminal reason. They remain useful for lifecycle logging. Tool failur counting should use `PostToolUseFailure` and feed the same mechanism that `stage-3-work-units.ts` checks through `toolFailureCount?(wu.unitKey)`. +For text and object generation, SDK authentication, billing, rate-limit, +permission, max-turn, structured-output, and execution errors must map to the +same error surfaces that KTX uses for the Anthropic API-key backend. + ## Auth and setup -`ktx setup` must validate that Claude Code SDK auth is usable, not just that -`~/.claude/` exists. Acceptable validation strategies: +`ktx setup`, status, and doctor flows must validate that Claude Code SDK auth is +usable, not just that `~/.claude/` exists. Acceptable validation strategies: - A minimal SDK probe call with `settingSources: []`, `tools: []`, and `maxTurns: 1`. @@ -299,17 +336,18 @@ counting should use `PostToolUseFailure` and feed the same mechanism that state that it proves auth usability. Failure copy should tell the user to authenticate Claude Code locally with the -Claude Code CLI, then rerun setup or ingest. +Claude Code CLI, then rerun setup or the command they attempted. ## Documentation impact -Docs updates are required because this changes user-visible setup and ingest -behavior: +Docs updates are required because this changes user-visible setup and LLM +provider behavior: - `docs-site/content/docs/getting-started/quickstart.mdx` - `docs-site/content/docs/cli-reference/ktx-setup.mdx` - `docs-site/content/docs/guides/building-context.mdx` - Any config reference page that documents `llm.provider.backend` +- Any status or doctor docs that describe LLM readiness The docs must say that `claude-code` uses the user's own local Claude Code session. Do not describe it as a way for KTX to resell, pool, or productize @@ -322,17 +360,24 @@ Claude subscription limits. (`packages/llm/src/types.ts`, `packages/llm/src/model-provider.ts`). - Project config currently accepts `llm.provider.backend: none | anthropic | vertex | gateway` (`packages/context/src/project/config.ts`). -- `resolveAgentRunner(...)` currently requires an AI SDK `llmProvider`, and - page triage is only constructed when that provider exists - (`packages/context/src/ingest/local-bundle-runtime.ts`). -- Page triage and light extraction are non-agent LLM calls using - `llmProvider.getModel("triage")` and AI SDK `generateText` +- `generateKtxText` and `generateKtxObject` are shared non-agent generation + helpers (`packages/context/src/llm/generation.ts`). +- `AgentRunnerService` is the shared AI SDK agent-loop implementation + (`packages/context/src/agent/agent-runner.service.ts`). +- Page triage and light extraction currently use raw `KtxLlmProvider` (`packages/context/src/ingest/page-triage/page-triage.service.ts`). -- The Agent SDK TypeScript reference documents `settingSources` defaulting to - no filesystem settings, `allowedTools` as auto-approval rather than - restriction, `permissionMode: "dontAsk"`, `tools`, `disallowedTools`, - `maxTurns`, `mcpServers`, `cwd`, `persistSession`, and SDK result/hook - message shapes. +- Scan/enrichment internals currently use `createLocalKtxLlmProviderFromConfig`, + `generateKtxText`, and `generateKtxObject` + (`packages/context/src/scan/local-scan.ts`, + `packages/context/src/scan/description-generation.ts`, + `packages/context/src/scan/relationship-llm-proposal.ts`). +- Local ingest and MCP local project ports inject `llmProvider` and + `agentRunner` today (`packages/context/src/ingest/local-bundle-runtime.ts`, + `packages/context/src/mcp/local-project-ports.ts`). +- The Agent SDK TypeScript reference documents `settingSources` defaulting to no + filesystem settings, `allowedTools` as auto-approval rather than restriction, + `permissionMode: "dontAsk"`, `tools`, `disallowedTools`, `maxTurns`, + `mcpServers`, `cwd`, `persistSession`, and SDK result/hook message shapes. - The Agent SDK MCP docs show registering MCP servers in `query()` options and using `allowedTools` for MCP tool access. - The Agent SDK skills docs say discovered skills can be controlled with the @@ -343,13 +388,17 @@ Claude subscription limits. 1. Confirm exact TypeScript option names and result-message discriminants against the pinned `@anthropic-ai/claude-agent-sdk` version. -2. Define the final `KtxGenerationPort` and `AgentRunnerPort` file locations and - package exports. +2. Define the final `KtxLlmRuntimePort` file location and package exports. 3. Define model alias validation for `sonnet`, `opus`, `haiku`, and full model IDs. 4. Define the auth probe and make setup/status/doctor report actionable messages. -5. Write tests that prove page triage is constructed and called under - `llm.provider.backend: claude-code`. -6. Write tests that prove a raw built-in Claude Code tool request is denied and - only `mcp__ktx__*` tools are available during ingest. +5. Run a repo-wide audit for all LLM call sites and migrate each one to the + runtime boundary. +6. Write tests proving `claude-code` works for text generation, structured + object generation, and agent-loop execution. +7. Write tests proving page triage, scan/enrichment internals, memory capture, + MCP-triggered local ingest, and normal local ingest all use the + `claude-code` runtime when configured. +8. Write tests proving a raw built-in Claude Code tool request is denied and + only `mcp__ktx__*` tools are available during KTX agent loops.