docs: revise claude-code ingest backend spec

2026-06-13 08:15:14 +02:00 · 2026-05-15 14:06:53 +02:00 · 2026-05-15 14:06:53 +02:00 · 02c70953da
commit 02c70953da
parent beeeda4437
1 changed files with 360 additions and 0 deletions
--- a/docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
+++ b/docs/superpowers/specs/2026-05-15-claude-code-backend-design.md
@ -0,0 +1,360 @@
+# Brainstorm: `claude-code` backend for end-to-end KTX ingest
+
+Adds a `claude-code` selection that makes **all `ktx ingest` LLM work** run
+through `@anthropic-ai/claude-agent-sdk`, reusing the user's existing local
+Claude Code authentication. The user experience stays the same: users run
+`ktx ingest`; the backend is selected in `ktx.yaml`.
+
+This is not an implementation plan. It is the revised design after iterating on
+the brainstorm with the requirement that **all KTX ingest capabilities must work
+with `claude-code`**. The follow-up implementation plan should be written
+separately.
+
+## Core decision
+
+`claude-code` is no longer only an agent-runner backend. It is an ingest-capable
+LLM runtime that covers both kinds of LLM work used by ingest:
+
+- **Agent loops**: work-unit execution, reconciliation, context-candidate
+  curation pagination, and memory-agent ingestion paths that call
+  `agentRunner.runLoop(...)`.
+- **Non-agent generation**: page triage and light extraction, which currently
+  call `KtxLlmProvider` directly through `generateText`.
+
+The implementation must not make page triage silently disappear when the user
+chooses `claude-code`. Today `PageTriageService` is only constructed when
+`resolveAgentRunner(...)` returns an AI SDK `llmProvider`
+(`packages/context/src/ingest/local-bundle-runtime.ts:684-693`). Under the new
+design, ingest gets a generation runtime for `claude-code`, so page triage and
+light extraction still run.
+
+## Goals
+
+- Let a KTX user run every `ktx ingest` mode against their existing local Claude
+  Code session without provisioning `ANTHROPIC_API_KEY`, Vertex credentials, or
+  an AI Gateway key.
+- Cover scheduled pulls, upload ingest, Metabase fan-out, page triage, light
+  extraction, context-candidate curation, work-unit execution, reconciliation,
+  memory capture invoked from ingest, and source-specific tools such as
+  historic-SQL evidence emission.
+- Preserve KTX's per-stage tool curation. Each stage exposes exactly the KTX
+  tools it already selected; Claude Code built-ins, filesystem-discovered MCP
+  servers, hooks, skills, plugins, agents, and slash commands must not expand
+  the tool surface.
+- Keep embeddings independent. Claude does not provide embeddings; users keep
+  configuring `ingest.embeddings` as they do today.
+- Fail fast with a clear message if local Claude Code authentication is not
+  usable.
+
+## Non-goals
+
+- **Global `ktx scan` parity in the first plan.** The required target is
+  end-to-end `ktx ingest`. `ktx scan` has separate non-agent LLM consumers
+  (`packages/context/src/scan/description-generation.ts`,
+  `packages/context/src/scan/relationship-llm-proposal.ts`,
+  `packages/context/src/scan/local-scan.ts`). The config and runtime design must
+  avoid accidental gateway fall-through for these paths, but the plan may either
+  adapt them in the same pass or fail them clearly until scan support is planned.
+- **Tool-call repair parity.** The AI SDK runner uses
+  `experimental_repairToolCall` (`packages/llm/src/repair.ts:35-88`). The Claude
+  Agent SDK has no transparent same-step repair hook. MVP behavior is next-turn
+  self-correction from schema errors or a normal tool-failure count.
+- **OTEL telemetry parity.** The AI SDK runner uses `experimental_telemetry`.
+  The Agent SDK exposes hooks such as `PostToolUseFailure` and `SessionEnd`, but
+  no drop-in OTEL switch. MVP ships without telemetry parity on this backend.
+- **Productizing Claude subscription limits.** Documentation must frame this as
+  "use your own local Claude Code session," not as a third-party Claude Max or
+  Claude.ai product feature.
+
+## Approaches considered
+
+### Recommended: Ingest LLM runtime port
+
+Introduce a backend-neutral KTX LLM runtime port for operations, not just model
+construction:
+
+```ts
+interface KtxGenerationPort {
+  generateText(input: KtxGenerateTextInput): Promise<string>;
+  generateObject<T>(input: KtxGenerateObjectInput<T>): Promise<T>;
+}
+
+interface AgentRunnerPort {
+  runLoop(params: RunLoopParams): Promise<RunLoopResult>;
+}
+```
+
+The existing AI SDK implementation adapts `KtxLlmProvider` to these ports. The
+new Claude Code implementation uses `query()` from
+`@anthropic-ai/claude-agent-sdk`. Ingest services depend on the ports:
+
+- `PageTriageService` depends on `KtxGenerationPort`, not raw
+  `KtxLlmProvider`.
+- `generateKtxText` / `generateKtxObject` become thin helpers over the
+  generation port or move behind it.
+- `AgentRunnerService` and `ClaudeAgentSdkRunnerService` both implement
+  `AgentRunnerPort`.
+
+This is the recommended approach because it matches the Agent SDK's actual
+shape. The Agent SDK is an agent/session API, not an AI SDK `LanguageModel`
+factory, so forcing it into `KtxLlmProvider.getModel(...)` would create a false
+abstraction and leave page triage broken.
+
+### Rejected: agent-runner-only backend
+
+This was the previous version of the spec. It made work-unit and reconciliation
+agent loops possible, but it did not cover page triage or light extraction.
+Because `ktx ingest` uses those non-agent LLM calls for document-like sources,
+this does not satisfy the updated requirement.
+
+### Rejected for MVP: Claude Code OpenAI proxy
+
+Using a proxy or `claude -p` subprocess would avoid some TypeScript adapter work,
+but it would add another protocol boundary, make tool control harder to prove,
+and move away from the official Agent SDK API.
+
+## Architecture
+
+```text
+ktx ingest
+  -> createLocalBundleIngestRuntime(...)
+     -> resolveIngestLlmRuntime(...)
+        -> AI SDK runtime
+           - KtxGenerationPort via generateText / Output.object
+           - AgentRunnerPort via current AgentRunnerService
+        -> Claude Code runtime
+           - KtxGenerationPort via Agent SDK query()
+           - AgentRunnerPort via ClaudeAgentSdkRunnerService
+
+  -> PageTriageService
+     -> generation.generateText({ role: "triage", ... })
+
+  -> IngestBundleRunner stages
+     -> agentRunner.runLoop({ modelRole, toolSet, stepBudget, ... })
+```
+
+The runtime is selected once at the context-runtime DI boundary. The main ingest
+integration point remains `resolveAgentRunner` /
+`createLocalBundleIngestRuntime` in
+`packages/context/src/ingest/local-bundle-runtime.ts`, but the function should
+evolve from "resolve an agent runner plus optional AI SDK provider" into
+"resolve the ingest LLM runtime ports." The memory-agent construction path in
+`packages/context/src/memory/local-memory.ts` needs the same port treatment.
+
+`packages/cli/src/runtime.ts` is the Python-runtime command handler; it is not
+the agent-runner or generation-runtime integration point.
+
+## Config design
+
+The plan should make `claude-code` a first-class config value, not a hidden
+side-channel. Recommended shape:
+
+```yaml
+llm:
+  provider:
+    backend: claude-code
+  models:
+    default: sonnet
+    triage: haiku
+    candidateExtraction: sonnet
+    curator: sonnet
+    reconcile: sonnet
+```
+
+Implementation implications:
+
+- Extend `KTX_LLM_BACKENDS` in `packages/context/src/project/config.ts` and
+  `KtxLlmBackend` in `packages/llm/src/types.ts`.
+- Update setup, status, doctor, and local provider resolution so
+  `claude-code` does not fall through to `gateway`.
+- For `claude-code`, do not construct a fake AI SDK `LanguageModel`. Construct
+  the Claude Code generation/runtime ports.
+- Keep `llm.models` as the per-role binding source. The Claude Code runtime maps
+  each KTX role to the configured model string for the current call. The plan
+  must decide and test the accepted model aliases, for example `sonnet`,
+  `opus`, `haiku`, or full Claude model IDs supported by the SDK.
+- If a non-ingest path such as `ktx scan` sees `backend: claude-code` before it
+  has been ported, it must fail fast with a clear "not supported for this command
+  yet" message. It must not silently route to gateway.
+
+## Claude Agent SDK runtime behavior
+
+Every Agent SDK call must be isolated and deterministic enough for KTX ingest.
+Use explicit options even when SDK defaults currently match the desired value.
+
+```ts
+query({
+  prompt,
+  options: {
+    cwd: project.projectDir,
+    systemPrompt,
+    model: resolveModel(modelRole),
+    maxTurns: stepBudget,
+    settingSources: [],
+    skills: [],
+    plugins: [],
+    mcpServers: { ktx: createSdkMcpServer({ name: "ktx", tools }) },
+    tools: ["mcp__ktx__*"],
+    allowedTools: ["mcp__ktx__*"],
+    permissionMode: "dontAsk",
+    disallowedTools: [
+      "Agent",
+      "Task",
+      "AskUserQuestion",
+      "Bash",
+      "Read",
+      "Edit",
+      "Write",
+      "Glob",
+      "Grep",
+      "WebFetch",
+      "WebSearch",
+      "TodoWrite"
+    ],
+    persistSession: false
+  }
+});
+```
+
+The plan must confirm the exact option names against the pinned SDK version, but
+the required outcome is fixed:
+
+- Filesystem settings are not loaded. `settingSources: []` is explicit, and the
+  implementation should assert from the SDK init message that no unexpected
+  settings-derived commands, skills, agents, or MCP servers are active.
+- Skills are disabled with `skills: []`, and plugins are disabled with
+  `plugins: []`.
+- Only KTX MCP tools are available and auto-approved. `allowedTools` alone is
+  not sufficient because the current SDK docs describe it as auto-approval, not
+  restriction. Use `tools`, `permissionMode: "dontAsk"`, and explicit
+  `disallowedTools` for built-ins.
+- Built-ins are denied even if a future SDK default changes.
+- `cwd` is `project.projectDir`, resolved at startup via `resolveKtxProjectDir`,
+  not `process.cwd()`.
+- Sessions are not persisted for ingest unless the plan identifies a concrete
+  debugging feature that needs persistence.
+
+For non-agent text generation, use the same isolated runtime with no MCP tools,
+`maxTurns: 1`, and no filesystem settings. For structured outputs, use the Agent
+SDK's JSON-schema output format and convert KTX's Zod schemas at the boundary.
+
+## Tool boundary
+
+The final `RunLoopParams.toolSet` cannot remain a raw AI SDK `Record<string,
+Tool>` if two backends must consume it. The plan must define a backend-neutral
+tool descriptor for the **final** tool map handed to the runner:
+
+```ts
+interface KtxRuntimeToolDescriptor<TInput, TOutput> {
+  name: string;
+  description: string;
+  inputSchema: z.ZodObject<z.ZodRawShape>;
+  execute(input: TInput): Promise<ToolOutput<TOutput>>;
+}
+```
+
+Every composed tool entry must preserve the descriptor, including:
+
+- `BaseTool` outputs from factory toolsets.
+- Source-specific raw tools such as `emit_historic_sql_evidence` in
+  `packages/context/src/ingest/local-bundle-runtime.ts:543-556`.
+- Stage-local tools in `buildWuToolSet` and `buildReconcileToolSet`
+  (`packages/context/src/ingest/stages/build-wu-context.ts`,
+  `packages/context/src/ingest/stages/build-reconcile-context.ts`).
+- Inline `load_skill`, read/raw/span, stage/diff, eviction, and emit tools in
+  `packages/context/src/ingest/ingest-bundle.runner.ts`.
+- Memory-agent `load_skill` in
+  `packages/context/src/memory/memory-agent.service.ts`.
+- The `withVerificationLedger` wrapping layer.
+
+The AI SDK adapter converts descriptors to `tool(...)`. The Claude Code adapter
+converts descriptors to Agent SDK `tool(name, description, schema.shape,
+handler)` entries inside `createSdkMcpServer(...)`. KTX tool handlers return
+`{ markdown, structured }`; the Claude adapter returns the markdown as text
+content and may include structured JSON in the text only if a caller needs it.
+
+Non-object schemas are unsupported for `claude-code` and must be rejected at
+startup with a clear error. In practice KTX tool inputs are already `z.object`.
+
+## Stop reasons and failures
+
+The Claude runner maps the SDK's typed result message to
+`RunLoopStopReason = "budget" | "natural" | "error"`:
+
+- `subtype: "error_max_turns"` or `stop_reason: "max_turns"` -> `"budget"`.
+- `subtype: "success"` -> `"natural"`.
+- Other error subtypes or non-null unsuccessful stop reasons -> `"error"`.
+
+`Stop` hooks are not the authoritative stop-reason source because they do not
+carry the terminal reason. They remain useful for lifecycle logging. Tool failure
+counting should use `PostToolUseFailure` and feed the same mechanism that
+`stage-3-work-units.ts` checks through `toolFailureCount?(wu.unitKey)`.
+
+## Auth and setup
+
+`ktx setup` must validate that Claude Code SDK auth is usable, not just that
+`~/.claude/` exists. Acceptable validation strategies:
+
+- A minimal SDK probe call with `settingSources: []`, `tools: []`, and
+  `maxTurns: 1`.
+- An SDK-provided account/auth status method if the pinned version exposes one.
+- A docs-endorsed file-presence check only if the official SDK docs explicitly
+  state that it proves auth usability.
+
+Failure copy should tell the user to authenticate Claude Code locally with the
+Claude Code CLI, then rerun setup or ingest.
+
+## Documentation impact
+
+Docs updates are required because this changes user-visible setup and ingest
+behavior:
+
+- `docs-site/content/docs/getting-started/quickstart.mdx`
+- `docs-site/content/docs/cli-reference/ktx-setup.mdx`
+- `docs-site/content/docs/guides/building-context.mdx`
+- Any config reference page that documents `llm.provider.backend`
+
+The docs must say that `claude-code` uses the user's own local Claude Code
+session. Do not describe it as a way for KTX to resell, pool, or productize
+Claude subscription limits.
+
+## Verified evidence
+
+- Current `KtxLlmProvider` returns AI SDK `LanguageModel` instances and only
+  supports `anthropic`, `vertex`, and `gateway`
+  (`packages/llm/src/types.ts`, `packages/llm/src/model-provider.ts`).
+- Project config currently accepts `llm.provider.backend: none | anthropic |
+  vertex | gateway` (`packages/context/src/project/config.ts`).
+- `resolveAgentRunner(...)` currently requires an AI SDK `llmProvider`, and
+  page triage is only constructed when that provider exists
+  (`packages/context/src/ingest/local-bundle-runtime.ts`).
+- Page triage and light extraction are non-agent LLM calls using
+  `llmProvider.getModel("triage")` and AI SDK `generateText`
+  (`packages/context/src/ingest/page-triage/page-triage.service.ts`).
+- The Agent SDK TypeScript reference documents `settingSources` defaulting to
+  no filesystem settings, `allowedTools` as auto-approval rather than
+  restriction, `permissionMode: "dontAsk"`, `tools`, `disallowedTools`,
+  `maxTurns`, `mcpServers`, `cwd`, `persistSession`, and SDK result/hook
+  message shapes.
+- The Agent SDK MCP docs show registering MCP servers in `query()` options and
+  using `allowedTools` for MCP tool access.
+- The Agent SDK skills docs say discovered skills can be controlled with the
+  `skills` option and disabled with `[]`; the runtime should set this
+  explicitly.
+
+## Open items for the implementation plan
+
+1. Confirm exact TypeScript option names and result-message discriminants
+   against the pinned `@anthropic-ai/claude-agent-sdk` version.
+2. Decide whether to port `ktx scan` non-agent LLM consumers in the same pass or
+   fail them clearly for `claude-code` until a scan-specific plan exists.
+3. Define the final `KtxGenerationPort` and `AgentRunnerPort` file locations and
+   package exports.
+4. Define model alias validation for `sonnet`, `opus`, `haiku`, and full model
+   IDs.
+5. Define the auth probe and make setup/status/doctor report actionable
+   messages.
+6. Write tests that prove page triage is constructed and called under
+   `llm.provider.backend: claude-code`.
+7. Write tests that prove a raw built-in Claude Code tool request is denied and
+   only `mcp__ktx__*` tools are available during ingest.