docs: expand claude-code spec to full llm parity

This commit is contained in:
Andrey Avtomonov 2026-05-15 14:30:02 +02:00
parent 1899f763f6
commit f7fbacf229

View file

@ -1,150 +1,175 @@
# Brainstorm: `claude-code` backend for end-to-end KTX ingest
# Brainstorm: `claude-code` backend with full KTX LLM parity
Adds a `claude-code` selection that makes **all `ktx ingest` LLM work** run
through `@anthropic-ai/claude-agent-sdk`, reusing the user's existing local
Claude Code authentication. The user experience stays the same: users run
`ktx ingest`; the backend is selected in `ktx.yaml`.
Adds a `claude-code` backend that gives KTX full parity with the existing
`ANTHROPIC_API_KEY`-based `anthropic` backend for **all KTX LLM calls**. The
backend uses `@anthropic-ai/claude-agent-sdk` and reuses the user's existing
local Claude Code authentication. Users select it in `ktx.yaml`.
This is not an implementation plan. It is the revised design after iterating on
the brainstorm with the requirement that **all KTX ingest capabilities must work
with `claude-code`**. The follow-up implementation plan should be written
separately.
This is not an implementation plan. It is the revised design after expanding
the requirement from "`ktx ingest` works with Claude Code" to "every KTX LLM
call works with Claude Code." The follow-up implementation plan should be
written separately.
## Core decision
`claude-code` is no longer only an agent-runner backend. It is an ingest-capable
LLM runtime that covers both kinds of LLM work used by ingest:
`claude-code` is a first-class global LLM backend. Any code path that currently
works with `llm.provider.backend: anthropic` must work with
`llm.provider.backend: claude-code`, unless it is not an LLM call at all.
- **Agent loops**: work-unit execution, reconciliation, context-candidate
curation pagination, and memory-agent ingestion paths that call
`agentRunner.runLoop(...)`.
- **Non-agent generation**: page triage and light extraction, which currently
call `KtxLlmProvider` directly through `generateText`.
This includes:
The implementation must not make page triage silently disappear when the user
chooses `claude-code`. Today `PageTriageService` is only constructed when
`resolveAgentRunner(...)` returns an AI SDK `llmProvider`
(`packages/context/src/ingest/local-bundle-runtime.ts:684-693`). Under the new
design, ingest gets a generation runtime for `claude-code`, so page triage and
light extraction still run.
- Agent loops implemented through `AgentRunnerService.runLoop(...)`.
- Text generation through `generateKtxText(...)`.
- Structured object generation through `generateKtxObject(...)`.
- Local ingest and MCP-triggered local ingest flows.
- Page triage and light extraction.
- Context-candidate curation and reconciliation.
- Memory capture.
- Scan/enrichment internals and relationship LLM proposals.
- Future KTX LLM call sites that use the shared runtime boundary.
Commands that do not use LLMs do not need special Claude Code behavior. There
must be no silent fallback from `claude-code` to gateway, Anthropic API-key
execution, or deterministic output.
## Goals
- Let a KTX user run every `ktx ingest` mode against their existing local Claude
Code session without provisioning `ANTHROPIC_API_KEY`, Vertex credentials, or
an AI Gateway key.
- Cover scheduled pulls, upload ingest, Metabase fan-out, page triage, light
extraction, context-candidate curation, work-unit execution, reconciliation,
memory capture invoked from ingest, and source-specific tools such as
historic-SQL evidence emission.
- Preserve KTX's per-stage tool curation. Each stage exposes exactly the KTX
tools it already selected; Claude Code built-ins, filesystem-discovered MCP
servers, hooks, skills, plugins, agents, and slash commands must not expand
the tool surface.
- Let a KTX user run all KTX LLM-backed behavior through their existing local
Claude Code session without provisioning `ANTHROPIC_API_KEY`, Vertex
credentials, or an AI Gateway key.
- Preserve the existing user-facing CLI and MCP behavior. `claude-code` changes
how LLM calls execute, not which KTX workflows exist.
- Preserve role-based model selection. `llm.models.default`, `triage`,
`candidateExtraction`, `curator`, `reconcile`, and `repair` remain the source
of model selection for every LLM call.
- Preserve KTX's curated tool boundaries. Claude Code built-ins,
filesystem-discovered MCP servers, hooks, skills, plugins, agents, and slash
commands must not expand the tool surface for KTX agent loops.
- Keep embeddings independent. Claude does not provide embeddings; users keep
configuring `ingest.embeddings` as they do today.
configuring `ingest.embeddings` and scan/enrichment embeddings as they do
today.
- Fail fast with a clear message if local Claude Code authentication is not
usable.
## Non-goals
- **Non-ingest LLM surfaces.** The required target is end-to-end `ktx ingest`.
Other internal LLM consumers are out of scope for this spec. The config and
runtime design must still avoid accidental gateway fall-through when
`llm.provider.backend` is `claude-code`.
- **Tool-call repair parity.** The AI SDK runner uses
- **Embedding parity.** Embeddings remain separate from LLM execution.
- **Tool-call repair parity in the first pass.** The AI SDK runner uses
`experimental_repairToolCall` (`packages/llm/src/repair.ts:35-88`). The Claude
Agent SDK has no transparent same-step repair hook. MVP behavior is next-turn
self-correction from schema errors or a normal tool-failure count.
- **OTEL telemetry parity.** The AI SDK runner uses `experimental_telemetry`.
The Agent SDK exposes hooks such as `PostToolUseFailure` and `SessionEnd`, but
no drop-in OTEL switch. MVP ships without telemetry parity on this backend.
- **OTEL telemetry parity in the first pass.** The AI SDK runner uses
`experimental_telemetry`. The Agent SDK exposes hooks such as
`PostToolUseFailure` and `SessionEnd`, but no drop-in OTEL switch. MVP ships
without telemetry parity on this backend.
- **Productizing Claude subscription limits.** Documentation must frame this as
"use your own local Claude Code session," not as a third-party Claude Max or
Claude.ai product feature.
## Approaches considered
### Recommended: Ingest LLM runtime port
### Recommended: global LLM runtime port
Introduce a backend-neutral KTX LLM runtime port for operations, not just model
construction:
```ts
interface KtxGenerationPort {
interface KtxLlmRuntimePort {
generateText(input: KtxGenerateTextInput): Promise<string>;
generateObject<T>(input: KtxGenerateObjectInput<T>): Promise<T>;
}
interface AgentRunnerPort {
runLoop(params: RunLoopParams): Promise<RunLoopResult>;
runAgentLoop(params: RunLoopParams): Promise<RunLoopResult>;
}
```
The existing AI SDK implementation adapts `KtxLlmProvider` to these ports. The
new Claude Code implementation uses `query()` from
`@anthropic-ai/claude-agent-sdk`. Ingest services depend on the ports:
The existing `anthropic`, `vertex`, and `gateway` backends implement the runtime
through the AI SDK and existing `KtxLlmProvider`. The new `claude-code` backend
implements the same runtime through `@anthropic-ai/claude-agent-sdk`.
- `PageTriageService` depends on `KtxGenerationPort`, not raw
`KtxLlmProvider`.
- `generateKtxText` / `generateKtxObject` become thin helpers over the
generation port or move behind it.
- `AgentRunnerService` and `ClaudeAgentSdkRunnerService` both implement
`AgentRunnerPort`.
This is the recommended approach because KTX call sites need operations:
"generate text," "generate a structured object," and "run an agent loop." They
do not inherently need direct access to an AI SDK `LanguageModel`. The Agent SDK
is a session/agent API, not an AI SDK model factory, so the runtime port avoids
pretending those APIs are the same.
This is the recommended approach because it matches the Agent SDK's actual
shape. The Agent SDK is an agent/session API, not an AI SDK `LanguageModel`
factory, so forcing it into `KtxLlmProvider.getModel(...)` would create a false
abstraction and leave page triage broken.
### Rejected: fake AI SDK `LanguageModel` for Claude Code
### Rejected: agent-runner-only backend
Trying to make Claude Code look like an AI SDK `LanguageModel` would be brittle.
The Agent SDK owns session execution, permissions, MCP tools, structured output,
and result messages. Those semantics do not map cleanly onto a normal
`getModel(...)` return value.
This was the previous version of the spec. It made work-unit and reconciliation
agent loops possible, but it did not cover page triage or light extraction.
Because `ktx ingest` uses those non-agent LLM calls for document-like sources,
this does not satisfy the updated requirement.
### Rejected: branch at every call site
### Rejected for MVP: Claude Code OpenAI proxy
Using a proxy or `claude -p` subprocess would avoid some TypeScript adapter work,
but it would add another protocol boundary, make tool control harder to prove,
and move away from the official Agent SDK API.
Adding `if backend === "claude-code"` around each LLM call would work briefly
but would duplicate prompt wrapping, structured output handling, debug logging,
tool conversion, auth checks, and error mapping. It would also make future LLM
call sites easy to miss.
## Architecture
```text
ktx ingest
-> createLocalBundleIngestRuntime(...)
-> resolveIngestLlmRuntime(...)
-> AI SDK runtime
- KtxGenerationPort via generateText / Output.object
- AgentRunnerPort via current AgentRunnerService
-> Claude Code runtime
- KtxGenerationPort via Agent SDK query()
- AgentRunnerPort via ClaudeAgentSdkRunnerService
ktx.yaml
llm.provider.backend: anthropic | vertex | gateway | claude-code
llm.models.<role>: model alias or model ID
-> PageTriageService
-> generation.generateText({ role: "triage", ... })
createLocalKtxLlmRuntimeFromConfig(project.config.llm)
-> AiSdkKtxLlmRuntime
- wraps existing KtxLlmProvider
- generateText / Output.object / AgentRunnerService
-> ClaudeCodeKtxLlmRuntime
- uses @anthropic-ai/claude-agent-sdk query()
- implements text, object, and agent-loop operations
-> IngestBundleRunner stages
-> agentRunner.runLoop({ modelRole, toolSet, stepBudget, ... })
All KTX LLM call sites
-> KtxLlmRuntimePort
```
The runtime is selected once at the context-runtime DI boundary. The main ingest
integration point remains `resolveAgentRunner` /
`createLocalBundleIngestRuntime` in
`packages/context/src/ingest/local-bundle-runtime.ts`, but the function should
evolve from "resolve an agent runner plus optional AI SDK provider" into
"resolve the ingest LLM runtime ports." The memory-agent construction path in
`packages/context/src/memory/local-memory.ts` needs the same port treatment.
The runtime is selected at the same boundaries that currently construct an
`llmProvider` or `AgentRunnerService`:
`packages/cli/src/runtime.ts` is the Python-runtime command handler; it is not
the agent-runner or generation-runtime integration point.
- `packages/context/src/llm/local-config.ts`
- `packages/context/src/ingest/local-bundle-runtime.ts`
- `packages/context/src/memory/local-memory.ts`
- `packages/context/src/scan/local-scan.ts`
- `packages/context/src/mcp/local-project-ports.ts`
- Any CLI setup/status/doctor code that validates LLM readiness
After the change, services should not need to know whether the configured
backend is AI SDK based or Claude Code based. They call the runtime operation
they need.
## LLM call-site migration
The implementation plan must migrate every current KTX LLM call site to the
runtime port:
- `packages/context/src/llm/generation.ts`: `generateKtxText` and
`generateKtxObject` become runtime-backed helpers or are folded into the
runtime.
- `packages/context/src/agent/agent-runner.service.ts`: the AI SDK agent loop
becomes the AI SDK implementation of `runAgentLoop`.
- `packages/context/src/ingest/page-triage/page-triage.service.ts`: page triage
and light extraction depend on `KtxLlmRuntimePort`, not raw `KtxLlmProvider`.
- `packages/context/src/scan/description-generation.ts`: AI descriptions use
the runtime text-generation operation.
- `packages/context/src/scan/relationship-llm-proposal.ts`: relationship
proposals use the runtime object-generation operation.
- `packages/context/src/ingest/stages/stage-3-work-units.ts`,
`packages/context/src/ingest/stages/stage-4-reconciliation.ts`,
`packages/context/src/ingest/context-candidates/curator-pagination.service.ts`,
and `packages/context/src/memory/memory-agent.service.ts`: agent loops use the
runtime agent-loop operation or a thin `AgentRunnerPort` backed by it.
- Test helpers and MCP local project ports that inject `llmProvider` or
`agentRunner` must either inject the runtime port or use compatibility test
adapters during the migration.
The plan must include a grep-based audit so new or overlooked `getModel(...)`,
`generateKtxText(...)`, `generateKtxObject(...)`, `AgentRunnerService`, and
`llmProvider` usages are either migrated or explicitly proven non-runtime.
## Config design
The plan should make `claude-code` a first-class config value, not a hidden
side-channel. Recommended shape:
The config should make `claude-code` a first-class backend:
```yaml
llm:
@ -156,28 +181,28 @@ llm:
candidateExtraction: sonnet
curator: sonnet
reconcile: sonnet
repair: sonnet
```
Implementation implications:
- Extend `KTX_LLM_BACKENDS` in `packages/context/src/project/config.ts` and
`KtxLlmBackend` in `packages/llm/src/types.ts`.
- Update setup, status, doctor, and local provider resolution so
`claude-code` does not fall through to `gateway`.
- For `claude-code`, do not construct a fake AI SDK `LanguageModel`. Construct
the Claude Code generation/runtime ports.
- Update setup, status, doctor, schema generation, examples, and docs so
`claude-code` is understood everywhere `anthropic` is understood.
- Update `createKtxLlmProvider` / `createModelFactory` so unsupported backend
values throw instead of falling through to gateway.
- Keep `llm.models` as the per-role binding source. The Claude Code runtime maps
each KTX role to the configured model string for the current call. The plan
must decide and test the accepted model aliases, for example `sonnet`,
`opus`, `haiku`, or full Claude model IDs supported by the SDK.
- If non-ingest code sees `backend: claude-code` before it has been ported, it
must fail fast with a clear unsupported-backend message. It must not silently
route to gateway.
each KTX role to the configured model string for the current call.
- Define accepted model aliases, such as `sonnet`, `opus`, and `haiku`, and full
model IDs supported by the pinned SDK version.
## Claude Agent SDK runtime behavior
Every Agent SDK call must be isolated and deterministic enough for KTX ingest.
Use explicit options even when SDK defaults currently match the desired value.
Every Agent SDK call must be isolated enough for KTX execution. Use explicit
options even when SDK defaults currently match the desired value.
For agent loops with tools:
```ts
query({
@ -213,33 +238,43 @@ query({
});
```
For plain text generation:
- Use the same `query()` runtime with `maxTurns: 1`.
- Pass `settingSources: []`, `skills: []`, `plugins: []`, `tools: []`, and
`permissionMode: "dontAsk"`.
- Do not expose MCP tools unless the KTX call explicitly passed tools.
- Return the final result message text.
For structured object generation:
- Use the Agent SDK structured output option for JSON schema output.
- Convert KTX Zod schemas at the runtime boundary.
- Parse and validate the returned object with the original KTX schema before
returning it to the caller.
The plan must confirm the exact option names against the pinned SDK version, but
the required outcome is fixed:
- Filesystem settings are not loaded. `settingSources: []` is explicit, and the
implementation should assert from the SDK init message that no unexpected
settings-derived commands, skills, agents, or MCP servers are active.
settings-derived commands, skills, agents, plugins, or MCP servers are active.
- Skills are disabled with `skills: []`, and plugins are disabled with
`plugins: []`.
- Only KTX MCP tools are available and auto-approved. `allowedTools` alone is
not sufficient because the current SDK docs describe it as auto-approval, not
restriction. Use `tools`, `permissionMode: "dontAsk"`, and explicit
`disallowedTools` for built-ins.
- `allowedTools` alone is not sufficient because the current SDK docs describe
it as auto-approval, not restriction. Use `tools`, `permissionMode:
"dontAsk"`, and explicit `disallowedTools` for built-ins.
- Built-ins are denied even if a future SDK default changes.
- `cwd` is `project.projectDir`, resolved at startup via `resolveKtxProjectDir`,
not `process.cwd()`.
- Sessions are not persisted for ingest unless the plan identifies a concrete
debugging feature that needs persistence.
For non-agent text generation, use the same isolated runtime with no MCP tools,
`maxTurns: 1`, and no filesystem settings. For structured outputs, use the Agent
SDK's JSON-schema output format and convert KTX's Zod schemas at the boundary.
- Sessions are not persisted unless the plan identifies a concrete debugging
feature that needs persistence.
## Tool boundary
The final `RunLoopParams.toolSet` cannot remain a raw AI SDK `Record<string,
Tool>` if two backends must consume it. The plan must define a backend-neutral
tool descriptor for the **final** tool map handed to the runner:
Agent-loop tools cannot remain only raw AI SDK `Record<string, Tool>` values if
two backends must consume them. The plan must define a backend-neutral tool
descriptor for the final tool map handed to an agent loop:
```ts
interface KtxRuntimeToolDescriptor<TInput, TOutput> {
@ -254,10 +289,8 @@ Every composed tool entry must preserve the descriptor, including:
- `BaseTool` outputs from factory toolsets.
- Source-specific raw tools such as `emit_historic_sql_evidence` in
`packages/context/src/ingest/local-bundle-runtime.ts:543-556`.
- Stage-local tools in `buildWuToolSet` and `buildReconcileToolSet`
(`packages/context/src/ingest/stages/build-wu-context.ts`,
`packages/context/src/ingest/stages/build-reconcile-context.ts`).
`packages/context/src/ingest/local-bundle-runtime.ts`.
- Stage-local tools in `buildWuToolSet` and `buildReconcileToolSet`.
- Inline `load_skill`, read/raw/span, stage/diff, eviction, and emit tools in
`packages/context/src/ingest/ingest-bundle.runner.ts`.
- Memory-agent `load_skill` in
@ -267,8 +300,8 @@ Every composed tool entry must preserve the descriptor, including:
The AI SDK adapter converts descriptors to `tool(...)`. The Claude Code adapter
converts descriptors to Agent SDK `tool(name, description, schema.shape,
handler)` entries inside `createSdkMcpServer(...)`. KTX tool handlers return
`{ markdown, structured }`; the Claude adapter returns the markdown as text
content and may include structured JSON in the text only if a caller needs it.
`{ markdown, structured }`; the Claude adapter returns markdown as text content
and may include structured JSON only if a caller needs it.
Non-object schemas are unsupported for `claude-code` and must be rejected at
startup with a clear error. In practice KTX tool inputs are already `z.object`.
@ -287,10 +320,14 @@ carry the terminal reason. They remain useful for lifecycle logging. Tool failur
counting should use `PostToolUseFailure` and feed the same mechanism that
`stage-3-work-units.ts` checks through `toolFailureCount?(wu.unitKey)`.
For text and object generation, SDK authentication, billing, rate-limit,
permission, max-turn, structured-output, and execution errors must map to the
same error surfaces that KTX uses for the Anthropic API-key backend.
## Auth and setup
`ktx setup` must validate that Claude Code SDK auth is usable, not just that
`~/.claude/` exists. Acceptable validation strategies:
`ktx setup`, status, and doctor flows must validate that Claude Code SDK auth is
usable, not just that `~/.claude/` exists. Acceptable validation strategies:
- A minimal SDK probe call with `settingSources: []`, `tools: []`, and
`maxTurns: 1`.
@ -299,17 +336,18 @@ counting should use `PostToolUseFailure` and feed the same mechanism that
state that it proves auth usability.
Failure copy should tell the user to authenticate Claude Code locally with the
Claude Code CLI, then rerun setup or ingest.
Claude Code CLI, then rerun setup or the command they attempted.
## Documentation impact
Docs updates are required because this changes user-visible setup and ingest
behavior:
Docs updates are required because this changes user-visible setup and LLM
provider behavior:
- `docs-site/content/docs/getting-started/quickstart.mdx`
- `docs-site/content/docs/cli-reference/ktx-setup.mdx`
- `docs-site/content/docs/guides/building-context.mdx`
- Any config reference page that documents `llm.provider.backend`
- Any status or doctor docs that describe LLM readiness
The docs must say that `claude-code` uses the user's own local Claude Code
session. Do not describe it as a way for KTX to resell, pool, or productize
@ -322,17 +360,24 @@ Claude subscription limits.
(`packages/llm/src/types.ts`, `packages/llm/src/model-provider.ts`).
- Project config currently accepts `llm.provider.backend: none | anthropic |
vertex | gateway` (`packages/context/src/project/config.ts`).
- `resolveAgentRunner(...)` currently requires an AI SDK `llmProvider`, and
page triage is only constructed when that provider exists
(`packages/context/src/ingest/local-bundle-runtime.ts`).
- Page triage and light extraction are non-agent LLM calls using
`llmProvider.getModel("triage")` and AI SDK `generateText`
- `generateKtxText` and `generateKtxObject` are shared non-agent generation
helpers (`packages/context/src/llm/generation.ts`).
- `AgentRunnerService` is the shared AI SDK agent-loop implementation
(`packages/context/src/agent/agent-runner.service.ts`).
- Page triage and light extraction currently use raw `KtxLlmProvider`
(`packages/context/src/ingest/page-triage/page-triage.service.ts`).
- The Agent SDK TypeScript reference documents `settingSources` defaulting to
no filesystem settings, `allowedTools` as auto-approval rather than
restriction, `permissionMode: "dontAsk"`, `tools`, `disallowedTools`,
`maxTurns`, `mcpServers`, `cwd`, `persistSession`, and SDK result/hook
message shapes.
- Scan/enrichment internals currently use `createLocalKtxLlmProviderFromConfig`,
`generateKtxText`, and `generateKtxObject`
(`packages/context/src/scan/local-scan.ts`,
`packages/context/src/scan/description-generation.ts`,
`packages/context/src/scan/relationship-llm-proposal.ts`).
- Local ingest and MCP local project ports inject `llmProvider` and
`agentRunner` today (`packages/context/src/ingest/local-bundle-runtime.ts`,
`packages/context/src/mcp/local-project-ports.ts`).
- The Agent SDK TypeScript reference documents `settingSources` defaulting to no
filesystem settings, `allowedTools` as auto-approval rather than restriction,
`permissionMode: "dontAsk"`, `tools`, `disallowedTools`, `maxTurns`,
`mcpServers`, `cwd`, `persistSession`, and SDK result/hook message shapes.
- The Agent SDK MCP docs show registering MCP servers in `query()` options and
using `allowedTools` for MCP tool access.
- The Agent SDK skills docs say discovered skills can be controlled with the
@ -343,13 +388,17 @@ Claude subscription limits.
1. Confirm exact TypeScript option names and result-message discriminants
against the pinned `@anthropic-ai/claude-agent-sdk` version.
2. Define the final `KtxGenerationPort` and `AgentRunnerPort` file locations and
package exports.
2. Define the final `KtxLlmRuntimePort` file location and package exports.
3. Define model alias validation for `sonnet`, `opus`, `haiku`, and full model
IDs.
4. Define the auth probe and make setup/status/doctor report actionable
messages.
5. Write tests that prove page triage is constructed and called under
`llm.provider.backend: claude-code`.
6. Write tests that prove a raw built-in Claude Code tool request is denied and
only `mcp__ktx__*` tools are available during ingest.
5. Run a repo-wide audit for all LLM call sites and migrate each one to the
runtime boundary.
6. Write tests proving `claude-code` works for text generation, structured
object generation, and agent-loop execution.
7. Write tests proving page triage, scan/enrichment internals, memory capture,
MCP-triggered local ingest, and normal local ingest all use the
`claude-code` runtime when configured.
8. Write tests proving a raw built-in Claude Code tool request is denied and
only `mcp__ktx__*` tools are available during KTX agent loops.