apunkt/ktx

mirror of https://github.com/Kaelio/ktx.git synced 2026-06-10 08:05:14 +02:00

Andrey Avtomonov b565e44a22

feat: add claude-code llm backend with runtime port (#115 )

* docs: revise claude-code ingest backend spec

* docs: keep claude-code spec focused on ingest

* docs: expand claude-code spec to full llm parity

* Refine claude-code backend spec after adversarial review iteration 1

* Refine claude-code backend spec after adversarial review iteration 2

* Refine claude-code backend spec after adversarial review iteration 3

* feat: recognize claude-code llm backend

* feat: add ktx llm runtime port

* feat: add claude-code llm runtime

* feat: route non-agent llm calls through runtime

* feat: run ingest agents through llm runtime

* feat: support claude-code setup and status

* test: verify claude-code backend runtime

* docs: add claude-code backend v1 runtime plan

* fix: close claude-code runtime isolation checks

* fix: warn on claude-code prompt caching during setup

* chore: verify claude-code v1 closure

* docs: add claude-code backend v1 isolation closure plan

* fix: update claude-code ingest setup guidance

* docs: add claude-code backend v1 ingest guidance closure plan

* docs: align claude-code isolation spec with sdk metadata

* test: cover claude-code host discovery metadata

* fix: tolerate claude-code host discovery metadata

* docs: clarify claude-code host discovery metadata

* docs: add claude-code auth-probe isolation fix plan

* chore: prepare kaelio ktx rc1 release

* chore: add semantic release workflow

* fix: unblock ci checks

* chore(release): 0.1.0-rc.1

* feat: add Claude Code model selection to setup

* fix: keep git maintenance attached in local repos

2026-05-16 12:06:34 +02:00

34 KiB

Raw Permalink Blame History

Brainstorm: `claude-code` backend with full KTX LLM parity

Adds a claude-code backend that gives KTX full parity with the existing ANTHROPIC_API_KEY-based anthropic backend for all KTX LLM calls. The backend uses @anthropic-ai/claude-agent-sdk and reuses the user's existing local Claude Code authentication. Users select it in ktx.yaml.

This is not an implementation plan. It is the revised design after expanding the requirement from "ktx ingest works with Claude Code" to "every KTX LLM call works with Claude Code." The follow-up implementation plan should be written separately.

Core decision

claude-code is a first-class global LLM backend. Any code path that currently works with llm.provider.backend: anthropic must work with llm.provider.backend: claude-code, unless it is not an LLM call at all.

This includes:

Agent loops implemented through AgentRunnerService.runLoop(...).
Text generation through generateKtxText(...).
Structured object generation through generateKtxObject(...).
Local ingest and MCP-triggered local ingest flows.
Page triage and light extraction.
Context-candidate curation and reconciliation.
Memory capture.
Scan/enrichment internals and relationship LLM proposals.
Future KTX LLM call sites that use the shared runtime boundary.

Commands that do not use LLMs do not need special Claude Code behavior. There must be no silent fallback from claude-code to gateway, Anthropic API-key execution, or deterministic output.

Goals

Let a KTX user run all KTX LLM-backed behavior through their existing local Claude Code session without provisioning ANTHROPIC_API_KEY, Vertex credentials, or an AI Gateway key.
Preserve the existing user-facing CLI and MCP behavior. claude-code changes how LLM calls execute, not which KTX workflows exist.
Preserve role-based model selection. llm.models.default, triage, candidateExtraction, curator, reconcile, and repair remain the source of model selection for every LLM call.
Preserve KTX's curated tool boundaries. Claude Code built-ins, filesystem-discovered MCP servers, hooks, skills, plugins, agents, and slash commands must not become invokable in KTX agent loops. The Agent SDK init message may still report host-discovered slash commands, skills, and agents; KTX treats that metadata as diagnostic only and restricts execution through tools: [], exact KTX MCP allowedTools, disallowedTools, and deny-by-default canUseTool.
Keep embeddings independent. Claude does not provide embeddings; users keep configuring ingest.embeddings and scan/enrichment embeddings as they do today.
Fail fast with a clear message if local Claude Code authentication is not usable.

Non-goals

Embedding parity. Embeddings remain separate from LLM execution.
Tool-call repair parity in the first pass. The AI SDK runner uses experimental_repairToolCall (packages/llm/src/repair.ts:35-88). The Claude Agent SDK has no transparent same-step repair hook. MVP behavior is next-turn self-correction from schema errors or a normal tool-failure count.
OTEL telemetry parity in the first pass. The AI SDK runner uses experimental_telemetry. The Agent SDK exposes hooks such as PostToolUseFailure and SessionEnd, but no drop-in OTEL switch. MVP ships without telemetry parity on this backend.
Productizing Claude subscription limits. Documentation must frame this as "use your own local Claude Code session," not as a third-party Claude Max or Claude.ai product feature.

Approaches considered

Recommended: global LLM runtime port

Introduce a backend-neutral KTX LLM runtime port for operations, not just model construction:

interface KtxLlmRuntimePort {
  generateText(input: KtxGenerateTextInput): Promise<string>;
  generateObject<T>(input: KtxGenerateObjectInput<T>): Promise<T>;
  runAgentLoop(params: RunLoopParams): Promise<RunLoopResult>;
}

The existing anthropic, vertex, and gateway backends implement the runtime through the AI SDK and existing KtxLlmProvider. The new claude-code backend implements the same runtime through @anthropic-ai/claude-agent-sdk.

This is the recommended approach because KTX call sites need operations: "generate text," "generate a structured object," and "run an agent loop." They do not inherently need direct access to an AI SDK LanguageModel. The Agent SDK is a session/agent API, not an AI SDK model factory, so the runtime port avoids pretending those APIs are the same.

Rejected: fake AI SDK `LanguageModel` for Claude Code

Trying to make Claude Code look like an AI SDK LanguageModel would be brittle. The Agent SDK owns session execution, permissions, MCP tools, structured output, and result messages. Those semantics do not map cleanly onto a normal getModel(...) return value.

Rejected: branch at every call site

Adding if backend === "claude-code" around each LLM call would work briefly but would duplicate prompt wrapping, structured output handling, debug logging, tool conversion, auth checks, and error mapping. It would also make future LLM call sites easy to miss.

Architecture

ktx.yaml
  llm.provider.backend: anthropic | vertex | gateway | claude-code
  llm.models.<role>: model alias or model ID

createLocalKtxLlmRuntimeFromConfig(project.config.llm)
  -> AiSdkKtxLlmRuntime
     - wraps existing KtxLlmProvider
     - generateText / Output.object / AgentRunnerService
  -> ClaudeCodeKtxLlmRuntime
     - uses @anthropic-ai/claude-agent-sdk query()
     - implements text, object, and agent-loop operations

All KTX LLM call sites
  -> KtxLlmRuntimePort

The runtime is selected at the same boundaries that currently construct an llmProvider or AgentRunnerService:

packages/context/src/llm/local-config.ts
packages/context/src/ingest/local-bundle-runtime.ts
packages/context/src/memory/local-memory.ts
packages/context/src/scan/local-scan.ts
packages/context/src/mcp/local-project-ports.ts
Any CLI setup/status/doctor code that validates LLM readiness

After the change, services should not need to know whether the configured backend is AI SDK based or Claude Code based. They call the runtime operation they need.

LLM call-site migration

The implementation plan must migrate every current KTX LLM call site to the runtime port:

packages/context/src/llm/generation.ts: generateKtxText and generateKtxObject become runtime-backed helpers or are folded into the runtime.
packages/context/src/agent/agent-runner.service.ts: the AI SDK agent loop becomes the AI SDK implementation of runAgentLoop.
packages/context/src/ingest/page-triage/page-triage.service.ts: page triage and light extraction depend on KtxLlmRuntimePort, not raw KtxLlmProvider.
packages/context/src/scan/description-generation.ts: AI descriptions use the runtime text-generation operation.
packages/context/src/scan/relationship-llm-proposal.ts: relationship proposals use the runtime object-generation operation.
packages/context/src/ingest/stages/stage-3-work-units.ts, packages/context/src/ingest/stages/stage-4-reconciliation.ts, packages/context/src/ingest/context-candidates/curator-pagination.service.ts, and packages/context/src/memory/memory-agent.service.ts: agent loops use the runtime agent-loop operation or a thin AgentRunnerPort backed by it.
Test helpers and MCP local project ports that inject llmProvider or agentRunner must either inject the runtime port or use compatibility test adapters during the migration.

The plan must include a grep-based audit so new or overlooked getModel(...), generateKtxText(...), generateKtxObject(...), AgentRunnerService, and llmProvider usages are either migrated or explicitly proven non-runtime.

Config design

The config should make claude-code a first-class backend:

llm:
  provider:
    backend: claude-code
  models:
    default: sonnet
    triage: haiku
    candidateExtraction: sonnet
    curator: sonnet
    reconcile: sonnet
    repair: sonnet

Implementation implications:

Extend KTX_LLM_BACKENDS in packages/context/src/project/config.ts and KtxLlmBackend in packages/llm/src/types.ts.
Update setup, status, doctor, schema generation, examples, and docs so claude-code is understood everywhere anthropic is understood.
Update createKtxLlmProvider / createModelFactory so unsupported backend values throw instead of falling through to gateway.
Keep llm.models as the per-role binding source. The Claude Code runtime maps each KTX role to the configured model string for the current call.
Define accepted model aliases, such as sonnet, opus, and haiku, and full model IDs supported by the pinned SDK version.

Claude Agent SDK runtime behavior

Every Agent SDK call must be isolated enough for KTX execution. Use explicit options even when SDK defaults currently match the desired value.

For agent loops with tools:

query({
  prompt,
  options: {
    cwd: project.projectDir,
    systemPrompt,
    model: resolveModel(modelRole),
    maxTurns: stepBudget,
    settingSources: [],
    skills: [],
    plugins: [],
    mcpServers: { ktx: createSdkMcpServer({ name: "ktx", tools }) },
    tools: [],
    allowedTools: [/* exact mcp__ktx__<toolName> ids generated from the tool map */],
    canUseTool: ktxCanUseTool,
    permissionMode: "dontAsk",
    persistSession: false,
    env: ktxClaudeCodeEnv
  }
});

ktxClaudeCodeEnv is the controlled environment described in "Agent SDK environment and auth boundary" below; it must be passed on every KTX query() call.

For plain text generation:

Use the same query() runtime with maxTurns: 1.
Pass settingSources: [], skills: [], plugins: [], tools: [], permissionMode: "dontAsk", persistSession: false, and env: ktxClaudeCodeEnv.
Do not expose MCP tools unless the KTX call explicitly passed tools.
Return the final result message text.

For structured object generation:

Use the same query() runtime with the Agent SDK structured output option for JSON schema output, plus the same isolation tuple including env: ktxClaudeCodeEnv.
Convert KTX Zod schemas at the runtime boundary.
Parse and validate the returned object with the original KTX schema before returning it to the caller.

The plan must confirm the exact option names against the pinned SDK version, but the required outcome is fixed:

Filesystem settings are not loaded. The SDK's documented default for an omitted settingSources is ["user", "project", "local"] (@anthropic-ai/claude-agent-sdk@0.3.142 sdk.d.ts:1686-1695), which would inherit the user's Claude Code filesystem settings. Every KTX query() call site - agent loops, text generation, object generation, and the auth probe - MUST pass settingSources: [] explicitly, along with skills: [], plugins: [], tools: [], persistSession: false, and no mcpServers entries other than the KTX MCP server (omitted entirely when the call site does not expose tools). The implementation MUST assert from the SDK init message that the controlled execution surface matches KTX's expectations:
- message.tools equals the exact generated KTX MCP tool ids for the current call.
- message.mcp_servers equals the expected KTX MCP server set: [] when the call exposes no tools, or ["ktx"] when it does.
- message.plugins is empty.
The implementation MUST NOT reject a run solely because message.slash_commands, message.skills, or message.agents contain host-discovered names. In @anthropic-ai/claude-agent-sdk@0.3.142, those fields can report host discovery even when KTX passes the isolation options. They are not part of the KTX execution surface when tools: [], allowedTools, disallowedTools, and deny-by-default canUseTool are set.
skills: [] is a context filter in the pinned SDK (sdk.d.ts:1697-1718): unlisted skills are hidden from the model's skill listing and rejected by the Skill tool, but discovered skill names may still appear in init metadata. KTX must still pass skills: [].
Plugins are disabled with plugins: [], and the runtime asserts that message.plugins is empty in the init message.
Built-in tools are disabled by setting tools: []. The pinned SDK type (@anthropic-ai/claude-agent-sdk@0.3.142, sdk.d.ts) documents tools as the base set of built-in tools, with [] meaning "disable all built-ins"; tools does not accept MCP tool ids and cannot be used to restrict MCP availability.
MCP tool availability is granted by registering the KTX MCP server through mcpServers. The SDK does not document a wildcard like mcp__ktx__* for any tool field; KTX must enumerate exact generated MCP tool ids of the form mcp__ktx__<toolName> (derived from the tool map handed to createSdkMcpServer) wherever a list of tool ids is required.
Pre-approval under permissionMode: "dontAsk" is configured by listing those same exact mcp__ktx__<toolName> ids in allowedTools (documented as auto-allow without prompting). Treat allowedTools as auto-approval, not restriction.
Defense-in-depth restriction uses canUseTool. The KTX runtime supplies a canUseTool handler that allows only tool names in the current KTX MCP tool map and denies everything else, so host-discovered slash commands, skills, agents, future SDK defaults, or a misconfigured MCP server cannot expand the execution surface.
disallowedTools MUST additionally list the current built-in tool names (Agent, Task, AskUserQuestion, Bash, Read, Edit, Write, Glob, Grep, WebFetch, WebSearch, TodoWrite) as redundant insurance.
cwd is project.projectDir, resolved at startup via resolveKtxProjectDir, not process.cwd().
Sessions are not persisted unless the plan identifies a concrete debugging feature that needs persistence.

Agent SDK environment and auth boundary

The Agent SDK's query() option env (@anthropic-ai/claude-agent-sdk@0.3.142 sdk.d.ts:1265-1279) is the environment passed to the Claude Code child process and defaults to process.env. Without an explicit env, the SDK inherits the parent's environment, including any ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, gateway/AI-Gateway tokens, GOOGLE_APPLICATION_CREDENTIALS / CLOUD_ML_REGION (Vertex), and AWS_* (Bedrock) credentials — any of which can switch the Claude Code CLI's authentication source to API-key or another provider, bypassing the user's local Claude Code session. That would silently violate the core requirement that claude-code runs through the user's existing local Claude Code session and that there is no silent fallback to gateway, Anthropic API-key, or other provider execution.

Every claude-code query() call site - agent loops, text generation, object generation, and the auth probe - MUST pass an explicit env (ktxClaudeCodeEnv) constructed from process.env with the following denylist removed:

ANTHROPIC_API_KEY
ANTHROPIC_AUTH_TOKEN
ANTHROPIC_BASE_URL
ANTHROPIC_MODEL (provider-routing override)
ANTHROPIC_VERTEX_PROJECT_ID, CLOUD_ML_REGION, GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_CLOUD_PROJECT
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION, AWS_PROFILE
CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX
Any future provider-routing variables the pinned SDK version documents

The denylist is the source of truth and lives next to the runtime constructor so adding a variable is a single-file change.

Acceptance criteria:

The constructed ktxClaudeCodeEnv does not contain any denylisted key, and this is verified by a unit test that seeds each denylisted key in a fake process.env.
The auth probe fails with the same "authenticate Claude Code locally" message even when ANTHROPIC_API_KEY (or any other denylisted credential) is present in process.env and no valid local Claude Code session exists.
Every KTX-originated query() invocation is spied to assert that env was passed and that it does not contain any denylisted key; the test fails if any code path falls back to the SDK default process.env.
The "no silent fallback" rule is preserved end-to-end: a machine with ANTHROPIC_API_KEY set but no local Claude Code authentication still fails setup/status/doctor on claude-code.

Tool boundary

Agent-loop tools cannot remain only raw AI SDK Record<string, Tool> values if two backends must consume them. The plan must define a backend-neutral tool descriptor for the final tool map handed to an agent loop:

interface KtxRuntimeToolDescriptor<TInput, TOutput> {
  name: string;
  description: string;
  inputSchema: z.ZodObject<z.ZodRawShape>;
  execute(input: TInput): Promise<KtxRuntimeToolOutput<TOutput>>;
}

interface KtxRuntimeToolOutput<TOutput> {
  // What the model sees as the tool_result content. Always a markdown string;
  // never a raw JS object. This matches BaseTool's existing
  // `toModelOutput` contract (`packages/context/src/tools/base-tool.ts:154-162`)
  // which sends only markdown to the LLM.
  markdown: string;
  // Out-of-band payload preserved for tool callers (transcripts, debug,
  // verification ledger, downstream KTX consumers). Not sent to the model.
  structured?: TOutput;
}

Every composed tool entry must produce this descriptor shape, including:

BaseTool outputs from factory toolsets, which already return { markdown, structured }.
Source-specific raw tools such as emit_historic_sql_evidence in packages/context/src/ingest/local-bundle-runtime.ts.
Stage-local tools in buildWuToolSet and buildReconcileToolSet.
Inline load_skill, read/raw/span, stage/diff, eviction, and emit tools in packages/context/src/ingest/ingest-bundle.runner.ts.
Memory-agent load_skill in packages/context/src/memory/memory-agent.service.ts.
The withVerificationLedger wrapping layer, whose markdown/structured guard outputs (packages/context/src/ingest/tools/verification-ledger.tool.ts:40-97) already match the contract.

Tool output contract

The runtime defines a single output contract for both backends so the model sees the same content regardless of provider:

Model-visible content: the markdown field, mapped to the Agent SDK tool handler return as { content: [{ type: "text", text: markdown }] } for claude-code, and surfaced through the existing toModelOutput markdown path for AI SDK backends. The model never sees raw JS objects.
Structured payload: the optional structured field, preserved on the in-process tool-result envelope for transcript/debug capture, the verification ledger, and any KTX caller that introspects results. The Claude adapter does not put structured JSON into model-visible content unless an individual call site explicitly opts in.
Normalization of existing raw tools: tools that today return a bare string (e.g. load_skill "Skill not available" responses in packages/context/src/ingest/ingest-bundle.runner.ts:697-721 and :924-936, and packages/context/src/memory/memory-agent.service.ts:128-152) must be wrapped at the descriptor boundary so markdown is the string and structured is omitted. Tools that today return a plain object (e.g. skill payload { name, content, skillDirectory }) must be wrapped so markdown is a deterministic human-readable rendering (e.g. the skill body with a header) and the original object is preserved on structured. No KTX tool may return a raw object as the model-visible payload on the Claude Code backend, because the Agent SDK MCP handler will otherwise stringify it and drop the structured fields.
AI SDK parity: the AI SDK adapter MUST preserve BaseTool's existing toModelOutput markdown-only behavior. Migrating BaseTool-derived tools to the descriptor must not start sending structured JSON to the model.

The AI SDK adapter converts descriptors to tool(...) with a toModelOutput that emits markdown only. The Claude Code adapter converts descriptors to Agent SDK tool(name, description, schema.shape, handler) entries inside createSdkMcpServer(...) and returns { content: [{ type: "text", text: markdown }] }.

Non-object schemas are unsupported for claude-code and must be rejected at startup with a clear error. In practice KTX tool inputs are already z.object.

Stop reasons and failures

The Claude runner maps the SDK's typed SDKResultMessage (union of SDKResultSuccess and SDKResultError in @anthropic-ai/claude-agent-sdk@0.3.142, sdk.d.ts) to RunLoopStopReason = "budget" | "natural" | "error". The mapping must consider three typed signals in this precedence order, because each successive signal may be present where the previous one is absent:

subtype: "error_max_turns" -> "budget"; "success" -> "natural"; other error subtypes ("error_during_execution", "error_max_budget_usd", "error_max_structured_output_retries") -> "error".
terminal_reason (optional TerminalReason field on both success and error results): "max_turns" -> "budget"; "completed" -> "natural"; any other terminal reason such as "blocking_limit", "rapid_refill_breaker", "prompt_too_long", "image_error", "model_error", "aborted_streaming", "aborted_tools", "stop_hook_prevented", "hook_stopped", or "tool_deferred" -> "error".
The assistant message stop_reason: "max_turns" -> "budget"; any other non-null unsuccessful stop reason -> "error".

A max_turns signal arriving through any of the three sources must map to "budget"; the runner MUST NOT classify a max-turn termination as "natural" or as a generic "error" because it was reported via terminal_reason instead of subtype.

Stop hooks are not the authoritative stop-reason source because they do not carry the terminal reason. They remain useful for lifecycle logging. Tool failure counting should use PostToolUseFailure and feed the same mechanism that stage-3-work-units.ts checks through toolFailureCount?(wu.unitKey).

For text and object generation, SDK authentication, billing, rate-limit, permission, max-turn, structured-output, and execution errors must map to the same error surfaces that KTX uses for the Anthropic API-key backend.

Agent-loop progress callbacks

RunLoopParams.onStepFinish (packages/context/src/agent/agent-runner.service.ts:20) is part of the current agent-loop contract. The AI SDK runner increments stepIndex on each generateText step and invokes the callback (agent-runner.service.ts:83-97). KTX consumers depend on this: packages/context/src/ingest/ingest-bundle.runner.ts:782 emits work_unit_step events from it, and :1036 / :1089 update reconciliation progress for the user-visible "Reconciling results · step N" status.

The claude-code runner MUST preserve onStepFinish semantics:

It MUST invoke onStepFinish exactly once per assistant turn (i.e. once per step the SDK reports), incrementing stepIndex starting at 1.
The plan MUST name the concrete SDK stream event used as the step boundary (the implementation plan picks one of the documented assistant/result message events from the pinned SDK version and justifies it). The chosen event must produce the same stepIndex count as the AI SDK runner for an equivalent run: N tool-using turns yield N callbacks.
Callback errors MUST be caught and logged at warn level without aborting the loop, matching agent-runner.service.ts:90-96.
stepBudget passed to the callback MUST equal the maxTurns configured on the SDK query() call.

Acceptance criteria:

A claude-code agent loop run with stepBudget: N produces N work_unit_step events when the loop runs to budget.
A reconciliation run under claude-code produces the same updateProgress calls (count and stepIndex / stepBudget ratio) as the Anthropic API-key backend for an equivalent fixture.
An onStepFinish callback that throws does not surface the error as the loop result.

Prompt caching parity

packages/llm/src/types.ts:44, :61 exposes llm.promptCaching as a config field, and the AI SDK message builder (packages/llm/src/message-builder.ts:62-114, :141-218) applies anthropic.cacheControl: { type: "ephemeral", ttl } markers to the system message, the last history message, and sorted tools, with TTLs split into systemTtl, toolsTtl, and historyTtl. model-provider.test.ts:276 verifies caching is enabled by default with those three TTLs.

The Agent SDK does not expose KTX's marker-based contract. The closest mechanism is systemPrompt: string[] with SYSTEM_PROMPT_DYNAMIC_BOUNDARY (sdk.d.ts:1746-1799), which marks a static prefix as cacheable but provides no per-tool, per-history, or per-TTL knobs.

For the claude-code backend, the spec treats llm.promptCaching as partial parity:

The Claude runtime MAY map a non-empty static system prefix to a cacheable systemPrompt array using SYSTEM_PROMPT_DYNAMIC_BOUNDARY when cacheSystem is enabled in the resolved KtxPromptCachingConfig. The implementation plan decides whether to ship this mapping in the first pass or defer it.
cacheTools, cacheHistory, and the systemTtl / toolsTtl / historyTtl fields have no Agent SDK equivalent. The runtime MUST NOT silently drop them: when a user sets non-default values under llm.promptCaching and the backend is claude-code, status/doctor and the setup wizard MUST surface that these fields are ignored on this backend.
Docs under docs-site/content/docs/ MUST document this divergence in the same pages that describe claude-code setup, so users do not assume the TTL/tool/history knobs apply.

Acceptance criteria:

A claude-code runtime constructed from a config with default promptCaching does not throw and does not pass KTX cacheControl markers to the Agent SDK (the AI-SDK-only markers stay on the AI SDK path).
A claude-code runtime constructed from a config with non-default promptCaching values yields a warning surfaced through doctor/status output identifying the ignored fields.

Auth and setup

ktx setup, status, and doctor flows must validate that Claude Code SDK auth is usable, not just that ~/.claude/ exists. Acceptable validation strategies:

A minimal SDK probe call with settingSources: [], skills: [], plugins: [], tools: [], persistSession: false, no mcpServers, env: ktxClaudeCodeEnv, and maxTurns: 1. The probe MUST NOT rely on the SDK's documented default for any of these fields, because the default for settingSources is ["user", "project", "local"] (loads filesystem settings) and the default for env is process.env (can route auth through ANTHROPIC_API_KEY or other provider credentials and hide a missing local Claude Code session). See "Agent SDK environment and auth boundary" above for the env denylist. The auth probe MUST tolerate init messages with non-empty slash_commands, skills, and agents when message.tools is empty, message.mcp_servers is empty, message.plugins is empty, and the query options contain the KTX isolation tuple. Host discovery metadata is not an auth failure.
An SDK-provided account/auth status method if the pinned version exposes one.
A docs-endorsed file-presence check only if the official SDK docs explicitly state that it proves auth usability.

Failure copy should tell the user to authenticate Claude Code locally with the Claude Code CLI, then rerun setup or the command they attempted.

Documentation impact

Docs updates are required because this changes user-visible setup and LLM provider behavior:

docs-site/content/docs/getting-started/quickstart.mdx
docs-site/content/docs/cli-reference/ktx-setup.mdx
docs-site/content/docs/guides/building-context.mdx
Any config reference page that documents llm.provider.backend
Any status or doctor docs that describe LLM readiness

The docs must say that claude-code uses the user's own local Claude Code session. Do not describe it as a way for KTX to resell, pool, or productize Claude subscription limits.

Verified evidence

Current KtxLlmProvider returns AI SDK LanguageModel instances and only supports anthropic, vertex, and gateway (packages/llm/src/types.ts, packages/llm/src/model-provider.ts).
Project config currently accepts llm.provider.backend: none | anthropic | vertex | gateway (packages/context/src/project/config.ts).
generateKtxText and generateKtxObject are shared non-agent generation helpers (packages/context/src/llm/generation.ts).
AgentRunnerService is the shared AI SDK agent-loop implementation (packages/context/src/agent/agent-runner.service.ts).
Page triage and light extraction currently use raw KtxLlmProvider (packages/context/src/ingest/page-triage/page-triage.service.ts).
Scan/enrichment internals currently use createLocalKtxLlmProviderFromConfig, generateKtxText, and generateKtxObject (packages/context/src/scan/local-scan.ts, packages/context/src/scan/description-generation.ts, packages/context/src/scan/relationship-llm-proposal.ts).
Local ingest and MCP local project ports inject llmProvider and agentRunner today (packages/context/src/ingest/local-bundle-runtime.ts, packages/context/src/mcp/local-project-ports.ts).
The Agent SDK TypeScript reference (@anthropic-ai/claude-agent-sdk@0.3.142, sdk.d.ts:1690-1697 and the sdk.mjs runtime default ["user","project","local"]) documents settingSources defaulting to loading user, project, and local filesystem settings when omitted; passing [] is the explicit opt-out ("SDK isolation mode"). The same reference documents allowedTools as auto-approval rather than restriction, canUseTool as the programmatic permission handler, permissionMode: "dontAsk", tools as the base built-in set with [] meaning "disable all built-ins" and no MCP-id support, disallowedTools, maxTurns, mcpServers, cwd, persistSession, and SDK result/hook message shapes.
SDKResultMessage = SDKResultSuccess | SDKResultError in @anthropic-ai/claude-agent-sdk@0.3.142 (sdk.d.ts); both variants expose an optional terminal_reason: TerminalReason, where TerminalReason includes 'max_turns' | 'completed' alongside other terminal reasons.
The Agent SDK MCP docs and SDK examples (e.g. Context7 /nothflare/claude-agent-sdk-docs custom-tools guide) show registering MCP servers in query() options and listing exact mcp__<server>__<tool> ids in allowedTools; no SDK doc or type currently documents a wildcard form.
BaseTool's toModelOutput already sends only markdown to the model while preserving structured output for callers (packages/context/src/tools/base-tool.ts:154-162); some raw AI SDK tools in packages/context/src/ingest/ingest-bundle.runner.ts:697-721, :924-936 and packages/context/src/memory/memory-agent.service.ts:128-152 currently return bare strings or plain objects and must be normalized at the descriptor boundary so both backends preserve the contract.
The Agent SDK skills docs say the skills option is a context filter rather than a sandbox. KTX must pass skills: [], but must not assert that message.skills is empty in the SDK init message.
Options.env in @anthropic-ai/claude-agent-sdk@0.3.142 (sdk.d.ts:1265-1279) is the environment passed to the Claude Code process and defaults to process.env. Without an explicit env, the SDK inherits the parent environment, including any provider-routing variables (ANTHROPIC_API_KEY, Vertex/Bedrock credentials, gateway tokens) that could change the active authentication source of the Claude Code CLI and hide a missing local Claude Code session.

Open items for the implementation plan

Confirm exact TypeScript option names and result-message discriminants against the pinned @anthropic-ai/claude-agent-sdk version.
Define the final KtxLlmRuntimePort file location and package exports.
Define model alias validation for sonnet, opus, haiku, and full model IDs.
Define the auth probe and make setup/status/doctor report actionable messages.
Run a repo-wide audit for all LLM call sites and migrate each one to the runtime boundary.
Write tests proving claude-code works for text generation, structured object generation, and agent-loop execution.
Write tests proving page triage, scan/enrichment internals, memory capture, MCP-triggered local ingest, and normal local ingest all use the claude-code runtime when configured.
Write tests proving a raw built-in Claude Code tool request is denied, host-discovered Skill/Agent/SlashCommand requests are denied by canUseTool, and only exact mcp__ktx__* tools are allowed during KTX agent loops.
Write a test that asserts every KTX-originated query() invocation (agent loop, text generation, object generation, auth probe) is called with settingSources: [], skills: [], plugins: [], tools: [], and persistSession: false, by spying on the SDK entry point. The test must fail if any path falls back to SDK defaults for those fields. The test must also prove that non-empty host-discovered slash_commands, skills, and agents in the init message do not fail the auth probe or runtime when the controlled tool, MCP server, and plugin surfaces match KTX expectations.
Write a test that asserts onStepFinish is invoked the expected number of times for a fixed-budget claude-code agent loop, including the work-unit and reconciliation progress paths.
Write a test that asserts every KTX-originated query() invocation (agent loop, text generation, object generation, auth probe) is called with an explicit env and that none of the denylisted provider-routing variables (ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, ANTHROPIC_MODEL, ANTHROPIC_VERTEX_PROJECT_ID, CLOUD_ML_REGION, GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_CLOUD_PROJECT, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION, AWS_PROFILE, CLAUDE_CODE_USE_BEDROCK, CLAUDE_CODE_USE_VERTEX) are present in that env, by seeding each variable in a fake process.env. The test must also assert that the auth probe still fails when ANTHROPIC_API_KEY is set in process.env but no local Claude Code session exists.

34 KiB Raw Permalink Blame History

Brainstorm: claude-code backend with full KTX LLM parity

Core decision

Goals

Non-goals

Approaches considered

Recommended: global LLM runtime port

Rejected: fake AI SDK LanguageModel for Claude Code

Rejected: branch at every call site

Architecture

LLM call-site migration

Config design

Claude Agent SDK runtime behavior

Agent SDK environment and auth boundary

Tool boundary

Tool output contract

Stop reasons and failures

Agent-loop progress callbacks

Prompt caching parity

Auth and setup

Documentation impact

Verified evidence

Open items for the implementation plan

34 KiB

Raw Permalink Blame History

Brainstorm: `claude-code` backend with full KTX LLM parity

Rejected: fake AI SDK `LanguageModel` for Claude Code