chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
|
|
|
import { KtxMessageBuilder, splitKtxSystemMessages } from '../../llm/message-builder.js';
|
|
|
|
|
import type { KtxLlmProvider } from '../../llm/types.js';
|
2026-05-16 12:06:34 +02:00
|
|
|
import { generateText, Output, stepCountIs, type FlexibleSchema, type TelemetrySettings, type ToolSet } from 'ai';
|
|
|
|
|
import type { z } from 'zod';
|
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
|
|
|
import { noopLogger, type KtxLogger } from '../../context/core/config.js';
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
import { isAbortError } from '../core/abort.js';
|
2026-05-16 12:06:34 +02:00
|
|
|
import { summarizeKtxLlmDebugRequest, type KtxLlmDebugRequestRecorder } from './debug-request-recorder.js';
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
import type { RateLimitGovernor, RateLimitProvider, RateLimitSignal } from './rate-limit-governor.js';
|
2026-05-16 12:06:34 +02:00
|
|
|
import { createAiSdkToolSet } from './runtime-tools.js';
|
|
|
|
|
import type {
|
|
|
|
|
KtxGenerateObjectInput,
|
|
|
|
|
KtxGenerateTextInput,
|
|
|
|
|
KtxLlmRuntimePort,
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
LlmTokenUsage,
|
2026-05-16 12:06:34 +02:00
|
|
|
RunLoopParams,
|
|
|
|
|
RunLoopResult,
|
|
|
|
|
} from './runtime-port.js';
|
|
|
|
|
|
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
|
|
|
interface AgentTelemetryPort {
|
2026-05-16 12:06:34 +02:00
|
|
|
createTelemetry(tags: Record<string, string>): TelemetrySettings;
|
|
|
|
|
}
|
|
|
|
|
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
interface MaybeUsage {
|
|
|
|
|
inputTokens?: number;
|
|
|
|
|
outputTokens?: number;
|
|
|
|
|
totalTokens?: number;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function toLlmTokenUsage(usage: MaybeUsage | undefined): LlmTokenUsage {
|
|
|
|
|
if (!usage) {
|
|
|
|
|
return {};
|
|
|
|
|
}
|
|
|
|
|
return {
|
|
|
|
|
...(usage.inputTokens !== undefined ? { inputTokens: usage.inputTokens } : {}),
|
|
|
|
|
...(usage.outputTokens !== undefined ? { outputTokens: usage.outputTokens } : {}),
|
|
|
|
|
...(usage.totalTokens !== undefined ? { totalTokens: usage.totalTokens } : {}),
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-16 12:06:34 +02:00
|
|
|
export interface AiSdkKtxLlmRuntimeDeps {
|
|
|
|
|
llmProvider: KtxLlmProvider;
|
|
|
|
|
telemetry?: AgentTelemetryPort;
|
|
|
|
|
logger?: KtxLogger;
|
|
|
|
|
debugRequestRecorder?: KtxLlmDebugRequestRecorder;
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
rateLimitGovernor?: Pick<RateLimitGovernor, 'waitForReady' | 'report' | 'maxRetryAttempts'>;
|
2026-05-16 12:06:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function hasTools(tools: Record<string, unknown>): boolean {
|
|
|
|
|
return Object.keys(tools).length > 0;
|
|
|
|
|
}
|
|
|
|
|
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
function modelProviderName(model: unknown): RateLimitProvider {
|
|
|
|
|
const provider = (model as { provider?: string }).provider ?? '';
|
|
|
|
|
return provider.includes('vertex') || provider.includes('google') ? 'vertex' : 'anthropic-api';
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
interface HeaderLimitPair {
|
|
|
|
|
limit: string;
|
|
|
|
|
remaining: string;
|
|
|
|
|
rateLimitType: string;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const RATE_LIMIT_HEADER_PAIRS: HeaderLimitPair[] = [
|
|
|
|
|
{
|
|
|
|
|
limit: 'anthropic-ratelimit-requests-limit',
|
|
|
|
|
remaining: 'anthropic-ratelimit-requests-remaining',
|
|
|
|
|
rateLimitType: 'rpm',
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
limit: 'anthropic-ratelimit-tokens-limit',
|
|
|
|
|
remaining: 'anthropic-ratelimit-tokens-remaining',
|
|
|
|
|
rateLimitType: 'tpm',
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
limit: 'anthropic-ratelimit-input-tokens-limit',
|
|
|
|
|
remaining: 'anthropic-ratelimit-input-tokens-remaining',
|
|
|
|
|
rateLimitType: 'itpm',
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
limit: 'anthropic-ratelimit-output-tokens-limit',
|
|
|
|
|
remaining: 'anthropic-ratelimit-output-tokens-remaining',
|
|
|
|
|
rateLimitType: 'otpm',
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
limit: 'x-ratelimit-limit-requests',
|
|
|
|
|
remaining: 'x-ratelimit-remaining-requests',
|
|
|
|
|
rateLimitType: 'rpm',
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
limit: 'x-ratelimit-limit-tokens',
|
|
|
|
|
remaining: 'x-ratelimit-remaining-tokens',
|
|
|
|
|
rateLimitType: 'tpm',
|
|
|
|
|
},
|
|
|
|
|
];
|
|
|
|
|
|
|
|
|
|
function normalizeHeaders(headers: unknown): Record<string, string> {
|
|
|
|
|
if (!headers || typeof headers !== 'object') {
|
|
|
|
|
return {};
|
|
|
|
|
}
|
|
|
|
|
const get = (headers as { get?: unknown }).get;
|
|
|
|
|
if (typeof get === 'function') {
|
|
|
|
|
const out: Record<string, string> = {};
|
|
|
|
|
for (const pair of RATE_LIMIT_HEADER_PAIRS) {
|
|
|
|
|
const limit = get.call(headers, pair.limit);
|
|
|
|
|
const remaining = get.call(headers, pair.remaining);
|
|
|
|
|
if (typeof limit === 'string') out[pair.limit] = limit;
|
|
|
|
|
if (typeof remaining === 'string') out[pair.remaining] = remaining;
|
|
|
|
|
}
|
|
|
|
|
return out;
|
|
|
|
|
}
|
|
|
|
|
return Object.fromEntries(
|
|
|
|
|
Object.entries(headers as Record<string, unknown>)
|
|
|
|
|
.filter((entry): entry is [string, string | number] => typeof entry[1] === 'string' || typeof entry[1] === 'number')
|
|
|
|
|
.map(([key, value]) => [key.toLowerCase(), String(value)]),
|
|
|
|
|
);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function numericHeader(headers: Record<string, string>, key: string): number | undefined {
|
|
|
|
|
const value = Number(headers[key]);
|
|
|
|
|
return Number.isFinite(value) && value >= 0 ? value : undefined;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function utilizationForPair(headers: Record<string, string>, pair: HeaderLimitPair): number | undefined {
|
|
|
|
|
const limit = numericHeader(headers, pair.limit);
|
|
|
|
|
const remaining = numericHeader(headers, pair.remaining);
|
|
|
|
|
if (limit === undefined || remaining === undefined || limit <= 0) {
|
|
|
|
|
return undefined;
|
|
|
|
|
}
|
|
|
|
|
return 1 - Math.min(limit, remaining) / limit;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function aiSdkHeaderRateLimitSignal(provider: RateLimitProvider, result: unknown): RateLimitSignal | undefined {
|
|
|
|
|
const headers = normalizeHeaders((result as { response?: { headers?: unknown } }).response?.headers);
|
|
|
|
|
let best: { utilization: number; rateLimitType: string } | undefined;
|
|
|
|
|
for (const pair of RATE_LIMIT_HEADER_PAIRS) {
|
|
|
|
|
const utilization = utilizationForPair(headers, pair);
|
|
|
|
|
if (utilization === undefined) {
|
|
|
|
|
continue;
|
|
|
|
|
}
|
|
|
|
|
if (!best || utilization > best.utilization) {
|
|
|
|
|
best = { utilization, rateLimitType: pair.rateLimitType };
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (!best) {
|
|
|
|
|
return undefined;
|
|
|
|
|
}
|
|
|
|
|
return {
|
|
|
|
|
provider,
|
|
|
|
|
status: 'allowed',
|
|
|
|
|
rateLimitType: best.rateLimitType,
|
|
|
|
|
utilization: Number(best.utilization.toFixed(4)),
|
|
|
|
|
};
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function retryAfterMs(error: unknown): number | undefined {
|
|
|
|
|
const value = (error as { retryAfter?: unknown }).retryAfter;
|
|
|
|
|
if (typeof value === 'number' && Number.isFinite(value) && value > 0) {
|
|
|
|
|
return value < 1_000 ? value * 1_000 : value;
|
|
|
|
|
}
|
|
|
|
|
return undefined;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function isAiSdkRateLimitError(error: unknown): boolean {
|
|
|
|
|
const record = error as { name?: string; statusCode?: number; status?: number };
|
|
|
|
|
return record.name === 'TooManyRequestsError' || record.statusCode === 429 || record.status === 429;
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-16 12:06:34 +02:00
|
|
|
export class AiSdkKtxLlmRuntime implements KtxLlmRuntimePort {
|
|
|
|
|
private readonly logger: KtxLogger;
|
|
|
|
|
|
|
|
|
|
constructor(private readonly deps: AiSdkKtxLlmRuntimeDeps) {
|
|
|
|
|
this.logger = deps.logger ?? noopLogger;
|
|
|
|
|
}
|
|
|
|
|
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
private async generateTextWithRateLimitRetry<T>(
|
|
|
|
|
provider: RateLimitProvider,
|
|
|
|
|
abortSignal: AbortSignal | undefined,
|
|
|
|
|
run: () => Promise<T>,
|
|
|
|
|
): Promise<T> {
|
|
|
|
|
// maxRetryAttempts() returns 1 when no governor is present or pacing is
|
|
|
|
|
// disabled, so a 429 throws immediately instead of hammering the provider
|
|
|
|
|
// with no backoff; the AI SDK's own maxRetries still handles transient 429s.
|
|
|
|
|
const maxAttempts = this.deps.rateLimitGovernor?.maxRetryAttempts() ?? 1;
|
|
|
|
|
let attempt = 0;
|
|
|
|
|
while (true) {
|
|
|
|
|
await this.deps.rateLimitGovernor?.waitForReady(abortSignal);
|
|
|
|
|
try {
|
|
|
|
|
const result = await run();
|
|
|
|
|
const signal = aiSdkHeaderRateLimitSignal(provider, result);
|
|
|
|
|
if (signal) {
|
|
|
|
|
this.deps.rateLimitGovernor?.report(signal);
|
|
|
|
|
}
|
|
|
|
|
return result;
|
|
|
|
|
} catch (error) {
|
|
|
|
|
if (isAbortError(error) || !isAiSdkRateLimitError(error) || attempt >= maxAttempts - 1) {
|
|
|
|
|
throw error;
|
|
|
|
|
}
|
|
|
|
|
attempt += 1;
|
|
|
|
|
const retryAfter = retryAfterMs(error);
|
|
|
|
|
this.deps.rateLimitGovernor?.report({
|
|
|
|
|
provider,
|
|
|
|
|
status: 'rejected',
|
|
|
|
|
rateLimitType: 'http_429',
|
|
|
|
|
...(retryAfter !== undefined ? { retryAfterMs: retryAfter } : {}),
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-16 12:06:34 +02:00
|
|
|
async generateText(input: KtxGenerateTextInput): Promise<string> {
|
|
|
|
|
const model = this.deps.llmProvider.getModel(input.role);
|
|
|
|
|
if ((model as { provider?: string }).provider === 'deterministic') {
|
|
|
|
|
return `Deterministic description for ${input.prompt.slice(0, 64).trim() || 'data source'}`;
|
|
|
|
|
}
|
|
|
|
|
const tools = createAiSdkToolSet(input.tools ?? {});
|
|
|
|
|
const built = new KtxMessageBuilder(this.deps.llmProvider).wrapSimple({
|
|
|
|
|
system: input.system,
|
|
|
|
|
messages: [{ role: 'user', content: input.prompt }],
|
|
|
|
|
tools,
|
|
|
|
|
model,
|
|
|
|
|
});
|
|
|
|
|
const split = splitKtxSystemMessages(built.messages);
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
const startedAt = Date.now();
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
const request = {
|
2026-05-16 12:06:34 +02:00
|
|
|
model,
|
|
|
|
|
temperature: input.temperature ?? 0,
|
|
|
|
|
...(split.system ? { system: split.system } : {}),
|
|
|
|
|
messages: split.messages,
|
|
|
|
|
tools: built.tools as ToolSet,
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
...(input.abortSignal ? { abortSignal: input.abortSignal } : {}),
|
2026-05-16 12:06:34 +02:00
|
|
|
...(hasTools(tools)
|
|
|
|
|
? {
|
|
|
|
|
experimental_repairToolCall: this.deps.llmProvider.repairToolCallHandler({
|
|
|
|
|
source: `ktx-${input.role}`,
|
|
|
|
|
}),
|
|
|
|
|
}
|
|
|
|
|
: {}),
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
};
|
|
|
|
|
const result = await this.generateTextWithRateLimitRetry(modelProviderName(model), input.abortSignal, () => generateText(request));
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
input.onMetrics?.({ totalMs: Date.now() - startedAt, usage: toLlmTokenUsage(result.totalUsage ?? result.usage) });
|
2026-05-16 12:06:34 +02:00
|
|
|
if (typeof result.text !== 'string') {
|
|
|
|
|
throw new Error('KTX LLM text generation returned no text');
|
|
|
|
|
}
|
|
|
|
|
return result.text;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async generateObject<TOutput, TSchema extends z.ZodType<TOutput>>(
|
|
|
|
|
input: KtxGenerateObjectInput<TOutput, TSchema>,
|
|
|
|
|
): Promise<TOutput> {
|
|
|
|
|
const model = this.deps.llmProvider.getModel(input.role);
|
|
|
|
|
const tools = createAiSdkToolSet(input.tools ?? {});
|
|
|
|
|
const built = new KtxMessageBuilder(this.deps.llmProvider).wrapSimple({
|
|
|
|
|
system: input.system,
|
|
|
|
|
messages: [{ role: 'user', content: input.prompt }],
|
|
|
|
|
tools,
|
|
|
|
|
model,
|
|
|
|
|
});
|
|
|
|
|
const split = splitKtxSystemMessages(built.messages);
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
const startedAt = Date.now();
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
const request = {
|
2026-05-16 12:06:34 +02:00
|
|
|
model,
|
|
|
|
|
temperature: input.temperature ?? 0,
|
|
|
|
|
...(split.system ? { system: split.system } : {}),
|
|
|
|
|
messages: split.messages,
|
|
|
|
|
tools: built.tools as ToolSet,
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
...(input.abortSignal ? { abortSignal: input.abortSignal } : {}),
|
2026-05-16 12:06:34 +02:00
|
|
|
...(hasTools(tools)
|
|
|
|
|
? {
|
|
|
|
|
experimental_repairToolCall: this.deps.llmProvider.repairToolCallHandler({
|
|
|
|
|
source: `ktx-${input.role}`,
|
|
|
|
|
}),
|
|
|
|
|
}
|
|
|
|
|
: {}),
|
|
|
|
|
output: Output.object({ schema: input.schema as unknown as FlexibleSchema<TOutput> }),
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
};
|
|
|
|
|
const result = await this.generateTextWithRateLimitRetry(modelProviderName(model), input.abortSignal, () => generateText(request));
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
input.onMetrics?.({ totalMs: Date.now() - startedAt, usage: toLlmTokenUsage(result.totalUsage ?? result.usage) });
|
2026-05-16 12:06:34 +02:00
|
|
|
if (result.output == null) {
|
|
|
|
|
throw new Error('KTX LLM object generation returned no output');
|
|
|
|
|
}
|
|
|
|
|
return result.output as TOutput;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async runAgentLoop(params: RunLoopParams): Promise<RunLoopResult> {
|
|
|
|
|
let stepIndex = 0;
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
const startedAt = Date.now();
|
|
|
|
|
const stepBoundariesMs: number[] = [];
|
2026-05-16 12:06:34 +02:00
|
|
|
try {
|
|
|
|
|
const model = this.deps.llmProvider.getModel(params.modelRole);
|
|
|
|
|
const tools = createAiSdkToolSet(params.toolSet);
|
|
|
|
|
const builder = new KtxMessageBuilder(this.deps.llmProvider);
|
|
|
|
|
const built = builder.wrapSimple({
|
|
|
|
|
system: params.systemPrompt,
|
|
|
|
|
messages: [{ role: 'user', content: params.userPrompt }],
|
|
|
|
|
tools,
|
|
|
|
|
model,
|
|
|
|
|
});
|
|
|
|
|
const promptMessages = splitKtxSystemMessages(built.messages);
|
|
|
|
|
|
|
|
|
|
await this.deps.debugRequestRecorder?.record(
|
|
|
|
|
summarizeKtxLlmDebugRequest({
|
|
|
|
|
operationName: params.telemetryTags.operationName ?? 'ktx-agent-runner',
|
|
|
|
|
source: params.telemetryTags.source,
|
|
|
|
|
jobId: params.telemetryTags.jobId,
|
|
|
|
|
unitKey: params.telemetryTags.unitKey,
|
|
|
|
|
modelRole: params.modelRole,
|
|
|
|
|
modelId: (model as { modelId?: string }).modelId ?? params.modelRole,
|
|
|
|
|
messages: built.messages,
|
|
|
|
|
tools: built.tools as Record<string, { providerOptions?: unknown }>,
|
|
|
|
|
}),
|
|
|
|
|
);
|
|
|
|
|
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
const request = {
|
2026-05-16 12:06:34 +02:00
|
|
|
model,
|
|
|
|
|
temperature: 0,
|
|
|
|
|
stopWhen: stepCountIs(params.stepBudget),
|
|
|
|
|
experimental_telemetry: this.deps.telemetry?.createTelemetry(params.telemetryTags) ?? this.deps.llmProvider.telemetryConfig(),
|
|
|
|
|
experimental_repairToolCall: this.deps.llmProvider.repairToolCallHandler({
|
|
|
|
|
source: params.telemetryTags.operationName ?? 'ktx-agent-runner',
|
|
|
|
|
}),
|
|
|
|
|
...(promptMessages.system ? { system: promptMessages.system } : {}),
|
|
|
|
|
messages: promptMessages.messages,
|
|
|
|
|
tools: built.tools as ToolSet,
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
...(params.abortSignal ? { abortSignal: params.abortSignal } : {}),
|
2026-05-16 12:06:34 +02:00
|
|
|
onStepFinish: async () => {
|
|
|
|
|
stepIndex += 1;
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
stepBoundariesMs.push(Date.now() - startedAt);
|
2026-05-16 12:06:34 +02:00
|
|
|
if (!params.onStepFinish) {
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
try {
|
|
|
|
|
await params.onStepFinish({ stepIndex, stepBudget: params.stepBudget });
|
|
|
|
|
} catch (err) {
|
|
|
|
|
this.logger.warn(
|
|
|
|
|
`[agent-runner] onStepFinish callback threw; ignoring: ${
|
|
|
|
|
err instanceof Error ? err.message : String(err)
|
|
|
|
|
}`,
|
|
|
|
|
);
|
|
|
|
|
}
|
|
|
|
|
},
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
};
|
|
|
|
|
const result = await this.generateTextWithRateLimitRetry(modelProviderName(model), params.abortSignal, () => generateText(request));
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
return {
|
|
|
|
|
stopReason: 'natural',
|
|
|
|
|
metrics: {
|
|
|
|
|
totalMs: Date.now() - startedAt,
|
|
|
|
|
stepCount: stepIndex,
|
|
|
|
|
stepBoundariesMs,
|
|
|
|
|
usage: toLlmTokenUsage(result.totalUsage ?? result.usage),
|
|
|
|
|
},
|
|
|
|
|
};
|
2026-05-16 12:06:34 +02:00
|
|
|
} catch (error) {
|
feat(cli): add ingest LLM rate-limit governor with paced retries (#261)
* feat(cli): add ingest rate limit governor
* feat(cli): wire ingest rate-limit config
* feat(cli): report provider rate-limit signals
* feat(cli): show ingest rate-limit waits
* fix(cli): complete rate-limit event coverage
* fix(cli): abort ingest provider calls cleanly
* fix(cli): propagate ingest cancellation
* fix(cli): reject pre-aborted ingest rate-limit waits
* fix(cli): honor Claude rate-limit reset waits
* fix(cli): retry thrown Codex rate-limit failures
* fix(cli): type Claude rate-limit result details
* fix(cli): emit ingest rate-limit countdowns from rejected signals
* fix(cli): report ai sdk rate-limit header utilization
* fix(cli): gate LLM rate-limit retries on the governor budget
The AI SDK and Codex runtimes retried 429 / opaque rate-limit failures up
to 6-7 times with no backoff when constructed without a RateLimitGovernor
(scan, memory, setup) or with pacing disabled, ignoring Retry-After and
worsening the limit. The outer retry loop only cooperates with the
governor's pause, so without active pacing there is no backoff to apply.
Route the retry bound through a single source: RateLimitGovernor
.maxRetryAttempts(), which returns retry.maxAttempts when enabled and 1
(no outer retry) when absent or disabled. All three runtimes (ai-sdk,
codex, claude-code) now use it, so ingest.rateLimit.retry.maxAttempts
genuinely controls attempts and the hard-coded 6 (plus Codex's off-by-one
extra attempt) is gone. Backend-native retry (e.g. the AI SDK's maxRetries)
still handles transient 429s.
Also correct the ktx.yaml docs for maxWaitMs (caps each wait, not the whole
run) and maxAttempts, and sync uv.lock ktx-sl/ktx-daemon to 0.9.0.
2026-06-05 12:10:27 +02:00
|
|
|
if (isAbortError(error)) {
|
|
|
|
|
throw error;
|
|
|
|
|
}
|
2026-05-16 12:06:34 +02:00
|
|
|
const err = error instanceof Error ? error : new Error(String(error));
|
|
|
|
|
this.logger.warn(`[agent-runner] loop failed: ${err.message}`);
|
feat(cli): profile ingest runs and split model vs tool time (#249)
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
2026-06-01 15:49:17 +02:00
|
|
|
return {
|
|
|
|
|
stopReason: 'error',
|
|
|
|
|
error: err,
|
|
|
|
|
metrics: { totalMs: Date.now() - startedAt, stepCount: stepIndex, stepBoundariesMs, usage: {} },
|
|
|
|
|
};
|
2026-05-16 12:06:34 +02:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|