feat(cli): profile ingest runs and split model vs tool time (#249)

* feat(cli): profile ingest runs to find where wall-clock time goes Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and agent loop now records durationMs / step count / token usage in the trace, and a post-run aggregator rolls them up into a "where did the time go" report printed to stderr. Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json -> raw structured profile) or persistently via `ingest.profile` in ktx.yaml. The json form emits raw milliseconds, token counts, and a summary.headline one-line diagnosis so coding agents can parse it directly; json wins when both env and config request profiling. - runtime-port: RunLoopMetrics (totalMs, usage, stepCount, stepBoundariesMs) plus onMetrics callbacks on text/object generation - ai-sdk + claude-code runtimes: capture per-loop timing and token usage - work-unit-executor and stages 3/4: thread metrics into trace events - ingest-bundle.runner: time worktree / triage / clustering / index / reconcile / squash phases and emit the profile in a finally block (best-effort; never affects the run outcome) - ingest-profile: new trace+transcript aggregator with table/json formatters - config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx * fix(cli): flush tool-call logs before reading ingest profile Tool transcripts are appended fire-and-forget so the agent hot path never blocks on logging. The ingest profiler read them before the writes settled, so per-work-unit toolMs (and the model-vs-tool split derived from it) could be incomplete. Track in-flight appends and expose flushToolCallLogs() — bounded by a timeout so it can never hang — and flush before the profiler reads the transcript.
2026-06-13 08:15:14 +02:00 · 2026-06-01 15:49:17 +02:00 · 2026-06-01 15:49:17 +02:00 · 21744fc520
commit 21744fc520
parent 22ddf5524c
20 changed files with 1243 additions and 56 deletions
--- a/docs-site/content/docs/cli-reference/ktx-ingest.mdx
+++ b/docs-site/content/docs/cli-reference/ktx-ingest.mdx
@ -143,6 +143,42 @@ verbosity:
 KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase
 ```

+### Profiling a slow ingest
+
+Each timed phase and work unit records a `durationMs` in the trace, and each
+agent loop records its step count and token usage. To see where wall-clock time
+went, enable profiling and **ktx** prints a rolled-up breakdown to stderr at the
+end of the run. There are two ways to turn it on, and two output formats.
+
+Turn it on per run with the `KTX_PROFILE_INGEST` environment variable, or
+persistently with `ingest.profile` in `ktx.yaml` (useful for CI or while
+iterating on a slow source):
+
+```bash
+KTX_PROFILE_INGEST=1 ktx ingest metabase       # human-readable table
+KTX_PROFILE_INGEST=json ktx ingest metabase    # raw JSON for coding agents
+```
+
+```yaml
+ingest:
+  profile: true   # human table; use "json" for the machine-readable form
+```
+
+Both formats report total wall time, time per phase, and the slowest work units,
+splitting each work unit's agent-loop time into model time versus tool-execution
+time. The `json` form emits the full structured profile (raw milliseconds and
+token counts, stable keys) plus a `summary.headline` one-line diagnosis, so a
+coding agent can parse it directly instead of scraping the table. If both the env
+var and the config request profiling, `json` wins. Example headline:
+
+```text
+Slowest phase: reconciliation (2m 05s, 48% of wall time). 2 work units (1 failed), ~88% model generation vs ~12% tools.
+```
+
+Work units run serially by default (`ingest.workUnits.maxConcurrency` is `1`);
+raise it in `ktx.yaml` if the profile shows the run is bound by serialized
+work-unit agent loops.
+
 ## Common errors

 | Error | Cause | Recovery |