Make telemetry reliable across interrupts and headless installs

Three reliability gaps surfaced while auditing why PostHog numbers were
untrustworthy:

1. Interrupted commands lost their events. capture() is fire-and-forget and the
   only flush guarantee lived in a finally block, which SIGINT/SIGTERM skip — so
   Ctrl-C'ing a long ingest or an MCP client killing 'ktx mcp stdio' dropped the
   command event and any queued events. Add SIGINT/SIGTERM handlers (real-process
   entry only; never under test/programmatic io) that mark the active command
   span aborted, emit it, drain the emitter, then exit. Idempotent with the
   normal finally path via the single-consume command span.

2. Headless-first installs were invisible. loadTelemetryIdentity refused to mint
   an installId unless stdout was a TTY, so a machine whose first run was an
   IDE-launched MCP server or a script emitted nothing, ever. Mint on first run
   regardless of surface (still honoring CI/DO_NOT_TRACK/KTX_TELEMETRY_DISABLED),
   writing the one-time notice to stderr — safe under the MCP stdio protocol,
   which reserves stdout. Drop the now-unused stdoutIsTTY option.

3. No guard against silent emit regressions (the 0.7.0 scan_completed blackout).
   Add tests: the shared executePublicIngestTarget chokepoint emits exactly one
   ingest_completed on success and on the preflight-failure branch, and a
   database target invokes the scan that emits scan_completed; plus coverage for
   the aborted-flush helper.

Identity is unchanged otherwise: every event still attributes to the installId
in ~/.ktx/telemetry.json. No event/field changes, so Node<->Python schema parity
is untouched. Docs updated to reflect first-run-on-any-surface activation.
This commit is contained in:
Andrey Avtomonov 2026-06-02 23:19:37 +02:00
parent 2334a4b6e3
commit cb6a67c2d7
7 changed files with 219 additions and 66 deletions

View file

@ -6,11 +6,10 @@ description: Understand what usage telemetry ktx collects and how to opt out.
**ktx** collects aggregated usage telemetry so maintainers can see
which commands work, where setup fails, and which parts of the data-agent
workflow need improvement. Telemetry is opt-out: it turns on the first time you
run **ktx** in an interactive terminal, which prints a one-time notice. From
then on the same install also reports background activity that has no terminal
of its own, such as the local MCP server your agent calls. It stays disabled in
CI, whenever an opt-out is set, and until that first interactive run has shown
the notice.
run **ktx** in any way — an interactive command, a script, or an
agent-launched MCP server — and prints a one-time notice (to the terminal when
there is one, otherwise to standard error). It stays disabled in CI and whenever
an opt-out is set.
## Opt out