* feat: add codex sdk runner foundation
* feat: parse codex runtime events
* feat: expose codex runtime mcp tools
* feat: add codex llm runtime
* feat: wire codex llm backend
* test: avoid Array.fromAsync in codex runner test
* docs: document codex llm backend
* fix: tighten codex runtime config ownership
* fix: use codex sdk env and thread options
* fix: parse codex sdk event shapes
* test: add codex backend live smoke
* docs: clarify codex backend isolation
* fix: drive codex loop metrics from mcp events
* fix: enforce codex local step budget
* docs: disclose codex isolation limits
* fix: count all codex agent steps and stream step callbacks live
The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.
collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.
* test: align ingest llm-guard assertions with codex backend
The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.
* fix: treat codex error:null tool calls as success
The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.
Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.
* fix: default codex backend to gpt-5.5 and report real probe errors
The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.
- Default codex model is now gpt-5.5 (works on both subscription and API-key
auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
failure and surfaces the real API error: collectEvents retains stream
events when the SDK throws on a non-zero exit, and the API error JSON
envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.
* fix: require llm.models.default in status and match codex probe remediation
Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.
Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.
Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.
Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
* feat(cli): profile ingest runs to find where wall-clock time goes
Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and
agent loop now records durationMs / step count / token usage in the
trace, and a post-run aggregator rolls them up into a "where did the
time go" report printed to stderr.
Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json ->
raw structured profile) or persistently via `ingest.profile` in
ktx.yaml. The json form emits raw milliseconds, token counts, and a
summary.headline one-line diagnosis so coding agents can parse it
directly; json wins when both env and config request profiling.
- runtime-port: RunLoopMetrics (totalMs, usage, stepCount,
stepBoundariesMs) plus onMetrics callbacks on text/object generation
- ai-sdk + claude-code runtimes: capture per-loop timing and token usage
- work-unit-executor and stages 3/4: thread metrics into trace events
- ingest-bundle.runner: time worktree / triage / clustering / index /
reconcile / squash phases and emit the profile in a finally block
(best-effort; never affects the run outcome)
- ingest-profile: new trace+transcript aggregator with table/json formatters
- config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx
* fix(cli): flush tool-call logs before reading ingest profile
Tool transcripts are appended fire-and-forget so the agent hot path never
blocks on logging. The ingest profiler read them before the writes settled,
so per-work-unit toolMs (and the model-vs-tool split derived from it) could
be incomplete. Track in-flight appends and expose flushToolCallLogs() —
bounded by a timeout so it can never hang — and flush before the profiler
reads the transcript.
* feat(completion): complete known argument values
* fix(completion): hide Commander-hidden subcommands from completions
Replace the `__`-prefix name heuristic with Commander's `_hidden` flag so
internal subcommands registered with { hidden: true } (e.g. `mcp serve-internal`)
are excluded from completions, mirroring `ktx --help`.
* test: cover wiki and sl read command routing
* test: cover raw wiki and sl reads
* feat: add wiki read command
* feat: add sl read command
* feat: complete read command entity names
* docs: document wiki and sl read commands
* test: include read commands in command tree
* feat(sl): read and validate unique sources by name
* feat(sl): make read and validate connection id optional
* fix(completion): dedupe semantic source names
* docs(sl): document connection-optional read and validate
* fix(sl): require connection id for query command
* docs(sl): clarify query connection requirement
* fix(completion): don't resolve option values as subcommands
resolveCommand skipped flag tokens but not the value consumed by a
value-taking option in the `--flag value` form, so a connection id like
`query` was matched as the `sl query` subcommand and yielded no `sl`
completions. Track value-taking options and skip their consumed value
before matching subcommands.
* test(telemetry): assert first-run notice via TELEMETRY_NOTICE constant
CI (which tests this branch merged with main) failed because #243 changed
the first-run notice wording in identity.ts (dropped "anonymous") but left
this test grepping for the old literal 'ktx collects anonymous usage data',
so indexOf returned -1. Assert against the exported TELEMETRY_NOTICE
constant instead so the test tracks the source of truth and cannot drift
when the notice text changes again.
* fix(cli): derive ingest outcomes from saved artifacts
* fix(cli): treat artifact-producing ingests with failures as partial
* fix(cli): route memory-flow run status through shared ingest outcome
* fix(cli): treat partial ingest as saved context in setup status
* test(cli): align memory-flow replay expectations with partial ingests
The Claude Code runtime counted every SDKAssistantMessage with
parent_tool_use_id === null as a step, but the SDK emits extra messages
within a single num_turns round-trip — `stop_reason: 'pause_turn'`
continuations and errored partials it retries internally. The local
counter then outran maxTurns and the ingest HUD rendered confusing
ratios like `step 69/40`.
Filter both cases in collectResult so stepIndex tracks num_turns and
stays bounded by the work-unit stepBudget.
The emit_historic_sql_evidence tool took rawPath as LLM-supplied input,
so projection actions frequently lacked defensible raw paths and every
row in bundle_ingest_reports fell through as actionType: 'skipped' with
null artifact metadata, hiding the wiki pages and SL merges the run had
actually produced (KLO-698).
The tool now reads the work unit's rawFiles from session.allowedRawPaths
and stores them on the evidence envelope; the projection emits actions
with those paths, and stale/archive actions are anchored to manifest.json
so they also surface as non-skipped provenance rows.
* feat(cli): define full warehouse dialect contract
* test(cli): keep dialect edge tests focused
* fix(cli): stabilize dialect contract foundation
* refactor(connectors): own read-only query preparation
* refactor(connectors): resolve dialects through registry
* refactor(connectors): keep concrete dialect classes internal
* chore(workspace): enforce dialect import boundary
* refactor(cli): resolve relationship dialect at scan boundary
* refactor(cli): use dialect display parsing for entity details
* refactor(cli): use dialect display parsing for warehouse catalog
* refactor(cli): use dialect SQL in relationship workflows
* test(cli): verify solid dialect scan workflow closure
* test: split cli tests from source tree
* refactor(cli): standardize BigQuery scope listing
* feat(sqlite): implement connector scope listing
* test(connectors): cover required table listing
* feat(cli): add warehouse driver registry
* refactor(setup): route scope discovery through driver registry
* refactor(cli): route local query execution through driver registry
* refactor(historic-sql): route dialect support through driver registry
* refactor(cli): test warehouse connections through driver registry
* fix(cli): close driver registry type export gaps
* Improve setup daemon diagnostics
* refactor(setup): centralize rail-prefixed diagnostics + query-history fallback
Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput
into clack.ts so the setup wizard, managed daemons, and embedding/agent steps
share one rail-formatted writer. setup-databases.ts also adds a
"disable query history and retry" option when the schema-context build fails
and query history is the likely culprit, surfaced via a new
failed-query-history-unavailable status.
* fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match
The setup picker's KtxTableListEntry was a 2-level { schema, name }, so
qualifiedTableId always wrote db.name into enabled_tables. When BigQuery,
Snowflake, or SQL Server later ran fast ingest, their introspect step filtered
the scope set with scopedTableNames(scope, { catalog: projectId|database, db })
— catalog was non-null on the introspect side but null in the scope refs, so
every entry was rejected, the live-database adapter staged zero table files,
and detect() failed with 'Adapter "live-database" did not recognize fetched
source output'.
Align the picker boundary with the canonical 3-level KtxTableRef:
- Add catalog: string | null to KtxTableListEntry.
- BigQuery/Snowflake/SQL Server listTables populate catalog from the
resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null.
- qualifiedTableId emits catalog.schema.name when catalog is non-null
(resolveEnabledTables already accepts the 3-part shape) and
schemasFromEnabledTables now goes through parseDottedTableEntry so it
recovers the schema correctly from both 2-part and 3-part entries.
- Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker
reuse.
Update listTables expectations in all seven connector tests and the setup /
picker test fixtures. Add a picker regression test that covers the
catalog-bearing round-trip (save + refine).
* fix(cli): allow debug telemetry under opt-out env