apunkt/ktx - bitfreedom.net: free all bits, everywhere

apunkt/ktx

mirror of https://github.com/Kaelio/ktx.git synced 2026-06-07 07:55:13 +02:00

Author	SHA1	Message	Date
Andrey Avtomonov	494618ab14	feat: add codex llm backend for ktx runtime work (#253 ) * feat: add codex sdk runner foundation * feat: parse codex runtime events * feat: expose codex runtime mcp tools * feat: add codex llm runtime * feat: wire codex llm backend * test: avoid Array.fromAsync in codex runner test * docs: document codex llm backend * fix: tighten codex runtime config ownership * fix: use codex sdk env and thread options * fix: parse codex sdk event shapes * test: add codex backend live smoke * docs: clarify codex backend isolation * fix: drive codex loop metrics from mcp events * fix: enforce codex local step budget * docs: disclose codex isolation limits * fix: count all codex agent steps and stream step callbacks live The agent-loop step budget only counted completed mcp_tool_call items, so built-in command_execution steps (which the public Codex SDK/CLI surface can still expose) never decremented the budget, letting ingest/reconciliation run past stepBudget until Codex stopped on its own. onStepFinish was also replayed only after the whole stream drained, so live work_unit_step / reconciliation progress appeared stuck until the Codex process exited. collectEvents is now the single live step accumulator: it counts every completed agent-action item via a shared isCompletedAgentStep predicate (command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish as each step completes, and enforces the budget on that broader count. A no-tool turn still counts as one step. toolFailures stays MCP-specific, since a non-zero command exit is normal agent exploration, not a loop failure. * test: align ingest llm-guard assertions with codex backend The skip-llm ingest guard message now lists codex as a valid backend and mentions a Claude Code/Codex session plus a codex setup hint, but this slow suite test still asserted the pre-codex wording. Update it to match the production message (already covered by the local-bundle-runtime unit test) and add the codex setup-line assertion. * fix: treat codex error:null tool calls as success The Codex SDK serializes error: null on successful mcp_tool_call items, so the failure check (item.error !== undefined) flagged every successful tool call as failed with the empty-payload default "Codex turn failed". This killed every ingest work unit under the codex backend before it could produce a patch. Key on status === 'failed' (authoritative, always set) and only treat a populated error object as a failure. Add a regression test built from a verbatim real-SDK event capture. * fix: default codex backend to gpt-5.5 and report real probe errors The previous default gpt-5.3-codex is an API-key-only model that the OpenAI API rejects under ChatGPT-account (subscription) auth, so codex status/setup failed with a misleading "authentication is not usable" message even though auth was fine. - Default codex model is now gpt-5.5 (works on both subscription and API-key auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark). - runCodexAuthProbe now distinguishes "model not available" from an auth failure and surfaces the real API error: collectEvents retains stream events when the SDK throws on a non-zero exit, and the API error JSON envelope is unwrapped to its human-readable message. - The Codex isolation warning now renders inside the clack setup frame. - Docs updated to gpt-5.5 with a note that -codex ids require API-key auth. fix: require llm.models.default in status and match codex probe remediation Status reported a project ready when a non-none LLM backend was configured without llm.models.default, but the runtime (resolveModelSlots) hard-requires it, so ingest/scan/memory threw after `ktx status` said the project was usable. buildLlmStatus now fails for any non-none backend missing models.default and no longer invents a fallback model for claude-code/codex. Codex probe failures now carry a category-matched fix: a model-access failure steers the user at llm.models.default instead of the auth/install remediation. runCodexAuthProbe returns the fix and status consumes it; the message stays self-sufficient so setup output is unchanged. Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx states --llm-model only accepts codex/default or gpt-/codex- ids. Repaired four doctor fixtures that configured a backend without models.default (the now-correctly-blocked config) and added coverage for the new behavior.	2026-06-02 13:57:11 +02:00
Andrey Avtomonov	21744fc520	feat(cli): profile ingest runs and split model vs tool time (#249 ) * feat(cli): profile ingest runs to find where wall-clock time goes Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and agent loop now records durationMs / step count / token usage in the trace, and a post-run aggregator rolls them up into a "where did the time go" report printed to stderr. Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json -> raw structured profile) or persistently via `ingest.profile` in ktx.yaml. The json form emits raw milliseconds, token counts, and a summary.headline one-line diagnosis so coding agents can parse it directly; json wins when both env and config request profiling. - runtime-port: RunLoopMetrics (totalMs, usage, stepCount, stepBoundariesMs) plus onMetrics callbacks on text/object generation - ai-sdk + claude-code runtimes: capture per-loop timing and token usage - work-unit-executor and stages 3/4: thread metrics into trace events - ingest-bundle.runner: time worktree / triage / clustering / index / reconcile / squash phases and emit the profile in a finally block (best-effort; never affects the run outcome) - ingest-profile: new trace+transcript aggregator with table/json formatters - config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx * fix(cli): flush tool-call logs before reading ingest profile Tool transcripts are appended fire-and-forget so the agent hot path never blocks on logging. The ingest profiler read them before the writes settled, so per-work-unit toolMs (and the model-vs-tool split derived from it) could be incomplete. Track in-flight appends and expose flushToolCallLogs() — bounded by a timeout so it can never hang — and flush before the profiler reads the transcript.	2026-06-01 15:49:17 +02:00
Andrey Avtomonov	d320d54ab2	feat(cli): shell completion for commands, flags, and entity names (#244 ) * feat(completion): complete known argument values * fix(completion): hide Commander-hidden subcommands from completions Replace the `__`-prefix name heuristic with Commander's `_hidden` flag so internal subcommands registered with { hidden: true } (e.g. `mcp serve-internal`) are excluded from completions, mirroring `ktx --help`. * test: cover wiki and sl read command routing * test: cover raw wiki and sl reads * feat: add wiki read command * feat: add sl read command * feat: complete read command entity names * docs: document wiki and sl read commands * test: include read commands in command tree * feat(sl): read and validate unique sources by name * feat(sl): make read and validate connection id optional * fix(completion): dedupe semantic source names * docs(sl): document connection-optional read and validate * fix(sl): require connection id for query command * docs(sl): clarify query connection requirement * fix(completion): don't resolve option values as subcommands resolveCommand skipped flag tokens but not the value consumed by a value-taking option in the `--flag value` form, so a connection id like `query` was matched as the `sl query` subcommand and yielded no `sl` completions. Track value-taking options and skip their consumed value before matching subcommands. * test(telemetry): assert first-run notice via TELEMETRY_NOTICE constant CI (which tests this branch merged with main) failed because #243 changed the first-run notice wording in identity.ts (dropped "anonymous") but left this test grepping for the old literal 'ktx collects anonymous usage data', so indexOf returned -1. Assert against the exported TELEMETRY_NOTICE constant instead so the test tracks the source of truth and cannot drift when the notice text changes again.	2026-05-31 23:44:33 +02:00
Andrey Avtomonov	2e5f7f25aa	feat: report MCP client telemetry (#242 )	2026-05-30 18:00:25 +02:00
Andrey Avtomonov	25f639fba2	feat: trim MCP query response payloads (#240 )	2026-05-30 17:54:24 +02:00
Andrey Avtomonov	53a6f8d111	fix(cli): treat artifact-producing ingests with failures as partial (#238 ) * fix(cli): derive ingest outcomes from saved artifacts * fix(cli): treat artifact-producing ingests with failures as partial * fix(cli): route memory-flow run status through shared ingest outcome * fix(cli): treat partial ingest as saved context in setup status * test(cli): align memory-flow replay expectations with partial ingests	2026-05-30 00:42:59 +02:00
Andrey Avtomonov	6837ab253d	fix(cli): align ingest step counter with SDK num_turns (#225 ) The Claude Code runtime counted every SDKAssistantMessage with parent_tool_use_id === null as a step, but the SDK emits extra messages within a single num_turns round-trip — `stop_reason: 'pause_turn'` continuations and errored partials it retries internally. The local counter then outran maxTurns and the ingest HUD rendered confusing ratios like `step 69/40`. Filter both cases in collectResult so stepIndex tracks num_turns and stays bounded by the work-unit stepBudget.	2026-05-28 02:09:53 +02:00
Andrey Avtomonov	1071f9d1c9	fix(ingest): attribute historic-sql evidence writes in bundle report (#220 ) The emit_historic_sql_evidence tool took rawPath as LLM-supplied input, so projection actions frequently lacked defensible raw paths and every row in bundle_ingest_reports fell through as actionType: 'skipped' with null artifact metadata, hiding the wiki pages and SL merges the run had actually produced (KLO-698). The tool now reads the work unit's rawFiles from session.allowedRawPaths and stores them on the evidence envelope; the projection emits actions with those paths, and stale/archive actions are anchored to manifest.json so they also surface as non-skipped provenance rows.	2026-05-26 12:21:53 +02:00
Andrey Avtomonov	56985b7e09	test: split cli tests from source tree (#216 ) * feat(cli): define full warehouse dialect contract * test(cli): keep dialect edge tests focused * fix(cli): stabilize dialect contract foundation * refactor(connectors): own read-only query preparation * refactor(connectors): resolve dialects through registry * refactor(connectors): keep concrete dialect classes internal * chore(workspace): enforce dialect import boundary * refactor(cli): resolve relationship dialect at scan boundary * refactor(cli): use dialect display parsing for entity details * refactor(cli): use dialect display parsing for warehouse catalog * refactor(cli): use dialect SQL in relationship workflows * test(cli): verify solid dialect scan workflow closure * test: split cli tests from source tree * refactor(cli): standardize BigQuery scope listing * feat(sqlite): implement connector scope listing * test(connectors): cover required table listing * feat(cli): add warehouse driver registry * refactor(setup): route scope discovery through driver registry * refactor(cli): route local query execution through driver registry * refactor(historic-sql): route dialect support through driver registry * refactor(cli): test warehouse connections through driver registry * fix(cli): close driver registry type export gaps * Improve setup daemon diagnostics * refactor(setup): centralize rail-prefixed diagnostics + query-history fallback Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput into clack.ts so the setup wizard, managed daemons, and embedding/agent steps share one rail-formatted writer. setup-databases.ts also adds a "disable query history and retry" option when the schema-context build fails and query history is the likely culprit, surfaced via a new failed-query-history-unavailable status. * fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match The setup picker's KtxTableListEntry was a 2-level { schema, name }, so qualifiedTableId always wrote db.name into enabled_tables. When BigQuery, Snowflake, or SQL Server later ran fast ingest, their introspect step filtered the scope set with scopedTableNames(scope, { catalog: projectId\|database, db }) — catalog was non-null on the introspect side but null in the scope refs, so every entry was rejected, the live-database adapter staged zero table files, and detect() failed with 'Adapter "live-database" did not recognize fetched source output'. Align the picker boundary with the canonical 3-level KtxTableRef: - Add catalog: string \| null to KtxTableListEntry. - BigQuery/Snowflake/SQL Server listTables populate catalog from the resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null. - qualifiedTableId emits catalog.schema.name when catalog is non-null (resolveEnabledTables already accepts the 3-part shape) and schemasFromEnabledTables now goes through parseDottedTableEntry so it recovers the schema correctly from both 2-part and 3-part entries. - Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker reuse. Update listTables expectations in all seven connector tests and the setup / picker test fixtures. Add a picker regression test that covers the catalog-bearing round-trip (save + refine). * fix(cli): allow debug telemetry under opt-out env	2026-05-26 08:49:05 +02:00

9 commits