apunkt/ktx - bitfreedom.net: free all bits, everywhere

apunkt/ktx

mirror of https://github.com/Kaelio/ktx.git synced 2026-06-22 08:38:08 +02:00

Author	SHA1	Message	Date
Kevin Messiaen	6c815ef529	feat(duckdb): cross-database federation via derived DuckDB connection (#295 ) * feat(duckdb): add @duckdb/node-api dependency for federation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(connectors): extract resolveStringReference to shared module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(connectors): route all identical connectors through shared resolveStringReference Collapse the 5 remaining private copies in bigquery, clickhouse, mysql, snowflake, and sqlserver into the shared module. Fix a latent bug in the shared module where `~/path` was incorrectly sliced (dropping only `~`, leaving the leading `/` and making resolve() ignore homedir). Add a tilde-expansion test that caught the bug and now covers that branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(sl): reserve _ktx_ connection-id prefix for virtual connections Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(connections): derive virtual federated connection from compatible members Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(duckdb): federated executor builds READ_ONLY attaches and runs SQL Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(duckdb): close federated DuckDB instance and escape quotes in attach url Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(sl): union member source directories for _ktx_federated Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(query): route _ktx_federated through DuckDB executor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(sl): use duckdb dialect for federated query compilation Bypass assertSafeConnectionId for _ktx_federated in resolveLocalConnectionId and loadComputableSources, and resolve the compute dialect to 'duckdb' when connectionId is FEDERATED_CONNECTION_ID instead of falling through to the default postgres lookup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(duckdb): end-to-end cross-catalog federated join Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(duckdb): harden federated join test with multi-book join-key coverage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(ingest): keep declared cross-DB joins to federated siblings Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(setup): surface federated connection availability after adding a member Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(setup): mark federationNoticeFor @internal for dead-code gate Also marks attachTypeForDriver, buildAttachStatements, and isReservedConnectionId @internal — all three are exported solely for unit-test access with no production cross-file consumer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(concepts): document cross-database federation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(concepts): correct sqlite two-part naming in federation doc Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(duckdb): quote federated catalog alias so hyphenated connection ids attach * refactor(duckdb): single-source federation driver list, dedup attach loads Collapse the parallel ATTACH_COMPATIBLE_DRIVERS set and ATTACH_TYPE_BY_DRIVER map into one map in federation.ts whose keys are the membership rule. Replace FederatedMember.config (read only via a type-erasing cast) with a typed url field extracted at derive time. Emit INSTALL/LOAD once per distinct driver type instead of once per member. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(duckdb): close federated DuckDB instance on connect failure; dedup id validation Wrap the federated DuckDB instance in its own try/finally so a failing connect() or a throwing connection.closeSync() no longer leaks the native instance. Route setup-sources connection-id validation through the canonical assertSafeConnectionId so the reserved _ktx_ prefix guard applies there too. Derive the federated dialect through sqlAnalysisDialectForDriver instead of a hardcoded literal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(federation): carry member connection config and projectDir on FederatedMember Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(federation): resolve per-member attach targets via canonical connector resolvers Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): quote mysql attach-string values like postgres Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): resolve member attach targets via canonical resolvers, supporting sqlite path: Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(federation): thread projectDir through deriveFederatedConnection callers Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(federation): add shared project read-only SQL executor that routes _ktx_federated Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(federation): exercise shared executor default federated path with real DuckDB Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(federation): route ingest query executor through shared executor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): route MCP sql_execution _ktx_federated through shared executor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): preserve cross-DB joins to federated siblings in manifest re-emit Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): preserve declared cross-DB joins through scan re-ingest Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(federation): document sibling-ref invariant, drop unsafe casts in test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): namespace federated source names by member to avoid collisions Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(federation): document member-namespaced federated source names Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): preserve member SSL/search_path in attach, classify federated MCP errors Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(federation): simplify federated dispatch and parallelize sibling reads Dedup the federated driver ternary in local-query, derive the prefixed source.name from the already-built name, drop the duplicated error in federatedAttachTarget's exhaustive switch, inline the one-line cleanupConnector wrapper, and parallelize federatedSiblingTargets' shard reads (was sequential await-in-for on the scan hot path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(federation): carry headerTypes through shared SQL executor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(federation): add shared federated connection listing builder Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): route ktx sql through shared executor for _ktx_federated parity Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(federation): show _ktx_federated in ktx connection list Surfaces the virtual federated connection in the output of `ktx connection list` so agents and users can discover cross-database querying when 2+ attach-compatible connections are configured. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(federation): surface _ktx_federated in MCP connection_list Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(federation): ktx sql federated cross-file join end-to-end Drive runKtxSql with the real federated DuckDB executor against two on-disk sqlite files, stubbing only SQL validation. The test surfaced that the JSON output path could not serialize bigint values DuckDB returns for integer columns; printJson now coerces bigint to JSON numbers, matching the plain/pretty paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(federation): document direct _ktx_federated query surface Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): coerce DuckDB bigint to number in shared federated executor DuckDB returns integer columns as JS bigint, which JSON.stringify cannot serialize. The CLI --json path worked around this with a replacer, but the MCP sql_execution tool serializes via plain JSON.stringify and crashed on any federated query selecting an integer column. Coerce bigint to Number once in executeFederatedQuery so every consumer (CLI, MCP, ingest, SL) gets a JSON-safe result, and remove the now-redundant CLI replacer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(federation): simplify driver map and collapse forked MCP SQL path - Replace the identity-valued ATTACH_TYPE_BY_DRIVER record with a ATTACH_COMPATIBLE_DRIVERS Set; the driver name doubles as the attach type, so the map encoded nothing beyond membership. - Switch federatedAttachTarget directly on the driver with a default throw, dropping the unreachable post-switch throw and its comment. - Route the MCP sql_execution standard-connection case through the shared executeProjectReadOnlySql instead of reimplementing the connector create/capability-check/execute/cleanup ceremony, so federated and standard connections share one execution path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(federation): allowlist placeholder credentials for detect-secrets The federation doc example URL and the federated-attach test fixtures use literal placeholder credentials that trip detect-secrets. Mark them with line-scoped pragma allowlist comments so a real secret added later is still caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(federation): correct SL addressing, join pruning, and id-quoting guidance - Federated SL list/search records carry the virtual `_ktx_federated` connection id (member origin stays in the prefixed source name), so rows round-trip to `ktx sl -c _ktx_federated read` and the fts index no longer clobbers per-connection partitions. - Prune semantic-layer joins by membership in the connection's own source set instead of matching the target's first dotted segment against other connection ids; a same-connection join whose target name collides with a sibling connection id is preserved, and orphan targets that would poison the planner are dropped. - Document double-quoting for connection ids that are not bare SQL identifiers (e.g. "books-db".public.books) in the federated naming hint, the sl-query rejection error, and the federation docs. - Preserve exact federated BIGINT values beyond 2^53 as strings instead of rounding, and steer the setup federation notice to raw SQL against `_ktx_federated`. * fix(federation): carry ssl:true into postgres URL attach target A postgres member configured with `url` plus `ssl: true` resolved to both a connectionString and an ssl flag, but the federated attach builder early-returned the bare URL and dropped the ssl intent. DuckDB then handed libpq a URL with no sslmode, so the URL path silently diverged from the discrete-field path (which emits sslmode=require) and from the direct scan path (which enforces TLS). Append sslmode=require to the URL when the member sets ssl, unless the URL already pins a stronger sslmode. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>	2026-06-15 15:01:39 +00:00
Andrey Avtomonov	663eaff940	feat(cli): setup progress spinners, Tab-to-select, and banner polish (#296 ) * fix(cli): double the height of the setup banner t crossbar * fix(cli): unify setup multi-select hints and make Tab the select key The six interactive multi-select surfaces in `ktx setup` documented three different hint voices, one had no hint at all, and they named two different select keys (Space vs Tab). Tab is the only key that can toggle selection without colliding with type-to-search input, so make it the single documented select key everywhere and compose every hint from one shared fragment vocabulary in prompt-navigation.ts. - Register `updateSettings({ aliases: { tab: 'space' } })` so Tab toggles flat multiselects; the alias applies only to non-text prompts, leaving typed search input (schema/Notion) untouched. - Add the missing hint to the agent-targets prompt and drop the stray "Space to select … Esc …" info line plus the now-dead writeSetupInfo helper. - Replace the schema-scope ad-hoc hint with the searchable-multiselect voice and standardize "filter" -> "search" vocabulary. - Delete DEFAULT_TREE_PICKER_HELP_TEXT and the unused TreePickerChrome.helpText seam; render the shared tree hint instead. * refactor(cli): show LLM check progress for every setup backend Rename runLlmHealthCheckWithProgress to validateModelWithProgress and wrap the Claude subscription and Codex auth probes in the same spinner progress as the Anthropic API and Vertex backends, so each backend shows consistent "Checking <provider> LLM" output during setup. * feat(cli): add ktx-orange progress spinners to setup steps Add a shared runWithCliSpinner helper and a TTY-aware createCliSpinner: an animated clack spinner in a terminal, and a static stderr-only spinner before raw-mode pickers (the table tree picker and demo tour), where the animated spinner's stdin grab would otherwise corrupt the next prompt. Wrap the slow setup waits in progress spinners: managed runtime install, embedding daemon start + first-run model download, embeddings health check, the connection-test gate, and source validation / dbt clone / Metabase discovery. Recolor every spinner frame from clack's magenta to the ktx mascot orange (#FF8A4C) via the static helper and clack's styleFrame option.	2026-06-12 16:43:10 +02:00
Andrey Avtomonov	00cdf2de90	refactor: enforce ktx naming and AGENTS.md compliance sweep (#289 ) Align the tree with AGENTS.md/CLAUDE.md conventions: - Rewrite user-facing strings, docs, and tests to lowercase `ktx` (no bare uppercase `KTX` tokens remain outside literal identifiers). - Drop the legacy `historicSql` migration path and its now-unused helpers, per the no-backward-compat rule. - Remove `as unknown as` / `any` casts: narrow `BaseTool` generics to `z.ZodObject`, add a typed `createLookerClient`, and delete the dead `getParametersSchema`/`toAnthropicFormat` pre-AI-SDK helpers. - Use `InvalidArgumentError` for Commander parse failures. - Finish the adapter→connector prose conversion in the `ktx.yaml` docs while keeping the literal `adapters` config key.	2026-06-11 13:49:45 +02:00
Andrey Avtomonov	c2beaf7d55	feat(setup): wizard prompt tweaks and quieter query-history filter output (#259 ) Setup wizard flow tweaks: - Add a reveal-tail password prompt (reveal-password-prompt.ts) that unmasks the last few characters of a typed/pasted secret, and wire it into the setup prompt adapter in place of clack's password(); adds the @clack/core dep. - Reorder wizard select options: surface "Paste a key" before the environment-variable option across embeddings/models/sources, promote Metabase/Notion in the source list, put Git URL before Local path, reorder the Notion crawl-mode choices, and relabel the sources "Done" action. Query-history filter picker output: - Collapse the per-template parse-failure lines into a single count in the setup output and route the full template-id list to --debug stderr. - Model parse failures as a structured parseFailedTemplateIds field instead of warning strings. - Add a privacy-safe query_history_filter_completed telemetry event (counts/enums only), mirrored into the Python daemon schema.	2026-06-04 14:11:08 +02:00
Andrey Avtomonov	ce1516b357	feat(cli): consistent connection setup recovery and build-time gate (#257 ) * feat(cli): block context build when a required connection fails its live test A context build can take several minutes, so a connection that is unreachable or misconfigured should stop the build up front instead of failing partway through. Before the build starts, run a live connection test for every primary- and context-source connection the build depends on. Each test's output is captured in a discarded buffer so raw error text (and database paths) never reach the user; failures are surfaced only by connection id and connector type, with a pointer to `ktx connection test <id>` for the underlying error. - Interactive setup lets the user fix the connection and retry without restarting, re-resolving targets so an added/removed/reconfigured connection is honored. - `--no-input` exits non-zero and writes a failed context state with a failureReason, so scripts stop early and setup never reads as ready. Extract the buffered command IO helper out of setup-databases into src/io/buffered-command-io.ts so both call sites share one implementation. * feat(cli): use recovery primitive for database setup * feat(cli): use recovery primitive for source setup * docs: document setup connection recovery * fix(cli): close database recovery gaps * fix(cli): target failing project in gate hint and preserve missing-input Address two review findings on the connection-recovery work: - The connection-gate failure hint emitted `ktx connection test <id>` with no --project-dir, so a setup run started with `--project-dir ./analytics` pointed users at cwd/KTX_PROJECT_DIR instead of the project that just failed. Emit the resolved project dir, matching the contextBuildCommands convention. - The non-interactive database configure path returned `cancelled`, which the recovery primitive collapses to `failed`. Sibling paths still report `missing-input` for absent flags, so incomplete-flag runs were indistinguishable from real connection failures. The database wrapper now tracks the configure missing-input signal and restores the `missing-input` step status; the shared primitive keeps its four outcomes.	2026-06-03 11:08:46 +00:00
Andrey Avtomonov	637891f030	fix(cli): align Notion setup credential to --source-auth-token-ref (#236 ) Notion's setup path read --source-api-key-ref while writing the auth_token_ref config field, so --source-auth-token-ref was silently dropped. Align Notion to the flag=field convention every other connector follows: it now reads --source-auth-token-ref, and --source-api-key-ref becomes Metabase-only. Also add validation rejecting any credential-ref flag not applicable to the chosen --source, with a pointer to the correct flag, closing the silent-drop class for all connectors. Update CLI-reference docs, the ktx skill Notion example, and tests. Fixes KLO-724.	2026-05-29 17:23:46 +02:00
Andrey Avtomonov	56985b7e09	test: split cli tests from source tree (#216 ) * feat(cli): define full warehouse dialect contract * test(cli): keep dialect edge tests focused * fix(cli): stabilize dialect contract foundation * refactor(connectors): own read-only query preparation * refactor(connectors): resolve dialects through registry * refactor(connectors): keep concrete dialect classes internal * chore(workspace): enforce dialect import boundary * refactor(cli): resolve relationship dialect at scan boundary * refactor(cli): use dialect display parsing for entity details * refactor(cli): use dialect display parsing for warehouse catalog * refactor(cli): use dialect SQL in relationship workflows * test(cli): verify solid dialect scan workflow closure * test: split cli tests from source tree * refactor(cli): standardize BigQuery scope listing * feat(sqlite): implement connector scope listing * test(connectors): cover required table listing * feat(cli): add warehouse driver registry * refactor(setup): route scope discovery through driver registry * refactor(cli): route local query execution through driver registry * refactor(historic-sql): route dialect support through driver registry * refactor(cli): test warehouse connections through driver registry * fix(cli): close driver registry type export gaps * Improve setup daemon diagnostics * refactor(setup): centralize rail-prefixed diagnostics + query-history fallback Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput into clack.ts so the setup wizard, managed daemons, and embedding/agent steps share one rail-formatted writer. setup-databases.ts also adds a "disable query history and retry" option when the schema-context build fails and query history is the likely culprit, surfaced via a new failed-query-history-unavailable status. * fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match The setup picker's KtxTableListEntry was a 2-level { schema, name }, so qualifiedTableId always wrote db.name into enabled_tables. When BigQuery, Snowflake, or SQL Server later ran fast ingest, their introspect step filtered the scope set with scopedTableNames(scope, { catalog: projectId\|database, db }) — catalog was non-null on the introspect side but null in the scope refs, so every entry was rejected, the live-database adapter staged zero table files, and detect() failed with 'Adapter "live-database" did not recognize fetched source output'. Align the picker boundary with the canonical 3-level KtxTableRef: - Add catalog: string \| null to KtxTableListEntry. - BigQuery/Snowflake/SQL Server listTables populate catalog from the resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null. - qualifiedTableId emits catalog.schema.name when catalog is non-null (resolveEnabledTables already accepts the 3-part shape) and schemasFromEnabledTables now goes through parseDottedTableEntry so it recovers the schema correctly from both 2-part and 3-part entries. - Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker reuse. Update listTables expectations in all seven connector tests and the setup / picker test fixtures. Add a picker regression test that covers the catalog-bearing round-trip (save + refine). * fix(cli): allow debug telemetry under opt-out env	2026-05-26 08:49:05 +02:00
Andrey Avtomonov	b0dd13ce7c	feat(telemetry): anonymous posthog usage telemetry across node cli and python daemon (#205 ) * feat: add telemetry phase 1 * feat: add node telemetry event catalog * feat: add telemetry event helpers * feat: emit setup and connection telemetry * feat: emit connection and stack telemetry * feat: emit ingest and scan telemetry * feat: emit query telemetry * feat: emit sampled mcp telemetry * docs: expand telemetry event catalog * feat: add telemetry schema sync artifact * feat: pass telemetry project id to semantic daemon * feat: add daemon telemetry foundation * feat: emit semantic daemon telemetry * feat: emit daemon lifecycle telemetry * docs: document full telemetry event catalog * feat(telemetry): dim first-run notice * feat(telemetry): show first-run notice before command output * feat(telemetry): wire ktx PostHog project for live ingestion * docs(telemetry): drop posthog project name and host from storage section * docs(telemetry): trim to general overview and disclaimer * docs(agents): add short telemetry guidelines * feat(telemetry): enable posthog geoip enrichment * docs(telemetry): drop ip-geoip note from public overview * refactor(telemetry): drop no-op groupIdentify, rely on capture groups field * fix(telemetry): respect CI kill switch in python daemon identity * fix(sql): route table-count analysis to existing analyze-batch endpoint * fix(telemetry): emit install_first_run from notice path and derive flagsPresent from commander * fix(telemetry): read package info via getKtxCliPackageInfo to satisfy boundary check * fix(telemetry): make python identity env={} bypass os.environ and unset CI in tests * fix(telemetry): unset CI kill switch in cli-program-telemetry tests	2026-05-22 18:18:47 +02:00
Andrey Avtomonov	c87d14a554	feat(cli): redesign database scope picker for searchable schema-first setup (#203 ) * feat: add searchable setup prompt pickers * fix: make snowflake scope discovery single query * fix: make bigquery table discovery schema scoped * fix: honor mysql and clickhouse database scope * feat: wire schema scope discovery for all relational setup drivers * feat: add schema-first database scope picker * test: update setup prompt stubs for type-check * docs: document database scope picker fields * Fix database setup edit preservation --------- Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>	2026-05-22 14:22:11 +02:00
Andrey Avtomonov	2366b00301	chore(workspace): gate dead-code with knip production mode (#196 ) * refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm * refactor(workspace): rewrite @ktx/llm imports to relative paths * refactor(workspace): fold internal packages into cli * chore(workspace): gate dead-code with knip production mode Turn on production-mode knip plus an autofix run in pre-commit and the `pnpm dead-code` script, document the `/** @internal /` convention for test-only exports in AGENTS.md, annotate test-only exports across the CLI with that JSDoc, and drop dead exports/wrappers the new gate surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`, `createLocalScanEnrichmentProvidersFromConfig`, `PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports). Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit production entries so cross-package barrel leaks are caught. refactor(cli): delete internal barrel index.ts files The 34 `index.ts` re-export barrels inside `packages/cli/src/` were holdovers from the pre-fold multi-workspace structure. Post-fold-in they served no production purpose: external consumers go through the single package main entry, and in-repo callers mostly imported through them only because the path was short. Internally, knip flagged most barrel re-exports as production-dead (only reached via tests). This change: - Deletes every internal barrel except `packages/cli/src/index.ts` (the published package entry). - Rewrites ~270 source/test files to import each name directly from the file that defines it. - Moves `tools/warehouse-verification/index.ts` to `create-warehouse-verification-tools.ts` (the function it defined locally) and updates its single consumer. - Renames `search/backend-conformance.ts` → `.test-utils.ts` to match the existing test-helper file convention. - Deletes 13 dead test-only chains (dbt-descriptions/, live-database/extracted-schema, live-database/structural-sync, relationship- feedback/review chain) plus their tests and a cascading orphan integration test. - Updates test mocks that pointed at deleted barrel paths (notion-client, connector barrels in scan/local-scan-connectors tests) to mock the source files instead. - Points the maintainer benchmark script (`scripts/relationship-benchmark-report.mjs`) at source files instead of `dist/context/scan/index.js`. - Drops the barrel `!` entries from `knip.json`; adds explicit production entries only for the benchmark code reached via dist by the maintainer script. Net: 413 files changed, ~1.2k insertions, ~9.4k deletions. `pnpm run dead-code` (Biome + knip default + knip production) and `pnpm run type-check` are clean; 2277 tests pass. * refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly Promote the CLI workspace package to the public name `@kaelio/ktx` and drop the separate `scripts/build-public-npm-package.mjs` wrapper. The CLI package is now publishable in place (`publishConfig.access: public`, `provenance: true`), so artifact packing uses `pnpm pack` against `packages/cli/` instead of assembling a parallel package tree. Updates all workspace filter invocations, docs, tests, and release readiness checks to reference the new package name, and folds the tarball-name helper into `scripts/public-npm-release-metadata.mjs`. * docs: align "agent clients" and "data agents" terminology Replace "client agents" with "agent clients" and "database agents" with "data agents" across AGENTS.md, README.md, the docs-site copy, and the matching setup-agents test description, matching the canonical vocabulary in docs/terminology.md. Also moves packages/cli/tsconfig.json's tsBuildInfoFile from node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive node_modules reinstalls. * refactor(release): single source of truth for package version Make packages/cli/package.json the single source of truth for the @kaelio/ktx version. publicNpmPackageVersion() now reads it directly, so artifact filenames, release-readiness checks, and the Python wheel version all derive from one field. The duplicate release-policy.json.publicNpmPackageVersion is removed. Previously the two fields could drift: tarballs were named kaelio-ktx-0.4.1.tgz while internally containing @kaelio/ktx@0.0.0-private. - update-public-release-version.mjs rewrites both Python pyproject.toml files (ktx-daemon, ktx-sl) alongside the npm package.jsons, normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2). - semantic-release-config.cjs adds the two pyproject.toml files to @semantic-release/git assets so the release commit back to main carries every version source in lockstep. - The six "?? '0.0.0-private'" fallback literals across the CLI are replaced with "?? getKtxCliPackageInfo().version", and createDefaultKtxMcpServer makes its version arg required. - docs/release.md describes the actual commit-back model: the dev tree always reflects the most recent release; no sentinel pin to maintain. Verified: pnpm run artifacts:build now produces kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with @kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and 2287 vitests + 173 script tests pass. * refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and scan command entrypoints so tests can stub them, and teach resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime feature when ktx.yaml selects sentence-transformers. * chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal Both symbols are consumed only by status-project.test.ts. Annotating with /** @internal / keeps knip's production-mode check clean without changing runtime behavior. fix(cli): use real package metadata in print-command-tree The stubbed package name embedded a forbidden product identifier that tripped the boundary check in CI. Read the metadata from package.json instead — keeps the rendered tree unchanged and removes a duplicate source of truth. * feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer source counts, computed with `SUM(embedding_json IS NOT NULL)` over `knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to "Wiki" (canonical per `docs/terminology.md`) and rename the matching `localStats.knowledgePages` field to `localStats.wikiPages`. Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those duplicated the per-surface rows above. Disk now reports only actual byte usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` / `semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry` helpers, and the `filter` arg on `summarizeDir` are removed.	2026-05-21 15:28:58 +02:00
Andrey Avtomonov	d1c84e5564	fix: improve setup wizard behavior (#127 ) * fix: improve setup wizard behavior * fix: derive runtime versions from release metadata * test: validate metabase source mapping requirements * Fix boundary check release identifiers	2026-05-17 19:15:09 +02:00
Andrey Avtomonov	f8db99811a	feat(context): add driver-discriminated connection schemas (#96 ) * refactor(context): export and describe mapping shape schemas * feat(context): add driver-schemas module with warehouse drivers * feat(context): add metabase, looker, lookml driver schemas with mappings * feat(context): add notion, dbt, metricflow driver schemas * refactor(context): make connectionSchema a driver-discriminated union * chore(context): re-export KtxConnectionConfig from project package * docs(context): add connection driver schema plan * chore(secrets): allowlist example credentials in driver-schemas fixtures * test(cli): align metabase fixtures with required api_url field The driver-discriminated union added in this branch now requires api_url for metabase connections and a known driver for warehouses. Update slow CLI test fixtures and assertions so they exercise the new schema: - ingest.test-utils.ts: add api_url to the prod-metabase fixture. - setup.test.ts: switch metabase fixture from 'url' to 'api_url'. - local-scan-connectors.test.ts: invalid-driver/missing-driver tests now expect the schema error from loadKtxProject (parse-time rejection).	2026-05-15 00:08:11 +02:00
Andrey Avtomonov	52dd89481c	fix(cli): keep ktx setup alive when a dbt git clone fails (#88 ) Wraps the validation clone in defaultValidateDbt so auth or network failures surface as a clean validation error instead of an unhandled RepoFetchError that exits the wizard. Verifies pasted tokens with testGitRepo before saving them as a secret so bad tokens are caught at paste time. In interactive setup, validation failures now bounce the user back to source selection (with a "Edit the connection or pick a different source" hint) instead of killing the process; --source flag mode still exits with failed as before.	2026-05-14 14:39:50 +02:00
Andrey Avtomonov	b00c1a11a9	feat: merge ingest and scan * docs: add CLI component reuse guidance * docs: add unified ingest ux design * Refine unified ingest UX design after adversarial review iteration 1 * Refine unified ingest UX design after adversarial review iteration 2 * Refine unified ingest UX design after adversarial review iteration 3 * feat(cli): route public connection ingest command * feat(cli): hide standalone scan from public help * feat(cli): plan public ingest depth and query history * feat(cli): execute public database ingest facets * feat(ingest): read connection query history config * fix(cli): use public ingest wording * fix(config): stop generating ingest adapter allow lists * docs: document public ingest command * test: align ingest surface expectations * docs: add unified ingest public CLI surface plan * feat(cli): preflight deep public ingest readiness * feat(setup): store query history in connection context * feat(setup): store database context depth * feat(setup): verify context readiness by database depth * fix(setup): keep context build foreground only * fix(config): reject reserved ingest connection ids * test: close unified ingest v1 expectations * docs: add unified ingest v1 closure plan * fix(ingest): bypass adapter allow-list for public source ingest * fix(ingest): honor query history window intent * fix(ingest): hide scan internals from public database ingest * feat(ingest): use foreground view for interactive public ingest * fix(setup): use schema context and query history wording * test(cli): verify unified ingest public output * docs: add unified ingest v1 public output closure plan * fix(setup): forward query history flags * fix(setup): prompt for postgres query history * fix(status): report query history readiness * fix(ingest): remove legacy public guidance * fix(ingest): polish foreground retry copy * docs(examples): use unified query history wording * chore(ingest): finish public query history cleanup * docs: add unified ingest v1 query history status cleanup plan * test(docs): cover unified ingest public docs * docs: align ingest CLI reference with unified UX * docs: update context build guides for unified ingest * docs: update setup and primary source ingest wording * docs: stop advertising adapter-backed example ingest * docs: close unified ingest public docs gaps * docs: add unified ingest v1 docs site closure plan * fix: render unified ingest foreground warnings * fix: explain query history schema order * fix: add public ingest retry guidance * fix: align setup next steps with unified ingest * fix: remove scan wording from demo progress * test: verify unified ingest ux closure * docs: add unified ingest v1 foreground and retry closure plan * fix(cli): preserve query-history pull config in public ingest * fix(cli): omit hidden commands from docs command tree * test(cli): close unified ingest final public surface checks * docs: add unified ingest v1 final public surface closure plan * fix(cli): use public source labels in ingest reports * fix(cli): suppress low-level public ingest output * test(cli): verify unified ingest public plain output * docs: add unified ingest v1 public plain output closure plan * fix(cli): add public ingest copy sanitizers * fix(cli): sanitize public ingest progress copy * fix(cli): rename setup schema scope prompt * docs(plan): add progress copy closure; test: align setup back-nav fixture Adds the iter9 plan and updates the setup back-navigation test fixture to pass disableQueryHistory plus listSchemas/listTables stubs that the unified ingest setup step now requires. * docs(plan): add final ux labels plan with narrowed label scans * fix(cli): aggregate unsupported query-history warnings * fix(cli): align setup database labels * test(cli): fix setup database test type-check * fix(cli): remove primary-source wording from setup output * test(cli): verify unified ingest setup closure * docs(plan): add unified ingest v1 verification copy closure plan * fix(cli): remove top-level scan command * fix(cli): remove legacy ingest and wiki commands * Merge scan into ingest flow * feat(cli): split ingest progress into per-phase rows, rename work units to tasks Each database target in the unified ingest dashboard now renders one row per real subprocess (Schema, then Query history when enabled) instead of a single combined bar. Each phase has its own monotonic 0-100% bar so the progress never snaps back to zero when historic-sql starts after scan completes. Completed phases keep their final bar, summary, and elapsed time visible as an inline audit trail; queued and skipped phases are shown explicitly. Also rename user-facing "work units" / "Failed work units" to "tasks" / "Failed tasks" in ingest output and parseIngestSummary. The parser still accepts the legacy "Work units:" wording in captured output for backward compat. Internal memory-flow event names and type fields are left alone. * Fix test harness failures * Fix CI smoke checks --------- Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>	2026-05-14 01:43:06 +02:00
Andrey Avtomonov	1a472cf3ed	fix: clean up ktx yaml config parameters (#75 ) * fix: clean up ktx yaml config parameters * fix: align ci smoke checks with status output * test: update artifact smoke status assertion	2026-05-14 01:27:31 +02:00
Luca Martial	9ecb8cb119	feat(cli): add edit flow for setup connections (#77 ) * feat(cli): add edit flow for primary database connections in setup Allow users to edit existing primary database connections during setup instead of only adding new ones. Preselects existing values (URL, schemas, tables) so users can adjust without re-entering everything. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(cli): add edit flow for context source connections in setup Allow users to edit existing context source connections during setup. Preselects existing values (URLs, credentials, repo details) and offers a "Keep existing credential" option for sensitive fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(cli): rename "Add more" to "Add additional" in primary sources menu Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 17:22:59 -04:00
Andrey Avtomonov	754e4a9039	feat(cli): improve setup progress UX (#69 )	2026-05-13 17:01:48 +02:00
Andrey Avtomonov	d7147f9ca1	feat: rename project wiki directory (#66 ) * feat: rename project wiki directory * test: fix wiki skill ordering expectations * Show configured context sources in setup	2026-05-13 16:05:58 +02:00
Andrey Avtomonov	97da9919e9	refactor: remove legacy compatibility paths (#64 ) * refactor: remove legacy compatibility paths * fix: support legacy metabase native queries * test: use canonical semantic layer descriptions * Rename CLI description * Recover setup scan from SQLite ABI mismatch * Remove legacy product name from CLI help	2026-05-13 15:55:00 +02:00
Andrey Avtomonov	e1e9c4af91	fix(cli): clean up connection commands (#62 ) * fix(cli): clean up connection commands * test(cli): update connection smoke coverage * Fix setup output formatting * fix notion setup picker exit	2026-05-13 15:04:50 +02:00
Andrey Avtomonov	b75576279c	fix: store Metabase mappings in ktx.yaml (#61 ) * fix: store Metabase mappings in ktx.yaml * docs: note KTX has no public users * refactor: drop setup progress compatibility	2026-05-13 13:55:21 +02:00
Andrey Avtomonov	b9e0a746af	feat(cli): clean up dev command surface (#57 ) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations	2026-05-13 12:00:08 +02:00
Luca Martial	8ceb3bc7b9	Confirm skipped optional setup selections	2026-05-12 18:23:03 -07:00
Luca Martial	fcdf5234c6	Merge pull request #45 from Kaelio/luca/klo-654-improve-indents feat(cli): add box-drawing prefixes to setup messages	2026-05-12 19:58:55 -04:00
Luca Martial	07ac71ea7c	feat(cli): add box-drawing prefixes to remaining setup stdout messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 16:58:09 -07:00
Luca Martial	dbfee6b453	feat(cli): migrate all setup steps to use local state for completion tracking Update every setup step to write completed_steps to .ktx/setup/state.json instead of ktx.yaml, stripping legacy entries from config on write. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 16:26:23 -07:00
Luca Martial	60457e9407	Improve schema setup and Notion ingest UX (#14 ) * Improve schema setup and Notion ingest UX * Handle Postgres network scan failures * WIP: save local changes before main merge * Refine setup prompt choices * Tighten ingest reconciliation guidance * Commit setup config updates * Canonicalize unmapped fallback details * Count reconciliation actions in reports * Harden semantic layer source validation * Return wiki content after edits * Validate SL sources against manifests * Validate wiki refs before writes * Simplify CLI next steps * Clarify agent setup summary * Surface dbt target SL sources * Recover SL write fallbacks * Preserve failed context build metadata * Track raw paths for ingest actions * test(cli): update seeded demo expectations * fix(ingest): scope fallback recovery checks * fix(sl): tighten source validation guards * fix(wiki): ignore empty embedding vectors * Improve Notion ingest UX * Enforce flat wiki keys * test(context): update wiki key assertion --------- Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>	2026-05-12 22:56:58 +02:00
Luca Martial	c82989119b	Update setup and ingest flows	2026-05-10 23:13:17 -07:00
Luca Martial	440a07d0d2	Summarize connector mapping validation	2026-05-10 16:19:19 -07:00
Luca Martial	1b5a9fe120	Improve connector credential setup UX	2026-05-10 16:12:51 -07:00
Andrey Avtomonov	3ce510b55b	rename klo to ktx	2026-05-10 23:51:24 +02:00
Andrey Avtomonov	1a42152e6f	Initial open-source release	2026-05-10 23:12:26 +02:00

32 commits